Disk Queue Length

I was reading an article on yellow bricks about "command queue latency" and "scheduled Number of Requests Outstanding".  Here is a link and a quote that starts you down the path to correct configuration of this important performance tweak.

http://www.yellow-bricks.com/2008/07/21/queuedepth-and-whats-next/

The number of virtual machines (and hence the number of ESX hosts) that can share a single VMFS volume depends on the I/O activity of the virtual machines and also on the capabilities of the storage array. In general, there are two constraints that govern how virtual machines are distributed among ESX hosts and how many LUNs (or VMFS volumes) are needed. The constraints are:
  • A maximum LUN queue depth of 64 per ESX host. If an ESX host has exclusive access to a LUN, you can increase the queue depth to 128 if an application really demands it. However, if multiple ESX hosts share the same LUN, it is not recommended to have a queue depth setting of more than 64, because it is possible to oversubscribe the storage array if it is short of hardware capabilities.
  • A maximum number of outstanding I/O commands to the shared LUN (or VMFS volume) that depends on the storage array. This number must be determined for a particular storage array configuration supporting multiple ESX hosts. If the storage array has a per‐LUN queue depth, exceeding this value causes high latencies. If the storage array does not have a per‐LUN queue depth, the bottleneck is shifted to the disks, and latencies increase. In either case, it is important to ensure that there are enough disks to support the influx of commands. It is hard to recommend an upper threshold for latency because it depends on individual applications. However, a 50 millisecond latency is high enough for most applications, and you should add more physical resources if you reach that point.

http://www.vmware.com/files/pdf/scalable_storage_performance.pdf

We have been struggling with command aborts and latency issues for quite sometime and I have been searching for a solution to this issue thinking it is coming from the EMC Clariion that we are running.  d'oh!

This link provides the knowledge base article to changing the options in the kernel module option.
http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1267
0