Windows 2008 R2 Failover Clustering and ESX

We ran into an issue the other day trying to configure a 2008 R2 Failover cluster on 2 DL 585 G1 servers (Opteron 875, 4x dual core processors.) These servers didn't have native support from Microsoft for the RAID card within the OS.  So we looked at the idea that we needed to purchase new hardware for this SQL 2008 DEV /  TEST cluster initiative.

I came up with the brain child to virtualize the hosts to ESXi and run the two sides of the clusters removing the dependency on the hardware.  So we charged down this path to implement a Windows 2008 R2 server failover cluster:

  • installing the ESXi server - flawless
  • installing 2008 R2 - flawless 
  • installing the failover cluster - failed.  

What?  Two areas fail during the testing mechanism, one is the lack of HBA's in the servers (no problem), the second confused us, SCSI-3 Persistent Reservation.  Validation failed because it reports that "putting PR reserve on cluster disk 0 was successful when it should of failed."  It recommends to check the configuration of the storage to allow it to function properly for failover clusters.

After a bit of examination on the web I stumbled upon this link:

and it talks about running a failover cluster in vmware workstation under an freeNAS iSCSI target.  Hmmmm, would that work?

Downloading openfiler as I type...


Over a year later and hopefully a whole lot smarter.  Openfiler didn't support SCSI-3 reservations and quickly understand that without that Clustering will never work.  SCSI-3 reservations have to do with how the LUN or disk is locked when a computer is talking to it. 

Here is good breakdown from Symantec's site:

SCSI-3 persistent reservations

SCSI-3 Persistent Reservations (SCSI-3 PR) are required for I/O fencing and resolve the issues of using SCSI reservations in a clustered SAN environment. SCSI-3 PR enables access for multiple nodes to a device and simultaneously blocks access for other nodes.
SCSI-3 reservations are persistent across SCSI bus resets and support multiple paths from a host to a disk. In contrast, only one host can use SCSI-2 reservations with one path. If the need arises to block access to a device because of data integrity concerns, only one host and one path remain active. The requirements for larger clusters, with multiple nodes reading and writing to storage in a controlled manner, make SCSI-2 reservations obsolete.
SCSI-3 PR uses a concept of registration and reservation. Each system registers its own "key" with a SCSI-3 device. Multiple systems registering keys form a membership and establish a reservation, typically set to "Write Exclusive Registrants Only." The WERO setting enables only registered systems to perform write operations. For a given disk, only one reservation can exist amidst numerous registrations.
With SCSI-3 PR technology, blocking write access is as simple as removing a registration from a device. Only registered members can "eject" the registration of another member. A member wishing to eject another member issues a "preempt and abort" command. Ejecting a node is final and atomic; an ejected node cannot eject another node. In VCS, a node registers the same key for all paths to the device. A single preempt and abort command ejects a node from all paths to the storage device. 

Confused yet?

End of the story is make sure the storage you are using is capable of SCSI-3 persistent reservations, share the storage to both of the cluster nodes, and cluster away.