Host Disconnected, will not reconnect.

Hello All,

I ran into an issue last week where a host disconnected and refused to reconnect.  I could connect through the VIClient direct to the host yet vCenter service would not connect to the host (vpxa)

Searching through the communities website I discovered a post that described that if a VM had too many snapshots, it would disconnect the Host from vCenter.  After some searching I had discovered the VM with the issue, 235 snapshots!  The backup program had attempted to quiesce the VM filesystem but it was too busy to complete and eventually gave up.  The resulting snapshots did not show up in the Snapshot manager and as expected needed to be consolidated.

Shut the VM down (after securing an outage window) and selected the Consolidate Disks option in the snapshot manager menu and it attempted to consolidate the disks, but didn't do a thing.  The task was frozen, the host had 40 VMs on it and it was able to connect to vCenter.  Call to VMware support was the next step.

There was a few ways to resolve this, but first to connect the host to vCenter. 
There is a line in the vpxa.cfg file on the host that has an entry <ThreadStackSizeKb> and the value is set to 128.  When vpxa service attempts to connect to vCenter this Thread Stack is limited to not allow VMs that are too many snapshots.  The setup is better left the way it is to alert the admin to

Commands are as follows:
  1. Enable SSH to the Host
  2. cd /etc/vmware/vpxa
  3. cp vpxa.cfg vpxa.cfg.old 
  4. vi vpxa.cfg
  5. Edit the entry <ThreadStackSizeKb>128</ThreadStackSizeKb> to 1024
  6. Exit and save
  7. /etc/init.d/vpxa restart
Once this is done we reconnected the Host to vCenter, moved the rest of the VMs off the host.  Once the Host was free and clear of the rest of the VMs I rebooted the host to clear the Zombie Task.  I followed this KB article (1002310 and for the most part this one 1027876) to clone all the snaps into one file, removed the .vmsd file and pointed the disk at the clone. 

Once these are completed, you have to remove / delete the snapshots from the folder.

Voila, the VM is good again.

Make sure to return the Hosts vpxa.cfg back to the original configuration before calling it done.