NFS Disconnects in VMware vSphere 5.5 U1

By on 04/17/2014.

UPDATE [Jul 18 23:30 PDT]:  We continue to receive a lot of calls from customers hitting this issue, so I wanted to share an official position from NetApp CSS (support) to both our customers, as well as our field and partner community to advise customers on information you should know, as of today, about the current status of the 5.5U1 APD issue:

Build 1881737 (ESXi 5.5 Express P4) corrects for this issue, however, it is not fully qualified by NetApp yet. We are anticipating it will be on our IMT in the next week or two.

Our current fully-supported recommendation remains to back the ESXi hosts down to 5.5 flat (build 1331820) and if the SSL Heartblead issue is of concern, have them apply patch ESXi 550-201404401-SG located here: http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2076121

Hopefully, within the next week or so, QA/Interop will complete testing and have this latest P4 patch listed on the IMT. When it hits the IMT, I’ll be sure to continue to update you all here.


UPDATE [Jun 12 11:40 PDT]: VMware has released Patch 4 {link} to address the NFS APD issue.  It is recommended to update immediately from current 5.5U1 installs.


UPDATE [Apr 18 17:00 PDT]: VMware has released KB 2076392, noting that this is a known issue affecting ESXi 5.5 Update 1 hosts with connected NFS storage. VMware is working towards providing a resolution to customers. To work around this issue, VMware recommends using ESXi 5.5 GA.   It was also brought to my attention today that 5.5U1 had not made it on to the NetApp IMT yet, as the QA teams had not finished their thorough interop testing.  This is one of those lessons where one must pay attention to the IMT’s for all your gear and software before upgrading whimsically.  #FoodForThought


UPDATE [Apr 17 16:25 PDT]: For now, if this condition is being experienced, the recommendation is to downgrade ESXi to 5.5.

"REMAIN CALM!"

“REMAIN CALM!”

Recently, an issue was uncovered by several NetApp customers running NFS in vSphere 5.5U1 where their datastores would go offline randomly, multiple times throughout the day. If you have not yet upgraded to 5.5 U1, DON’T! There is an ongoing internal thread at NetApp about this issue, so if you’re a NetApp employee, make sure you’re following the server-virt distribution list. When I first heard the news, my first inclination was to post an alert on twitter.  Little did I know how widespread this had become.

My first troubleshooting thought was that this was another iteration of the vSphere 5.5 change of the NFS queue depths from 64 to 4 Billion.  I can confirm that it is NOT related to the issue found in KB 2016122. VMware has confirmed the issue in vSphere and is working closely with NetApp to determine root cause, and we should expect a public KB very soon. This post will be updated with findings as they’re released.  Stay tuned…

Nick Howell
Tech Evangelist at DatacenterDude, LLC
One of the top technology evangelists in the IT Industry, Nick Howell has spent 15 years driving multi-million dollar solutions across multiple Fortune 500 companies and implementing new systems from an engineering, architectural, and consultant capacity across many industry verticals including finance, defense, retail & more.

17 Comments