This week at Amazon AWS Summit, they announced the new Amazon Elastic File System (EFS).
From their product page:
Amazon Elastic File System (Amazon EFS) is a file storage service for Amazon Elastic Compute Cloud (Amazon EC2) instances. Amazon EFS is easy to use and provides a simple interface that allows you to create and configure file systems quickly and easily. With Amazon EFS, storage capacity is elastic, growing and shrinking automatically as you add and remove files, so your applications have the storage they need, when they need it.
Amazon EFS supports the Network File System version 4 (NFSv4) protocol, so the applications and tools that you use today work seamlessly with Amazon EFS. Multiple Amazon EC2 instances can access an Amazon EFS file system at the same time, providing a common data source for workloads and applications running on more than one instance.
This is HUGE news – and not just for Amazon. As the Tech Marketing Engineer for NFS at NetApp, you can imagine how excited I am about this.
This is huge news for every storage provider that offers NFS as a file service option. Why? Amazon has just validated using NAS in the cloud.
That said, the details on their offering are slim, unfortunately. For one, why only NFSv4 … ?
I’m supposing that NFS is the first choice for Amazon EFS because it isn’t as reliant on external services (such as Active Directory and DNS) as CIFS/SMB is. However, NFSv4 standards in RFC-3530 do call for required* ID domain mapping and even Kerberos!
On ID domain mapping:
To provide a greater degree of compatibility with previous versions of NFS (i.e., v2 and v3), which identified users and groups by 32-bit unsigned uid's and gid's, owner and group strings that consist of decimal numeric values with no leading zeros can be given a special interpretation by clients and servers which choose to provide such support. The receiver may treat such a user or group string as representing the same user as would be represented by a v2/v3 uid or gid having the corresponding numeric value. A server is not obligated to accept such a string, but may return an NFS4ERR_BADOWNER instead. To avoid this mechanism being used to subvert user and group translation, so that a client might pass all of the owners and groups in numeric form, a server SHOULD return an NFS4ERR_BADOWNER error when there is a valid translation for the user or owner designated in this way. In that case, the client must use the appropriate name@domain string and not the special form for compatibility. The owner string "nobody" may be used to designate an anonymous user, which will be associated with a file created by a security principal that cannot be mapped through normal means to the owner attribute.
To meet end-to-end security requirements, the RPCSEC_GSS framework [RFC2203] will be used to extend the basic RPC security. With the use of RPCSEC_GSS, various mechanisms can be provided to offer authentication, integrity, and privacy to the NFS version 4 protocol. Kerberos V5 will be used as described in [RFC1964] to provide one security framework.
* I say required, but many NFS servers allow for bypassing of name ID mappings (including NetApp Data ONTAP) and don’t require the use of Kerberos with NFSv4.
I suspect that NFSv4 is being used to help sell people on the security aspect. One of the biggest concerns of putting data into the cloud is being able to confidently secure it. If you’re using NFSv3, then you’re probably more concerned with performance over security in most cases.
NFSv4, in general, has seen its fair share of criticisms regarding performance vs. NFSv3. There are a number of reasons for that.
- NFSv3 has 15+ years on NFSv4 as a mature protocol.
- NFSv4 adoption hasn’t been as high as NFSv3. Fewer users means less use cases to find/fix problems.
- NFSv3 doesn’t do integrated locking like NFSv4 does.
- NFSv3 splits out locking, mount, portmapping, etc all into different processes/ports.
- NFSv4 has added security over NFSv3.
- NFSv4 uses compound calls. As a result, fewer packets, but a larger payload.
- NFSv4 is intended to allow for a unified namespace, so there are natural traversals of paths that may cross physical hardware boundaries.
Ultimately, the issues with NFSv4 come down to the same thing that makes NFSv4 a superior protocol to NFSv3 – it just does more. Doing more means more processing, more CPU, etc. It also means you can do more with your storage system. For example, many database application providers are looking at NFSv4 as a serious replacement to NFSv3, simply for the vastly improved locking mechanisms.
No more stale locks! No more need to clean up locks when a database restarts! Less chance for database corruption!
Ultimately, the answer to “why NFSv4” is that “it’s better” and “it’s the future.” But it will be interesting to see how Amazon EFS solves the problems with NFSv4 performance, especially when dealing with network latencies over WAN connections in the cloud.
Why not NFSv4.1?
This choice was more of a head scratcher for me. NFSv4 is now 15 years old. NFSv4.1 provides a better feature-set, as well as pNFS support, which could be useful in multi-node clustered file systems (which I assume AWS is going to be using). Additionally, VMware recently announced support for NFSv4.1 in vSphere 6. Given the large swath of virtualized environments using ESXi, wouldn’t it make sense to support NFSv4.1 as well? I mean, you’re not going to serve up Hyper-V environments on EFS. Or maybe Amazon doesn’t project VMs being a large use case for Amazon EFS…
Regardless of VMware, why not add support for NFSv4.1 at launch? Why stick with boring, flawed NFSv4?
Speaking of which …
What about the Amazon EFS NFSv4 implementation?
I’m mostly left asking a lot of questions …
Will all components of NFSv4 be supported?
How will they handle the domain ID mapping issue?
I assume via AWS Identity and Access Management (IAM).
Will name services like LDAP or NIS be supported as well?
How will they approach Kerberos?
What enctypes will they support?
How will they approach the implementation?
Will they leverage AWS Directory Service?
(I can personally vouch for the fact that using NFS Kerberos is not exactly the easiest or most straightforward thing in the world.)
How will they make the transition to cloud simple for users wanting to leverage NFS?
What about age old problems, like the 16 auxiliary GID limitation with AUTH_SYS?
And what about new-ish stuff like RDMA?
Will they be adding CIFS/SMB support in the future?
I know I will be watching closely to see how all of this transpires, because it foretells the future of NAS protocols in the cloud, whether public, private, or otherwise.
Until then, check out these awesome resources regarding NFS on NetApp, and we’ll be sure to keep you abreast of any news on Amazon AWS/EFS as we find them!