Amazon Web Services Elastic File System has been to my knowledge the service to have the longest beta testing period: reason for this may be that not as many client as expected tested it and AWS received too few feedback on it or that there were issues not to release GA. I don’t want to speculate on which one is correct but now that it has been officially released I decided to give it a try and of course compare it to a self-managed solution on the same platform.
If you followed AWS evolution you may agree that EFS has been introduced to fill the gap between EBS storage and S3: before EFS was live there was no “easy” way of having a distributed file system in AWS, you could only set up your own using a combination of EC2 instances mounting Elastic Block Storage volumes and S3. Now with EFS you can have a AWS-managed distributed file system to be used in your cloud environment or even across the internet (will try that on a public subnet) with all the benefits of offloading the high-availability and replication burden to Amazon, and at a reasonable price. Will performance be enough compared to a self-managed solution?
I use terraform to create an infrastructure template to run the tests, you can see it here. Once
has finished, you’ll end up with:
- An EFS with General Purpose performance mode
- An EFS mount target for 1 Availability Zone
- 1 EC2 instance named “client” to mount remote file systems
- 2 EC2 instances named “server_X” each one with a 10 GB General Purpose EBS, they will serve a self-managed distributed, replicated file system
This is the terraform output and the steps to run on the 2 server nodes to have a running GlusterFS replicated volume; to configure the NFS export on server1, I used this guide.
On the client I mount the EFS target with NFS4.1, the GlusterFS volume from the server in the same subnet via the GlusterFS native client_ _and the NFS export on the client’s designated mount points. I use the server on the same subnet as the client is, because the EFS target exposes a mount point in the same subnet and latency is a key factor in remote file system.
I used fio installed on the client box with a command suggested by this BinaryLane post and run it against the mount point for EFS, GlusterFS and NFS v4.0 with the following command
changing the target directory each time I tun the test; each test is run isolated.
I did not customize any storage option for GlusterFS or NFS, so I’m using the default options.
NFS 4.0 (not replicated)
EFS pricing is linear, you get billed a fixed amount for GB/month; this is not true with a self-managed cluster where you can surely reach higher performance, but the TCO is increasing every time you add capacity. If you’re not satisfied with EFS throughput you need a dedicated team to manage a distributed file system cluster and its operations and maintenance.
If you need a stable, realiable file system in AWS to be shared between EC2 instances, go and use EFS and don’t reinvent the wheel: GlusterFS is outperformed in both read and write IOPS and bandwith with the default options! The workload simulated here is showing poor write performance, so if your use case is a lot of concurrent writes on many files, consider another solution. A good workload could be a WORM (Write Once Read Many) share for permanent stored contents (images, archives?).
The performance are still low in absolute terms or compared to a non-replicated mount such as a stock NFS v4.0 server without the high-availability burden, but if you don’t care about 100% uptime you should definitely set up NFS with DRDB to a secondary node, and switch your mounts when the primary node fails. But still: why manage all of this if AWS can do it for you?