MMS • Renato Losio
Article originally posted on InfoQ. Visit InfoQ
During the latest Pi Day, AWS announced Mountpoint for Amazon S3, an open-source file client to deliver high throughput access on Amazon S3. Currently in alpha, the local mount point provides high single-instance transfer rates and is primarily intended for data lake applications.
Mountpoint for Amazon S3 translates local file system API calls to S3 object API calls like GET and LIST. The client supports random and sequential read operations on files and the listing of files and directories. The alpha release does not support writes (PUTs) and the client is expected to only support sequential writes to new objects in the future.
James Bornholt, scholar at AWS and assistant professor at the University of Texas, Devabrat Kumar, senior product manager at AWS, and Andy Warfield, distinguished engineer at AWS, acknowledge that the client is not a general-purpose networked file system, and comes with some restrictions on file operations and write:
Mountpoint is designed for large-scale analytics applications that read and generate large amounts of S3 data in parallel but don’t require the ability to write to the middle of existing objects. Mountpoint allows you to map S3 buckets or prefixes into your instance’s file system namespace, traverse the contents of your buckets as if they were local files, and achieve high throughput access to objects.
The open-source client does not emulate operations like directory renames that would require many S3 API calls or POSIX file system features that are not supported in S3 APIs.
Mountpoint for S3 is not the first client presenting S3 as a filesystem, with Goofys and s3fs popular open-source options to mount a bucket via FUSE. While some developers question on Reddit the need for a new client and worry that it will be used outside the data lake space, Bornholt, Kumar and Warfield write:
Mountpoint is not the first file client for accessing S3—there are several open-source file clients that our customers have experience with. A common theme we’ve heard from these customers, however, is that they want these clients to offer the same stability, performance, and technical support that they get from S3’s REST APIs and the AWS SDKs.
Built in Rust on the Common Runtime (CRT) used by most AWS SDKs, the new client relies on automated reasoning to validate the file system semantics. Corey Quinn, chief cloud economist at The Duckbill Group, tweets:
Oh no, what has AWS done? I didn’t spend fifteen years yelling at people not to use S3 as a file system just to be undone by the S3 team itself!
Ben Kehoe, cloud expert and AWS Serverless Hero, warns:
Thinking about S3 using file concepts is going to mislead you about the semantics of the API, and you will end up making the wrong assumptions and being sad when your system is always slightly broken because those assumptions don’t hold.
Released under the Apache License 2.0, Mountpoint is not yet ready for production workloads. The initial alpha release and the public roadmap are available on GitHub.