S3 API
Alluxio supports a RESTful API that is compatible with the basic operations of the Amazon S3 API.
The Alluxio S3 API should be used by applications designed to communicate with an S3-like storage and would benefit from the other features provided by Alluxio, such as data caching, data sharing with file system based applications, and storage system abstraction (e.g., using Ceph instead of S3 as the backing store). For example, a simple application that downloads reports generated by analytic tasks can use the S3 API instead of the more complex file system API.
Prerequisites
To use the S3 API provided in the worker process, you need to modify conf/alluxio-site.properties
to include:
It is recommended to set up a load balancer to distribute the API calls among all the worker nodes. You can consider using different load balancing solutions such as DNS, Nginx, or LVS.
Limitations and Disclaimers
Alluxio Filesystem Limitations
Only top-level Alluxio directories are treated as buckets by the S3 API. Hence the root directory of the Alluxio filesystem is not treated as an S3 bucket. Any root-level objects (eg: alluxio://file
) will be inaccessible through the Alluxio S3 API.
Alluxio uses /
as a reserved separator. Therefore, any S3 paths with objects or folders named /
; s3://example-bucket//
will cause undefined behavior.
Also note that the Alluxio filesystem does not handle the following special characters and patterns:
Question mark (
'?'
)Patterns with period (
./
and../
)Backslash (
'\'
)
No Bucket Virtual Hosting
Virtual hosting of buckets is not supported. S3 clients must utilize path-style requests (i.e: http://s3.amazonaws.com/{bucket}/{object}
) rather than http://{bucket}.s3.amazonaws.com/{object}
.
S3 Writes Implicitly Overwrite
As described in the AWS S3 docs for PutObject:
Amazon S3 is a distributed system. If it receives multiple write requests for the same object simultaneously, it overwrites all but the last object written. Amazon S3 does not provide object locking; if you need this, make sure to build it into your application layer or use versioning instead.
Note that at the moment the Alluxio S3 API does not support object versioning
Alluxio S3 will overwrite the existing key and the temporary directory for multipart upload.
Folders in ListObjects(V2)
All sub-directories in Alluxio are returned as 0-byte folders when using ListObjects(V2). This behavior is in accordance with if you used the AWS S3 console to create all parent folders for each object.
Tagging & Metadata Limits
To support the Tagging function in S3 API, you need to modify conf/alluxio-site.properties
to include:
User-defined tags on buckets & objects are limited to 10 and obey the S3 tag restrictions.
Set the property key
alluxio.proxy.s3.tagging.restrictions.enabled=false
to disable this behavior.
The maximum size for user-defined metadata in PUT-requests is 2KB by default in accordance with S3 object metadata restrictions.
Set the property key
alluxio.proxy.s3.header.metadata.max.size
to change this behavior.
Performance
Since the S3 API implementation adopts a redirection mechanism and data zero-copy, it requires the client that supports HTTP redirection to access it.
Global request headers
Supported S3 API Actions
The following table describes the support status for current S3 API Actions:
Example Usage
please refer to the S3 API Usage.
Last updated