Alluxio supports a RESTful API that is compatible with the basic operations of the Amazon S3 API.
The Alluxio S3 API should be used by applications designed to communicate with an S3-like storage and would benefit from the other features provided by Alluxio, such as data caching, data sharing with file system based applications, and storage system abstraction (e.g., using Ceph instead of S3 as the backing store). For example, a simple application that downloads reports generated by analytic tasks can use the S3 API instead of the more complex file system API.
There are performance implications of using the S3 API. The S3 API leverages the Alluxio proxy, introducing an extra hop. For optimal performance, it is recommended to run the proxy server and an Alluxio worker on each compute node. It is also recommended to put all the proxy servers behind a load balancer.
Features support
The following table describes the support status for current Amazon S3 functional features:
S3 Feature
Status
List Buckets
Supported
List Objects
Supported
List ObjectsV2
Supported
Delete Buckets
Supported
Create Bucket
Supported
Bucket Lifecycle
Not Supported
Policy (Buckets, Objects)
Not Supported
Bucket ACLs (Get, Put)
Not Supported
Bucket Location
Not Supported
Bucket Notification
Not Supported
Bucket Object Versions
Not Supported
Get Bucket Info (HEAD)
Not Supported
Put Object
Supported
Delete Object
Supported
Delete Objects
Supported
Get Object
Supported
Get Object Info (HEAD)
Supported
Get Object (Range Query)
Supported
Object ACLs (Get, Put)
Not Supported
POST Object
Not Supported
Copy Object
Supported
Multipart Uploads
Supported
Language support
Alluxio S3 client supports various programming languages, such as C++, Java, Python, Golang, and Ruby. In this documentation, we use curl REST calls and python S3 client as usage examples.
Example Usage
REST API
For example, you can run the following RESTful API calls to an Alluxio cluster running on localhost. The Alluxio proxy is listening at port 39999 by default.
Authorization
By default, the user that is used to do any FileSystem operations is the user that was used to launch the proxy process. This can be changed by providing the Authorization Header. The header format is defined by the AWS S3 Rest API reference
Create a bucket
List all buckets owned by the user
Authenticating as a user is necessary to have buckets returned by this operation.
Get the bucket (listing objects)
Put an object
Assuming there is an existing file on local file system called LICENSE:
import math, os
from filechunkio import FileChunkIO
# Use a chunk size of 1MB (feel free to change this)
sourceSize = os.stat(largeObjectFile).st_size
chunkSize = 1048576
chunkCount = int(math.ceil(sourceSize / float(chunkSize)))
for i in range(chunkCount):
offset = chunkSize * i
bytes = min(chunkSize, sourceSize - offset)
with FileChunkIO(largeObjectFile, 'r', offset=offset, bytes=bytes) as fp:
mp.upload_part_from_file(fp, part_num=i + 1)