S3 API

Alluxio supports a RESTful API that is compatible with the basic operations of the Amazon S3 APIarrow-up-right.

The Alluxio S3 API should be used by applications designed to communicate with an S3-like storage and would benefit from the other features provided by Alluxio, such as data caching, data sharing with file system based applications, and storage system abstraction (e.g., using Ceph instead of S3 as the backing store). For example, a simple application that downloads reports generated by analytic tasks can use the S3 API instead of the more complex file system API.

There are performance implications of using the S3 API. The S3 API leverages the Alluxio proxy, introducing an extra hop. For optimal performance, it is recommended to run the proxy server and an Alluxio worker on each compute node. It is also recommended to put all the proxy servers behind a load balancer.

Features support

The following table describes the support status for current Amazon S3 functional features:

S3 Feature
Status

List Buckets

Supported

List Objects

Supported

List ObjectsV2

Supported

Delete Buckets

Supported

Create Bucket

Supported

Bucket Lifecycle

Not Supported

Policy (Buckets, Objects)

Not Supported

Bucket ACLs (Get, Put)

Not Supported

Bucket Location

Not Supported

Bucket Notification

Not Supported

Bucket Object Versions

Not Supported

Get Bucket Info (HEAD)

Not Supported

Put Object

Supported

Delete Object

Supported

Delete Objects

Supported

Get Object

Supported

Get Object Info (HEAD)

Supported

Get Object (Range Query)

Supported

Object ACLs (Get, Put)

Not Supported

POST Object

Not Supported

Copy Object

Supported

Multipart Uploads

Supported

Language support

Alluxio S3 client supports various programming languages, such as C++, Java, Python, Golang, and Ruby. In this documentation, we use curl REST calls and python S3 client as usage examples.

Example Usage

REST API

For example, you can run the following RESTful API calls to an Alluxio cluster running on localhost. The Alluxio proxy is listening at port 39999 by default.

Authorization

By default, the user that is used to do any FileSystem operations is the user that was used to launch the proxy process. This can be changed by providing the Authorization Header. The header format is defined by the AWS S3 Rest API referencearrow-up-right

Create a bucket

List all buckets owned by the user

Authenticating as a user is necessary to have buckets returned by this operation.

Get the bucket (listing objects)

Put an object

Assuming there is an existing file on local file system called LICENSE:

Get the object:

Listing a bucket with one object

Listing a bucket with multiple objects

You can upload more files and use the max-keys and marker as the GET bucket request parameterarrow-up-right. For example:

You can also verify those objects are represented as Alluxio files, under /testbucket directory.

Copy an Object

Delete Single Objects

Delete Multiple Objects

Initiate a multipart upload

Since we deleted the testobject in the previous command, you have to create another testobject before initiating a multipart upload.

Note that the commands below related to multipart upload need the upload ID shown above, it's not necessarily 3.

Upload part

List parts

Complete a multipart upload

Abort a multipart upload

A non-completed upload can be aborted:

Delete an empty bucket

Python S3 Client

Tested for Python 2.7.

Create a connection:

Please note you have to install boto package first.

Authenticating as a user:

By default, authenticating with no access_key_id uses the user that was used to launch the proxy as the user performing the file system actions.

Set the aws_access_key_id to a different username to perform the actions under a different user.

Create a bucket

List all buckets owned by the user

Authenticating as a user is necessary to have buckets returned by this operation.

PUT a small object

Get the small object

Upload a large object

Create a 8MB file on local file system.

Then use python S3 client to upload this as an object

Get the large object

Delete the objects

Initiate a multipart upload

Upload parts

Complete the multipart upload

Abort the multipart upload

Non-completed uploads can be aborted.

Delete the bucket

Last updated