Alluxio
ProductsLanguageHome
AI-3.5 (stable)
AI-3.5 (stable)
  • Overview
  • Getting Started with K8s
    • Resource Prerequisites and Compatibility
    • Install on Kubernetes
    • Monitoring and Metrics
    • Cluster Administration
    • System Health Check & Quick Recovery
    • Collecting Cluster Information
  • Architecture
    • Alluxio Namespace and Under File System Namespaces
    • I/O Resiliency
    • Worker Management and Consistent Hashing
  • Storage Integrations
    • Amazon AWS S3
    • HDFS
    • Aliyun OSS
    • Tencent COS
    • Volcengine TOS
    • Google Cloud GCS
  • Client APIs
    • Alluxio Python Filesystem API based on FSSpec
    • FUSE based POSIX API
      • Client Writeback
    • S3 API
  • Caching Operations
    • Cache Preloading
    • Cache Filter Policy
    • Cache Eviction
      • TTL Policy
      • Priority Policy
      • Free CLI Command
  • Resource Management
    • Directory-Based Cluster Quota
    • UFS Bandwidth Limiting
  • Performance Optimizations
    • Read Throughput Via Replicas
    • Reading Large Files
    • Metadata Listing
    • Data Prefetch
    • Writing Temporary Files
  • Security
    • TLS Support
  • Performance Benchmarks
    • Fio (POSIX) Benchmark
    • COSBench (S3) Benchmark
    • MLPerf Storage Benchmark
  • Reference
    • User CLI
    • Metrics
    • S3 API Usage
    • Third Party Licenses
  • Release Notes
Powered by GitBook
On this page
  • Prerequisites
  • Limitations and Disclaimers
  • Alluxio Filesystem Limitations
  • No Bucket Virtual Hosting
  • S3 Writes Implicitly Overwrite
  • Folders in ListObjects(V2)
  • Tagging & Metadata Limits
  • HTTP Persistent Connection
  • Performance
  • Global request headers
  • Supported S3 API Actions
  • Example Usage
  1. Client APIs

S3 API

Last updated 3 months ago

Alluxio supports a RESTful API that is compatible with the basic operations of the Amazon .

The Alluxio S3 API should be used by applications designed to communicate with an S3-like storage and would benefit from the other features provided by Alluxio, such as data caching, data sharing with file system based applications, and storage system abstraction (e.g., using Ceph instead of S3 as the backing store). For example, a simple application that downloads reports generated by analytic tasks can use the S3 API instead of the more complex file system API.

Prerequisites

To use the S3 API provided in the worker process, you need to modify conf/alluxio-site.properties to include:

alluxio.worker.s3.api.enabled=true

It is recommended to set up a load balancer to distribute the API calls among all the worker nodes. You can consider using different load balancing solutions such as DNS, Nginx, or LVS.

Limitations and Disclaimers

Alluxio Filesystem Limitations

Only top-level Alluxio directories are treated as buckets by the S3 API. Hence the root directory of the Alluxio filesystem is not treated as an S3 bucket. Any root-level objects (eg: alluxio://file) will be inaccessible through the Alluxio S3 API.

Alluxio uses / as a reserved separator. Therefore, any S3 paths with objects or folders named /; s3://example-bucket// will cause undefined behavior.

Also note that the Alluxio filesystem does not handle the following special characters and patterns:

  • Question mark ('?')

  • Patterns with period ('./' and '../')

  • Backslash ('\')

No Bucket Virtual Hosting

S3 Writes Implicitly Overwrite

Amazon S3 is a distributed system. If it receives multiple write requests for the same object simultaneously, it overwrites all but the last object written. Amazon S3 does not provide object locking; if you need this, make sure to build it into your application layer or use versioning instead.

  • Note that at the moment the Alluxio S3 API does not support object versioning

Alluxio S3 will overwrite the existing key and the temporary directory for multipart upload.

Folders in ListObjects(V2)

All sub-directories in Alluxio are returned as 0-byte folders when using ListObjects(V2). This behavior is in accordance with if you used the AWS S3 console to create all parent folders for each object.

Tagging & Metadata Limits

To support the Tagging function in S3 API, you need to modify conf/alluxio-site.properties to include:

alluxio.underfs.xattr.change.enabled=true
  • Set the property key alluxio.proxy.s3.tagging.restrictions.enabled=false to disable this behavior.

  • Set the property key alluxio.proxy.s3.header.metadata.max.size to change this behavior.

HTTP Persistent Connection

HTTP persistent connection (also called HTTP keep-alive), is the idea of using a single TCP connection to send and receive multiple HTTP requests/responses, as opposed to opening a new connection for every single request/response pair.

The main advantages of persistent connections include:

  • Reduced Latency: Minimizes delay caused by frequent requests.

  • Resource Savings: Reduces server and client resource consumption through fewer connections and less repeated requests.

  • Real-time Capability: Enables quick transmission of the latest data.

However, long connections also have some drawbacks, such as:

  • Increased Server Pressure: Many open connections can increase the memory and CPU burden on the server.

  • Timeout Issues: Requires handling cases where connections are unresponsive for a long time to ensure the effectiveness of timeout mechanisms.

In summary, HTTP long connections is an effective technology suitable for scenarios with high real-time requirements but also aiming to save resources.

To enable HTTP long connection keep-alive for S3 API, you need to modify the conf/alluxio-site.properties file to include the following content:

alluxio.worker.s3.connection.keep.alive.enabled=true

# When the connection idle time exceeds this configuration, the connection will be closed. 0 means to turn off this function.
alluxio.worker.s3.connection.idle.max.time=0sec

Performance

Since the S3 API implementation adopts a redirection mechanism and data zero-copy, it requires the client that supports HTTP redirection to access it.

Global request headers

Header
Content
Description

AWS4-HMAC-SHA256 Credential={user}/..., SignedHeaders=..., Signature=...

Supported S3 API Actions

S3 API Action
Supported Headers
Supported Query Parameters

N/A

N/A

N/A

N/A

  • Content-Type,

  • x-amz-copy-source,

  • x-amz-metadata-directive,

  • x-amz-tagging-directive,

  • x-amz-tagging

N/A

N/A

N/A

N/A

N/A

N/A

N/A

N/A

N/A

N/A

N/A

N/A

N/A

Range

N/A

N/A

N/A

N/A

N/A

N/A

N/A

N/A

N/A

N/A

N/A

N/A

delimiter, encoding-type, marker, max-keys, prefix

N/A

continuation-token, delimiter, encoding-type, max-keys, prefix, start-after

N/A

N/A

N/A

N/A

  • Content-Length,

  • Content-MD5,

  • Content-Type,

  • x-amz-tagging

N/A

N/A

N/A

  • Content-Length,

  • Content-MD5,

N/A

Example Usage

is not supported. S3 clients must utilize (i.e: http://s3.amazonaws.com/{bucket}/{object}) rather than http://{bucket}.s3.amazonaws.com/{object}.

As described in the AWS S3 docs for :

User-defined tags on buckets & objects are limited to 10 and obey the .

The maximum size for user-defined metadata in PUT-requests is 2KB by default in accordance with .

There is currently no support for access & secret keys in the Alluxio S3 API. The only supported authentication scheme is the SIMPLE authentication type. By default, the user that is used to perform any operations is the user that was used to launch the Alluxio process. Therefore this header is used exclusively to specify an Alluxio ACL username to perform an operation with. In order to remain compatible with other S3 clients, the header is still expected to follow the format. When supplying an access key to an S3 client, put the intended Alluxio ACL username. The secret key is unused, so you may use any dummy value.

The following table describes the support status for current :

please refer to the .

S3 API
Virtual hosting of buckets
path-style requests
PutObject
S3 tag restrictions
S3 object metadata restrictions
S3 API Actions
S3 API Usage
Authorization
AWS Signature Version 4
AbortMultipartUpload
CompleteMultipartUpload
CopyObject
CreateMultipartUpload
DeleteBucketTagging
DeleteObject
DeleteObjects
DeleteObjectTagging
GetBucketTagging
GetObject
GetObjectTagging
HeadBucket
HeadObject
ListBuckets
ListMultipartUploads
ListObjects
ListObjectsV2
ListParts
PutBucketTagging
PutObject
PutObjectTagging
UploadPart