Alluxio
ProductsLanguageHome
  • Alluxio概览
  • 用户指南
    • 快速上手指南
    • 架构
    • FAQ
    • 作业服务器
    • 应用场景
  • 核心功能
    • 缓存
    • 统一命名空间
  • 部署 Alluxio
    • 在Kubernetes上部署Alluxio
    • 本地运行Alluxio
    • 在集群上独立运行Alluxio
    • 在Docker上运行Alluxio
    • 在具有HA的群集上部署Alluxio
    • 使用Docker部署AlluxioFuse加速深度学习训练(试验)
    • 基本要求
  • 云源生
    • Tencent EMR
  • 计算应用
    • Apache Spark
    • Presto
    • Spark on Kubernetes
    • Apache Flink
    • Apache Hadoop MapReduce
    • Presto on Iceberg (Experimental)
    • Trino
    • Apache Hive
    • 深度学习框架
    • Tensorflow
  • 底层存储系统
    • Alluxio集成Amazon AWS S3作为底层存储
    • Alluxio集成GCS作为底层存储
    • Alluxio集成Azure Blob Store作为底层存储
    • Azure Data Lake Storage Gen2
    • Azure 数据湖存储
    • Alluxio集成HDFS作为底层存储
    • Alluxio集成COS作为底层存储
    • Alluxio集成COSN作为底层存储
    • Alluxio集成Ceph Object Storage作为底层存储
    • Alluxio集成NFS作为底层存储
    • Alluxio集成Kodo作为底层存储
    • Alluxio集成Swift作为底层存储
    • Alluxio集成WEB作为底层存储
    • Alluxio集成Minio作为底层存储
    • 阿里云对象存储服务
    • Alluxio集成Ozone作为底层存储
    • Alluxio集成CephFS作为底层存储
  • 安全设置
    • 安全性
  • 运维指南
    • 配置项设置
    • 命令行接口
    • 管理员命令行接口
    • Web界面
    • 日志
    • 度量指标系统
    • 远程记录日志
  • 管理
    • 升级
    • 异常诊断与调试
  • APIs
    • Filesystem API
    • S3 Client
    • POSIX API
    • REST API
    • Python Client
    • 兼容Hadoop的Java
    • Go 客户端
  • 开发者资源
    • 编译Alluxio源代码
    • 开发指南
    • 代码规范
    • 如何开发单元测试
    • 文档规范
  • 参考
    • 配置项列表
    • List of Metrics
  • REST API
    • Master REST API
    • Worker REST API
    • Proxy REST API
    • Job REST API
  • Javadoc
Powered by GitBook
On this page
  • S3 Client
  • 特性支持
  • 语言支持
  • 使用示例
  • REST API
  • Python S3 Client
  1. APIs

S3 Client

Last updated 6 months ago

S3 Client

Alluxio支持RESTful API,兼容 的基本操作。

会在Alluxio构建时生成并且可以通过${ALLUXIO_HOME}/core/server/proxy/target/miredot/index.html获得。

使用HTTP代理会带来一些性能的影响,尤其是在使用代理的时候会增加一个额外的跳计数。为了达到最优的性能,推荐代理服务和一个Alluxio worker运行在一个计算节点上。或者,推荐将所有的代理服务器放到load balancer之后。

特性支持

下表描述了对当前Amazon S3基础特性的支持情况:

Action
Headers
Query Params

AbortMultipartUpload

None

CompleteMultipartUpload

None

CopyObject

Content-Type | x-amz-copy-source | x-amz-metadata-directive | x-amz-tagging-directive | x-amz-tagging

CreateBucket

None

CreateMultipartUpload

Content-Type | x-amz-tagging

DeleteBucket

None

DeleteBucketTagging

None

DeleteObject

None

DeleteObjects

None

DeleteObjectTagging

None

GetBucketTagging

None

GetObject

Range

None

GetObjectTagging

None

None

HeadBucket

None

None

HeadObject

None

None

ListBuckets

ListMultipartUploads

None

None

ListObjects

None

delimiter | encoding-type | marker | max-keys | prefix

ListObjectsV2

None

continuation-token | delimiter | encoding-type | max-keys | prefix | start-after

ListParts

None

None

PutBucketTagging

None

PutObject

Content-Length | Content-MD5 | Content-Type | x-amz-tagging

PutObjectTagging

None

None

UploadPart

Content-Length | Content-MD5

UploadPartCopy

x-amz-copy-source

语言支持

Alluxio S3 客户端支持各种编程语言,比如C++、Java、Python、Golang、Ruby等。在这个文档中,我们使用curl REST调用和python S3 client作为使用示例。

使用示例

REST API

举个例子,你可以使用如下的RESTful API调用方式在本地运行一个Alluxio集群。Alluxio代理会默认在39999端口监听。

创建bucket

$ curl -i -X PUT http://localhost:39999/api/v1/s3/testbucket
HTTP/1.1 200 OK
Date: Tue, 29 Aug 2017 22:34:41 GMT
Content-Length: 0
Server: Jetty(9.2.z-SNAPSHOT)

获取bucket(objects列表)

$ curl -i -X GET http://localhost:39999/api/v1/s3/testbucket
HTTP/1.1 200 OK
Date: Tue, 29 Aug 2017 22:35:00 GMT
Content-Type: application/xml
Content-Length: 200
Server: Jetty(9.2.z-SNAPSHOT)

<ListBucketResult xmlns=""><Name>/testbucket</Name><Prefix/><ContinuationToken/><NextContinuationToken/><KeyCount>0</KeyCount><MaxKeys>1000</MaxKeys><IsTruncated>false</IsTruncated></ListBucketResult>

加入object

假定本地现存一个文件LICENSE。

$ curl -i -X PUT -T "LICENSE" http://localhost:39999/api/v1/s3/testbucket/testobject
HTTP/1.1 100 Continue

HTTP/1.1 200 OK
Date: Tue, 29 Aug 2017 22:36:03 GMT
ETag: "9347237b67b0be183499e5893128704e"
Content-Length: 0
Server: Jetty(9.2.z-SNAPSHOT)

获取object

$ curl -i -X GET http://localhost:39999/api/v1/s3/testbucket/testobject
HTTP/1.1 200 OK
Date: Tue, 29 Aug 2017 22:37:34 GMT
Last-Modified: Tue, 29 Aug 2017 22:36:03 GMT
Content-Type: application/xml
Content-Length: 26847
Server: Jetty(9.2.z-SNAPSHOT)

.................. Content of the test file ...................

列出含有单个object的bucket

$ curl -i -X GET http://localhost:39999/api/v1/s3/testbucket
HTTP/1.1 200 OK
Date: Tue, 29 Aug 2017 22:38:48 GMT
Content-Type: application/xml
Content-Length: 363
Server: Jetty(9.2.z-SNAPSHOT)

<ListBucketResult xmlns=""><Name>/testbucket</Name><Prefix/><ContinuationToken/><NextContinuationToken/><KeyCount>1</KeyCount><MaxKeys>1000</MaxKeys><IsTruncated>false</IsTruncated><Contents><Key>testobject</Key><LastModified>2017-08-29T15:36:03.613Z</LastModified><ETag></ETag><Size>26847</Size><StorageClass>STANDARD</StorageClass></Contents></ListBucketResult>

列出含有多个objects的bucket

你可以上传更多的文件并且使用max-keys和continuation-token作为GET bucket request参数,比如:

$ curl -i -X PUT -T "LICENSE" http://localhost:39999/api/v1/s3/testbucket/key1
$ curl -i -X PUT -T "LICENSE" http://localhost:39999/api/v1/s3/testbucket/key2
$ curl -i -X PUT -T "LICENSE" http://localhost:39999/api/v1/s3/testbucket/key3
$ curl -i -X GET http://localhost:39999/api/v1/s3/testbucket\?max-keys\=2
HTTP/1.1 200 OK
Date: Tue, 29 Aug 2017 22:40:45 GMT
Content-Type: application/xml
Content-Length: 537
Server: Jetty(9.2.z-SNAPSHOT)

<ListBucketResult xmlns=""><Name>/testbucket</Name><Prefix/><ContinuationToken/><NextContinuationToken>key3</NextContinuationToken><KeyCount>2</KeyCount><MaxKeys>2</MaxKeys><IsTruncated>true</IsTruncated><Contents><Key>key1</Key><LastModified>2017-08-29T15:40:42.213Z</LastModified><ETag></ETag><Size>26847</Size><StorageClass>STANDARD</StorageClass></Contents><Contents><Key>key2</Key><LastModified>2017-08-29T15:40:43.269Z</LastModified><ETag></ETag><Size>26847</Size><StorageClass>STANDARD</StorageClass></Contents></ListBucketResult>

# curl -i -X GET http://localhost:39999/api/v1/s3/testbucket\?max-keys\=2\&continuation-token\=key3
HTTP/1.1 200 OK
Date: Tue, 29 Aug 2017 22:41:18 GMT
Content-Type: application/xml
Content-Length: 540
Server: Jetty(9.2.z-SNAPSHOT)

<ListBucketResult xmlns=""><Name>/testbucket</Name><Prefix/><ContinuationToken>key3</ContinuationToken><NextContinuationToken/><KeyCount>2</KeyCount><MaxKeys>2</MaxKeys><IsTruncated>false</IsTruncated><Contents><Key>key3</Key><LastModified>2017-08-29T15:40:44.002Z</LastModified><ETag></ETag><Size>26847</Size><StorageClass>STANDARD</StorageClass></Contents><Contents><Key>testobject</Key><LastModified>2017-08-29T15:36:03.613Z</LastModified><ETag></ETag><Size>26847</Size><StorageClass>STANDARD</StorageClass></Contents></ListBucketResult>

你还可以验证这些对象是否为Alluxio文件,在/testbucket目录下。

$ ./bin/alluxio fs ls -R /testbucket

删除objects

$ curl -i -X DELETE http://localhost:39999/api/v1/s3/testbucket/key1
HTTP/1.1 204 No Content
Date: Tue, 29 Aug 2017 22:43:22 GMT
Server: Jetty(9.2.z-SNAPSHOT)
$ curl -i -X DELETE http://localhost:39999/api/v1/s3/testbucket/key2
$ curl -i -X DELETE http://localhost:39999/api/v1/s3/testbucket/key3
$ curl -i -X DELETE http://localhost:39999/api/v1/s3/testbucket/testobject

初始化multipart upload

$ curl -i -X POST http://localhost:39999/api/v1/s3/testbucket/testobject?uploads
HTTP/1.1 200 OK
Date: Tue, 29 Aug 2017 22:43:22 GMT
Content-Length: 197
Server: Jetty(9.2.z-SNAPSHOT)

<?xml version="1.0" encoding="UTF-8"?>
<InitiateMultipartUploadResult xmlns="">
  <Bucket>testbucket</Bucket>
  <Key>testobject</Key>
  <UploadId>2</UploadId>
</InitiateMultipartUploadResult>

上传分块

$ curl -i -X PUT http://localhost:39999/api/v1/s3/testbucket/testobject?partNumber=1&uploadId=2
HTTP/1.1 200 OK
Date: Tue, 29 Aug 2017 22:43:22 GMT
ETag: "b54357faf0632cce46e942fa68356b38"
Server: Jetty(9.2.z-SNAPSHOT)

罗列已上传的分块

$ curl -i -X GET http://localhost:39999/api/v1/s3/testbucket/testobject?uploadId=2
HTTP/1.1 200 OK
Date: Tue, 29 Aug 2017 22:43:22 GMT
Content-Length: 985
Server: Jetty(9.2.z-SNAPSHOT)

<?xml version="1.0" encoding="UTF-8"?>
<ListPartsResult xmlns="">
  <Bucket>testbucket</Bucket>
  <Key>testobject</Key>
  <UploadId>2</UploadId>
  <StorageClass>STANDARD</StorageClass>
  <IsTruncated>false</IsTruncated>
  <Part>
    <PartNumber>1</PartNumber>
    <LastModified>2017-08-29T20:48:34.000Z</LastModified>
    <ETag>"b54357faf0632cce46e942fa68356b38"</ETag>
    <Size>10485760</Size>
  </Part>
</ListPartsResult>

完成multipart upload

$ curl -i -X POST http://localhost:39999/api/v1/s3/testbucket/testobject?uploadId=2 -d '
<CompleteMultipartUpload>
  <Part>
    <PartNumber>1</PartNumber>
    <ETag>"b54357faf0632cce46e942fa68356b38"</ETag>
  </Part>
</CompleteMultipartUpload>'

HTTP/1.1 200 OK
Date: Tue, 29 Aug 2017 22:43:22 GMT
Server: Jetty(9.2.z-SNAPSHOT)

<?xml version="1.0" encoding="UTF-8"?>
<CompleteMultipartUploadResult xmlns="">
  <Location>/testbucket/testobjectLocation>
  <Bucket>testbucket</Bucket>
  <Key>testobject</Key>
  <ETag>"b54357faf0632cce46e942fa68356b38"</ETag>
</CompleteMultipartUploadResult>

中止multipart upload

$ curl -i -X DELETE http://localhost:39999/api/v1/s3/testbucket/testobject?uploadId=2
HTTP/1.1 204 OK
Date: Tue, 29 Aug 2017 22:43:22 GMT
Content-Length: 0
Server: Jetty(9.2.z-SNAPSHOT)

删除空bucket

$ curl -i -X DELETE http://localhost:39999/api/v1/s3/testbucket
HTTP/1.1 204 No Content
Date: Tue, 29 Aug 2017 22:45:19 GMT

Python S3 Client

创建连接

import boto
import boto.s3.connection

conn = boto.connect_s3(
    aws_access_key_id = '',
    aws_secret_access_key = '',
    host = 'localhost',
    port = 39999,
    path = '/api/v1/s3',
    is_secure=False,
    calling_format = boto.s3.connection.OrdinaryCallingFormat(),
)

创建bucket

bucketName = 'bucket-for-testing'
bucket = conn.create_bucket(bucketName)

加入small object

smallObjectKey = 'small.txt'
smallObjectContent = 'Hello World!'

key = bucket.new_key(smallObjectKey)
key.set_contents_from_string(smallObjectContent)

获取small object

assert smallObjectContent == key.get_contents_as_string()

上传large object

在本地文件系统创建一个8MB文件

$ dd if=/dev/zero of=8mb.data bs=1048576 count=8

使用python S3 client把它作为object上传

largeObjectKey = 'large.txt'
largeObjectFile = '8mb.data'

key = bucket.new_key(largeObjectKey)
with open(largeObjectFile, 'rb') as f:
    key.set_contents_from_file(f)
with open(largeObjectFile, 'rb') as f:
    largeObject = f.read()

获取large objecy

assert largeObject == key.get_contents_as_string()

删除objects

bucket.delete_key(smallObjectKey)
bucket.delete_key(largeObjectKey)

初始化multipart upload

mp = bucket.initiate_multipart_upload(largeObjectFile)

上传分块

import math, os

from filechunkio import FileChunkIO

# Use a chunk size of 1MB (feel free to change this)
sourceSize = os.stat(largeObjectFile).st_size
chunkSize = 1048576
chunkCount = int(math.ceil(sourceSize / float(chunkSize)))

for i in range(chunkCount):
    offset = chunkSize * i
    bytes = min(chunkSize, sourceSize - offset)
    with FileChunkIO(largeObjectFile, 'r', offset=offset, bytes=bytes) as fp:
        mp.upload_part_from_file(fp, part_num=i + 1)

完成multipart upload

mp.complete_upload()

中止multipart upload

mp.cancel_upload()

删除bucket

conn.delete_bucket(bucketName)
Amazon S3 API
REST API 手册