Alluxio
ProductsLanguageHome
AI-3.3
AI-3.3
  • Overview
  • Getting Started with K8s
    • Resource Prerequisites and Compatibility
    • Install on Kubernetes
    • Monitoring and Metrics
    • Cluster Administration
    • System Health Check & Quick Recovery
    • Collecting Cluster Information
  • Storage Integrations
    • Storage Integrations Overview
    • Amazon AWS S3
    • HDFS
    • Aliyun OSS
    • COS
    • TOS
    • GCS
  • Client APIs
    • Alluxio Python Filesystem API based on FSSpec
    • FUSE based POSIX API
    • S3 API
  • Features
    • Alluxio Namespace and Under File System Namespaces
    • Cache Preloading
    • Client Writeback
    • Cache Evicting
    • Cache Filtering
    • Cache Free
    • Directory-Based Cluster Quota
    • File Replication
    • File Segmentation
    • Index Service
    • I/O Resiliency
  • Performance Benchmarks
    • Fio Tests
    • MLPerf Storage Benchmark
    • Performance Optimization
    • COSBench performance benchmark
  • Reference
    • User CLI
    • S3 API Usage
    • Third Party Licenses
Powered by GitBook
On this page
  • Prerequisites
  • Basic Setup
  • Advanced Setup
  • COS multipart upload
  1. Storage Integrations

COS

Last updated 6 months ago

This guide describes how to configure Alluxio with Tencent (Cloud Object Storage) as the under storage system. Tencent Cloud Object Storage (COS) is a distributed storage service offered by Tencent Cloud for massive data and accessible via HTTP/HTTPS protocols. It can store massive amounts of data and features imperceptible bandwidth and capacity expansion, making it a perfect data pool for big data computation and analytics.

Prerequisites

Alluxio runs on multiple machines in cluster mode so its binary package needs to be deployed on the machines.

Before using COS with Alluxio, either create a new bucket or use an existing one. Additionally, identify the directory you wish to use within that bucket, whether by creating a new directory or selecting an existing one. For this guide, the COS bucket name is COS_BUCKET, the directory within the bucket is COS_DATA, and the bucket region is COS_REGION.

Basic Setup

Use the to add a new mount point, specifying the Alluxio path to create the mount on and the COS path as the UFS URI. Credentials and configuration options can also be specified as part of the mount command by specifying the --option flag as described by .

An example command to mount cos://<COS_ALLUXIO_BUCKET>/<COS_DATA> to /cos:

bin/alluxio mount add --path /cos/ --ufs-uri cos://<COS_BUCKET>/<COS_DATA> \
  --option fs.cos.access.key=<COS_SECRET_ID> --option fs.cos.secret.key=<COS_SECRET_KEY> \
  --option fs.cos.region=<COS_REGION> --option fs.cos.app.id=<COS_APP_ID>

Note that if you want to mount the root of the COS bucket, add a trailing slash after the bucket name (e.g. cos://COS_BUCKET/).

Advanced Setup

Note that configuration options can be specified as mount options or as configuration properties in conf/alluxio-site.properties. The following sections will describe how to set configurations as properties, but they can also be set as mount options via --option <key>=<value>.

COS multipart upload

The default upload method uploads one file completely from start to end in one go. We use multipart-upload method to upload one file by multiple parts, every part will be uploaded in one thread. It won't generate any temporary files while uploading.

To enable COS multipart upload, you need to modify conf/alluxio-site.properties to include:

alluxio.underfs.cos.multipart.upload.enabled=true

There are other parameters you can specify in conf/alluxio-site.properties to potentially speed up the upload.

# Timeout for uploading part when using multipart upload.
alluxio.underfs.object.store.multipart.upload.timeout
# Thread pool size for COS multipart upload.
alluxio.underfs.cos.multipart.upload.threads
# Multipart upload partition size for COS. The default partition size is 64MB. 
alluxio.underfs.cos.multipart.upload.partition.size
COS
mount table operations
configuring mount points