Alluxio
ProductsLanguageHome
AI-3.6 (stable)
AI-3.6 (stable)
  • Overview
    • Alluxio Namespace and Under File System
    • Worker Management and Consistent Hashing
    • Multi Tenancy and Unified Management
    • I/O Resiliency
  • Getting Started with K8s
    • Resource Prerequisites and Compatibility
    • Installation
      • Install on Kubernetes
      • Handling Images
      • Advanced Configuration
      • License
    • Monitoring and Metrics
    • Management Console
      • Deployment
      • Navigation
      • User Roles & Access Control
    • Cluster Administration
    • System Health Check & Quick Recovery
    • Diagnostic Snapshot
  • Storage Integrations
    • Amazon AWS S3
    • Google Cloud GCS
    • Azure Blob Store
    • Aliyun OSS
    • Tencent COS
    • Volcengine TOS
    • Baidu Object Storage
    • HDFS
    • Network Attached Storage (NAS)
  • Data Access
    • Access via FUSE (POSIX API)
      • Client Writeback
      • Client Virtual Path Mapping
    • Access via S3 API
    • Access via PythonSDK/FSSpec
    • Data Access High Availability
      • Multiple Replicas
      • Multiple Availability Zones (AZ)
    • Performance Optimizations
      • File Reading
      • File Writing
      • Metadata Listing
    • UFS Bandwidth Limiter
  • Cache Management
    • Cache Filter Policy
    • Cache Loading
    • Cache Eviction
      • Manual Eviction by Free Command
      • Auto Eviction by TTL Policy
      • Auto Eviction by Priority Policy
    • Stale Cache Cleaning
    • Cache Quota
  • Performance Benchmarks
    • Fio (POSIX) Benchmark
    • COSBench (S3) Benchmark
    • MLPerf Storage Benchmark
  • Security
    • TLS Support
  • Reference
    • User CLI
    • Metrics
    • S3 API Usage
    • Third Party Licenses
  • Release Notes
Powered by GitBook
On this page
  • Prerequisites
  • Basic Setup
  • Advanced Setup
  • Enabling HTTPS
  • COS multipart upload
  • Setting Request Retry Policy
  1. Storage Integrations

Tencent COS

Last updated 9 days ago

This guide describes how to configure Alluxio with Tencent (Cloud Object Storage) as the under storage system. Tencent Cloud Object Storage (COS) is a distributed storage service offered by Tencent Cloud for massive data and accessible via HTTP/HTTPS protocols. It can store massive amounts of data and features imperceptible bandwidth and capacity expansion, making it a perfect data pool for big data computation and analytics.

Prerequisites

Alluxio runs on multiple machines in cluster mode so its binary package needs to be deployed on the machines.

Before using COS with Alluxio, either create a new bucket or use an existing one. Additionally, identify the directory you wish to use within that bucket, whether by creating a new directory or selecting an existing one. For this guide, the COS bucket name is COS_BUCKET, the directory within the bucket is COS_DATA, and the bucket region is COS_REGION.

Basic Setup

Use the to add a new mount point, specifying the Alluxio path to create the mount on and the COS path as the UFS URI. Credentials and configuration options can also be specified as part of the mount operation as described by.

An example ufs.yaml to create a mount point with the operator:

apiVersion: k8s-operator.alluxio.com/v1
kind: UnderFileSystem
metadata:
  name: alluxio-cos
  namespace: alx-ns
spec:
  alluxioCluster: alluxio-cluster
  path: cos://<COS_BUCKET>/<COS_DIRECTORY>
  mountPath: /cos
  mountOptions:
    fs.cos.access.key: <COS_ACCESS_KEY>
    fs.cos.secret.key: <COS_ACCESS_KEY_SECRET>
    fs.cos.region: <COS_REGION>
    fs.cos.app.id: <COS_APP_ID>

An example command to mount cos://<COS_ALLUXIO_BUCKET>/<COS_DATA> to /cos if not using the operator:

bin/alluxio mount add --path /cos/ --ufs-uri cos://<COS_BUCKET>/<COS_DATA> \
  --option fs.cos.access.key=<COS_SECRET_ID> --option fs.cos.secret.key=<COS_SECRET_KEY> \
  --option fs.cos.region=<COS_REGION> --option fs.cos.app.id=<COS_APP_ID>

Note: The full name of a COS bucket is <COS_BUCKET>-<COS_APP_ID>. The value of the path should only include the <COS_BUCKET> portion with the <COS_APP_ID> part omitted, resulting in cos://<COS_BUCKET>/<COS_DIRECTORY>. Also, make sure to set the fs.cos.app.id to <COS_APP_ID>.

Note that if you want to mount the root of the COS bucket, add a trailing slash after the bucket name (e.g. cos://COS_BUCKET/).

Advanced Setup

Note that configuration options can be specified as mount options or as configuration properties in conf/alluxio-site.properties. The following sections will describe how to set configurations as properties, but they can also be set as mount options via --option <key>=<value>.

Enabling HTTPS

To enable the use of the HTTPS protocol for secure communication with COS with an additional layer of security for data transfers, configure the following setting in conf/alluxio-site.properties:

alluxio.underfs.cos.secure.http.enabled=true

COS multipart upload

The default upload method uploads one file completely from start to end in one go. We use multipart-upload method to upload one file by multiple parts, every part will be uploaded in one thread. It won't generate any temporary files while uploading.

To enable COS multipart upload, you need to modify conf/alluxio-site.properties to include:

alluxio.underfs.cos.multipart.upload.enabled=true

There are other parameters you can specify in conf/alluxio-site.properties to potentially speed up the upload.

# Timeout for uploading part when using multipart upload.
alluxio.underfs.object.store.multipart.upload.timeout

# Thread pool size for COS multipart upload.
alluxio.underfs.cos.multipart.upload.threads

# Multipart upload partition size for COS. The default partition size is 16MB. 
alluxio.underfs.cos.multipart.upload.part.size

Setting Request Retry Policy

Sometimes there may be an error in accessing UFS because the server is temporarily unable to respond. You can configure a retry policy for UFS requests.

Each I/O request sent to UnderFS like getObject, putObject, MultipartUpload, Alluxio will check the response. If the response is an error, and the error code suggests it may be retryable, the request will be resubmitted according to the retry policy in configuration. Alluxio will keep trying until the request is successful or reaches the maximum number of retries. The wait interval between successive retries will gradually increase from the configured base sleep time to the maximum sleep time.

The following error codes are categorized as retryable: 500 HTTP_INTERNAL_ERROR, 502 HTTP_BAD_GATEWAY, 503 HTTP_UNAVAILABLE, 503 Slow Down, and 504 HTTP_GATEWAY_TIMEOUT.

Note:

  • 4xx status code usually represents client errors, such as NOT_FOUND, PERMISSION_DENIED, UNAUTHENTICATED, etc. Such errors should never be retried since the issue is on the client side.

  • 5xx status code usually represents server errors, but not all 5xx error should be retried. For example, 501 HTTP_NOT_IMPLEMENTED should not be retried.

If you want to set the retry policy for the UFS accessing request, you need to modify conf/alluxio-site.properties to include:

# the max number of retry in one UnderFS accessing request.
alluxio.underfs.business.retry.max.num=10

# the sleep time between the two retries after the initial failure
alluxio.underfs.business.retry.base.sleep=30ms

# the max sleep time between two retries
alluxio.underfs.business.retry.max.sleep=30s
COS
mount table operations
configuring mount points