Alluxio
ProductsLanguageHome
AI-3.5 (stable)
AI-3.5 (stable)
  • Overview
  • Getting Started with K8s
    • Resource Prerequisites and Compatibility
    • Install on Kubernetes
    • Monitoring and Metrics
    • Cluster Administration
    • System Health Check & Quick Recovery
    • Collecting Cluster Information
  • Architecture
    • Alluxio Namespace and Under File System Namespaces
    • I/O Resiliency
    • Worker Management and Consistent Hashing
  • Storage Integrations
    • Amazon AWS S3
    • HDFS
    • Aliyun OSS
    • Tencent COS
    • Volcengine TOS
    • Google Cloud GCS
  • Client APIs
    • Alluxio Python Filesystem API based on FSSpec
    • FUSE based POSIX API
      • Client Writeback
    • S3 API
  • Caching Operations
    • Cache Preloading
    • Cache Filter Policy
    • Cache Eviction
      • TTL Policy
      • Priority Policy
      • Free CLI Command
  • Resource Management
    • Directory-Based Cluster Quota
    • UFS Bandwidth Limiting
  • Performance Optimizations
    • Read Throughput Via Replicas
    • Reading Large Files
    • Metadata Listing
    • Data Prefetch
    • Writing Temporary Files
  • Security
    • TLS Support
  • Performance Benchmarks
    • Fio (POSIX) Benchmark
    • COSBench (S3) Benchmark
    • MLPerf Storage Benchmark
  • Reference
    • User CLI
    • Metrics
    • S3 API Usage
    • Third Party Licenses
  • Release Notes
Powered by GitBook
On this page
  • Prerequisites
  • Basic Setup
  • Advanced Setup
  • Enabling HTTPS
  • TOS multipart upload
  • Setting Request Retry Policy
  • High Concurrency Tuning
  1. Storage Integrations

Volcengine TOS

Last updated 3 months ago

This guide describes how to configure as Alluxio's under storage system. Tinder Object Storage Service (TOS) is a massive, secure, low-cost, easy-to-use, highly reliable, and highly available distributed cloud storage service provided by VolcEngine.

Prerequisites

Before using TOS with Alluxio, follow the to sign up for TOS and create a TOS bucket.

Before you get started, please ensure you have the required information listed below:

<TOS_BUCKET>

Create a new TOS bucket or use an existing bucket

<TOS_DIRECTORY>

The directory you want to use in the bucket, either by creating a new directory or using an existing one

<TOS_ACCESS_KEY_ID>

<TOS_ACCESS_KEY_SECRET>

<TOS_ENDPOINT>

<TOS_REGION>

Basic Setup

Use the to add a new mount point, specifying the Alluxio path to create the mount on and the TOS path as the UFS URI. Credentials and configuration options can also be specified as part of the mount command by specifying the --option flag as described by .

An example command to mount tos://<TOS_BUCKET>/<TOS_DIRECTORY> to /tos:

bin/alluxio mount add --path /tos/ --ufs-uri tos://<TOS_BUCKET>/<TOS_DIRECTORY> \
  --option fs.tos.accessKeyId=<TOS_ACCESS_KEY> --option fs.tos.accessKeySecret=<TOS_ACCESS_KEY_SECRET> \
  --option fs.tos.endpoint=<TOS_ENDPOINT> --option fs.tos.region=<TOS_REGION>

Note that if you want to mount the root of the TOS bucket, add a trailing slash after the bucket name (e.g. tos://TOS_BUCKET/).

Advanced Setup

Note that configuration options can be specified as mount options or as configuration properties in conf/alluxio-site.properties. The following sections will describe how to set configurations as properties, but they can also be set as mount options via --option <key>=<value>.

Enabling HTTPS

To enable the use of the HTTPS protocol for secure communication with TOS with an additional layer of security for data transfers, configure the following setting in conf/alluxio-site.properties:

fs.tos.endpoint=https://<TOS_ENDPOINT>
alluxio.underfs.tos.secure.http.enabled=true

TOS multipart upload

The default upload method uploads one file completely from start to end in one go. We use multipart-upload method to upload one file by multiple parts, every part will be uploaded in one thread. It won't generate any temporary files while uploading.

To enable TOS multipart upload, you need to modify conf/alluxio-site.properties to include:

alluxio.underfs.tos.multipart.upload.enabled=true

There are other parameters you can specify in conf/alluxio-site.properties to potentially speed up the upload.

# Timeout for uploading part when using multipart upload.
alluxio.underfs.object.store.multipart.upload.timeout
# Thread pool size for TOS multipart upload.
alluxio.underfs.tos.multipart.upload.threads
# Multipart upload partition size for TOS. The default partition size is 64MB. 
alluxio.underfs.tos.multipart.upload.partition.size

Setting Request Retry Policy

Sometimes there may be an error in accessing UFS because the server is temporarily unable to respond. You can configure a retry policy for UFS requests.

Each I/O request sent to UnderFS like getObject, putObject, MultipartUpload, Alluxio will check the response. If the response is an error, and the error code suggests it may be retryable, the request will be resubmitted according to the retry policy in configuration. Alluxio will keep trying until the request is successful or reaches the maximum number of retries. The wait interval between successive retries will gradually increase from the configured base sleep time to the maximum sleep time.

The following error codes are categorized as retryable: 500 HTTP_INTERNAL_ERROR, 502 HTTP_BAD_GATEWAY, 503 HTTP_UNAVAILABLE, 503 Slow Down, and 504 HTTP_GATEWAY_TIMEOUT.

Note:

  • 4xx status code usually represents client errors, such as NOT_FOUND, PERMISSION_DENIED, UNAUTHENTICATED, etc. Such errors should never be retried since the issue is on the client side.

  • 5xx status code usually represents server errors, but not all 5xx error should be retried. For example, 501 HTTP_NOT_IMPLEMENTED should not be retried.

If you want to set the retry policy for the UFS accessing request, you need to modify conf/alluxio-site.properties to include:

# the max number of retry in one UnderFS accessing request.
alluxio.underfs.business.retry.max.num=10

# the sleep time between the two retries after the initial failure
alluxio.underfs.business.retry.base.sleep=30ms

# the max sleep time between two retries
alluxio.underfs.business.retry.max.sleep=30s

High Concurrency Tuning

When integrating Alluxio with TOS, you can optimize performance by adjusting the following configurations:

  • alluxio.underfs.tos.retry.max: Controls the number of retries with TOS. Default value is 3.

  • alluxio.underfs.tos.read.timeout: Controls read timeout with TOS. Default value is 30000 milliseconds.

  • alluxio.underfs.tos.write.timeout: Controls write timeout with TOS. Default value is 30000 milliseconds.

  • alluxio.underfs.tos.streaming.upload.partition.size: Controls the partition size for TOS streaming upload. Default value is 64MB.

  • alluxio.underfs.tos.connect.timeout: Controls the connection timeout with TOS. Default value is 30000 milliseconds.

The for TOS, which are created and managed in the

The for TOS, which are created and managed in the

The internet endpoint of the bucket, which can be found in the bucket overview page with values like tos-cn-beijing.volces.com and tos-cn-guangzhou.volces.com. Available endpoints are listed in the .

The region where the bucket is located, such as cn-beijing or cn-guangzhou. Available regions are listed in the .

Access Key ID
TOS AccessKey management console
Secret Access Key
TOS AccessKey management console
TOS Internet Endpoints documentation
TOS Regions documentation
Tinder Object Storage Service (TOS)
TOS quick start guide
mount table operations
configuring mount points