Alluxio
ProductsLanguageHome
AI-3.6 (stable)
AI-3.6 (stable)
  • Overview
    • Alluxio Namespace and Under File System
    • Worker Management and Consistent Hashing
    • Multi Tenancy and Unified Management
    • I/O Resiliency
  • Getting Started with K8s
    • Resource Prerequisites and Compatibility
    • Installation
      • Install on Kubernetes
      • Handling Images
      • Advanced Configuration
      • License
    • Monitoring and Metrics
    • Management Console
      • Deployment
      • Navigation
      • User Roles & Access Control
    • Cluster Administration
    • System Health Check & Quick Recovery
    • Diagnostic Snapshot
  • Storage Integrations
    • Amazon AWS S3
    • Google Cloud GCS
    • Azure Blob Store
    • Aliyun OSS
    • Tencent COS
    • Volcengine TOS
    • Baidu Object Storage
    • HDFS
    • Network Attached Storage (NAS)
  • Data Access
    • Access via FUSE (POSIX API)
      • Client Writeback
      • Client Virtual Path Mapping
    • Access via S3 API
    • Access via PythonSDK/FSSpec
    • Data Access High Availability
      • Multiple Replicas
      • Multiple Availability Zones (AZ)
    • Performance Optimizations
      • File Reading
      • File Writing
      • Metadata Listing
    • UFS Bandwidth Limiter
  • Cache Management
    • Cache Filter Policy
    • Cache Loading
    • Cache Eviction
      • Manual Eviction by Free Command
      • Auto Eviction by TTL Policy
      • Auto Eviction by Priority Policy
    • Stale Cache Cleaning
    • Cache Quota
  • Performance Benchmarks
    • Fio (POSIX) Benchmark
    • COSBench (S3) Benchmark
    • MLPerf Storage Benchmark
  • Security
    • TLS Support
  • Reference
    • User CLI
    • Metrics
    • S3 API Usage
    • Third Party Licenses
  • Release Notes
Powered by GitBook
On this page
  • Enabling CACHE_ONLY
  • Deploying CACHE_ONLY storage on Kubernetes
  • Configuring Resource Usage
  • Accessing CACHE_ONLY
  • Enabling Async Persistence
  • Limitations
  • Enabling the feature
  • Configuration Options
  • Fault Tolerance
  • Restoring Lost Data from UFS
  • Advanced Configurations
  • Enabling Multi-Replica
  • Enabling Multipart Upload
  • Cache Eviction
  1. Data Access
  2. Performance Optimizations

File Writing

Last updated 2 days ago

This feature is experimental.

In certain scenarios, the performance and bandwidth of the underlying file system (UFS) may not meet the needs of large-scale data writes. To address this issue, Alluxio offers an option to write data directly to the Alluxio cluster only. Since the process does not interact with UFS, the write performance and bandwidth depends entirely on the performance and bandwidth of the Alluxio cluster. This feature is called CACHE_ONLY.

The recommended use cases for CACHE_ONLY include:

  • Temporarily saving checkpoint files during AI training

  • Shuffle files generated during big data computations

In these use cases, the files written are temporary in nature and not meant to be persisted in storage for long term use.

Additionally, for scenarios requiring eventual persistence of CACHE_ONLY files, Alluxio supports an optional async persistence feature, which can be configured as described in the section.

Enabling CACHE_ONLY

To use the CACHE_ONLY feature, the CACHE_ONLY storage component must be separately deployed. Note that Alluxio client directly interfaces with the CACHE_ONLY storage and does not communicate with the Alluxio worker. The data and metadata in CACHE_ONLY storage are managed independently by CACHE_ONLY storage itself. Since files are managed separately, files in the CACHE_ONLY cannot interact with all the other files served by the Alluxio workers.

Deploying CACHE_ONLY storage on Kubernetes

The deployment of CACHE_ONLY storage is integrated into the Alluxio operator. Enable it by populating the cacheOnly field in the Alluxio deployment file.

apiVersion: k8s-operator.alluxio.com/v1
kind: AlluxioCluster
metadata:
  name: alluxio
spec:
  image: <PRIVATE_REGISTRY>/alluxio-enterprise
  imageTag: <TAG>
  properties:

  worker:
    count: 2

  pagestore:
    size: 100Gi

  cacheOnly:
    enabled: true
    mountPath: "/cache-only"
    image: <PRIVATE_REGISTRY>/alluxio-cacheonly
    imageTag: <TAG>
    imagePullPolicy: IfNotPresent

    # Replace with base64 encoded license generated by
    # cat /path/to/license.json | base64 |  tr -d "\n"
    license:

    properties:

    journal:
      storageClass: "gp2"

    worker:
      count: 2
    tieredstore:
      levels:
        - level: 0
          alias: SSD
          mediumtype: SSD
          path: /data1/cacheonly/worker
          type: hostPath
          quota: 10Gi

Note: The CACHE_ONLY Worker requires local disk storage for CACHE_ONLY data. This disk space is completely independent of the Alluxio Worker cache, so estimate the required capacity and reserve disk space accordingly.

Configuring Resource Usage

Configure cacheOnly.master.resources and cacheOnly.worker.resources in a similar fashion as the coordinator and worker fields.

apiVersion: k8s-operator.alluxio.com/v1
kind: AlluxioCluster
metadata:
  name: alluxio
spec:
  cacheOnly:
    enabled: true
    master:
      count: 1
      resources:
        limits:
          cpu: "8"
          memory: "40Gi"
        requests:
          cpu: "8"
          memory: "40Gi"
      jvmOptions:
        - "-Xmx24g"
        - "-Xms24g"
        - "-XX:MaxDirectMemorySize=8g"
    worker:
      count: 2
      resources:
        limits:
          cpu: "8"
          memory: "20Gi"
        requests:
          cpu: "8"
          memory: "20Gi"
      jvmOptions:
        - "-Xmx8g"
        - "-Xms8g"
        - "-XX:MaxDirectMemorySize=8g"

The recommended memory calculation is:

(${Xmx} + ${MaxDirectMemorySize}) * 1.1 <= ${requests} = ${limit}

Accessing CACHE_ONLY

Once CACHE_ONLY storage is deployed, all requests to its mount point will be treated as CACHE_ONLY requests. You can access CACHE_ONLY data in various ways.

Access using the Alluxio CLI:

bin/alluxio fs ls /cache_only

Access using the Alluxio FUSE interface:

cd ${fuse_mount_path}/cache_only
echo '123' > test.txt
cat test.txt

Enabling Async Persistence

For scenarios where temporary data written to Alluxio requires eventual persistence, Alluxio offers an async persistence mechanism. This allows data written to a CACHE_ONLY mount point to be asynchronously uploaded to a corresponding UFS path as configured.

This is especially useful in environments where immediate persistence is not necessary, but eventual consistency is desired.

Limitations

  1. Limited metadata operations: Only basic file persistence is supported; operations like renaming are not reliably handled.

  2. No UFS cleanup: Deleting files from Alluxio does not automatically remove the corresponding data from UFS.

  3. Weak recovery semantics: If the UFS and Alluxio versions of a file diverge, Alluxio cannot currently reconcile them.

  4. File modifications retrigger persistence: Modifying a file in Alluxio will schedule a new async persistence task, potentially creating inconsistent versions across Alluxio and UFS.

  5. Cache isolation: Files written via CACHE_ONLY and later persisted are not removed from CACHE_ONLY even after being written to UFS. Reading through the original CACHE_ONLY path will hit the CACHE_ONLY cache, while reading through the UFS path will use the standard Alluxio Worker pipeline — the two caches do not share data.

Enabling the feature

To enable the feature, you need to

  1. Enabling CACHE_ONLY, as a prerequisite to enable async persistence

  2. Setting alluxio.gemini.master.async.upload.local.file.path and the corresponding json path. Note that this should be set on both alluxio and alluxio CACHE_ONLY machines (See instructions below)

  3. Enabling alluxio coordinator (async persistence relies on the job service)

  4. Make sure the CACHE_ONLY masters be able to connect to ETCD

  5. If you use operator to deploy, make sure alluxio properties are also set on alluxio CACHE_ONLY properties. Some operation generated properties need to be specified on CACHE_ONLY components manually due to the current operator limitation.

Configuration Options

Property
Description
Default

alluxio.gemini.master.async.upload.local.file.path

Path to the async upload path mapping JSON file

N/A

alluxio.gemini.master.persistence.checker.interval

Interval to check and update async persistence status

1s

alluxio.gemini.master.persistence.scheduler.interval

Interval to schedule new async persistence tasks

1s

Async Upload Path Mapping Configuration File

The file path specified in alluxio.gemini.master.async.upload.local.file.path should be in JSON format. Example:

{
  "cacheOnlyMountPoint": "/cache-only",
  "asyncUploadPathMapping": {
    "/cache-only/a": "/s3/a",
    "/cache-only/b": "/local/c"
  },
  "blackList": [
    ".tmp"
  ]
}

Supported Keys

Key
Required
Description

cacheOnlyMountPoint

Yes

The mount point path for CACHE_ONLY storage

asyncUploadPathMapping

Yes

Key is the CACHE_ONLY sub-path, value is the Alluxio path to persist to (resolved by mount table)

blackList

Optional

Simple filename pattern exclusion list (non-regex)


Fault Tolerance

  1. Worker failure: If a CACHE_ONLY worker goes offline, Alluxio can retrieve data from other CACHE_ONLY workers that have replicas (if replication is enabled).

  2. Master failover: Metadata required for async persistence is stored in the Alluxio master journal. When a master fails, a standby master can recover the metadata by reading the journal.

  3. Coordinator restart: Async persistence is managed by the Alluxio Coordinator, which stores job state in a local RocksDB. The coordinator can resume ongoing jobs after a restart by reading the RocksDB state.

  4. Worker reassignment: If the worker responsible for uploading to UFS fails, the coordinator will reschedule the task to another worker.


Restoring Lost Data from UFS

When a file is lost from CACHE_ONLY, Alluxio supports restoring data from UFS under the following mechanisms:

Restore Triggers

  1. File open with missing blocks: If a file opened through a CACHE_ONLY path has incomplete blocks, the Alluxio client will attempt to fetch the missing content from UFS and cache it.

  2. File read errors: If Alluxio encounters an error reading from CACHE_ONLY, the client will fallback to reading the file directly from UFS, without caching it back into Alluxio.

Preconditions for Restore

  • The file was previously stored in UFS via async persistence.

  • The modification time in UFS is newer than the one in Alluxio.

  • The file length in UFS matches that in Alluxio metadata.

Advanced Configurations

Enabling Multi-Replica

CACHE_ONLY supports multi-replica writes. Enable this feature by adding the alluxio.gemini.user.file.replication configuration in the deployment file:

apiVersion: k8s-operator.alluxio.com/v1
kind: AlluxioCluster
metadata:
  name: alluxio
spec:
  properties:
    "alluxio.gemini.user.file.replication": "2"

Enabling Multipart Upload

Alluxio supports temporarily storing data in memory and uploading it to the CACHE_ONLY cluster in the background using multipart uploads to improve write performance. To enable this feature, add the following configurations:

apiVersion: k8s-operator.alluxio.com/v1
kind: AlluxioCluster
metadata:
  name: alluxio
spec:
  properties:
    "alluxio.gemini.user.file.cache.only.multipart.upload.enabled": "true"
    "alluxio.gemini.user.file.cache.only.multipart.upload.threads": "16"
    "alluxio.gemini.user.file.cache.only.multipart.upload.buffer.number": "16"
Configuration Item
Default
Description

alluxio.gemini.user.file.cache.only.multipart.upload.enabled

false

Enables the multipart upload feature

alluxio.gemini.user.file.cache.only.multipart.upload.threads

16

Maximum number of threads for multipart upload

alluxio.gemini.user.file.cache.only.multipart.upload.buffer.number

16

Number of memory buffers for multipart upload

Note: Enabling multipart upload will significantly increase the memory usage of the Alluxio client. The memory usage is calculated as follows:

${alluxio.gemini.user.file.cache.only.multipart.upload.buffer.number} * 64MB

Cache Eviction

Files stored as CACHE_ONLY will not be automatically evicted. The files can be manually deleted to free up space if the capacity is near full. Delete it via Alluxio FUSE with rm ${file_path} or run the Alluxio CLI command bin/alluxio fs rm ${file_path}

Enabling Async Persistence