Optimizing Writes

Accelerate write-intensive workloads in Alluxio by decoupling writes from the underlying storage (UFS). This guide covers two primary methods for improving throughput and reducing latency: a Client-Side Write Cache (FUSE only) and a Cluster-Level Write Cache.

Client-Side Write Cache (FUSE Only)

This feature is experimental.

The Client-Side Write Cache, a FUSE-only feature, boosts write performance by buffering data to a local disk before asynchronously persisting it to the UFS. This approach minimizes network overhead and is ideal for performance-sensitive tasks.

Overview

Key benefits include:

High Throughput: Achieves write speeds approaching that of the local disk for large, sequential writes.
Fast Small File Creation: Allows for the rapid creation of many small files without waiting for UFS operations.

Use Cases and Limitations

Before using the Client-Side Write Cache, understand these critical trade-offs:

Data Loss Risk: Data on the local disk buffer will be lost if the disk fails before the asynchronous upload to the UFS is complete.
Hardware Dependency: The local buffer must be a high-performance SSD. Using a standard HDD will not yield significant performance gains.
Read-After-Write Consistency: A file written via the cache cannot be read until the asynchronous upload is complete.

Due to these limitations, the Client-Side Write Cache is best for performance-critical tasks that can tolerate a small risk of data loss, such as writing intermediate AI model checkpoints.

Enabling the Client-Side Write Cache

Configuration requires setting static properties on FUSE clients and defining dynamic path-based rules.

Static Configuration

Add the following to conf/alluxio-site.properties on each FUSE client node. Note that the property names use the internal term write.back.

# Enable the client-side write cache feature
alluxio.user.write.back.enabled=true
# Set the local directory for the write cache buffer (must be on an SSD)
alluxio.user.fuse.write.back.dir=/data/alluxio/writeback
# Define the local disk quota for the buffer (0 means no limit)
alluxio.user.fuse.write.back.dir.quota=1TB

Dynamic Path-Based Configuration

Activate the cache for specific Alluxio paths using regular expressions. You can set these rules dynamically by sending a PUT request to the coordinator's REST API.

$curl -sS 'http://<coordinator-host>:19999/api/v1/conf' -X PUT -H 'Content-Type: application/json' --data '{"key":"PathConfigEntity","conf":"{\"configs\":[{\"randomWriteEnabled\":false,\"localWriteBackEnabled\":true,\"uploadIntervalMs\":null,\"regexPattern\":\"/test/b/.*\"}]}"}'

The conf value is a JSON string defining the path-based rules. For example:

{
  "configs": [
    {
      "localWriteBackEnabled": true,
      "regexPattern": "/user/ai_user1/checkpoint/.*"
    },
    {
      "localWriteBackEnabled": true,
      "regexPattern": "/user/ai_user2/checkpoint/.*"
    }
  ]
}

Note: The API currently only supports replacing the entire configuration.

Degraded Mode on Insufficient Space

To prevent errors when the local disk quota is exceeded, you can configure the cache to fall back to synchronous writing. In this "degraded mode," new writes are blocked until buffered files are uploaded to the UFS, freeing up space. Subsequent writes will bypass the buffer and go directly to the UFS until the buffer is healthy.

Enable this fallback behavior with the following property:

alluxio.user.fuse.write.back.degraded.sync.write.on.insufficient.space=true

Accelerating Small File Writes

To speed up writing many small files, enable a bloom filter. It avoids costly UFS checks by quickly determining if a file does not exist.

alluxio.user.fuse.write.back.status.bloom.filter.enabled=true

The bloom filter has a default capacity of 10 million items and refreshes every 5 minutes. If you observe a high false-positive rate (via the alluxio_write_back_bloom_filter_fpp metric), you can shorten the refresh period:

alluxio.user.fuse.write.back.status.bloom.filter.refresh.period=1min

By default, empty files are written synchronously. To accelerate the creation of empty files, you can disable this optimization:

alluxio.user.fuse.write.back.sync.flush.empty.file=false

Handling Upload Failures

If a file fails to upload to the UFS three consecutive times, it is marked as failed and moved to an UPLOAD_FAILED subdirectory within your cache directory. These files are not deleted automatically to prevent data loss. Monitor the alluxio_upload_manager_upload_failed_files metric and handle these files manually.

Cluster-Level Write Cache

This feature is experimental.

For workloads where UFS performance is a major bottleneck, the Cluster-Level Write Cache offers maximum write performance. Data is written directly to a dedicated set of Alluxio components that form the write cache, completely bypassing the UFS for the initial write.

This feature is ideal for temporary data that does not require immediate persistence, such as:

AI model training checkpoints.
Shuffle data from large-scale data processing jobs.

Overview

The Cluster-Level Write Cache is a storage cluster managed independently from standard Alluxio workers. Clients interact directly with the cache components for any paths mounted for the write cache.

Deploying the Cluster-Level Write Cache

Deploy the Cluster-Level Write Cache using the Alluxio Operator for Kubernetes. The deployment is configured under the cacheOnly section in your AlluxioCluster definition, which reflects the internal component name.

apiVersion: k8s-operator.alluxio.com/v1
kind: AlluxioCluster
metadata:
  name: alluxio
spec:
  image: <PRIVATE_REGISTRY>/alluxio-enterprise
  imageTag: <TAG>
  
  # ... other configurations ...

  cacheOnly:
    enabled: true
    mountPath: "/write-cache"
    image: <PRIVATE_REGISTRY>/alluxio-cacheonly
    imageTag: <TAG>
    license: <YOUR_BASE64_ENCODED_LICENSE>
    
    journal:
      storageClass: "gp2"

    worker:
      count: 2
    tieredstore:
      levels:
        - level: 0
          alias: SSD
          mediumtype: SSD
          path: /data1/cacheonly/worker
          type: hostPath
          quota: 10Gi

Accessing the Cluster-Level Write Cache

Once deployed, all requests to the configured mountPath (e.g., /write-cache) are routed to the Cluster-Level Write Cache.

Access via Alluxio CLI:

bin/alluxio fs ls /write-cache

Access via Alluxio FUSE:

cd ${fuse_mount_path}/write-cache
echo '123' > test.txt
cat test.txt

Asynchronous Persistence to UFS

For data that eventually needs to be persisted, Alluxio offers an optional async persistence mechanism to upload data from a cache path to a corresponding UFS path.

Limitations

Metadata Operations: Only basic file persistence is supported; operations like rename are not reliably handled.
No UFS Cleanup: Deleting a file from the cache does not delete it from the UFS.
No Reconciliation: Alluxio cannot automatically reconcile diverging versions of a file between the cache and the UFS.

Configuration

To enable async persistence, configure a path mapping file on both the standard Alluxio masters and the masters for the Cluster-Level Write Cache.

Set the property in alluxio-site.properties:
Property
Description
alluxio.gemini.master.async.upload.local.file.path
Path to the async upload path mapping JSON file.

Create the JSON mapping file. This file defines the mappings from cache paths to UFS-backed Alluxio paths. Note the cacheOnlyMountPoint key is required.

{
  "cacheOnlyMountPoint": "/write-cache",
  "asyncUploadPathMapping": {
    "/write-cache/a": "/s3/a",
    "/write-cache/b": "/local/c"
  },
  "blackList": [
    ".tmp"
  ]
}

Advanced Configurations

Multi-Replica Writes

Enable multi-replica writes for fault tolerance within the Cluster-Level Write Cache by setting the replication factor.

# In your AlluxioCluster spec:
properties:
  "alluxio.gemini.user.file.replication": "2"

Multipart Upload

Improve write performance by buffering data in client memory and uploading it in the background. Note that property names use the internal cache.only term.

# In your AlluxioCluster spec:
properties:
  "alluxio.gemini.user.file.cache.only.multipart.upload.enabled": "true"
  "alluxio.gemini.user.file.cache.only.multipart.upload.threads": "16"
  "alluxio.gemini.user.file.cache.only.multipart.upload.buffer.number": "16"

Note: This feature significantly increases client memory usage.

Cache Eviction

Files in the Cluster-Level Write Cache are not evicted automatically. You must manually delete them using rm or alluxio fs rm to free up space.

Last updated 1 month ago