Optimizing Writes
Accelerate write-intensive workloads in Alluxio by decoupling writes from the underlying storage (UFS). This guide covers two primary methods for improving throughput and reducing latency: a Client-Side Write Cache (FUSE only) and a Cluster-Level Write Cache.
Client-Side Write Cache (FUSE Only)
This feature is experimental.
The Client-Side Write Cache, a FUSE-only feature, boosts write performance by buffering data to a local disk before asynchronously persisting it to the UFS. This approach minimizes network overhead and is ideal for performance-sensitive tasks.
Overview
Key benefits include:
High Throughput: Achieves write speeds approaching that of the local disk for large, sequential writes.
Fast Small File Creation: Allows for the rapid creation of many small files without waiting for UFS operations.
Use Cases and Limitations
Before using the Client-Side Write Cache, understand these critical trade-offs:
Data Loss Risk: Data on the local disk buffer will be lost if the disk fails before the asynchronous upload to the UFS is complete.
Hardware Dependency: The local buffer must be a high-performance SSD. Using a standard HDD will not yield significant performance gains.
Read-After-Write Consistency: A file written via the cache cannot be read until the asynchronous upload is complete.
Due to these limitations, the Client-Side Write Cache is best for performance-critical tasks that can tolerate a small risk of data loss, such as writing intermediate AI model checkpoints.
Enabling the Client-Side Write Cache
Configuration requires setting static properties on FUSE clients and defining dynamic path-based rules.
Static Configuration
Add the following to conf/alluxio-site.properties
on each FUSE client node. Note that the property names use the internal term write.back
.
# Enable the client-side write cache feature
alluxio.user.write.back.enabled=true
# Set the local directory for the write cache buffer (must be on an SSD)
alluxio.user.fuse.write.back.dir=/data/alluxio/writeback
# Define the local disk quota for the buffer (0 means no limit)
alluxio.user.fuse.write.back.dir.quota=1TB
Dynamic Path-Based Configuration
Activate the cache for specific Alluxio paths using regular expressions. You can set these rules dynamically by sending a PUT
request to the coordinator's REST API.
$curl -sS 'http://<coordinator-host>:19999/api/v1/conf' -X PUT -H 'Content-Type: application/json' --data '{"key":"PathConfigEntity","conf":"{\"configs\":[{\"randomWriteEnabled\":false,\"localWriteBackEnabled\":true,\"uploadIntervalMs\":null,\"regexPattern\":\"/test/b/.*\"}]}"}'
The conf
value is a JSON string defining the path-based rules. For example:
{
"configs": [
{
"localWriteBackEnabled": true,
"regexPattern": "/user/ai_user1/checkpoint/.*"
},
{
"localWriteBackEnabled": true,
"regexPattern": "/user/ai_user2/checkpoint/.*"
}
]
}
Note: The API currently only supports replacing the entire configuration.
Degraded Mode on Insufficient Space
To prevent errors when the local disk quota is exceeded, you can configure the cache to fall back to synchronous writing. In this "degraded mode," new writes are blocked until buffered files are uploaded to the UFS, freeing up space. Subsequent writes will bypass the buffer and go directly to the UFS until the buffer is healthy.
Enable this fallback behavior with the following property:
alluxio.user.fuse.write.back.degraded.sync.write.on.insufficient.space=true
Accelerating Small File Writes
To speed up writing many small files, enable a bloom filter. It avoids costly UFS checks by quickly determining if a file does not exist.
alluxio.user.fuse.write.back.status.bloom.filter.enabled=true
The bloom filter has a default capacity of 10 million items and refreshes every 5 minutes. If you observe a high false-positive rate (via the alluxio_write_back_bloom_filter_fpp
metric), you can shorten the refresh period:
alluxio.user.fuse.write.back.status.bloom.filter.refresh.period=1min
By default, empty files are written synchronously. To accelerate the creation of empty files, you can disable this optimization:
alluxio.user.fuse.write.back.sync.flush.empty.file=false
Handling Upload Failures
If a file fails to upload to the UFS three consecutive times, it is marked as failed and moved to an UPLOAD_FAILED
subdirectory within your cache directory. These files are not deleted automatically to prevent data loss. Monitor the alluxio_upload_manager_upload_failed_files
metric and handle these files manually.
Cluster-Level Write Cache
This feature is experimental.
For workloads where UFS performance is a major bottleneck, the Cluster-Level Write Cache offers maximum write performance. Data is written directly to a dedicated set of Alluxio components that form the write cache, completely bypassing the UFS for the initial write.
This feature is ideal for temporary data that does not require immediate persistence, such as:
AI model training checkpoints.
Shuffle data from large-scale data processing jobs.
Overview
The Cluster-Level Write Cache is a storage cluster managed independently from standard Alluxio workers. Clients interact directly with the cache components for any paths mounted for the write cache.
Deploying the Cluster-Level Write Cache
Deploy the Cluster-Level Write Cache using the Alluxio Operator for Kubernetes. The deployment is configured under the cacheOnly
section in your AlluxioCluster
definition, which reflects the internal component name.
apiVersion: k8s-operator.alluxio.com/v1
kind: AlluxioCluster
metadata:
name: alluxio
spec:
image: <PRIVATE_REGISTRY>/alluxio-enterprise
imageTag: <TAG>
# ... other configurations ...
cacheOnly:
enabled: true
mountPath: "/write-cache"
image: <PRIVATE_REGISTRY>/alluxio-cacheonly
imageTag: <TAG>
license: <YOUR_BASE64_ENCODED_LICENSE>
journal:
storageClass: "gp2"
worker:
count: 2
tieredstore:
levels:
- level: 0
alias: SSD
mediumtype: SSD
path: /data1/cacheonly/worker
type: hostPath
quota: 10Gi
Accessing the Cluster-Level Write Cache
Once deployed, all requests to the configured mountPath
(e.g., /write-cache
) are routed to the Cluster-Level Write Cache.
Access via Alluxio CLI:
bin/alluxio fs ls /write-cache
Access via Alluxio FUSE:
cd ${fuse_mount_path}/write-cache
echo '123' > test.txt
cat test.txt
Asynchronous Persistence to UFS
For data that eventually needs to be persisted, Alluxio offers an optional async persistence mechanism to upload data from a cache path to a corresponding UFS path.
Limitations
Metadata Operations: Only basic file persistence is supported; operations like
rename
are not reliably handled.No UFS Cleanup: Deleting a file from the cache does not delete it from the UFS.
No Reconciliation: Alluxio cannot automatically reconcile diverging versions of a file between the cache and the UFS.
Configuration
To enable async persistence, configure a path mapping file on both the standard Alluxio masters and the masters for the Cluster-Level Write Cache.
Set the property in
alluxio-site.properties
:PropertyDescriptionalluxio.gemini.master.async.upload.local.file.path
Path to the async upload path mapping JSON file.
Create the JSON mapping file. This file defines the mappings from cache paths to UFS-backed Alluxio paths. Note the
cacheOnlyMountPoint
key is required.{ "cacheOnlyMountPoint": "/write-cache", "asyncUploadPathMapping": { "/write-cache/a": "/s3/a", "/write-cache/b": "/local/c" }, "blackList": [ ".tmp" ] }
Advanced Configurations
Multi-Replica Writes
Enable multi-replica writes for fault tolerance within the Cluster-Level Write Cache by setting the replication factor.
# In your AlluxioCluster spec:
properties:
"alluxio.gemini.user.file.replication": "2"
Multipart Upload
Improve write performance by buffering data in client memory and uploading it in the background. Note that property names use the internal cache.only
term.
# In your AlluxioCluster spec:
properties:
"alluxio.gemini.user.file.cache.only.multipart.upload.enabled": "true"
"alluxio.gemini.user.file.cache.only.multipart.upload.threads": "16"
"alluxio.gemini.user.file.cache.only.multipart.upload.buffer.number": "16"
Note: This feature significantly increases client memory usage.
Cache Eviction
Files in the Cluster-Level Write Cache are not evicted automatically. You must manually delete them using rm
or alluxio fs rm
to free up space.
Last updated