Optimizing Reads
Alluxio provides several advanced features to accelerate file reading performance. Whether you are working with sequential access patterns, very large files, or specialized AI/ML workloads, you can tune Alluxio to meet your needs. This guide covers the key mechanisms for optimizing read operations.
Tune Client-Side Prefetching for Sequential Reads
For sequential file reads, the Alluxio client automatically prefetches data that is likely to be read next. This data is cached in the client's local memory, allowing subsequent read requests to be served directly from the client without needing a network request to a worker.
The prefetch window is self-adjusting: it increases during continuous sequential reads and decreases for non-continuous or random reads. For completely random access patterns, the prefetch window will eventually shrink to zero.
Standard Prefetching Configuration
Client-side prefetching is enabled by default. You can tune its behavior using the following properties in conf/alluxio-site.properties
:
alluxio.user.position.reader.streaming.async.prefetch.thread
64
The overall concurrency for the async prefetch thread pool.
alluxio.user.position.reader.streaming.async.prefetch.part.length
4MB
The size of each prefetch unit.
alluxio.user.position.reader.streaming.async.prefetch.max.part.number
8
The maximum number of units a single opened file can have. For example, with a 4MB unit size and a max of 8 units, Alluxio will prefetch up to 32MB of data ahead for a single file.
alluxio.user.position.reader.streaming.async.prefetch.file.length.threshold
4MB
If a file's size is below this threshold, Alluxio will immediately maximize the prefetch window instead of starting small. This is useful for improving the read performance of small files.
Using the Slow Prefetch Pool for Mixed Workloads
Some workloads, like cold reads, may benefit from different prefetching parameters (e.g., higher concurrency) than others. Alluxio provides a secondary, "slow" prefetch pool that can be configured independently for these scenarios.
To enable and configure the slow prefetch pool, set the following properties:
alluxio.user.position.reader.streaming.async.prefetch.use.slow.thread.pool
true
Set to true
to enable the slow prefetch pool.
alluxio.user.position.reader.streaming.async.prefetch.use.slow.thread.pool.for.cold.read
true
If true
, the slow pool will be used for cold reads. Otherwise, it will only be used for cache filter reads.
alluxio.user.position.reader.streaming.slow.async.prefetch.thread
256
The overall async prefetch concurrency for the slow pool.
alluxio.user.position.reader.streaming.slow.async.prefetch.part.length
1MB
The size of the prefetch unit used by the slow pool.
alluxio.user.position.reader.streaming.slow.async.prefetch.max.part.number
64
The maximum number of units a single opened file can have for the slow pool.
Accelerate Large File Reads
Reading very large files presents unique challenges, such as initial cold read latency and single-worker bottlenecks. Alluxio offers two features to address these issues.
Preloading Entire Files for Faster Cold Reads
For workloads that read an entire large file from beginning to end, you can enable full-file preloading. When a client begins reading a large file for the first time, this feature triggers Alluxio workers to concurrently load the entire file from the UFS into the cache. This can make cold read performance nearly as fast as a fully cached hot read.
Note: This feature causes read amplification if your application only reads a small portion of the file, as the entire file is loaded into Alluxio regardless.
To enable this feature, set the following properties:
alluxio.user.position.reader.preload.data.enabled
true
Set to true
to enable large file preloading.
alluxio.user.position.reader.preload.data.file.size.threshold.min
1GB
The minimum file size to trigger the async preload.
alluxio.user.position.reader.preload.data.file.size.threshold.max
200GB
The maximum file size to trigger the async preload. This prevents extremely large files from filling the entire cache and causing excessive evictions.
alluxio.worker.preload.data.thread.pool.size
64
The number of concurrent jobs on each worker to load parts of the file from the UFS in parallel. For example, with a 4MB page size and 64 threads, a worker can load up to 256MB per iteration.
Segmenting Large Files Across Multiple Workers
By default, an entire file is cached on a single Alluxio worker. For very large files, this can create a bottleneck if multiple clients request the same file. File segmentation solves this by breaking a large file into smaller, independent segments, each of which can be cached on a different worker. When a client reads the file, it can fetch different segments from multiple workers in parallel, dramatically increasing read throughput.
How It Works
A file is treated as an ordered list of segments. Each segment is identified by a unique Segment ID, which is a combination of the original file's ID and the segment's index within the file.
Segment ID := (fileId, segmentIndex)
When a client needs to read part of a file, Alluxio uses this Segment ID—not the file ID—as the key for its worker selection algorithm. This ensures that different segments are mapped to different workers.
When a client reads a region that spans multiple segments, the request is broken down, and each segment is read from its corresponding worker. This parallelizes the I/O and distributes the load across the cluster.
Limitations
Files created and written directly by clients into Alluxio cannot be segmented.
The segment size is a cluster-wide setting and cannot be configured on a per-file basis.
Enabling File Segmentation
To enable this feature, set the following properties on all Alluxio nodes (masters, workers, and clients):
alluxio.dora.file.segment.read.enabled
true
Set to true
to enable file segmentation.
alluxio.dora.file.segment.size
(depends on use case)
The size of the segments. Defaults to 1 GiB.
Choosing the right segment size is a trade-off. If segments are too small, clients may switch between workers too frequently, underutilizing network bandwidth. If they are too large, the risk of uneven cache distribution across workers increases. A common range is between several gigabytes to tens of gigabytes.
Optimize for AI Model File Loading
Alluxio is highly effective at accelerating AI model file loading, a common bottleneck in production machine learning systems. In a typical workflow, trained models are stored in a central UFS, and online inference services need to load them quickly to serve predictions. These model files are often large, and conventional filesystems can struggle with the high-frequency, concurrent read requests from many service replicas, leading to traffic spikes and slow startup times.
By using Alluxio as a caching layer—often accessed via Alluxio FUSE to present models as a local filesystem—you can dramatically improve model loading speed and reduce load on the UFS.
While the standard client prefetching is often sufficient, you can enable an enhanced prefetching logic specifically for the high-concurrency reads common in model serving. When multiple services read the same model file through a single Alluxio FUSE instance, this feature can provide up to a 3x performance improvement.
To enable this optimization, set the following properties:
alluxio.user.position.reader.streaming.async.prefetch.by.file.enabled=true
alluxio.user.position.reader.streaming.async.prefetch.shared.cache.enabled=true
Last updated