File Reading
This document introduces features to improve Alluxio reading file performance for specific scenarios.
Client Prefetching
If the current file is being read sequentially, the Alluxio client will prefetch a range of data after the current read position and start to cache this data on the client. When the current read reaches the cached data, the Alluxio client will return the cached data instead of sending an RPC to the worker.
The prefetch window is self-adjusting. If the reads always starts at the end of last read position, the prefetch window will increase. If the reads are not continuous, the prefetch window will decrease. In the case that the reads are completely random reads, the prefetch window will eventually be reduced to 0.
Async prefetch caches data in client direct memory. Performance can be improved by increasing the direct memory assigned to the jvm process.
The client async prefetch is always enabled. The following parameters allow the user to tune the feature.
alluxio.user.position.reader.streaming.async.prefetch.thread
64
The overall async prefetch concurrency
alluxio.user.position.reader.streaming.async.prefetch.part.length
4MB
The size of the prefetch unit
alluxio.user.position.reader.streaming.async.prefetch.max.part.number
8
The maximum number of units a single opened file can have. For example, if the prefetch unit size is 4MB and the max number of unit is 8, Alluxio will fetch at most 32MB data ahead for an opened file.
alluxio.user.position.reader.streaming.async.prefetch.file.length.threshold
4MB
If the file size is less than the specified threshold, Alluxio will max out the prefetch window immediately instead of starting with a small window. This configuration is used to improve small file read performance.
Enable the slow async prefetch pool
Users may have different situations that require different async prefetch parameters, such as for cold reads vs cache filter reads.
Cold reads usually require more concurrency to maximize the network bandwidth and achieve the best performance. Alluxio has a secondary async prefetch pool dedicated for alternative configurations, labeled as the slow thread pool. To enable and configure this secondary pool, set the following configuration:
alluxio.user.position.reader.streaming.async.prefetch.use.slow.thread.pool
true
Set to true to enable the slow pool
alluxio.user.position.reader.streaming.async.prefetch.use.slow.thread.pool.for.cold.read
true
If set to true, the slow pool will be used for cold read as well. Otherwise, the slow pool will only be used cache filter read.
alluxio.user.position.reader.streaming.slow.async.prefetch.thread
256
The overall async prefetch concurrency for the slow pool
alluxio.user.position.reader.streaming.slow.async.prefetch.part.length
1MB
The size of the prefetch unit used by the slow pool
alluxio.user.position.reader.streaming.slow.async.prefetch.max.part.number
64
The maximum number of units a single opened file can have for the slow pool
Large File Reading
Code Read Optimization
Large file preload is an optimization for the cold read of large files. If the feature is enabled, Alluxio will load the whole file concurrently into Alluxio workers as the file is read initially by the client. When running the FIO benchmark for a single 100GB file stored on S3, Alluxio's cold read performance with this feature achieves a comparable read performance as a fully cached hot read.
Deduplication is handled on both the client and worker side to avoid excessive RPC calls and redundant traffic to the UFS. Note that since Alluxio always fully loads the file, this feature can cause read amplification if the application does not need to read the whole file.
To enable this feature, set the following configuration:
alluxio.user.position.reader.preload.data.enabled
true
Set to true to enable large file preloading
alluxio.user.position.reader.preload.data.file.size.threshold.min
1GB
The minimum file size to trigger the async preload
alluxio.user.position.reader.preload.data.file.size.threshold.max
200GB
The maximum file size to trigger the async preload. This is useful to avoid loading extremely large files that would completely fill up the page store capacity and trigger cache eviction.
alluxio.worker.preload.data.thread.pool.size
64
The number of concurrent jobs on the worker to load the data of the file into UFS in parallel. Each job loads a page into Alluxio. For example, if the page size is 4MB and this config is set to 64, the worker will concurrently load 256M per iteration.
Large File Segmentation
Each file in Alluxio has its own unique file ID. The file ID is used as the key to a worker selection algorithm that determines which worker should be responsible for caching the metadata and data of that file. The identical algorithm is implemented at the client side, so a client knows which worker it should go to fetch the cached file. The worker caches the file in its entirety, regardless of the size of the file. When reading the file, clients always go to the same worker, regardless of which part of the file they are trying to read.
This scheme works fine when the files stored in Alluxio are small to medium in size, compared to the capacity of a worker's cache storage. A worker can easily handle a large number of such not-so-large files, and the worker selection algorithm will approximately distribute the files evenly onto different workers. However, in the case of very large files whose sizes are comparable to the cache capacity of a single worker, it becomes increasingly difficult to efficiently cache these huge files. If multiple clients request the same file, the one worker can easily be overloaded, throttling the overall read performance.
File segmentation is a feature of Alluxio that allows a huge file to be cached in multiple segments on multiple workers. The segment's size is configurable by administrators and is usually significantly smaller than the file size. Segments of files can be efficiently served by multiple workers, reducing the possibility of worker load imbalance.
The following use cases may benefit from file segmentation:
Storing very large files in Alluxio cache, where the files are larger than or close to a worker's cache capacity
High read performance applications that could benefit from multiple workers serving the same file
How File Segmentation Works
A segment of a file is defined by the file ID, along with its index within the file, as if the file is an ordered list of segments:
The segment ID of a segment is defined as a tuple containing the file ID and the segment index:
When a client needs to locate the multiple parts of a segmented file, the segmented ID is used in place of the file ID as the key to the worker selection algorithm.
Reading a segmented file can be broken down into reading by segments sequentially. Given the following example file that consists of 4 segments:
An unsegmented read that spans the region of 3 segments can be split into reads of 3 segments, and each segment will be served by different workers.
Currently, there are a few limitations with file segmentation:
Files created and written directly by clients into Alluxio cannot be segmented
Segment size is set cluster-wide and all nodes must share the same segment size. It cannot be set on a per-file basis.
Enabling File Segmentation
alluxio.dora.file.segment.read.enabled
true
Set to true to enable file segmentation.
alluxio.dora.file.segment.size
(depends on use case)
The size of the segments. Defaults to 1 GiB.
Set alluxio.dora.file.segment.read.enabled
to true
on all nodes of Alluxio, including clients.
Set alluxio.dora.file.segment.size
to a desired segment size; this value should also be consistent across all nodes.
The best segment size can be determined by considering the following factors:
Different segments are likely mapped to different workers. When reading a file sequentially, a client needs to switch among different workers to read the different segments. If the size of the segments is too small, the client will have to frequently switch between workers and suffer from underutilized network bandwidth.
The data of a segment is stored in its entirety in a single worker. If the segment size is too large, the chance of uneven cache usage on different workers will increase.
The best segment size achieves a compromise between the performance and the even distribution of cached data. A common range for segment size varies between several gigabytes to tens of gigabytes.
AI Model Loading
Alluxio is optimized to accelerate model loading efficiency.
In a typical model loading workflow, users upload trained models to the UFS and use Alluxio as a caching layer. These models are then loaded locally via Alluxio Fuse for consumption by online services or inference systems. In this process, Alluxio serves as a caching intermediary that significantly improves model loading speed and reduces pressure on the underlying storage system.
Since a single Alluxio Fuse node may handle read requests from multiple models simultaneously, for example, multiple online services concurrently loading model files—this can lead to significant concurrent access pressure and traffic spikes. Model files are typically large, and conventional file systems often struggle to handle high-frequency concurrent reads effectively. Therefore, introducing Alluxio as a caching layer offers a more suitable solution for model distribution scenarios.
Furthermore, when multiple concurrent reads of the same model file are made via a single Alluxio Fuse instance, an enhanced prefetching logic can provide up to 3x faster than normal case. You can enable it using the following configuration:
Last updated