File Replication
Overview
A copy of a file's data and metadata on an Alluxio worker node is called a replica. File replication allows multiple workers to serve the same file by creating and distributing multiple replicas across different workers. This feature is primarily used to increase I/O throughput and availability, especially when a dataset is cached and accessed concurrently by a large number of clients.
If file replication is not enabled (i.e., the replication level is set to 1), a file will only have one replica in the cluster at a time. This means all client access to the file's data will be served by the single worker where the replica is stored. In cases where this file needs to be accessed by many clients concurrently, that single worker can become a performance bottleneck.
Note: A worker will have at most one replica of any given file. It will not store multiple copies of the same file.
Enabling File Replication
File replication is controlled by several system settings that determine how many replicas are created and how they are accessed.
Number of Replicas
By default, the replication level is 1, meaning each file has only one replica. To increase this, you have two options:
Create Replica Rules (Recommended): This approach offers granular control by allowing you to define replication levels for specific paths. For example, a rule can set the replica count to 3 for all files under a given path.
Set a Cluster-Wide Default: Use the deprecated
alluxio.user.file.replication.min
property to set a default replication level for the entire cluster. This method is less flexible than using replica rules.
Replica Selection Policy
When reading a file with multiple replicas, the replica selection policy determines which worker a client reads from. This is key for load balancing and ensuring high availability. If the chosen worker is unavailable, the client automatically fails over to another replica.
The policy is configured using the alluxio.user.replica.selection.policy
property, with the following options available:
GLOBAL_FIXED
(Default): All clients read from workers in the same fixed order.CLIENT_FIXED
: Each client has its own fixed order of workers, distributing the load across replicas.RANDOM
: Each client chooses a random replica for each read request.
To configure the policy, set the following property:
alluxio.user.replica.selection.policy=GLOBAL_FIXED
Optimizing Replica Reading Performance
By default, a client follows the replica selection policy without considering if the data is already cached on the selected worker, which can lead to unnecessary cold reads. To optimize this, you can enable two key features: cached replica prioritization and passive caching.
Cached Replica Prioritization: The client first checks which of the available replica workers have the file fully cached. It prioritizes reading from one of these "hot" workers to avoid a slow cold read from the UFS.
Passive Caching: This feature complements replica prioritization. If the client's most preferred worker does not have the data cached but another worker does, the client will read from the cached source while simultaneously triggering an asynchronous background job to load the file onto the preferred worker. This ensures data becomes cached on the ideal worker for future reads.
To enable both features, set the following properties:
alluxio.user.replica.prefer.cached.replicas=true
alluxio.user.file.passive.cache.enabled=true
Note: A worker with a partially cached file is not prioritized over a worker with no cache at all.
Configuring Replication Rules
The number of replicas for files is managed through replica rules. These rules are configured via a REST API endpoint on the coordinator.
API Endpoint: http://<coordinator_host>:<coordinator_api_port>/api/v1/replica-rule
Creating a Replica Rule
To add a new rule, send a POST
request with a JSON body specifying the path
and numReplicasPerCluster
.
The following command sets the replication level to 2 for all files under the root path (/
):
$ curl -X POST -H 'Content-Type: application/json' -d '{"path": "/", "numReplicasPerCluster": 2}' http://<coordinator_host>:<coordinator_api_port>/api/v1/replica-rule
You can create multiple rules for different paths. For example, to set 3 replicas for /s3/vip_data
and 1 replica for /s3/temp_data
:
$ curl -X POST -H 'Content-Type: application/json' -d '{"path": "/s3/vip_data", "numReplicasPerCluster": 3}' http://<coordinator_host>:<coordinator_api_port>/api/v1/replica-rule
$ curl -X POST -H 'Content-Type: application/json' -d '{"path": "/s3/temp_data", "numReplicasPerCluster": 1}' http://<coordinator_host>:<coordinator_api_port>/api/v1/replica-rule
Limitation on Nested Paths: Currently, replica rule paths cannot be nested. For example, you cannot have a rule for /parent
and another for /parent/child
simultaneously. This also means a rule on the root path (/
) cannot coexist with any other rule. This limitation will be lifted in a future release.
Listing Replica Rules
To see all active replica rules, send a GET
request to the endpoint:
$ curl -G http://<coordinator_host>:<coordinator_api_port>/api/v1/replica-rule
The response is a JSON object containing all rules. The following example shows a default rule on the root path (/
) with a replication level of 1.
{
"/": {
"pathPrefix": "/",
"clusterReplicaMap": {
"DefaultAlluxioCluster": 1
},
"totalReplicas": 1,
"type":"UNIFORM"
}
}
Updating a Replica Rule
To update an existing rule, send a POST
request with the same path but a new numReplicasPerCluster
value. The following example updates the rule for /s3/vip_data
to 1 replica:
$ curl -X POST -H 'Content-Type: application/json' -d '{"path": "/s3/vip_data", "numReplicasPerCluster": 1}' http://<coordinator_host>:<coordinator_api_port>/api/v1/replica-rule
Removing a Replica Rule
To remove a rule, send a DELETE
request with the path of the rule to delete in the JSON body:
$ curl -X DELETE -H 'Content-Type: application/json' -d '{"path": "/s3/vip_data"}' http://<coordinator_host>:<coordinator_api_port>/api/v1/replica-rule
Creating Replicas
Replicas can be created in two ways: actively via a distributed load job or passively when clients read data.
Creating Replicas Actively
The most reliable way to create a specific number of replicas is to use the job load
command. This command launches a distributed job that loads the file into Alluxio and creates the number of replicas specified by the matching replica rule.
For example, if a rule for /s3
is set to 3 replicas, the following command will load /s3/file
with 3 replicas on 3 different workers:
bin/alluxio job load --path s3://bucket/file --submit
The load job will report as successful even if some replicas fail to load. Failed replicas will be created passively upon the next client request.
Creating Replicas Passively
Replicas can be created passively in two scenarios: on a cold read or through passive caching.
On Cold Read When a client reads a file that is not cached in Alluxio, it triggers a "cold read," and the file is loaded into the cache of the worker serving the request, creating a replica. If subsequent client requests for the same file are directed to different workers (based on the replica selection policy), additional replicas will be created passively over time. The number of replicas created this way is not guaranteed and depends on client access patterns.
With Passive Caching When passive caching is enabled, reading a file can trigger asynchronous replica creation. If a client's preferred worker does not have the data, the client reads from another source while an asynchronous background job loads the file to the preferred worker. This creates a new replica on the preferred worker, ensuring better data locality for future reads.
Monitoring
Alluxio provides metrics to monitor the behavior of file replication and passive caching.
Read Traffic Metrics
When alluxio.user.replica.prefer.cached.replicas
is enabled, the alluxio_multi_replica_read_from_workers_bytes_total
metric tracks data read patterns.
Type:
counter
Component:
client
Description: Total bytes read by a client from workers for multi-replica files. It helps analyze read distribution and cache effectiveness.
Labels:
cluster_name
: The cluster where the serving worker resides.local_cluster
:true
if the worker is in the same cluster as the client.hot_read
:true
if the worker had a fully cached replica.
Passive Caching Metrics
When passive caching is enabled, the alluxio_passive_cache_async_loaded_files
metric tracks files loaded asynchronously.
Type:
counter
Component:
worker
Description: Total number of files loaded by a worker triggered by passive cache.
Labels:
result
: The result of the load job (submitted
,success
, orfailure
).
Limitations
Consistency with Mutable Data
File replication works best with immutable data. If the underlying data in the UFS is mutable, different replicas loaded at different times may become inconsistent.
To mitigate this, you can:
Manually Refresh Metadata: Use the
job load
command with the--metadataOnly
flag to check for UFS updates and invalidate stale replicas.bin/alluxio job load --path s3://bucket/file --metadataOnly --submit
Use a Cache Filter Policy: Set a
maxAge
policy to automatically invalidate replicas after a certain period, forcing them to be reloaded from the UFS. For example, a 10-minutemaxAge
:{ "apiVersion": 1, "data": { "defaultType": "maxAge", "defaultMaxAge": "10min" }, "metadata": { "defaultType": "maxAge", "defaultMaxAge": "10min" } }
Replication Level is Not Actively Maintained
Alluxio does not actively add or remove replicas to match the target replication level set in the rules. The actual number of replicas can be lower or higher than the target due to:
Cache Eviction: Replicas may be evicted from workers due to insufficient cache capacity.
Passive Creation: If a file is always served from an existing replica, new replicas may not be created.
Worker Membership Changes: When workers join or leave the cluster, the replica selection algorithm may change, leading to "orphaned" replicas that are no longer selected but still exist in the cache. These orphaned replicas are eventually evicted.
Last updated