I/O Resiliency

The document describes how Alluxio handles unexpected or expected downtime, such as when the Alluxio workers become unavailable.

Multiple mechanisms are in place to handle application requests gracefully and ensure the overall I/O resiliency of the Alluxio system.

UFS fallback mechanism

Alluxio clients read both the metadata and data from Alluxio workers. If the requested worker becomes unavailable or returns with an error, the client can fallback to request this information from the UFS instead.

The UFS fallback feature is enabled by default. This may cause undesirable behavior in certain situations, such as potentially overloading the UFS with a large number of requests, referred to as the thundering herd problem.

To disable the feature, set the following property:

alluxio.dora.client.ufs.fallback.enabled=false

FUSE UFS fallback while reading a file

If the user is reading files via the FUSE interface and the worker becomes unavailable in the middle of the operation, Alluxio will seamlessly switch the read source from the worker to the UFS. This prevents the need for the client to explicitly retry the read operation in the case of a worker being removed.

Retry across replications

When multiple replication is enabled and the client fails to get a response from a worker, the client will retry on other available workers. This applies to both data and metadata access.

Again, this prevents the need for the client to explicitly retry the read operation if the target worker suddenly becomes unresponsive.

Retry across multiple Availability Zones

When multiple Alluxio clusters are deployed in multiple Availability Zones (AZ), an Alluxio client from one of the clusters can leverage service from other clusters in other AZs, even if the Alluxio cluster in the local AZ is unavailable due to failure in the AZ. See Multi AZ High Availability for instructions to enable and configure the feature.

When the preferred worker fails to serve a request, the Alluxio client will retry from different sources, in the following order:

  1. If multiple replication is enabled, the client will first retry with the replica workers in the same cluster.

  2. If multiple AZ support is enabled, and all the local workers fail to provide the service, the client then retry with workers from other clusters, possibly located in different AZs.

  3. If all the remote workers from other clusters are also unavailable, the client will fall back to the UFS as a last resort.

Membership changes

Adding workers

When a worker is added to the cluster, the newly added worker will take over some hash range from existing workers according to the consistent hashing algorithm. Clients will still read files from the existing set of workers until the clients detect that the membership has changed. In the case that requests are directed to the new worker, read operations may slow down since the worker would need to load the data again from UFS, but the operation should not encounter any errors.

Removing workers

When a worker is removed, the paths previously directed to the removed worker will be distributed to other workers, again according to the consistent hashing algorithm. Ongoing requests served by the removed worker will fail. If the UFS fallback mechanism is enabled, the client will attempt to retry the request directly to the UFS. Once the client detects the membership change, these requests will be properly directed to the remaining live workers.

With the FUSE UFS fallback feature, the FUSE client should not encounter any error when workers are removed. For S3 & fsspec interfaces, the application might see transient read errors for ongoing reads, but should be able to recover once the worker membership is updated.

Restarting workers

Restarting a worker behaves in the same way as removing a worker and subsequently adding it back. Because each worker has a unique identity, which is used to form the consistent hash ring, the restarted worker will be responsible for the same paths it had been previously managing, such that the cached data on the worker will remain valid. Again, FUSE clients should not observe any read failures in the case of worker restarts.

Last updated