Core Concepts

This section explains the fundamental concepts behind Alluxio's architecture and capabilities. Understanding these ideas will help you get the most out of the system.

1. Decentralized Architecture

Alluxio is built on a decentralized, master-less architecture designed for high availability and massive scalability. Unlike traditional systems that rely on a central master node, Alluxio distributes responsibilities across the cluster. The core of this architecture is consistent hashing, which allows clients to locate data and metadata efficiently without querying a central service.

This decentralized design provides several key advantages:

  • No Single Point of Failure: The system remains available even if some worker nodes fail.

  • Linear Scalability: Metadata and data capacity scale horizontally as you add more workers.

  • Low Latency: Clients can resolve metadata and data locations in a single network hop.

The Consistent Hash Ring

Alluxio uses a consistent hashing algorithm to map file paths to a set of distributed workers. Imagine a virtual "hash ring" where all the workers in the cluster are placed. Each worker is responsible for a segment of this ring. This approach ensures even data distribution, scalability, and fault tolerance.

Key Components

This architecture is composed of two main components that work together:

  1. Alluxio Worker: Workers are typically co-located with your compute applications (e.g., on the same Kubernetes nodes). They use local storage (memory, SSD, or HDD) to store cached data. Crucially, each worker is also responsible for managing the metadata for its portion of the namespace, as determined by the consistent hash ring.

  2. Alluxio Client: The Alluxio client is a library embedded in your compute framework. It contains the consistent hashing logic to locate the correct worker for any given file path.

How It Works

When an application needs to access a file:

  1. Hashing: The Alluxio client applies a hash function to the file path.

  2. Mapping: The output of the hash determines which worker on the ring is responsible for that file's metadata and its cached data.

  3. Direct Communication: The client communicates directly with the responsible worker to get the information it needs, avoiding the bottleneck of a central master.

To keep the view of the hash ring consistent, Alluxio uses a distributed key-value store like etcd to manage cluster membership and track which workers are online.

2. Unified Namespace

Alluxio provides a unified namespace that presents all your connected storage systems as a single, logical file system. This is achieved by "mounting" different Under File Systems (UFS) to paths within Alluxio.

For example, you can mount an S3 bucket and a GCS bucket into Alluxio:

The mount table would look like this:

Alluxio Path
Under File System (UFS) Path

/s3/

s3://bucketA/data/

/gcs/

gcs://bucketB/records/

Now, your applications can access data from both S3 and GCS through a single, consistent API without explicitly providing credentials or other system specific information for each underlying storage system. The mapping between Alluxio paths and UFS paths is stored in a mount table. To ensure this table is highly available and consistent across all Alluxio components, it is stored in a distributed key-value store, etcd.

For a deeper dive, see Connecting to Storage .

3. I/O Resiliency and High Availability

Alluxio is designed to be highly resilient to failures. It has multiple mechanisms to ensure that I/O operations can continue gracefully even when components become unavailable.

  • UFS Fallback: If a client tries to read data from a worker and that worker is unavailable, the client can automatically fallback to reading the data directly from the Under File System. This ensures the application's read request succeeds without interruption, even if the Alluxio cluster becomes unresponsive.

  • Retry Across Replicas: When data replication is enabled, if a client fails to get a response from one worker, it will automatically retry the request on other workers that could host a replica of the data.

  • Multi-AZ High Availability: For maximum fault tolerance, you can deploy Alluxio clusters across multiple Availability Zones (AZs). If the local Alluxio cluster becomes unavailable, the client will failover and request data from clusters in other AZs.

These features work together to create a robust data access layer that is resilient to common failures.

Failover Priority: The Order of Operations

When an Alluxio client fails to read data from its preferred worker, it will automatically attempt to recover by trying different sources in the following order:

  1. Retry with Local Replicas: If data replication is enabled, the client will first try to read from other workers in the same cluster that have a replica of the requested data.

  2. Failover to Remote Clusters (Multi-AZ): If all local replicas are unavailable and multi-AZ support is enabled, the client will then try to read from workers in other federated clusters, typically located in different Availability Zones.

  3. Fallback to UFS: If all Alluxio workers (both local and remote) are unavailable, the client will fall back to reading the data directly from the Under File System (UFS) as a last resort.

This layered approach maximizes the chance of a successful read while prioritizing the fastest available data source.

How Worker Lifecycle Events Affect I/O

The decentralized architecture is designed to handle dynamic changes in cluster membership.

  • When a Worker is Added: The new worker joins the consistent hash ring and takes over a portion of the hash range. Read operations for data in its range may initially be slow as it loads data from the UFS, but the operations will succeed.

  • When a Worker is Removed or Fails: Ongoing requests to the removed worker will fail. The client will then trigger the failover process described above. For FUSE clients, this process is seamless. For S3 or FSSpec clients, the application might see a transient error but should recover on retry once the client's view of the worker list is updated.

  • When a Worker is Restarted: This is treated as a worker removal followed by an addition. As long as the worker's identity is persisted, it will rejoin the ring and reclaim its original hash range, making its cached data available again.

For a deeper dive, see Accessing Data with High Availability .

4. The Data Cache Lifecycle

Effectively managing the data cached in Alluxio is key to achieving maximum performance and resource efficiency. The life of cached data can be understood in three main phases: Loading, Managing, and Removing.

Loading Data

Data enters the Alluxio cache in two primary ways. It can be loaded on-demand (passively) when an application first reads it, which is the default behavior. Alternatively, data can be loaded proactively through preloading jobs that "warm up" the cache in anticipation of future workloads, ensuring initial requests are served at memory speed.

Managing Data

Once data is in the cache, its lifecycle is governed by a set of flexible policies. Administrators can define rules to control what data gets cached, set storage quotas to ensure fair resource allocation between different datasets or tenants, and apply time-to-live (TTL) settings to automatically purge stale information. These controls allow for fine-grained management of the cache's contents and resource footprint.

Removing Data

Data is removed from the cache either automatically or manually. When the cache is full, automatic eviction makes space for new data by removing existing items based on a configured policy, such as Least Recently Used (LRU). For more direct control, manual eviction allows administrators to explicitly free specific files or directories from the cache.

For a deeper dive, see Managing Cache .

5. Multi-Tenancy and Cluster Federation

For large enterprise environments, Alluxio supports multi-tenancy and the federation of multiple clusters.

  • Multi-Tenancy: Alluxio can enforce tenant isolation, allowing different teams or business units to share a single Alluxio deployment securely. This includes per-tenant cache quotas, access policies, and configurations. Authentication and authorization are handled through integrations with enterprise identity providers, such as Okta, and policy engines, such as OPA.

  • Cluster Federation: When you have multiple Alluxio clusters (e.g., one for each region or business unit), a central Management Console and API Gateway can provide a unified view for monitoring, licensing, and operations. This simplifies the management of a large-scale, distributed data environment.

For a deeper dive, see Multi-Tenancy and Federation .

Last updated