Worker Management and Consistent Hashing

This document explores the interaction between Alluxio workers and etcd, an open-source distributed key-value store, in managing component inventory and state. It details how workers register, maintain membership, and collaboratively serve Alluxio queries in a distributed environment. By fine-tuning various configuration settings, users can optimize their Alluxio cluster to meet specific performance, reliability, and scalability objectives.

Key Concepts

  • Worker Registration: Alluxio workers use etcd to register themselves within the cluster.

  • Membership Maintenance: etcd helps maintain an up-to-date view of the cluster's composition.

  • Distributed Query Handling: Workers collaborate to efficiently process Alluxio queries across the cluster.

Consistent Hashing

Alluxio employs consistent hashing to efficiently map client requests to specific workers, eliminating the need for a centralized metadata service query. Each worker in the cluster is represented by one or more virtual nodes on a hash ring, providing a scalable and fault-tolerant mechanism for data sharding across the cluster. This approach ensures efficient load balancing and enables scaling out for improved performance in a distributed caching system. Key features of this system include:

  • Load Distribution: Virtual nodes ensure even data distribution across workers, minimizing hotspots and maximizing resource utilization.

  • Scalability: Adding or removing workers requires minimal data redistribution, allowing the cluster to scale seamlessly.

  • Fault Tolerance: Worker failures only affect the data mapped to the failed worker's segment of the hash ring, minimizing the impact on the rest of the cluster.

Virtual Nodes for Load Balancing

To evenly distribute data across workers, Alluxio assigns a number of virtual nodes to each worker on the hash ring. Virtual nodes enhance load balancing by spreading I/O requests more evenly across the cluster. The number of virtual nodes per worker can be configured via: alluxio.user.worker.selection.policy.consistent.hash.virtual.node.count.per.worker (default: 2000). Adjusting this value allows fine-tuning of load distribution, particularly in clusters with varying workloads.

Handling Heterogeneous Workers

By default, the consistent hashing algorithm assumes that all workers have equal capacity and assigns the same number of virtual nodes per worker. However, in clusters with heterogeneous configurations (e.g., workers with different storage or network capacities), this default behavior can result in sub-optimal resource utilization.

To address this, Alluxio supports capacity-based virtual node allocation. Set the property alluxio.user.worker.selection.policy.consistent.hash.provider.impl to CAPACITY, virtual nodes are then allocated proportionally to each worker's relative capacity (which will derive the number of vnodes by dividing the worker storage capacity by 1GB). This ensures fair data distribution and prevents smaller-capacity workers from being overloaded.

Dynamic vs. Static Hash Ring

The hash ring in Alluxio can operate in two modes: dynamic (default) or static, configurable via the property alluxio.user.dynamic.consistent.hash.ring.enabled.

  • Dynamic Hash Ring (set to true, the default): Includes only currently online workers in the hash ring. Workers that go offline are removed from the ring along with their virtual nodes.

  • Static Hash Ring (set to false): Includes all registered workers, regardless of their online status.

Note: The default configuration of using dynamic ring is well suited for most use cases, providing optimal performance and adaptability in dynamic environments. The setting of a static ring is suitable for specific scenarios requiring a consistent view of all workers and minimizing data loading from UFS to Alluxio, even if some are temporarily offline.

Worker Management

Alluxio supports dynamic scaling, enabling the addition or removal of workers based on traffic demands for improved resource efficiency. Workers must register themselves with the system to participate in the consistent hashing ring. Alluxio utilizes etcd by default to maintain membership records and perform dynamic liveness tracking, ensuring an accurate and up-to-date view of the cluster's state.

To configure Alluxio to use etcd for maintaining cluster information, add the following to the alluxio-site.properties file:

alluxio.worker.membership.manager.type=ETCD
alluxio.etcd.endpoints=<connection URI of ETCD cluster>

Registering a Worker

When a worker starts or restarts, it registers itself in etcd using a unique worker ID, which is typically auto-generated. The virtual nodes (vnodes) for each worker are derived from its worker ID. These registry entries establish the worker's membership in the cluster and are used to construct the consistent hash ring.

Tracking Liveness of Workers

Etcd tracks the liveness of each worker in the cluster. If a worker fails to communicate with etcd within the timeout period (configurable via alluxio.worker.failure.detection.timeout, default: 2 minutes), it is considered inactive or offline and its virtual nodes are removed from the consistent hash ring by default (dynamic consistent hashing ring).

You can view the currently online and offline workers in the cluster by running the following command:

bin/alluxio info nodes

Rejoining a Worker to the Ring

Alluxio maintains worker registration records in etcd even when a worker goes offline. This feature supports a crucial use case: allowing workers to seamlessly rejoin the cluster with their previously cached data intact. For instance, if a worker undergoes maintenance while retaining its cached data, it can reintegrate into the consistent hash ring without data loss.

To enable this functionality, the worker must reconnect with the same set of virtual nodes (vnodes) and worker ID it previously used. To ensure this, configure the following property correctly:

alluxio.worker.identity.uuid.file.path=/mnt/alluxio/system-info/worker_identity

The /mnt/alluxio/system-info/worker_identity file should contain the same identity that the worker had during its last run. If the worker instance was migrated to a different host, make sure to copy that identity file to the new host.

This setting is vital for preserving the worker's identity within the consistent hash ring. By maintaining a consistent hostname, the worker can reclaim its position in the cluster upon rejoining, ensuring that:

  1. The cached data remains valid and accessible.

  2. The load distribution across the cluster remains balanced.

  3. Client requests are routed correctly to the rejoined worker.

Removing a Worker from the Ring

Alluxio's consistent hash ring maintains worker registrations even if or when a worker is shut down. This design ensures ring stability when workers undergo temporary maintenance and are expected to return online.

However, if a worker has permanently left the cluster, its registration record can be deleted from etcd. It's important to note that removing a worker from the consistent hash ring has consequences:

  1. It triggers a re-mapping of files to the remaining worker instances.

  2. This re-mapping can lead to temporary cache misses as the system adjusts.

To remove a worker from the ring, follow these steps:

  1. Ensure the worker to be removed has been shut down.

  2. Run command bin/alluxio info nodes to get a list of currently registered workers in the cluster.

  3. Identify the worker to be removed, by checking its worker ID, network addresses, and liveness status.

  4. Run the command bin/alluxio process remove-worker -n <worker_id>, where <worker_id> is the ID of the worker, as shown in the output in step 2.

  5. Re-run bin/alluxio info nodes to verify that the worker has been successfully removed from the cluster.

Refreshing Worker Lists

The Alluxio client periodically refreshes its internal snapshot of the worker list from etcd to maintain an up-to-date view of the cluster. The Refresh Interval is configurable via alluxio.user.worker.list.refresh.interval (default: 2 minutes). This periodic refresh ensures the client can adapt to dynamic changes in the cluster, such as workers going online or offline.

Last updated