Managing Alluxio
This guide provides a comprehensive overview of administrative operations for managing a running Alluxio cluster on Kubernetes. It covers day-to-day tasks such as configuration updates, scaling, and upgrades, as well as advanced topics like namespace and multi-tenancy management.
1. Cluster Lifecycle and Configuration
This section covers fundamental operations related to the cluster's lifecycle, such as scaling, upgrades, and dynamic configuration updates.
Scaling the Cluster
You can dynamically scale the number of Alluxio workers up or down to adjust to workload changes.
To Scale Up Workers:
Modify your
alluxio-cluster.yaml
file and increase thecount
under theworker
section. The example below scales from 2 to 3 workers.Apply the change to your cluster.
# Apply the changes to Kubernetes
$ kubectl apply -f alluxio-cluster.yaml
alluxiocluster.k8s-operator.alluxio.com/alluxio-cluster configured
# Verify the new worker pods are being created
$ kubectl -n alx-ns get pod
NAME READY STATUS RESTARTS AGE
...
alluxio-cluster-worker-58999f8ddd-p6n59 0/1 PodInitializing 0 4s
# Wait for all workers to become ready
$ kubectl -n alx-ns get pod -l app.kubernetes.io/component=worker
NAME READY STATUS RESTARTS AGE
alluxio-cluster-worker-58999f8ddd-cd6r2 1/1 Running 0 5m21s
alluxio-cluster-worker-58999f8ddd-rtftk 1/1 Running 0 4m21s
alluxio-cluster-worker-58999f8ddd-p6n59 1/1 Running 0 34s
Upgrading Alluxio
The upgrade process involves two main steps: upgrading the Alluxio Operator and then upgrading the Alluxio cluster itself.
Step 1: Upgrade the Operator
The operator is stateless and can be safely re-installed without affecting the running Alluxio cluster.
Obtain the new Docker images for the operator and the new Helm chart.
Uninstall the old operator and install the new one.
# Uninstall the current operator
$ helm uninstall operator
release "operator" uninstalled
# Ensure the operator namespace is fully removed
$ kubectl get ns alluxio-operator
Error from server (NotFound): namespaces "alluxio-operator" not found
# Replace the new CRDs from the new Helm chart directory and create the new ones
$ kubectl create -f alluxio-operator/crds 2>/dev/null
$ kubectl replace -f alluxio-operator/crds 2>/dev/null
customresourcedefinition.apiextensions.k8s.io/alluxioclusters.k8s-operator.alluxio.com replaced
customresourcedefinition.apiextensions.k8s.io/clustergroups.k8s-operator.alluxio.com replaced
customresourcedefinition.apiextensions.k8s.io/collectinfoes.k8s-operator.alluxio.com replaced
customresourcedefinition.apiextensions.k8s.io/licenses.k8s-operator.alluxio.com replaced
customresourcedefinition.apiextensions.k8s.io/underfilesystems.k8s-operator.alluxio.com replaced
# Install the new operator using your configuration file (update the image tag)
$ helm install operator -f operator-config.yaml alluxio-operator
Step 2: Upgrade the Alluxio Cluster
The operator will perform a rolling upgrade of the Alluxio components.
Upload the new Alluxio Docker images to your registry.
Update the
imageTag
in youralluxio-cluster.yaml
to the new version.Apply the configuration change.
# Apply the updated cluster definition
$ kubectl apply -f alluxio-cluster.yaml
alluxiocluster.k8s-operator.alluxio.com/alluxio-cluster configured
# Monitor the rolling upgrade process
$ kubectl -n alx-ns get pod
NAME READY STATUS RESTARTS AGE
alluxio-cluster-coordinator-0 0/1 Init:0/2 0 7s
...
alluxio-cluster-worker-58999f8ddd-cd6r2 0/1 Init:0/2 0 7s
alluxio-cluster-worker-5d6786f5bf-cxv5j 1/1 Running 0 10m
# Check the cluster status until it returns to 'Ready'
$ kubectl -n alx-ns get alluxiocluster
NAME CLUSTERPHASE AGE
alluxio-cluster Updating 10m
...
NAME CLUSTERPHASE AGE
alluxio-cluster Ready 12m
# Verify the new version is running
$ kubectl -n alx-ns exec -it alluxio-cluster-coordinator-0 -- alluxio info version 2>/dev/null
AI-3.7-13.0.0
Dynamically Updating Configuration
You can change Alluxio properties in a running cluster by editing its ConfigMap.
Find the ConfigMap for your cluster.
$ kubectl -n alx-ns get configmap NAME DATA AGE alluxio-cluster-alluxio-conf 4 7m48s ...
Edit the ConfigMap to modify
alluxio-site.properties
,alluxio-env.sh
, etc.$ kubectl -n alx-ns edit configmap alluxio-cluster-alluxio-conf
Restart components to apply the changes.
Coordinator:
kubectl -n alx-ns rollout restart statefulset alluxio-cluster-coordinator
Workers:
kubectl -n alx-ns rollout restart deployment alluxio-cluster-worker
DaemonSet FUSE:
kubectl -n alx-ns rollout restart daemonset alluxio-fuse
CSI FUSE: These pods must be restarted by exiting the application pod or by manually deleting the FUSE pod (
kubectl -n alx-ns delete pod <fuse-pod-name>
).
2. Worker Management
Alluxio's decentralized architecture relies on workers that are managed via a consistent hash ring.
Checking Worker Status
To see a list of all registered workers and their current status (online or offline):
bin/alluxio info nodes
Adding a New Worker
To add a new worker to the cluster:
Install the Alluxio software on the new node.
Ensure the
alluxio-site.properties
file is configured to point to your etcd cluster.Start the worker process. It will automatically register itself in etcd and join the consistent hashing ring.
Removing a Worker Permanently
If you need to decommission a worker permanently:
Shut down the worker process on the target node.
Get the Worker ID by running
bin/alluxio info nodes
.Remove the worker using its ID.
bin/alluxio process remove-worker -n <worker_id>
Verify removal by running
bin/alluxio info nodes
again.
Important: Removing a worker is a permanent action that will cause its portion of the hash ring to be redistributed, potentially causing a temporary increase in cache misses.
Restarting a Worker
If you restart a worker for maintenance, it will be temporarily marked as offline. As long as its identity is preserved (via alluxio.worker.identity.uuid.file.path
), it will rejoin the cluster with its cached data intact and available.
3. UFS Mount Management
Alluxio's unified namespace allows you to mount multiple under storage systems (UFS) into a single logical view. The mount table that manages these connections is stored in etcd for high availability. Alluxio components periodically poll etcd for the latest mount table, so any changes are automatically propagated throughout the cluster.
Managing Mount Points
Use UnderFileSystem
configuration to manage UFS mounts. To add a new mount, see Connect to Storage guide.
List Mounts:
# You can also list all submitted configurations with Kubernetes CLI. $ kubectl -n alx-ns get ufs NAME PHASE AGE alluxio-s3 Ready 13d
Remove a Mount:
$ kubectl -n alx-ns delete ufs alluxio-s3 underfilesystem.k8s-operator.alluxio.com "alluxio-s3" deleted
Configuring UFS Credentials
You can provide credentials for a UFS globally or on a per-mount basis.
Global Configuration: Set properties for all mounts of a certain type (e.g., all S3 mounts) in
alluxio-site.properties
.# alluxio-site.properties s3a.accessKeyId=<S3_ACCESS_KEY> s3a.secretKey=<S3_SECRET_KEY>
Per-Mount Configuration (Recommended): Provide credentials as options during the mount command. This is the most flexible and secure method, ideal for connecting to multiple systems with different credentials. Per-mount options override global settings.
apiVersion: k8s-operator.alluxio.com/v1 kind: UnderFileSystem metadata: name: alluxio-s3 namespace: alx-ns spec: alluxioCluster: alluxio-cluster path: s3://bucket-a/images mountPath: /s3-images mountOptions: s3a.accessKeyId: <S3_ACCESS_KEY_ID> s3a.secretKey: <S3_SECRET_KEY>
Rules for Mounting
When defining your namespace, you must follow two important rules to ensure a valid and unambiguous mount table.
Rule 1: Mounts Must Be Direct Children of the Root (/
)
/
)You can only create mount points at the top level of the Alluxio namespace. You cannot mount to the root path (/
) itself, nor can you create a mount point inside a non-existent directory.
Examples:
Mount a bucket
/s3-data
s3://my-bucket/
✔️ Yes
Mount point is a direct child of root.
Mount to root
/
s3://my-bucket/
❌ No
The root path cannot be a mount point.
Mount to a sub-path
/data/images
s3://my-bucket/images/
❌ No
Mount points cannot be created in subdirectories.
Rule 2: Mounts Cannot Be Nested
One mount point cannot be created inside another, either in the Alluxio namespace or in the UFS namespace. For example, if /data
is mounted to s3://my-bucket/data
, you cannot create a new mount at /data/tables
(nested Alluxio path) or mount another UFS to s3://my-bucket/data/tables
(nested UFS path).
Example Scenario:
Suppose you have an existing mount point:
Alluxio Path:
/data
UFS Path:
s3://my-bucket/data
The following new mounts would be invalid:
/data/tables
hdfs://namenode/tables
❌ No
The Alluxio path /data/tables
is nested inside the existing /data
mount.
/tables
s3://my-bucket/data/tables
❌ No
The UFS path s3://.../data/tables
is nested inside the existing s3://.../data
mount.
4. Multi-Tenancy and Federation
For large-scale enterprise deployments, Alluxio provides advanced features for multi-tenancy and cluster federation. This allows multiple teams and business units to share data infrastructure securely and efficiently while simplifying administrative overhead.
The reference architecture below features an API Gateway that centrally handles authentication and authorization across multiple Alluxio clusters.
Core Concepts
Authentication
Alluxio integrates with external enterprise identity providers like Okta. When a user logs in, the provider authenticates them and generates a JSON Web Token (JWT). This JWT is then sent with every subsequent request to the Alluxio API Gateway to verify the user's identity.
Authorization
Once a user is authenticated, Alluxio uses an external policy engine, Open Policy Agent (OPA), to determine what actions the user is authorized to perform. Administrators can write fine-grained access control policies in OPA's declarative language, Rego, to control which users can access which resources. The API Gateway queries OPA for every request to ensure it is authorized.
Multi-Tenancy and Isolation
Alluxio enforces isolation between tenants to ensure security and prevent interference. This is achieved through:
User Roles: Defining different roles with specific access levels and permissions.
Cache Isolation: Assigning tenant-specific cache configurations, including quotas, TTLs, and eviction policies, ensuring one tenant's workload does not negatively impact another's.
Cluster Federation
For organizations with multiple Alluxio clusters (e.g., across different regions or for different business units), federation simplifies management. A central Management Console provides a single pane of glass for:
Cross-cluster monitoring and metrics.
Executing operations across multiple clusters simultaneously.
Centralized license management for all clusters.
Example Workflow: Updating a Cache Policy
This workflow demonstrates how the components work together:
Authentication: A user logs into the Management Console, which redirects them to Okta for authentication. Upon success, Okta issues a JWT.
Request Submission: The user uses the console to submit a request to change a cache TTL. The request, containing the JWT, is sent to the API Gateway.
Authorization: The API Gateway validates the JWT and queries the OPA Policy Engine to check if the user has permission to modify cache settings for the target tenant.
Execution: If the request is authorized, the API Gateway forwards the command to the coordinator of the relevant Alluxio cluster, which then applies the new TTL policy.
Last updated