Installing on Kubernetes
This documentation shows how to install Alluxio on Kubernetes via Operator, a Kubernetes extension for managing applications.
Installation Steps
1. Preparation
Before you begin, ensure you have reviewed the Resource Prerequisites and Compatibility.
It is assumed the required container images, both for Alluxio and third party components, are accessible to the Kubernetes cluster. If your cluster cannot access public image repositories, see Appendix A: Handling Images for instructions.
First, download and extract the operator helm chart:
# The command will extract the files to the directory alluxio-operator/
$ tar zxf alluxio-operator-3.3.6-helmchart.tgz
This creates the alluxio-operator
directory containing the Helm chart.
2. Deploy Alluxio Operator
Create an alluxio-operator/alluxio-operator.yaml
file to specify the operator image.
global:
image: <PRIVATE_REGISTRY>/alluxio-operator
imageTag: 3.3.6
alluxio-csi:
enabled: false
Now, deploy the operator using Helm:
$ cd alluxio-operator
# The last parameter is the directory to the helm chart; "." means the current directory
$ helm install operator -f alluxio-operator.yaml .
Verify that the operator pods are running:
$ kubectl -n alluxio-operator get pod
NAME READY STATUS RESTARTS AGE
alluxio-cluster-controller-5647cc664d-lrx84 1/1 Running 0 14s
alluxio-collectinfo-controller-667b746fd6-hfzqk 1/1 Running 0 14s
...
If the operator pods fail to start due to image pull errors, your Kubernetes cluster may not have access to public image registries. Please refer to Appendix A.2: Unable to access public image registry.
3. Deploy Alluxio Cluster
For a production environment, we recommend deploying the Alluxio cluster with specific node selectors and persistent storage for metadata.
First, label the Kubernetes nodes where you want to run the Alluxio coordinator and workers:
kubectl label nodes <node-name> alluxio-role=coordinator
kubectl label nodes <node-name> alluxio-role=worker
Next, create an alluxio-cluster.yaml
file. This example includes a cluster license, nodeSelector
for production environments, and a persistent metastore
.
A cluster license is the simplest way to get started. For other options, including the recommended deployment license for production, see Appendix D: License Management. For more advanced cluster configurations, see Appendix C: Advanced Configuration.
apiVersion: k8s-operator.alluxio.com/v1
kind: AlluxioCluster
metadata:
name: alluxio-cluster
namespace: alx-ns
spec:
image: <PRIVATE_REGISTRY>/alluxio-enterprise
imageTag: DA-3.7-13.0.1
properties:
alluxio.license: <YOUR_CLUSTER_LICENSE>
coordinator:
nodeSelector:
alluxio-role: coordinator
metastore:
type: persistentVolumeClaim
storageClass: "gp2"
size: 4Gi
worker:
nodeSelector:
alluxio-role: worker
count: 2
pagestore:
size: 100Gi
fuse:
type: none
Deploy the Alluxio cluster:
$ kubectl create namespace alx-ns
$ kubectl create -f alluxio-cluster.yaml
Check the status of the cluster. It may take a few minutes for all pods to become Ready
.
# Check the cluster status
$ kubectl -n alx-ns get alluxiocluster
NAME CLUSTERPHASE AGE
alluxio-cluster Ready 2m18s
# Check the running pods
$ kubectl -n alx-ns get pod
NAME READY STATUS RESTARTS AGE
alluxio-cluster-coordinator-0 1/1 Running 0 2m3s
alluxio-cluster-etcd-0 1/1 Running 0 2m3s
...
alluxio-cluster-worker-85fd45db46-c7n9p 1/1 Running 0 2m3s
...
If any component fails to start, refer to Appendix B: Troubleshooting for guidance.
4. Connect to Storage
Alluxio unifies access to your existing data by connecting to various storage systems, known as Under File Systems (UFS). You can mount a UFS by creating an UnderFileSystem
resource.
For a complete list of supported storage systems, see the Connecting to Storage guide.
The following example shows how to mount an S3 bucket. Create a ufs.yaml
file:
apiVersion: k8s-operator.alluxio.com/v1
kind: UnderFileSystem
metadata:
name: alluxio-s3
namespace: alx-ns
spec:
alluxioCluster: alluxio-cluster
path: s3://<S3_BUCKET>/<S3_DIRECTORY>
mountPath: /s3
mountOptions:
s3a.accessKeyId: <S3_ACCESS_KEY_ID>
s3a.secretKey: <S3_SECRET_KEY>
alluxio.underfs.s3.region: <S3_REGION>
Apply the configuration to mount the storage:
$ kubectl create -f ufs.yaml
Verify the mount status:
# Verify the UFS resource is ready
$ kubectl -n alx-ns get ufs
NAME PHASE AGE
alluxio-s3 Ready 46s
# Check the mount table in Alluxio
$ kubectl -n alx-ns exec -it alluxio-cluster-coordinator-0 -- alluxio mount list 2>/dev/null
Listing all mount points
s3://my-bucket/path/to/mount on /s3/ properties={...}
5. Access Data
Alluxio provides several APIs for applications to access data. For a general overview, see Accessing Data.
S3 API: Ideal for applications already using S3 SDKs; see S3 API
HDFS API: For Java applications that integrate with HDFS API; see Java HDFS-compatible API
Appendix
A. Handling Images
Two types of container images are required for deployment:
Alluxio images: Provided by your Alluxio sales representative.
Third-party images: For components like etcd, typically pulled from public registries.
All images must be accessible to your Kubernetes cluster. If your cluster is in an air-gapped environment or cannot access public registries, you must pull all necessary images and push them to your private registry.
A.1. Alluxio Images
The primary Alluxio images are:
alluxio-operator-3.3.6-docker.tar
alluxio-enterprise-DA-3.7-13.0.1-docker.tar
Load and push them to your private registry:
# Load the images locally
$ docker load -i alluxio-operator-3.3.6-docker.tar
$ docker load -i alluxio-enterprise-DA-3.7-13.0.1-docker.tar
# Retag the images for your private registry
$ docker tag alluxio/operator:3.3.6 <PRIVATE_REGISTRY>/alluxio-operator:3.3.6
$ docker tag alluxio/alluxio-enterprise:DA-3.7-13.0.1 <PRIVATE_REGISTRY>/alluxio-enterprise:DA-3.7-13.0.1
# Push to the remote registry
$ docker push <PRIVATE_REGISTRY>/alluxio-operator:3.3.6
$ docker push <PRIVATE_REGISTRY>/alluxio-enterprise:DA-3.7-13.0.1
A.2. Unable to access public image registry
If your cluster cannot pull from public registries, you will see pods stuck in ContainerCreating
or ImagePullBackOff
status. You must manually pull, retag, and push the required third-party images to your private registry.
Third-Party Dependent Images
cluster ETCD
docker.io/bitnami/etcd
3.5.9-debian-11-r24
cluster ETCD
docker.io/bitnami/os-shell
11-debian-11-r2
cluster monitor
grafana/grafana
10.4.5
cluster monitor
prom/prometheus
v2.52.0
Commands to Relocate Images
# Pull the Docker images (specify --platform if needed)
...
# Tag the images with your private registry
...
# Push the images to your private registry
...
Update YAML Files
Update alluxio-operator.yaml
and alluxio-cluster.yaml
to point to the images in your private registry.
alluxio-operator.yaml
Example:
global:
image: <PRIVATE_REGISTRY>/alluxio-operator
imageTag: 3.3.6
alluxio-cluster.yaml
Example:
spec:
image: <PRIVATE_REGISTRY>/alluxio-enterprise
imageTag: DA-3.7-13.0.1
etcd:
image:
registry: <PRIVATE_REGISTRY>
repository: <PRIVATE_REPOSITORY>/etcd
tag: 3.5.9-debian-11-r24
...
B. Troubleshooting
B.1. etcd pod stuck in pending status
If etcd
pods are Pending
, it is often due to storage issues. Use kubectl describe pod <etcd-pod-name>
to check events.
Symptom: Event message shows pod has unbound immediate PersistentVolumeClaims
.
Cause: No storageClass
is set for the PVC, or no PV is available.
Solution: Specify a storageClass
in alluxio-cluster.yaml
:
spec:
etcd:
persistence:
storageClass: <YOUR_STORAGE_CLASS>
size: 10Gi # Example size
Then, delete the old cluster and PVCs before recreating the cluster.
Symptom: Event message shows waiting for first consumer
.
Cause: The storageClass
does not support dynamic provisioning, and a volume must be manually created by an administrator.
Solution: Either use a dynamic provisioner or manually create a PersistentVolume that satisfies the claim.
C. Advanced Configuration
This section describes common configurations to adapt to different scenarios.
C.1. Configuring Alluxio Properties
To modify Alluxio's configuration, edit the .spec.properties
field in the alluxio-cluster.yaml
file. These properties are appended to the alluxio-site.properties
file inside the Alluxio pods.
C.2. Configuring the Hash Ring
The consistent hash ring determines how data is mapped to workers. It is critical to define your hash ring strategy before deploying the cluster, as changing these settings later is a destructive operation that will cause all cached data to be lost.
Key properties to consider, which should be set in alluxio-cluster.yaml
under .spec.properties
:
Hash Ring Mode (
alluxio.user.dynamic.consistent.hash.ring.enabled
):true
(Default): Dynamic mode. Includes only online workers. Best for most environments.false
: Static mode. Includes all registered workers, online or offline. Use if you need a stable ring view despite temporary worker unavailability.
Virtual Nodes (
alluxio.user.worker.selection.policy.consistent.hash.virtual.node.count.per.worker
):Default:
2000
. Controls load balancing granularity.
Worker Capacity (
alluxio.user.worker.selection.policy.consistent.hash.provider.impl
):DEFAULT
(Default): Assumes all workers have equal capacity.CAPACITY
: Allocates virtual nodes based on worker storage capacity. Use this for heterogeneous clusters.
For more details, see Hash Ring Management.
C.3. Resource and JVM Tuning
You can configure resource limits and JVM options for each component.
apiVersion: k8s-operator.alluxio.com/v1
kind: AlluxioCluster
spec:
worker:
count: 2
resources:
limits:
cpu: "12"
memory: "36Gi"
requests:
cpu: "1"
memory: "32Gi"
jvmOptions:
- "-Xmx22g"
- "-Xms22g"
- "-XX:MaxDirectMemorySize=10g"
coordinator:
resources:
limits:
cpu: "12"
memory: "36Gi"
requests:
cpu: "1"
memory: "32Gi"
jvmOptions:
- "-Xmx4g"
- "-Xms1g"
The container's total memory limit should be slightly more than the sum of its heap size (
-Xmx
) and direct memory size (-XX:MaxDirectMemorySize
) to avoid out-of-memory errors.
C.4. Use PVC for Page Store
To persist worker cache data, specify a PersistentVolumeClaim (PVC) for the page store.
apiVersion: k8s-operator.alluxio.com/v1
kind: AlluxioCluster
spec:
worker:
pagestore:
type: persistentVolumeClaim
storageClass: "" # Defaults to "standard", can be empty for static binding
size: 100Gi
reservedSize: 10Gi # Recommended 5-10% of cache size
C.5. Mount Custom ConfigMaps or Secrets
You can mount custom ConfigMap
or Secret
files into your Alluxio pods. This is useful for providing configuration files like core-site.xml
or credentials.
Example: Mount a Secret
Create the secret from a local file:
kubectl -n alx-ns create secret generic my-secret --from-file=/path/to/my-file
Specify the secret to load and the mount path in
alluxio-cluster.yaml
:apiVersion: k8s-operator.alluxio.com/v1 kind: AlluxioCluster spec: secrets: worker: my-secret: /opt/alluxio/secret coordinator: my-secret: /opt/alluxio/secret
The file
my-file
will be available at/opt/alluxio/secret/my-file
on the pods.
C.6. Use External ETCD
If you have an external ETCD cluster, you can configure Alluxio to use it instead of the one deployed by the operator.
apiVersion: k8s-operator.alluxio.com/v1
kind: AlluxioCluster
spec:
etcd:
enabled: false
properties:
alluxio.etcd.endpoints: http://external-etcd:2379
# If using TLS for ETCD, add the following:
# alluxio.etcd.tls.enabled: "true"
C.7. Customize ETCD configuration
The fields under spec.etcd
follow the Bitnami ETCD helm chart. For example, to set node affinity for etcd pods, the affinity
field can be used as described in the Kubernetes documentation.
apiVersion: k8s-operator.alluxio.com/v1
kind: AlluxioCluster
spec:
etcd:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: topology.kubernetes.io/zone
operator: In
values:
- antarctica-east1
- antarctica-west1
D. License Management
Alluxio requires a license provided by your sales representative. There are two types: a cluster license (for single test clusters) and a deployment license (recommended for production).
D.1. Cluster License
A cluster license is set directly in the alluxio-cluster.yaml
file. This method is not recommended for production.
apiVersion: k8s-operator.alluxio.com/v1
kind: AlluxioCluster
spec:
properties:
alluxio.license: <YOUR_CLUSTER_LICENSE>
D.2. Deployment License
A deployment license is the recommended method for production and can cover multiple clusters. It is applied by creating a separate License
resource after the cluster has been created.
Step 1: Create the Cluster without a License Deploy the Alluxio cluster as described in Step 3 of the main guide, but do not include the alluxio.license
property in alluxio-cluster.yaml
. The pods will start but remain in an Init
state, waiting for the license.
Step 2: Apply the License Create an alluxio-license.yaml
file as shown in Step 4 of the main guide. The name
and namespace
in this file must match the metadata of your AlluxioCluster
.
apiVersion: k8s-operator.alluxio.com/v1
kind: License
metadata:
name: alluxio-license
namespace: alx-ns
spec:
clusters:
- name: alluxio-cluster
namespace: alx-ns
licenseString: <YOUR_DEPLOYMENT_LICENSE>
Apply this file with kubectl create -f alluxio-license.yaml
. The Alluxio pods will detect the license and transition to Running
.
Warning: Only specify running clusters in the
clusters
list. If the operator cannot find a listed cluster, the license operation will fail for all clusters.
D.3. Updating a Deployment License
To update an existing deployment license, update the licenseString
in your alluxio-license.yaml
and re-apply it:
kubectl delete -f alluxio-license.yaml
kubectl create -f alluxio-license.yaml
D.4. Checking License Status
You can check the license details and utilization from within the Alluxio coordinator pod.
# Get a shell into the coordinator pod
$ kubectl -n alx-ns exec -it alluxio-cluster-coordinator-0 -- /bin/bash
# View license details (expiration, capacity)
$ alluxio license show
# View current utilization
$ alluxio license status
Last updated