Alluxio
ProductsLanguageHome
AI-3.6 (stable)
AI-3.6 (stable)
  • Overview
    • Alluxio Namespace and Under File System
    • Worker Management and Consistent Hashing
    • Multi Tenancy and Unified Management
    • I/O Resiliency
  • Getting Started with K8s
    • Resource Prerequisites and Compatibility
    • Installation
      • Install on Kubernetes
      • Handling Images
      • Advanced Configuration
      • License
    • Monitoring and Metrics
    • Management Console
      • Deployment
      • Navigation
      • User Roles & Access Control
    • Cluster Administration
    • System Health Check & Quick Recovery
    • Diagnostic Snapshot
  • Storage Integrations
    • Amazon AWS S3
    • Google Cloud GCS
    • Azure Blob Store
    • Aliyun OSS
    • Tencent COS
    • Volcengine TOS
    • Baidu Object Storage
    • HDFS
    • Network Attached Storage (NAS)
  • Data Access
    • Access via FUSE (POSIX API)
      • Client Writeback
      • Client Virtual Path Mapping
    • Access via S3 API
    • Access via PythonSDK/FSSpec
    • Data Access High Availability
      • Multiple Replicas
      • Multiple Availability Zones (AZ)
    • Performance Optimizations
      • File Reading
      • File Writing
      • Metadata Listing
    • UFS Bandwidth Limiter
  • Cache Management
    • Cache Filter Policy
    • Cache Loading
    • Cache Eviction
      • Manual Eviction by Free Command
      • Auto Eviction by TTL Policy
      • Auto Eviction by Priority Policy
    • Stale Cache Cleaning
    • Cache Quota
  • Performance Benchmarks
    • Fio (POSIX) Benchmark
    • COSBench (S3) Benchmark
    • MLPerf Storage Benchmark
  • Security
    • TLS Support
  • Reference
    • User CLI
    • Metrics
    • S3 API Usage
    • Third Party Licenses
  • Release Notes
Powered by GitBook
On this page
  • Dynamically Update Alluxio Configuration in a Running Cluster
  • Upgrading to a newer Alluxio version
  • Upgrade the Operator
  • Upgrade the Alluxio cluster
  • Scaling the size of the cluster
  • Scale Up the Workers
  1. Getting Started with K8s

Cluster Administration

This document describes administrative operations on a running Alluxio cluster on Kubernetes, such as upgrading to a new version and adding new workers.

Dynamically Update Alluxio Configuration in a Running Cluster

  1. Get configmap

$ kubectl -n alx-ns get configmap
NAME                              DATA   AGE
alluxio-cluster-alluxio-conf      4      7m48s
...
  1. Run edit configmap to update Alluxio configuration

$ kubectl -n alx-ns edit configmap alluxio-cluster-alluxio-conf

There should be 4 files inside: alluxio-env.sh, alluxio-site.properties, log4j2.xml, and metrics.properties. Edit as you need, then save the configmap.

configmap/alluxio-cluster-alluxio-conf edited
  1. Restart Alluxio components as needed For the cluster alluxio-cluster in the alx-ns namespace:

  • coordinator: kubectl -n alx-ns rollout restart statefulset alluxio-cluster-coordinator

  • worker: kubectl -n alx-ns rollout restart deployment alluxio-cluster-worker

  • daemonset fuse (fuse.type = daemonSet): kubectl -n alx-ns rollout restart daemonset alluxio-fuse

  • csi fuse (fuse.type = csi): the csi fuse pods doesn’t have a rollout restart command and must be restarted by either exiting out of the attached user's pod or manually kill it with kubectl -n alx-ns delete pod alluxio-fuse-xxx.

Upgrading to a newer Alluxio version

Upgrade the Operator

  1. Run the following command to apply the new changes to the cluster.

# uninstall the operator. the operator is independent and the status of the operator won't affect the existing Alluxio cluster
$ helm uninstall operator
release "operator" uninstalled

# check if all the resources are removed. the namespace will be the last resource to remove
$ kubectl get ns alluxio-operator
Error from server (NotFound): namespaces "alluxio-operator" not found

# run the command in the new helm chart directory to upgrade the CRDs first
$ kubectl apply -f alluxio-operator/crds 2>/dev/null
customresourcedefinition.apiextensions.k8s.io/alluxioclusters.k8s-operator.alluxio.com configured
customresourcedefinition.apiextensions.k8s.io/underfilesystems.k8s-operator.alluxio.com configured

# use the same operator-config.yaml with only the tag of the image changed to restart the operator
$ helm install operator -f operator-config.yaml alluxio-operator
NAME: operator
LAST DEPLOYED: Thu Jun 27 15:47:44 2024
NAMESPACE: default
STATUS: deployed
REVISION: 1
TEST SUITE: None

Upgrade the Alluxio cluster

Before the operation you should know:

  • When the upgrade operation starts, the coordinator, workers, and the DaemonSet FUSE will perform rolling upgrade to use the new image. The existing CSI FUSE pods will not be restarted and upgraded, and only the new pods will use the new image.

  • While the cluster is being upgraded, the cache hit rate may decrease slightly, but will fully recover once the cluster is fully running again.

Following the steps to upgrade the cluster:

  1. Update the imageTag fields in alluxio-cluster.yaml to reflect the new Alluxio version. In the following example the new imageTag will be AI-3.6-12.0.2.

  2. Run the following command to apply the new changes to the cluster.

# apply the changes to Kubernetes
$ kubectl apply -f alluxio-cluster.yaml
alluxiocluster.k8s-operator.alluxio.com/alluxio-cluster configured

# verify the upgration. you can see the new pods are spawning
$ kubectl -n alx-ns get pod
NAME                                          READY   STATUS     RESTARTS   AGE
alluxio-cluster-coordinator-0                 0/1     Init:0/2   0          7s
alluxio-cluster-etcd-0                        1/1     Running    0          10m
alluxio-cluster-etcd-1                        1/1     Running    0          10m
alluxio-cluster-etcd-2                        1/1     Running    0          10m
alluxio-cluster-grafana-b89bf9dbb-77pb6       1/1     Running    0          10m
alluxio-cluster-prometheus-59b7b8bd64-b95jh   1/1     Running    0          10m
alluxio-cluster-worker-58999f8ddd-cd6r2       0/1     Init:0/2   0          7s
alluxio-cluster-worker-5d6786f5bf-cxv5j       1/1     Running    0          10m

# check the status of the cluster
$ kubectl -n alx-ns get alluxiocluster
NAME              CLUSTERPHASE   AGE
alluxio-cluster   Updating       10m

# wait until the cluster is ready again
$ kubectl -n alx-ns get alluxiocluster
NAME              CLUSTERPHASE   AGE
alluxio-cluster   Ready          12m

# check the pods of the cluster. you can see the age of the alluxio pods are changed
$ kubectl -n alx-ns get pod
NAME                                          READY   STATUS    RESTARTS   AGE
alluxio-cluster-coordinator-0                 1/1     Running   0          93s
alluxio-cluster-etcd-0                        1/1     Running   0          12m
alluxio-cluster-etcd-1                        1/1     Running   0          12m
alluxio-cluster-etcd-2                        1/1     Running   0          12m
alluxio-cluster-grafana-b89bf9dbb-77pb6       1/1     Running   0          12m
alluxio-cluster-prometheus-59b7b8bd64-b95jh   1/1     Running   0          12m
alluxio-cluster-worker-58999f8ddd-cd6r2       1/1     Running   0          93s
alluxio-cluster-worker-58999f8ddd-rtftk       1/1     Running   0          33s

# double check the version string
$ kubectl -n alx-ns exec -it alluxio-cluster-coordinator-0 -- alluxio info version 2>/dev/null
AI-3.6-12.0.2

Scaling the size of the cluster

Scale Up the Workers

Before the operation you should know:

  • While the cluster is being upgraded, the cache hit rate may decrease slightly, but will fully recover once the cluster is fully running again.

Following the steps to scale up the workers:

  1. Change the alluxio-cluster.yaml to increase the count in the worker. In the following example we will scale from 2 workers to 3 workers.

  2. Run the following command to apply the new changes to the cluster.

# apply the changes to Kubernetes
$ kubectl apply -f alluxio-cluster.yaml
alluxiocluster.k8s-operator.alluxio.com/alluxio-cluster configured

# verify the cluster is upgrading. you should be able to see the new pods are spawning
$ kubectl -n alx-ns get pod
NAME                                          READY   STATUS            RESTARTS   AGE
alluxio-cluster-coordinator-0                 1/1     Running           0          4m51s
alluxio-cluster-etcd-0                        1/1     Running           0          15m
alluxio-cluster-etcd-1                        1/1     Running           0          15m
alluxio-cluster-etcd-2                        1/1     Running           0          15m
alluxio-cluster-grafana-b89bf9dbb-77pb6       1/1     Running           0          15m
alluxio-cluster-prometheus-59b7b8bd64-b95jh   1/1     Running           0          15m
alluxio-cluster-worker-58999f8ddd-cd6r2       1/1     Running           0          4m51s
alluxio-cluster-worker-58999f8ddd-rtftk       1/1     Running           0          3m51s
alluxio-cluster-worker-58999f8ddd-p6n59       0/1     PodInitializing   0          4s

# check if the new instances are ready
$ kubectl -n alx-ns get pod
NAME                                          READY   STATUS    RESTARTS   AGE
alluxio-cluster-coordinator-0                 1/1     Running   0          5m21s
alluxio-cluster-etcd-0                        1/1     Running   0          16m
alluxio-cluster-etcd-1                        1/1     Running   0          16m
alluxio-cluster-etcd-2                        1/1     Running   0          16m
alluxio-cluster-grafana-b89bf9dbb-77pb6       1/1     Running   0          16m
alluxio-cluster-prometheus-59b7b8bd64-b95jh   1/1     Running   0          16m
alluxio-cluster-worker-58999f8ddd-cd6r2       1/1     Running   0          5m21s
alluxio-cluster-worker-58999f8ddd-rtftk       1/1     Running   0          4m21s
alluxio-cluster-worker-58999f8ddd-p6n59       1/1     Running   0          34s

Last updated 10 days ago

Upload the new docker images corresponding to the new Alluxio operator version to your image registry and unpack the helm chart of the operator. Refer to the for details.

Upload the new docker images corresponding to the new Alluxio version to your image registry. Refer to the for details.

installation doc
installation doc