This page describes how to deploy Alluxio on Kubernetes and run FIO as validation.
Prerequisites
Kubernetes
A Kubernetes cluster with version at least 1.19, with feature gates enabled.
Ensure the cluster's Kubernetes Network Policy allows for connectivity between applications (Alluxio clients) and the Alluxio Pods on the defined ports.
The Kubernetes cluster has helm 3 with version at least 3.6.0 installed.
Image registry for storing and managing container image
Alluxio Operator
Permission to create CRD (Custom Resource Definition);
Permission to create ServiceAccount, ClusterRole, and ClusterRoleBinding for the operator pod;
Permission to create namespace that the operator will be in.
Refer to https://fio.readthedocs.io/en/latest/fio_doc.html
Preparation
Download files
# helm chart for alluxio operator# extract this tarball, which will create the directory alluxio-operator/alluxio-operator-1.1.2-helmchart.tgz# docker images# use docker load to load the respective images into docker# alluxio operator docker imagealluxio-k8s-operator-1.1.2-docker.tar# alluxio/alluxio-enterprise docker imagealluxio-enterprise-AI-3.1-3.3.2-docker.tar# alluxio csi docker imagealluxio-csi-1.1.2-docker.tar
Extract Operator helm chart
# untar the Operator helm chart. This will extract to the directory alluxio-operator/$tar-xzfalluxio-operator-1.1.2-helmchart.tgz
Upload images
This example shows how to upload Alluxio operator image. Repeat these steps for the Alluxio CSI and Alluxio Enterprise images.
Create the following configuration files within the extracted directory of the alluxio operator helm chart.
Create the operator configuration in alluxio-operator/alluxio-operator.yaml
nameOverride:alluxio-operatorimage:alluxio/operator# set this value to be an accessible registry containing this imageimageTag:1.1.2imagePullPolicy:Alwaysalluxio-csi:# disable CSIenabled:false
Create the dataset configuration in alluxio-operator/dataset.yaml
Note that placeholder values are inserted for the dataset name and path, otherwise it will not work with the mount table feature.
Create the cluster configuration in alluxio-operator/alluxio-cluster.yaml
apiVersion:k8s-operator.alluxio.com/v1alpha1kind:AlluxioClustermetadata:name:alluxiospec:dataset:null-dataset# note this matches .metadata.name in the dataset configurationimage:alluxio/alluxio-enterprise# set this value to be an accessible registry containing this imageimageTag:AI-3.1-3.3.2imagePullPolicy:Alwaysproperties:# adjust the following alluxio-site.properties as neededalluxio.master.journal.type:"NOOP"alluxio.master.scheduler.initial.wait.time:"1s"alluxio.master.scheduler.restore.job.from.journal:"false"alluxio.user.file.writetype.default:"THROUGH"alluxio.user.metadata.cache.max.size:"0"alluxio.user.file.replication.min:"1"alluxio.user.fuse.sync.close.enabled:"true"alluxio.fuse.web.enabled:"true"alluxio.mount.table.source:"ETCD"alluxio.worker.membership.manager.type:"ETCD"alluxio.dora.ufs.list.status.cache.nr.files:"0"alluxio.security.authorization.permission.enabled:"false"alluxio.security.authentication.type:"NOSASL"alluxio.network.tls.enabled:"false"alluxio.user.fuse.sync.close.enabled:"false"alluxio.license:"xxxxxxx"master:count:1resources:limits:cpu:"1"memory:"10Gi"requests:cpu:"1"memory:"2Gi"jvmOptions: - "-Xmx4g" - "-Xms1g" - "-XX:MaxDirectMemorySize=4g"# nodeSelector: # change node label based on customer's environment# alluxio-node: "true"# localfs-server: "true"worker:count:2resources:limits:cpu:"10"memory:"20Gi"requests:cpu:"1"memory:"4Gi"jvmOptions: - "-Xmx8g" - "-Xms2g" - "-XX:MaxDirectMemorySize=8g"# nodeSelector: # change it based one real node label# alluxio-node: "true"pagestore:type:hostPathquota:10GihostPath:/mnt/alluxio/pagemetastore:type:hostPathhostPath:/mnt/alluxio/metafuse:enabled:truehostPathForMount:/mnt/alluxio/fuseresources:requests:cpu:"1"memory:"4Gi"limits:cpu:"6"memory:"16Gi"jvmOptions: - "-Xmx4g" - "-Xms1g" - "-XX:MaxDirectMemorySize=8g"mountOptions: - allow_other - kernel_cache - entry_timeout=10000 - attr_timeout=10000metrics:prometheusMetricsServlet:enabled:truepodAnnotations:prometheus.io/scrape:"true"prometheus.io/masterPort:"19999"prometheus.io/workerPort:"30000"prometheus.io/fusePort:"49999"prometheus.io/path:"/metrics/"etcd:enabled:truereplicaCount:3alluxio-monitor:enabled:true
Verify configurations
Modify the image, imageTag, and dataset values in alluxio-operator/alluxio-cluster.yaml. Modify the cpu, memory, and count values for master, worker, etcd, and fuse configurations as needed. Specify the startup location of pods using nodeSelector.
Bind SSD paths. The above configuration will have two hostpath mounts:
Alluxio pods will use the hostpath mount at /mnt/alluxio/meta to store Alluxio's metadata information. It is recommended to use a SSD disk for this directory.
Alluxio workers will use the hostpath mount at /mnt/alluxio/page to store Alluxio's cached data. It is also recommended to use a SSD disk for this directory.
The path for FUSE's local_data_cache is at /mnt/alluxio/fuse-local-cache.
Other Hostpath configuration:
For mounting NAS, you need to first add the corresponding mount path in the hostPaths section of Workers.
If you want to use a different path for FUSE local data cache, you also need to add the corresponding mount path in the hostPaths section of FUSE.
By default, FUSE is mounted at /mnt/alluxio/fuse. You can view the mounted UFS storage file list in the host’s directory /mnt/alluxio/fuse.
S3 ECR configuration:
The configuration value for the docker images should be replaced with customer’s AWS ECR address in order to successfully pull the images served by the corresponding ECR.
In this example, an existing S3 bucket is mounted to Alluxio
# go into alluxio worker pod$pod_worker=$(kubectlgetpods-lname=alluxio-worker-ojsonpath='{.items[0].metadata.name}'-nalluxio-test)$kubectlexec-it $pod_worker -nalluxio-test--bash# mount ufs$alluxiomountadd \--option aws.accessKeyId=xxx \--option aws.secretKey=xxx \--option alluxio.underfs.s3.region=us-east-1 \--path /bucket \--ufs-uri s3://test/MountedufsPath=s3://test/toalluxioPath=/bucketwith3options# check mount point status$alluxiomountlists3://test/on/bucket/properties={aws.secretKey=xxx,alluxio.underfs.s3.region=us-east-1,aws.accessKeyId=xxx}# go into alluxio fuse pod check data in mount point$pod_fuse=$(kubectlgetpods-lrole=alluxio-fuse-ojsonpath='{.items[0].metadata.name}'-nalluxio-test)$kubectlexec-it $pod_fuse -nalluxio-test--bash$ls-l/mnt/alluxio/fuse/bucket/drwx------1rootroot0Jan119702023-10-17/drwx------1rootroot0Jan11970alluxio/drwx------1rootroot0Jan11970alluxio_ufs/-rwx------1rootroot173279Oct1708:26log.tar.gz*drwx------1rootroot0Jan11970pach_alluxio/# unmount (as needed)$alluxiomountremove--path/bucketUnmounted/bucketfromAlluxio.
Quick Verification - FIO
Follow the instructions here to install the FIO tool on the FUSE pod.
Execute the following tests via Alluxio FUSE with FIO
A Grafana dashboard is deployed in the same namespace as the Alluxio cluster, exposed through port 8080 on its host machine. The port needs to be opened and not firewalled by the host machine.
If using EKS
Run kubectl get pods -owide -n <alluxio namespace> | grep grafana to get the hostname of the node. It should be in the form of ip-10-0-6-132.ec2.internal.
If the machine we are using to access Grafana are in the same private network as the host machine, access the Grafana UI directly through http://<hostname>:8080. Otherwise, identify the external IP of the host machine to use as the hostname in the URL. Run kubectl get nodes -owide to find the corresponding external IP.
[centos@ip-172-31-92-52 ~]$ k get nodes -owideNAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
ip-10-0-6-132.ec2.internal Ready <none> 210d v1.22.17-eks-0a21954 10.0.6.132 35.173.122.123 Amazon Linux 2 5.4.247-162.350.amzn2.x86_64 docker://20.10.23
In this example, the machine has an external IP of 35.173.122.123, so the Grafana UI should be accessible through http://35.173.122.123:8080
Appendix: Access Alluxio via Kubernetes CSI
Applications can use Alluxio FUSE as a Persistent Volume Claim (PVC) via CSI.
CSI yaml configuration file
Default configuration file at alluxio-operator/charts/alluxio-csi/values.yaml
If you are not able to access the internet, you will need to download the two dependent CSI images and upload them to the local image registry, then modify the values for provisioner.image and driverRegistrar.image to point to the corresponding local image addresses.
nameOverride:alluxioimage:alluxio/csiimageTag:latestimagePullPolicy:IfNotPresentimagePullSecrets:hostNetwork:falsednsPolicy:kubeletPath:/var/lib/kubeletcontrollerPlugin:# NodeSelector for scheduling Alluxio CSI controllernodeSelector: {}# Schedule Alluxio CSI controller with affinity.affinity: {}# Additional tolerations for scheduling Alluxio CSI controllertolerations: []provisioner:image:registry.k8s.io/sig-storage/csi-provisioner:v2.0.5resources:limits:cpu:100mmemory:300Mirequests:cpu:10mmemory:20Micontroller:resources:limits:cpu:200mmemory:200Mirequests:cpu:10mmemory:20MinodePlugin:# NodeSelector for scheduling Alluxio CSI nodePluginnodeSelector: {}# Schedule Alluxio CSI nodePlugin with affinity.affinity: {}# Additional tolerations for scheduling Alluxio CSI nodePlugintolerations: []nodeserver:resources:limits:cpu:200mmemory:200Mirequests:cpu:10mmemory:20MidriverRegistrar:image:registry.k8s.io/sig-storage/csi-node-driver-registrar:v2.0.0resources:limits:cpu:100mmemory:100Mirequests:cpu:10mmemory:20Mi
Update the Alluxio operator configuration at alluxio-operator/alluxio-operator.yaml
nameOverride:alluxio-operatorimage:alluxio/operator# set to the accessible registry with the imagesimageTag:1.1.2imagePullPolicy:Alwaysalluxio-csi:# enable CSIenabled:trueimage:alluxio/csi# set to the accessible registry with the imagesimageTag:1.1.2
To disable the FUSE daemonset, update the following section of Alluxio cluster configuration at alluxio-operator/alluxio-cluster.yaml
spec:fuse:enabled:false
Check Alluxio configuration
The following steps will add a CSI FUSE volume in the application pod.
Add a sample pod in alluxio-operator/alluxio-cluster.yaml
# Run the pod$kubectlapply-falluxio-operator/app.yaml-nalluxio-testpod/fuse-testcreated# Enter the pod and check$kubectlexec-itfuse-test-nalluxio-test--sh$ls-l/data/bucket/drwx------1rootroot0Jan119702023-10-17drwx------1rootroot0Jan11970alluxiodrwx------1rootroot0Jan11970alluxio_ufs-rwx------1rootroot173279Oct1708:26log.tar.gz
Troubleshooting
Inspect and manipulate dataset credentials
$kubectlgetcrd# delete all crds ending with k8s-operator.alluxio.com$ kubectl delete crd datasets.k8s-operator.alluxio.com loads.k8s-operator.alluxio.com updates.k8s-operator.alluxio.com unloads.k8s-operator.alluxio.com alluxioclusters.k8s-operator.alluxio.com