Install on Kubernetes
This documentation shows how to install Alluxio on Kubernetes via Operator, a Kubernetes extension for managing applications.
Preparation
Please see Resource Prerequisites and Compatibility for resource planning recommendations.
It is assumed the required container images, both for Alluxio and third party components, are accessible to the Kubernetes cluster. See handling images for instructions on how to extract and upload the Alluxio images from a provided tarball package, as well as which third party images to upload in the case the Kubernetes cluster does not have access to public image repositories.
Extract the helm chart for operator
Download the operator helm chart tarball into a location with access to deploy on a running Kubernetes cluster.
# the command will extract the files to the directory alluxio-operator/
$ tar zxf alluxio-operator-3.2.1-helmchart.tgzThe extracted alluxio-operator directory contains the Helm chart files responsible for deploying the operator.
Deployment
Deploy Alluxio operator
Create the alluxio-operator/alluxio-operator.yaml file to specify the image and version used for deploying the operator. The following example shows how to specify the operator image and version:
global:
image: <PRIVATE_REGISTRY>/alluxio-operator
imageTag: 3.2.1Move to the alluxio-operator directory and execute the following command to deploy the operator:
Deploying alluxio operator requires pulling dependent images from the public image repository. If you fail to deploy
alluxio-operatorbecause the network environment cannot access the public image repository, please refer to Configuring alluxio-operator image.
Deploy Alluxio
Create the alluxio-operator/alluxio-cluster.yaml file to deploy the Alluxio cluster. The file below shows the minimal configuration, which is recommended for testing scenarios.
The minimal configuration provided above can help you quickly deploy the Alluxio cluster for testing and validation. In a production environment where restarts are anticipated, we recommend deploying the Alluxio cluster using labels and selectors, as well as persisting information on PVCs.
Select a group of Kubernetes nodes to run the Alluxio cluster, and label the nodes accordingly:
The following configuration is a starting template for production scenarios, where nodeSelector and metastore fields are added.
Move to the alluxio-operator directory and execute the following commands to deploy the Alluxio cluster:
In Alluxio 3.x, the coordinator is a stateless control component that serves as an interface to the whole cluster, such as serving jobs like distributed load.
If some components in the cluster do not reach the
Runningstate, you can usekubectl describe podto view detailed information and identify the issue. For specific issues encountered during deployment, refer to the FAQ section.
Alluxio cluster also includes etcd and monitoring components. If the image cannot be pulled from the public image registry, causing etcd and monitoring to fail to start, please refer to Configuring Alluxio Cluster Image.
Mount storage to Alluxio
Alluxio supports integration with various underlying storage systems, including S3, HDFS, OSS, COS, and TOS.
With the operator, you can mount underlying storage by creating UnderFileSystem resources. An UnderFileSystem corresponds to a mount point for Alluxio. Regarding the Alluxio and the underlying storage namespace, please refer to Alluxio Namespace and Under File System Namespaces.
Create the alluxio-operator/ufs.yaml file to specify the UFS configuration. The following example shows how to mount an S3 bucket to Alluxio.
Find more details about mounting S3 to Alluxio in Amazon AWS S3.
Executing the mount
First, ensure that the Alluxio cluster is up and running with a Ready or WaitingForReady status.
Execute the following command to create the UnderFileSystem resource and mount that to Alluxio namespace:
Monitoring
The Alluxio cluster enables monitoring by default. You can view various Alluxio metrics visually through Grafana. Please refer to the Monitoring and Metrics section on Kubernetes Operator.
Data Access Acceleration
In the steps above, you deployed the Alluxio cluster and mounted the under file system to Alluxio. Training tasks that read data through Alluxio can improve training speed and GPU utilization. Majorly, Alluxio provides three ways for applications to access data:
FUSE based POSIX API: Please refer to FUSE based POSIX API.
S3 API: Please refer to S3 API.
FSSpec Python API: Please refer to FSSpec Python API.
FAQ
etcd pod stuck in pending status
For example, if three etcd pods remain in the Pending state, you can use kubectl describe pod to view detailed information:
Based on the error message, the etcd pods are stuck in the Pending state because no storage class is set. You can resolve this issue by specifying the storage class for etcd in the alluxio-operator/alluxio-cluster.yaml file:
First, delete the Alluxio cluster and the etcd PVC, then recreate the Alluxio cluster:
Another issue is the etcd PVC specifies a storage class, but both the etcd pod and PVC remain in a pending state. For example, as shown in the detailed information of the PVC below, the storage class specified for the etcd PVC does not support dynamic provisioning, and the storage volume needs to be manually created by the cluster administrator.
For similar issues where etcd pods remain in the Pending state, you can use the above method for troubleshooting.
alluxio-cluster-fuse PVC in pending status
After creating the cluster, you might notice that alluxio-cluster-fuse is in the Pending status. This is normal. The PVC will automatically bind to a PV and its status will change to Bound when it is used by a client pod.
Last updated