# 在Kubernetes上安装Alluxio

本文档展示了如何通过 [Operator](https://kubernetes.io/zh-cn/docs/concepts/extend-kubernetes/operator/)（Kubernetes管理应用程序的扩展）在Kubernetes上安装Alluxio。

## 系统要求

* Kubernetes
  * 至少1.19版本的Kubernetes集群，支持[特性门控](https://kubernetes.io/zh-cn/docs/reference/command-line-tools-reference/feature-gates/)
  * 确保集群的Kubernetes网络策略允许应用程序（Alluxio客户端）与定义端口上的Alluxio Pods之间建立连接
  * Kubernetes集群已安装至少3.6.0版本的Helm 3
  * 用于存储和管理容器镜像的镜像仓库
* 权限。参考：[使用RBAC授权](https://kubernetes.io/zh-cn/docs/reference/access-authn-authz/rbac/)
  * 创建 CRD（自定义资源定义）的权限
  * 为 Operator Pod 创建 ServiceAccount、ClusterRole 和ClusterRoleBinding 的权限
  * 创建 operator 所在命名空间的权限

## 准备

### 下载 Alluxio operator 和 Alluxio 集群的文件

* `alluxio-operator-1.2.0-helmchart.tgz` 是部署 Alluxio operator 的 helm chart
* `alluxio-k8s-operator-1.2.0-docker.tar` 是 Alluxio operator 的 docker 镜像
* `alluxio-csi-1.2.0-docker.tar` 是 Alluxio CSI 的 docker 镜像，是默认需要的
* `alluxio-enterprise-AI-3.2-5.2.0-docker.tar` 是Alluxio 的 docker 镜像

### 将镜像上传到 image registry

以下示例展示了如何上传 Alluxio operator 镜像。重复这些步骤以上传上述所有镜像。

```shell
# load the image to local
$ docker load -i alluxio-k8s-operator-1.2.0-docker.tar

# retag the image with your private registry
$ docker tag alluxio/k8s-operator:1.2.0 <YOUR.PRIVATE.REGISTRY.HERE>/alluxio-operator:1.2.0

# push to the remote registry
$ docker push <YOUR.PRIVATE.REGISTRY.HERE>/alluxio-operator:1.2.0
```

### 为 operator 解压 helm chart

```shell
# the command will extract the files to the directory alluxio-operator/
$ tar zxf alluxio-operator-1.2.0-helmchart.tgz
```

### 准备配置文件

* 对于 Operator ，将配置放在 `alluxio-operator/alluxio-operator.yaml`中

```yaml
image: <YOUR.PRIVATE.REGISTRY.HERE>/alluxio-operator
imageTag: 1.2.0
alluxio-csi:
  image: <YOUR.PRIVATE.REGISTRY.HERE>/alluxio-csi
  imageTag: 1.2.0
```

* 对于 Alluxio 集群，将配置放在 `alluxio-operator/alluxio-cluster.yaml` 中以描述集群。 要创建标准集群，可以使用[最低配置](#最低配置)。 `.spec.properties`字段中的属性将通过一个`alluxio-site.properties`配置文件传递给 Alluxio 进程。
* 要将外部存储挂载到 Alluxio 集群，将配置放在`alluxio-operator/ufs.yaml`中。示例将挂载现有的 S3 路径到Alluxio。

```yaml
apiVersion: k8s-operator.alluxio.com/v1
kind: UnderFileSystem
metadata:
  name: alluxio-s3
spec:
  alluxioCluster: alluxio
  path: s3://my-bucket/path/to/mount
  mountPath: /s3
  mountOptions:
    s3a.accessKeyId: xxx
    s3a.secretKey: xxx
    alluxio.underfs.s3.region: us-east-1
```

## 部署

### 部署 Alluxio operator

```shell
# the last parameter is the directory to the helm chart
$ helm install operator -f alluxio-operator/alluxio-operator.yaml alluxio-operator
NAME: operator
LAST DEPLOYED: Wed May 15 17:32:34 2024
NAMESPACE: default
STATUS: deployed
REVISION: 1
TEST SUITE: None

# verify if the operator is running as expected
$ kubectl get pod -n alluxio-operator
NAME                                              READY   STATUS    RESTARTS   AGE
alluxio-controller-6b449d8b68-njx7f               1/1     Running   0          45s
operator-alluxio-csi-controller-765f9fd65-drjm4   2/2     Running   0          45s
operator-alluxio-csi-nodeplugin-ks262             2/2     Running   0          45s
operator-alluxio-csi-nodeplugin-vk8r4             2/2     Running   0          45s
ufs-controller-65f7c84cbd-kll8q                   1/1     Running   0          45s
```

### 部署 Alluxio

```shell
$ kubectl create -f alluxio-operator/alluxio-cluster.yaml
alluxiocluster.k8s-operator.alluxio.com/alluxio created

# the cluster will be starting
$ kubectl get pod
NAME                                          READY   STATUS              RESTARTS   AGE
alluxio-etcd-0                                0/1     ContainerCreating   0          7s
alluxio-etcd-1                                0/1     ContainerCreating   0          7s
alluxio-etcd-2                                0/1     ContainerCreating   0          7s
alluxio-master-0                              0/1     Init:0/1            0          7s
alluxio-monitor-grafana-847fd46f4b-84wgg      0/1     Running             0          7s
alluxio-monitor-prometheus-778547fd75-rh6r6   1/1     Running             0          7s
alluxio-worker-76c846bfb6-2jkmr               0/1     Init:0/2            0          7s
alluxio-worker-76c846bfb6-nqldm               0/1     Init:0/2            0          7s

# check the status of the cluster
$ kubectl get alluxiocluster
NAME      CLUSTERPHASE   AGE
alluxio   Ready          2m18s

# and check the running pods after the cluster is ready
$ kubectl get pod
NAME                                          READY   STATUS    RESTARTS   AGE
alluxio-etcd-0                                1/1     Running   0          2m3s
alluxio-etcd-1                                1/1     Running   0          2m3s
alluxio-etcd-2                                1/1     Running   0          2m3s
alluxio-master-0                              1/1     Running   0          2m3s
alluxio-monitor-grafana-7b9477d66-mmcc5       1/1     Running   0          2m3s
alluxio-monitor-prometheus-78dbb89994-xxr4c   1/1     Running   0          2m3s
alluxio-worker-85fd45db46-c7n9p               1/1     Running   0          2m3s
alluxio-worker-85fd45db46-sqv2c               1/1     Running   0          2m3s
```

在Alluxio 3.x中，"master"组件不再处于关键的I/O路径上，而是成为一个无状态的组件，仅用于托管如分布式加载这样的任务。

### 将底层存储挂载到 Alluxio

```shell
$ kubectl create -f alluxio-operator/ufs.yaml
underfilesystem.k8s-operator.alluxio.com/alluxio-s3 created

# verify the status of the storage
$ kubectl get ufs
NAME         PHASE   AGE
alluxio-s3   Ready   46s

# also check the mount table via Alluxio command line
$ kubectl exec -it alluxio-master-0 -- alluxio mount list 2>/dev/null
Listing all mount points
s3://my-bucket/path/to/mount  on  /s3/ properties={s3a.secretKey=xxx, alluxio.underfs.s3.region=us-east-1, s3a.accessKeyId=xxx}
```

## 配置

### 最低配置

从最低配置开始，我们可以构建一个标准集群：

```yaml
apiVersion: k8s-operator.alluxio.com/v1
kind: AlluxioCluster
metadata:
  name: alluxio
spec:
  image: <YOUR.PRIVATE.REGISTRY.HERE>/alluxio
  imageTag: AI-3.2-5.2.0
  properties:
    alluxio.license: "xxx"

  worker:
    count: 2

  pagestore:
    quota: 1000Gi
```

### 常见用例

#### 更改资源限制

对于每个组件，如 worker、master 和 FUSE，我们都可以通过以下配置更改资源的使用量：

```yaml
apiVersion: k8s-operator.alluxio.com/v1
kind: AlluxioCluster
spec:
  worker:
    resources:
      limits:
        cpu: "12"
        memory: "36Gi"
      requests:
        cpu: "1"
        memory: "32Gi"
    jvmOptions:
      - "-Xmx22g"
      - "-Xms22g"
      - "-XX:MaxDirectMemorySize=10g"
```

* 容器将永远无法访问超过限制的资源，这个请求限制会在调度过程中生效。如需了解更多信息，请参阅 [为 Pod 和容器管理资源](https://kubernetes.io/zh-cn/docs/concepts/configuration/manage-resources-containers/)。
* 内存限制应略大于堆内存（`-Xmx`）和直接内存（`-XX:MaxDirectMemorySize=10g`）的大小之和，以避免内存不足。

#### 将 PVC 用于page store

```yaml
apiVersion: k8s-operator.alluxio.com/v1
kind: AlluxioCluster
spec:
  pagestore:
    type: persistentVolumeClaim
    storageClass: ""
    quota: 1000Gi
```

* PVC由operator创建
* `storageClass` 默认为 `standard`，但可以指定为空字符串进行静态绑定

#### 挂载 NAS 或其他宿主机路径

```yaml
apiVersion: k8s-operator.alluxio.com/v1
kind: AlluxioCluster
spec:
  hostPaths:
    worker:
      /mnt/nas:/ufs/data
    fuse:
      /mnt/nas:/ufs/data
```

* 键是节点上的宿主机路径，值是容器中的挂载路径。
* 如果使用 NAS 作为 UFS，则 worker 和 FUSE 进程都需要挂载相同的路径，以便在发生任何错误时 FUSE 可以回退

#### 挂载自定义映射配置

```yaml
apiVersion: k8s-operator.alluxio.com/v1
kind: AlluxioCluster
spec:
  configMaps:
    custom-config-map: /etc/custom
```

* 键是 `ConfigMap` 的名称，值是容器中的挂载路径
* 默认情况下，`/opt/alluxio/conf` 已被挂载。自定义配置映射需要挂载到其他路径

#### 使用root用户

FUSE pod 将始终使用 root 用户。其他进程默认使用uid为1000的用户。在容器中，用户名为 `alluxio`。 要将其更改为root用户，请使用此配置：

```yaml
apiVersion: k8s-operator.alluxio.com/v1
kind: AlluxioCluster
spec:
  user: 0
  group: 0
  fsGroup: 0
```

* 如果文件可由root用户组访问，指定 `.spec.fsGroup = 0` 即可。
* 如果将挂载主机路径（如页面存储路径和日志路径）的所有权更改为 root 用户，则其所有权将转移到 root。

### 支持的挂载选项

`UnderFileSystem` 配置中的 `mountOptions` 支持与 Alluxio 集群属性相同的设置。确切来说，使用不同存储时可使用以下属性

| UFS | Configuration               | Description                                                                                                     |
| --- | --------------------------- | --------------------------------------------------------------------------------------------------------------- |
| S3  | `s3a.accessKey`             | The access key of S3 bucket                                                                                     |
|     | `s3a.secretKey`             | The secret key of S3 bucket                                                                                     |
|     | `alluxio.underfs.s3.region` | Optionally, set the S3 bucket region. If not provided, will enable the global bucket access with extra requests |


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://documentation.alluxio.io/ee-ai-cn/ai-3.2/start/install-alluxio-on-kubernetes.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
