# 镜像管理

部署时需要准备两类容器镜像：

1. Alluxio 镜像（由 Alluxio 销售代表提供）
2. 第三方组件可公开访问的镜像

这些镜像必须通过 Kubernetes 集群可访问的镜像仓库提供。

对于 Alluxio 镜像，通常需要用户将其上传至自己管理的镜像仓库。

> 镜像仓库是集中存储和共享容器镜像的地方，可以是公开的也可以是私有的。许多云服务厂商都提供镜像仓库服务，例如：\
> [Amazon Elastic Container Registry(ECR)](https://aws.amazon.com/ecr/), [Azure Container Registry (ACR)](https://azure.microsoft.com/en-in/products/container-registry), 和 [Google Container Registry (GCR)](https://cloud.google.com/artifact-registry?hl=en).\
> 私有镜像仓库也可部署于本地系统或组织内网环境中。

## Alluxio 镜像

Alluxio 提供以下镜像文件:

* `alluxio-operator-3.2.1-docker.tar` ：Alluxio Operator 各组件的 Docker 镜像
* `alluxio-enterprise-AI-3.6-12.0.2-docker.tar` ：Alluxio coordinator和worker 的 Docker 镜像

此外，也可能提供以下镜像（可选）:

* `alluxio-gateway-AI-3.6-12.0.2-docker.tar` ：Alluxio API 网关的 Docker 镜像
* `alluxio-dashboard-AI-3.6-12.0.2-docker.tar` ：Alluxio 管理控制台的 Docker 镜像

如何上传 Alluxio Operator 镜像的示例如下：

```console
# 加载镜像至本地
$ docker load -i alluxio-operator-3.2.1-docker.tar
$ docker load -i alluxio-enterprise-AI-3.6-12.0.2-docker.tar

# 为镜像重新标记私有仓库标签
$ docker tag alluxio/operator:3.2.1 <PRIVATE_REGISTRY>/alluxio-operator:3.2.1
$ docker tag alluxio/alluxio-enterprise:AI-3.6-12.0.2 <PRIVATE_REGISTRY>/alluxio-enterprise:AI-3.6-12.0.2

# 将镜像推送至远程仓库
$ docker push <PRIVATE_REGISTRY>/alluxio-operator:3.2.1
$ docker push <PRIVATE_REGISTRY>/alluxio-enterprise:AI-3.6-12.0.2
```

## 无法访问公共镜像仓库

当Kubernetes集群无法连接公共网络时，通常从公共镜像仓库获取的镜像将无法下载。

要成功部署Alluxio，必须将这些镜像上传至集群可访问的私有镜像仓库。

如果您的网络环境无法连接公共镜像仓库，在拉取镜像时将出现超时错误：

```console
# 检查 operator 是否正常运行
$ kubectl -n alluxio-operator get pod
NAME                                              READY   STATUS              RESTARTS   AGE
alluxio-cluster-controller-65b59f65b4-5d667       1/1     Running             0          22s
alluxio-collectinfo-controller-667b746fd6-hfzqk   1/1     Running             0          22s
alluxio-csi-controller-c85f8f759-sqc56            0/2     ContainerCreating   0          22s
alluxio-csi-nodeplugin-5pgmg                      0/2     ContainerCreating   0          22s
alluxio-csi-nodeplugin-fpkcq                      0/2     ContainerCreating   0          22s
alluxio-csi-nodeplugin-j9wll                      0/2     ContainerCreating   0          22s
alluxio-ufs-controller-5f69bbb878-7km58           1/1     Running             0          22s
```

您可能会注意到，`cluster controller`，`ufs controller` 和`collectinfo controller`已成功启动，但`csi controller` 和 `csi nodeplugin`仍处于`ContainerCreating`状态。这是由于拉取依赖镜像超时导致。

通过 `kubectl describe pod` 命令查看详细信息，将会看到类似如下的错误信息：

```console
$ kubectl -n alluxio-operator describe pod -l app.kubernetes.io/component=csi-controller

Events:
  Type     Reason          Age                    From               Message
  ----     ------          ----                   ----               -------
  Normal   Scheduled       10m                    default-scheduler  Successfully assigned alluxio-operator/alluxio-csi-controller-c85f8f759-sqc56 to <nodeName>
  Normal   AllocIPSucceed  10m                    terway-daemon      Alloc IP 10.0.0.27/24 took 28.443992ms
  Normal   Pulling         10m                    kubelet            Pulling image "registry.xxx.com/alluxio/operator:3.2.1"
  Normal   Pulled          10m                    kubelet            Successfully pulled image "registry.xxx.com/alluxio/operator:3.2.1" in 5.55s (5.55s including waiting)
  Normal   Created         10m                    kubelet            Created container csi-controller
  Normal   Started         10m                    kubelet            Started container csi-controller
  Warning  Failed          8m20s (x2 over 10m)    kubelet            Failed to pull image "registry.k8s.io/sig-storage/csi-provisioner:v2.0.5": failed to pull and unpack image "registry.k8s.io/sig-storage/csi-provisioner:v2.0.5": failed to resolve reference "registry.k8s.io/sig-storage/csi-provisioner:v2.0.5": failed to do request: Head "https://us-west2-docker.pkg.dev/v2/k8s-artifacts-prod/images/sig-storage/csi-provisioner/manifests/v2.0.5": dial tcp 142.251.8.82:443: i/o timeout
  Warning  Failed          8m20s (x3 over 10m)    kubelet            Error: ErrImagePull
  Warning  Failed          7m40s (x5 over 10m)    kubelet            Error: ImagePullBackOff
  Warning  Failed          6m56s (x2 over 9m19s)  kubelet            Failed to pull image "registry.k8s.io/sig-storage/csi-provisioner:v2.0.5": rpc error: code = DeadlineExceeded desc = failed to pull and unpack image "registry.k8s.io/sig-storage/csi-provisioner:v2.0.5": failed to resolve reference "registry.k8s.io/sig-storage/csi-provisioner:v2.0.5": failed to do request: Head "https://us-west2-docker.pkg.dev/v2/k8s-artifacts-prod/images/sig-storage/csi-provisioner/manifests/v2.0.5": dial tcp 64.233.187.82:443: i/o timeout
  Normal   Pulling         5m29s (x5 over 10m)    kubelet            Pulling image "registry.k8s.io/sig-storage/csi-provisioner:v2.0.5"
  Normal   BackOff         30s (x28 over 10m)     kubelet            Back-off pulling image "registry.k8s.io/sig-storage/csi-provisioner:v2.0.5"
```

### 第三方依赖镜像

| 组件           | 镜像名称                                                  | 版本                  | 目的           |
| ------------ | ----------------------------------------------------- | ------------------- | ------------ |
| operator CSI | registry.k8s.io/sig-storage/csi-node-driver-registrar | v2.0.0              | csi 驱动注册依赖项  |
| operator CSI | registry.k8s.io/sig-storage/csi-provisioner           | v2.0.5              | csi 资源调配器依赖项 |
| 集群 ETCD      | docker.io/bitnami/etcd                                | 3.5.9-debian-11-r24 | etcd 依赖项     |
| 集群 ETCD      | docker.io/bitnami/os-shell                            | 11-debian-11-r2     | os-shell 依赖项 |
| 集群监控         | grafana/grafana                                       | 10.4.5              | 监控仪表盘        |
| 集群监控         | prom/prometheus                                       | v2.52.0             | 指标采集         |

以下是拉取 Docker 镜像并上传至私有镜像仓库的命令，确保设置与 Kubernetes 集群环境相匹配的平台（例如 `linux/amd64`）：

```console
# 拉取 Docker 镜像
$ docker pull --platform linux/amd64 registry.k8s.io/sig-storage/csi-node-driver-registrar:v2.0.0
$ docker pull --platform linux/amd64 registry.k8s.io/sig-storage/csi-provisioner:v2.0.5
$ docker pull --platform linux/amd64 docker.io/bitnami/etcd:3.5.9-debian-11-r24
$ docker pull --platform linux/amd64 docker.io/bitnami/os-shell:11-debian-11-r2
$ docker pull --platform linux/amd64 grafana/grafana:10.4.5
$ docker pull --platform linux/amd64 prom/prometheus:v2.52.0

# 为镜像标记私有仓库标签
$ docker tag registry.k8s.io/sig-storage/csi-node-driver-registrar:v2.0.0 <PRIVATE_REGISTRY>/csi-node-driver-registrar:v2.0.0
$ docker tag registry.k8s.io/sig-storage/csi-provisioner:v2.0.5 <PRIVATE_REGISTRY>/csi-provisioner:v2.0.5
$ docker tag docker.io/bitnami/etcd:3.5.9-debian-11-r24 <PRIVATE_REGISTRY>/etcd:3.5.9-debian-11-r24
$ docker tag docker.io/bitnami/os-shell:11-debian-11-r2 <PRIVATE_REGISTRY>/os-shell:11-debian-11-r2
$ docker tag grafana/grafana:10.4.5 <PRIVATE_REGISTRY>/grafana:10.4.5
$ docker tag prom/prometheus:v2.52.0 <PRIVATE_REGISTRY>/prometheus:v2.52.0

# 将镜像推送至私有仓库
$ docker push <PRIVATE_REGISTRY>/csi-node-driver-registrar:v2.0.0
$ docker push <PRIVATE_REGISTRY>/csi-provisioner:v2.0.5
$ docker push <PRIVATE_REGISTRY>/etcd:3.5.9-debian-11-r24
$ docker push <PRIVATE_REGISTRY>/os-shell:11-debian-11-r2
$ docker push <PRIVATE_REGISTRY>/grafana:10.4.5
$ docker push <PRIVATE_REGISTRY>/prometheus:v2.52.0
```

### 修改 alluxio-operator.yaml 文件

请修改 `alluxio-operator/alluxio-operator.yaml` 文件中的镜像地址，\
并添加 provisioner 和 driverRegistrar 的镜像地址：

```yaml
global:
  image: <PRIVATE_REGISTRY>/alluxio-operator
  imageTag: 3.2.1

alluxio-csi:
  controllerPlugin: 
    provisioner: 
      image: <PRIVATE_REGISTRY>/csi-provisioner:v2.0.5
  nodePlugin: 
    driverRegistrar: 
        image: <PRIVATE_REGISTRY>/csi-node-driver-registrar:v2.0.0
```

### 修改 alluxio-cluster.yaml 文件

请相应地修改 `alluxio-operator/alluxio-cluster.yaml` 文件中的镜像地址。

```yaml
apiVersion: k8s-operator.alluxio.com/v1
kind: AlluxioCluster
metadata:
  name: alluxio-cluster
  namespace: alx-ns
spec:
  image: <PRIVATE_REGISTRY>/alluxio-enterprise
  imageTag: AI-3.6-12.0.2
  properties:
  worker:
    count: 2
    pagestore:
      size: 100Gi
  etcd:
    image:
      registry: <PRIVATE_REGISTRY>
      repository: <PRIVATE_REPOSITORY>/etcd
      tag: 3.5.9-debian-11-r24
    volumePermissions:
      image:
        registry: <PRIVATE_REGISTRY>
        repository: <PRIVATE_REPOSITORY>/os-shell
        tag: 11-debian-11-r2
  prometheus:
    image: <PRIVATE_REGISTRY>/prometheus
    imageTag: v2.52.0
  grafana:
    image: <PRIVATE_REGISTRY>/grafana
    imageTag: 10.4.5
```
