# 镜像管理

部署时需要准备两类容器镜像：

1. Alluxio 镜像（由 Alluxio 销售代表提供）
2. 第三方组件可公开访问的镜像

这些镜像必须通过 Kubernetes 集群可访问的镜像仓库提供。

对于 Alluxio 镜像，通常需要用户将其上传至自己管理的镜像仓库。

> 镜像仓库是集中存储和共享容器镜像的地方，可以是公开的也可以是私有的。许多云服务厂商都提供镜像仓库服务，例如：\
> [Amazon Elastic Container Registry(ECR)](https://aws.amazon.com/ecr/), [Azure Container Registry (ACR)](https://azure.microsoft.com/en-in/products/container-registry), 和 [Google Container Registry (GCR)](https://cloud.google.com/artifact-registry?hl=en).\
> 私有镜像仓库也可部署于本地系统或组织内网环境中。

## Alluxio 镜像

Alluxio 提供以下镜像文件:

* `alluxio-operator-3.2.1-docker.tar` ：Alluxio Operator 各组件的 Docker 镜像
* `alluxio-enterprise-AI-3.6-12.0.2-docker.tar` ：Alluxio coordinator和worker 的 Docker 镜像

此外，也可能提供以下镜像（可选）:

* `alluxio-gateway-AI-3.6-12.0.2-docker.tar` ：Alluxio API 网关的 Docker 镜像
* `alluxio-dashboard-AI-3.6-12.0.2-docker.tar` ：Alluxio 管理控制台的 Docker 镜像

如何上传 Alluxio Operator 镜像的示例如下：

```console
# 加载镜像至本地
$ docker load -i alluxio-operator-3.2.1-docker.tar
$ docker load -i alluxio-enterprise-AI-3.6-12.0.2-docker.tar

# 为镜像重新标记私有仓库标签
$ docker tag alluxio/operator:3.2.1 <PRIVATE_REGISTRY>/alluxio-operator:3.2.1
$ docker tag alluxio/alluxio-enterprise:AI-3.6-12.0.2 <PRIVATE_REGISTRY>/alluxio-enterprise:AI-3.6-12.0.2

# 将镜像推送至远程仓库
$ docker push <PRIVATE_REGISTRY>/alluxio-operator:3.2.1
$ docker push <PRIVATE_REGISTRY>/alluxio-enterprise:AI-3.6-12.0.2
```

## 无法访问公共镜像仓库

当Kubernetes集群无法连接公共网络时，通常从公共镜像仓库获取的镜像将无法下载。

要成功部署Alluxio，必须将这些镜像上传至集群可访问的私有镜像仓库。

如果您的网络环境无法连接公共镜像仓库，在拉取镜像时将出现超时错误：

```console
# 检查 operator 是否正常运行
$ kubectl -n alluxio-operator get pod
NAME                                              READY   STATUS              RESTARTS   AGE
alluxio-cluster-controller-65b59f65b4-5d667       1/1     Running             0          22s
alluxio-collectinfo-controller-667b746fd6-hfzqk   1/1     Running             0          22s
alluxio-csi-controller-c85f8f759-sqc56            0/2     ContainerCreating   0          22s
alluxio-csi-nodeplugin-5pgmg                      0/2     ContainerCreating   0          22s
alluxio-csi-nodeplugin-fpkcq                      0/2     ContainerCreating   0          22s
alluxio-csi-nodeplugin-j9wll                      0/2     ContainerCreating   0          22s
alluxio-ufs-controller-5f69bbb878-7km58           1/1     Running             0          22s
```

您可能会注意到，`cluster controller`，`ufs controller` 和`collectinfo controller`已成功启动，但`csi controller` 和 `csi nodeplugin`仍处于`ContainerCreating`状态。这是由于拉取依赖镜像超时导致。

通过 `kubectl describe pod` 命令查看详细信息，将会看到类似如下的错误信息：

```console
$ kubectl -n alluxio-operator describe pod -l app.kubernetes.io/component=csi-controller

Events:
  Type     Reason          Age                    From               Message
  ----     ------          ----                   ----               -------
  Normal   Scheduled       10m                    default-scheduler  Successfully assigned alluxio-operator/alluxio-csi-controller-c85f8f759-sqc56 to <nodeName>
  Normal   AllocIPSucceed  10m                    terway-daemon      Alloc IP 10.0.0.27/24 took 28.443992ms
  Normal   Pulling         10m                    kubelet            Pulling image "registry.xxx.com/alluxio/operator:3.2.1"
  Normal   Pulled          10m                    kubelet            Successfully pulled image "registry.xxx.com/alluxio/operator:3.2.1" in 5.55s (5.55s including waiting)
  Normal   Created         10m                    kubelet            Created container csi-controller
  Normal   Started         10m                    kubelet            Started container csi-controller
  Warning  Failed          8m20s (x2 over 10m)    kubelet            Failed to pull image "registry.k8s.io/sig-storage/csi-provisioner:v2.0.5": failed to pull and unpack image "registry.k8s.io/sig-storage/csi-provisioner:v2.0.5": failed to resolve reference "registry.k8s.io/sig-storage/csi-provisioner:v2.0.5": failed to do request: Head "https://us-west2-docker.pkg.dev/v2/k8s-artifacts-prod/images/sig-storage/csi-provisioner/manifests/v2.0.5": dial tcp 142.251.8.82:443: i/o timeout
  Warning  Failed          8m20s (x3 over 10m)    kubelet            Error: ErrImagePull
  Warning  Failed          7m40s (x5 over 10m)    kubelet            Error: ImagePullBackOff
  Warning  Failed          6m56s (x2 over 9m19s)  kubelet            Failed to pull image "registry.k8s.io/sig-storage/csi-provisioner:v2.0.5": rpc error: code = DeadlineExceeded desc = failed to pull and unpack image "registry.k8s.io/sig-storage/csi-provisioner:v2.0.5": failed to resolve reference "registry.k8s.io/sig-storage/csi-provisioner:v2.0.5": failed to do request: Head "https://us-west2-docker.pkg.dev/v2/k8s-artifacts-prod/images/sig-storage/csi-provisioner/manifests/v2.0.5": dial tcp 64.233.187.82:443: i/o timeout
  Normal   Pulling         5m29s (x5 over 10m)    kubelet            Pulling image "registry.k8s.io/sig-storage/csi-provisioner:v2.0.5"
  Normal   BackOff         30s (x28 over 10m)     kubelet            Back-off pulling image "registry.k8s.io/sig-storage/csi-provisioner:v2.0.5"
```

### 第三方依赖镜像

| 组件           | 镜像名称                                                  | 版本                  | 目的           |
| ------------ | ----------------------------------------------------- | ------------------- | ------------ |
| operator CSI | registry.k8s.io/sig-storage/csi-node-driver-registrar | v2.0.0              | csi 驱动注册依赖项  |
| operator CSI | registry.k8s.io/sig-storage/csi-provisioner           | v2.0.5              | csi 资源调配器依赖项 |
| 集群 ETCD      | docker.io/bitnami/etcd                                | 3.5.9-debian-11-r24 | etcd 依赖项     |
| 集群 ETCD      | docker.io/bitnami/os-shell                            | 11-debian-11-r2     | os-shell 依赖项 |
| 集群监控         | grafana/grafana                                       | 10.4.5              | 监控仪表盘        |
| 集群监控         | prom/prometheus                                       | v2.52.0             | 指标采集         |

以下是拉取 Docker 镜像并上传至私有镜像仓库的命令，确保设置与 Kubernetes 集群环境相匹配的平台（例如 `linux/amd64`）：

```console
# 拉取 Docker 镜像
$ docker pull --platform linux/amd64 registry.k8s.io/sig-storage/csi-node-driver-registrar:v2.0.0
$ docker pull --platform linux/amd64 registry.k8s.io/sig-storage/csi-provisioner:v2.0.5
$ docker pull --platform linux/amd64 docker.io/bitnami/etcd:3.5.9-debian-11-r24
$ docker pull --platform linux/amd64 docker.io/bitnami/os-shell:11-debian-11-r2
$ docker pull --platform linux/amd64 grafana/grafana:10.4.5
$ docker pull --platform linux/amd64 prom/prometheus:v2.52.0

# 为镜像标记私有仓库标签
$ docker tag registry.k8s.io/sig-storage/csi-node-driver-registrar:v2.0.0 <PRIVATE_REGISTRY>/csi-node-driver-registrar:v2.0.0
$ docker tag registry.k8s.io/sig-storage/csi-provisioner:v2.0.5 <PRIVATE_REGISTRY>/csi-provisioner:v2.0.5
$ docker tag docker.io/bitnami/etcd:3.5.9-debian-11-r24 <PRIVATE_REGISTRY>/etcd:3.5.9-debian-11-r24
$ docker tag docker.io/bitnami/os-shell:11-debian-11-r2 <PRIVATE_REGISTRY>/os-shell:11-debian-11-r2
$ docker tag grafana/grafana:10.4.5 <PRIVATE_REGISTRY>/grafana:10.4.5
$ docker tag prom/prometheus:v2.52.0 <PRIVATE_REGISTRY>/prometheus:v2.52.0

# 将镜像推送至私有仓库
$ docker push <PRIVATE_REGISTRY>/csi-node-driver-registrar:v2.0.0
$ docker push <PRIVATE_REGISTRY>/csi-provisioner:v2.0.5
$ docker push <PRIVATE_REGISTRY>/etcd:3.5.9-debian-11-r24
$ docker push <PRIVATE_REGISTRY>/os-shell:11-debian-11-r2
$ docker push <PRIVATE_REGISTRY>/grafana:10.4.5
$ docker push <PRIVATE_REGISTRY>/prometheus:v2.52.0
```

### 修改 alluxio-operator.yaml 文件

请修改 `alluxio-operator/alluxio-operator.yaml` 文件中的镜像地址，\
并添加 provisioner 和 driverRegistrar 的镜像地址：

```yaml
global:
  image: <PRIVATE_REGISTRY>/alluxio-operator
  imageTag: 3.2.1

alluxio-csi:
  controllerPlugin: 
    provisioner: 
      image: <PRIVATE_REGISTRY>/csi-provisioner:v2.0.5
  nodePlugin: 
    driverRegistrar: 
        image: <PRIVATE_REGISTRY>/csi-node-driver-registrar:v2.0.0
```

### 修改 alluxio-cluster.yaml 文件

请相应地修改 `alluxio-operator/alluxio-cluster.yaml` 文件中的镜像地址。

```yaml
apiVersion: k8s-operator.alluxio.com/v1
kind: AlluxioCluster
metadata:
  name: alluxio-cluster
  namespace: alx-ns
spec:
  image: <PRIVATE_REGISTRY>/alluxio-enterprise
  imageTag: AI-3.6-12.0.2
  properties:
  worker:
    count: 2
    pagestore:
      size: 100Gi
  etcd:
    image:
      registry: <PRIVATE_REGISTRY>
      repository: <PRIVATE_REPOSITORY>/etcd
      tag: 3.5.9-debian-11-r24
    volumePermissions:
      image:
        registry: <PRIVATE_REGISTRY>
        repository: <PRIVATE_REPOSITORY>/os-shell
        tag: 11-debian-11-r2
  prometheus:
    image: <PRIVATE_REGISTRY>/prometheus
    imageTag: v2.52.0
  grafana:
    image: <PRIVATE_REGISTRY>/grafana
    imageTag: 10.4.5
```


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://documentation.alluxio.io/ee-ai-cn/ai-3.6/start/install/handling-images.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
