镜像管理
部署时需要准备两类容器镜像:
Alluxio 镜像(由 Alluxio 销售代表提供)
第三方组件可公开访问的镜像
这些镜像必须通过 Kubernetes 集群可访问的镜像仓库提供。
对于 Alluxio 镜像,通常需要用户将其上传至自己管理的镜像仓库。
镜像仓库是集中存储和共享容器镜像的地方,可以是公开的也可以是私有的。许多云服务厂商都提供镜像仓库服务,例如: Amazon Elastic Container Registry(ECR), Azure Container Registry (ACR), 和 Google Container Registry (GCR). 私有镜像仓库也可部署于本地系统或组织内网环境中。
Alluxio 镜像
Alluxio 提供以下镜像文件:
alluxio-operator-3.2.1-docker.tar
:Alluxio Operator 各组件的 Docker 镜像alluxio-enterprise-AI-3.6-12.0.2-docker.tar
:Alluxio coordinator和worker 的 Docker 镜像
此外,也可能提供以下镜像(可选):
alluxio-gateway-AI-3.6-12.0.2-docker.tar
:Alluxio API 网关的 Docker 镜像alluxio-dashboard-AI-3.6-12.0.2-docker.tar
:Alluxio 管理控制台的 Docker 镜像
如何上传 Alluxio Operator 镜像的示例如下:
# 加载镜像至本地
$ docker load -i alluxio-operator-3.2.1-docker.tar
$ docker load -i alluxio-enterprise-AI-3.6-12.0.2-docker.tar
# 为镜像重新标记私有仓库标签
$ docker tag alluxio/operator:3.2.1 <PRIVATE_REGISTRY>/alluxio-operator:3.2.1
$ docker tag alluxio/alluxio-enterprise:AI-3.6-12.0.2 <PRIVATE_REGISTRY>/alluxio-enterprise:AI-3.6-12.0.2
# 将镜像推送至远程仓库
$ docker push <PRIVATE_REGISTRY>/alluxio-operator:3.2.1
$ docker push <PRIVATE_REGISTRY>/alluxio-enterprise:AI-3.6-12.0.2
无法访问公共镜像仓库
当Kubernetes集群无法连接公共网络时,通常从公共镜像仓库获取的镜像将无法下载。
要成功部署Alluxio,必须将这些镜像上传至集群可访问的私有镜像仓库。
如果您的网络环境无法连接公共镜像仓库,在拉取镜像时将出现超时错误:
# 检查 operator 是否正常运行
$ kubectl -n alluxio-operator get pod
NAME READY STATUS RESTARTS AGE
alluxio-cluster-controller-65b59f65b4-5d667 1/1 Running 0 22s
alluxio-collectinfo-controller-667b746fd6-hfzqk 1/1 Running 0 22s
alluxio-csi-controller-c85f8f759-sqc56 0/2 ContainerCreating 0 22s
alluxio-csi-nodeplugin-5pgmg 0/2 ContainerCreating 0 22s
alluxio-csi-nodeplugin-fpkcq 0/2 ContainerCreating 0 22s
alluxio-csi-nodeplugin-j9wll 0/2 ContainerCreating 0 22s
alluxio-ufs-controller-5f69bbb878-7km58 1/1 Running 0 22s
您可能会注意到,cluster controller
,ufs controller
和collectinfo controller
已成功启动,但csi controller
和 csi nodeplugin
仍处于ContainerCreating
状态。这是由于拉取依赖镜像超时导致。
通过 kubectl describe pod
命令查看详细信息,将会看到类似如下的错误信息:
$ kubectl -n alluxio-operator describe pod -l app.kubernetes.io/component=csi-controller
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 10m default-scheduler Successfully assigned alluxio-operator/alluxio-csi-controller-c85f8f759-sqc56 to <nodeName>
Normal AllocIPSucceed 10m terway-daemon Alloc IP 10.0.0.27/24 took 28.443992ms
Normal Pulling 10m kubelet Pulling image "registry.xxx.com/alluxio/operator:3.2.1"
Normal Pulled 10m kubelet Successfully pulled image "registry.xxx.com/alluxio/operator:3.2.1" in 5.55s (5.55s including waiting)
Normal Created 10m kubelet Created container csi-controller
Normal Started 10m kubelet Started container csi-controller
Warning Failed 8m20s (x2 over 10m) kubelet Failed to pull image "registry.k8s.io/sig-storage/csi-provisioner:v2.0.5": failed to pull and unpack image "registry.k8s.io/sig-storage/csi-provisioner:v2.0.5": failed to resolve reference "registry.k8s.io/sig-storage/csi-provisioner:v2.0.5": failed to do request: Head "https://us-west2-docker.pkg.dev/v2/k8s-artifacts-prod/images/sig-storage/csi-provisioner/manifests/v2.0.5": dial tcp 142.251.8.82:443: i/o timeout
Warning Failed 8m20s (x3 over 10m) kubelet Error: ErrImagePull
Warning Failed 7m40s (x5 over 10m) kubelet Error: ImagePullBackOff
Warning Failed 6m56s (x2 over 9m19s) kubelet Failed to pull image "registry.k8s.io/sig-storage/csi-provisioner:v2.0.5": rpc error: code = DeadlineExceeded desc = failed to pull and unpack image "registry.k8s.io/sig-storage/csi-provisioner:v2.0.5": failed to resolve reference "registry.k8s.io/sig-storage/csi-provisioner:v2.0.5": failed to do request: Head "https://us-west2-docker.pkg.dev/v2/k8s-artifacts-prod/images/sig-storage/csi-provisioner/manifests/v2.0.5": dial tcp 64.233.187.82:443: i/o timeout
Normal Pulling 5m29s (x5 over 10m) kubelet Pulling image "registry.k8s.io/sig-storage/csi-provisioner:v2.0.5"
Normal BackOff 30s (x28 over 10m) kubelet Back-off pulling image "registry.k8s.io/sig-storage/csi-provisioner:v2.0.5"
第三方依赖镜像
operator CSI
registry.k8s.io/sig-storage/csi-node-driver-registrar
v2.0.0
csi 驱动注册依赖项
operator CSI
registry.k8s.io/sig-storage/csi-provisioner
v2.0.5
csi 资源调配器依赖项
集群 ETCD
docker.io/bitnami/etcd
3.5.9-debian-11-r24
etcd 依赖项
集群 ETCD
docker.io/bitnami/os-shell
11-debian-11-r2
os-shell 依赖项
集群监控
grafana/grafana
10.4.5
监控仪表盘
集群监控
prom/prometheus
v2.52.0
指标采集
以下是拉取 Docker 镜像并上传至私有镜像仓库的命令,确保设置与 Kubernetes 集群环境相匹配的平台(例如 linux/amd64
):
# 拉取 Docker 镜像
$ docker pull --platform linux/amd64 registry.k8s.io/sig-storage/csi-node-driver-registrar:v2.0.0
$ docker pull --platform linux/amd64 registry.k8s.io/sig-storage/csi-provisioner:v2.0.5
$ docker pull --platform linux/amd64 docker.io/bitnami/etcd:3.5.9-debian-11-r24
$ docker pull --platform linux/amd64 docker.io/bitnami/os-shell:11-debian-11-r2
$ docker pull --platform linux/amd64 grafana/grafana:10.4.5
$ docker pull --platform linux/amd64 prom/prometheus:v2.52.0
# 为镜像标记私有仓库标签
$ docker tag registry.k8s.io/sig-storage/csi-node-driver-registrar:v2.0.0 <PRIVATE_REGISTRY>/csi-node-driver-registrar:v2.0.0
$ docker tag registry.k8s.io/sig-storage/csi-provisioner:v2.0.5 <PRIVATE_REGISTRY>/csi-provisioner:v2.0.5
$ docker tag docker.io/bitnami/etcd:3.5.9-debian-11-r24 <PRIVATE_REGISTRY>/etcd:3.5.9-debian-11-r24
$ docker tag docker.io/bitnami/os-shell:11-debian-11-r2 <PRIVATE_REGISTRY>/os-shell:11-debian-11-r2
$ docker tag grafana/grafana:10.4.5 <PRIVATE_REGISTRY>/grafana:10.4.5
$ docker tag prom/prometheus:v2.52.0 <PRIVATE_REGISTRY>/prometheus:v2.52.0
# 将镜像推送至私有仓库
$ docker push <PRIVATE_REGISTRY>/csi-node-driver-registrar:v2.0.0
$ docker push <PRIVATE_REGISTRY>/csi-provisioner:v2.0.5
$ docker push <PRIVATE_REGISTRY>/etcd:3.5.9-debian-11-r24
$ docker push <PRIVATE_REGISTRY>/os-shell:11-debian-11-r2
$ docker push <PRIVATE_REGISTRY>/grafana:10.4.5
$ docker push <PRIVATE_REGISTRY>/prometheus:v2.52.0
修改 alluxio-operator.yaml 文件
请修改 alluxio-operator/alluxio-operator.yaml
文件中的镜像地址,
并添加 provisioner 和 driverRegistrar 的镜像地址:
global:
image: <PRIVATE_REGISTRY>/alluxio-operator
imageTag: 3.2.1
alluxio-csi:
controllerPlugin:
provisioner:
image: <PRIVATE_REGISTRY>/csi-provisioner:v2.0.5
nodePlugin:
driverRegistrar:
image: <PRIVATE_REGISTRY>/csi-node-driver-registrar:v2.0.0
修改 alluxio-cluster.yaml 文件
请相应地修改 alluxio-operator/alluxio-cluster.yaml
文件中的镜像地址。
apiVersion: k8s-operator.alluxio.com/v1
kind: AlluxioCluster
metadata:
name: alluxio-cluster
namespace: alx-ns
spec:
image: <PRIVATE_REGISTRY>/alluxio-enterprise
imageTag: AI-3.6-12.0.2
properties:
worker:
count: 2
pagestore:
size: 100Gi
etcd:
image:
registry: <PRIVATE_REGISTRY>
repository: <PRIVATE_REPOSITORY>/etcd
tag: 3.5.9-debian-11-r24
volumePermissions:
image:
registry: <PRIVATE_REGISTRY>
repository: <PRIVATE_REPOSITORY>/os-shell
tag: 11-debian-11-r2
prometheus:
image: <PRIVATE_REGISTRY>/prometheus
imageTag: v2.52.0
grafana:
image: <PRIVATE_REGISTRY>/grafana
imageTag: 10.4.5
Last updated