> For the complete documentation index, see [llms.txt](https://documentation.alluxio.io/ee-ai-cn/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://documentation.alluxio.io/ee-ai-cn/start/installing-on-docker.md).

# Docker 安装

本文档介绍如何在裸机 Linux 主机、EC2 实例或 Slurm 集群上使用 Docker 容器部署 Alluxio，无需 Kubernetes。

## 概述

### 架构

每个 Alluxio 组件运行在独立的 Docker 容器中，均使用 `--net=host` 共享宿主机网络，组件之间通过 IP 直接通信，无需配置端口映射。

本指南部署以下拓扑：

| 主机             | 运行的容器                                       |
| -------------- | ------------------------------------------- |
| Coordinator 节点 | ETCD、Alluxio Coordinator、Prometheus、Grafana |
| Worker 节点（可多台） | Alluxio Worker（每台主机一个）                      |
| FUSE 客户端节点     | Alluxio FUSE（每台主机一个）                        |

### 制品

您将收到 Alluxio Docker 镜像的下载链接：

| 制品         | 文件名                                                       | 用途                              |
| ---------- | --------------------------------------------------------- | ------------------------------- |
| Alluxio 镜像 | `alluxio-enterprise-AI-3.9-16.0.0-linux-amd64-docker.tar` | Coordinator、Worker 和 FUSE 的容器镜像 |
| License    |                                                           | 激活集群所需                          |

> **版本**：以上文件名中的版本字符串仅为示例，您收到的下载链接中包含销售代表为您提供的实际版本。 **平台**：x86 主机使用 `-linux-amd64-docker.tar`，ARM 主机使用 `-linux-arm64-docker.tar`。

Docker 镜像是单一多角色镜像——每个节点加载相同的 `.tar` 文件。角色（coordinator、worker、fuse）由传递给 `docker run` 的入口参数决定。

## 开始之前

在开始之前运行以下检查。跳过此步骤是部署失败的最常见原因。

* [ ] **Docker** 已在每台主机上安装并运行：

  ```shell
  docker --version
  docker info
  ```
* [ ] **Linux 主机**，支持 `--net=host`。Linux 裸机和云 VM 均默认支持；macOS Docker Desktop 不支持——请改用 Linux VM 或 EC2 实例。
* [ ] **所有主机之间可以互相访问**
* [ ] **安全组 / 防火墙** 允许所有主机之间开放所需端口。完整端口列表请参阅[先决条件 → 网络](https://documentation.alluxio.io/ee-ai-cn/start/pages/u5ytEWeeFKI3ZN07lNwB#网络)。
* [ ] **Alluxio Docker 镜像 `.tar` 文件** 已可用于加载
* [ ] **Alluxio License 字符串** 已准备就绪
* [ ] **UFS 凭据** 已准备就绪（S3 access key/secret，或主机绑定的 IAM role）

> **在 EC2 上使用 IAM role？** 在启动 EC2 实例之前将 IAM instance profile 附加到实例——无需在 Docker 运行命令中填写 access key。

> **FUSE 客户端主机**：在运行 Alluxio FUSE 客户端的主机上需要 libfuse 3.10 或更高版本（Kubernetes 上 libfuse 已内置于容器镜像中，无需主机安装）。
>
> ```shell
> fusermount3 --version
> # 如未安装：
> apt-get install -y fuse3   # Ubuntu / Debian
> yum install -y fuse3       # RHEL / CentOS / Amazon Linux
> ```

各组件资源规格（CPU、内存、缓存磁盘）请参阅[先决条件 → 资源规格](https://documentation.alluxio.io/ee-ai-cn/start/pages/u5ytEWeeFKI3ZN07lNwB#资源规格)。

## 安装步骤

### 0. 在每台主机上加载 Alluxio 镜像

在**每台主机**（coordinator、每个 worker、FUSE 客户端）上执行：

```shell
docker load -i alluxio-enterprise-AI-3.9-16.0.0-linux-amd64-docker.tar
```

**✅ 验证：**

```shell
docker images | grep alluxio
```

```console
alluxio/alluxio-enterprise   AI-3.9-16.0.0   4091c3d8dbc4   ...   2.58GB
```

记录确切的镜像名称和标签——每个 `docker run` 命令都会用到。

### 1. 启动 ETCD

SSH 进入 **ETCD 节点**并启动单节点 ETCD 集群：

```shell
docker run -p 2379:2379 -p 2380:2380 -d --name etcd-standalone \
  quay.io/coreos/etcd:v3.5.9 etcd \
  --listen-client-urls http://0.0.0.0:2379 \
  --advertise-client-urls http://<COORDINATOR_PRIVATE_IP>:2379
```

将 `<COORDINATOR_PRIVATE_IP>` 替换为 coordinator 主机的私有 IP（例如 `172.31.31.133`）。使用**私有 IP**（而非主机名）以确保其他主机上的 worker 能可靠地访问 ETCD。

> 生产环境建议运行 3 节点 ETCD 集群以实现高可用。单节点 ETCD 适用于评估和非关键工作负载。

**✅ 验证：**

```shell
docker exec etcd-standalone etcdctl endpoint status --cluster -w table
```

```console
+-----------------------------------+------------------+---------+---------+-----------+
|             ENDPOINT              |        ID        | VERSION | DB SIZE | IS LEADER |
+-----------------------------------+------------------+---------+---------+-----------+
| http://172.31.31.133:2379         | 8e9e05c52164694d |   3.5.9 |   20 kB |      true |
+-----------------------------------+------------------+---------+---------+-----------+
```

### 2. 启动 Coordinator（Coordinator 主机）

把 license 和集群相关属性写入宿主机上的 `alluxio-site.properties`，然后把文件挂载进容器。

```shell
# 创建配置文件
sudo mkdir -p /etc/alluxio
cat <<EOF | sudo tee /etc/alluxio/alluxio-site.properties
alluxio.license=<YOUR_LICENSE>
alluxio.etcd.endpoints=http://<COORDINATOR_PRIVATE_IP>:2379
alluxio.coordinator.hostname=<COORDINATOR_PRIVATE_IP>
alluxio.mount.table.source=ETCD
EOF

# 启动 Coordinator
docker run -d --net=host --name=alluxio-coordinator \
  -v /etc/alluxio/alluxio-site.properties:/opt/alluxio/conf/alluxio-site.properties \
  alluxio/alluxio-enterprise:AI-3.9-16.0.0 coordinator
```

关键配置项（写在 `alluxio-site.properties` 里）：

| 配置项                               | 说明                       |
| --------------------------------- | ------------------------ |
| `alluxio.license`                 | Alluxio License 字符串      |
| `alluxio.etcd.endpoints`          | ETCD 端点（私有 IP + 端口 2379） |
| `alluxio.coordinator.hostname`    | 本主机私有 IP——worker 使用此地址注册 |
| `alluxio.mount.table.source=ETCD` | 挂载表存储在 ETCD 中（重启后持久保留）   |

**✅ 验证：**

```shell
docker logs alluxio-coordinator 2>&1 | grep -i "started\|listening\|etcd"
```

前 30 秒内没有 `ERROR` 行表示启动正常。

### 3. 启动 Worker（每台 Worker 主机）

SSH 进入每台 **worker 主机**。和 coordinator 一样用 `alluxio-site.properties`，额外追加两条 worker 自己的 page store 属性，然后挂载进容器：

```shell
# 创建缓存目录（/tmp 下无需 sudo）
mkdir -p /tmp/alluxio-cache

# 创建配置文件
sudo mkdir -p /etc/alluxio
cat <<EOF | sudo tee /etc/alluxio/alluxio-site.properties
alluxio.license=<YOUR_LICENSE>
alluxio.etcd.endpoints=http://<COORDINATOR_PRIVATE_IP>:2379
alluxio.coordinator.hostname=<COORDINATOR_PRIVATE_IP>
alluxio.mount.table.source=ETCD
alluxio.worker.page.store.dirs=/tmp/alluxio-cache
alluxio.worker.page.store.sizes=<CACHE_SIZE>
EOF

# 启动 Worker
docker run -d --net=host --name=alluxio-worker \
  -v /tmp/alluxio-cache:/tmp/alluxio-cache \
  -v /etc/alluxio/alluxio-site.properties:/opt/alluxio/conf/alluxio-site.properties \
  -e ALLUXIO_JAVA_OPTS="-Xmx8g -Xms2g -XX:MaxDirectMemorySize=8g" \
  alluxio/alluxio-enterprise:AI-3.9-16.0.0 worker
```

将 `<CACHE_SIZE>` 替换为缓存路径的可用磁盘容量的约 80%（例如 `50GB`、`200GB`、`1TB`）。

> **关于 `/tmp`**：`/tmp/alluxio-cache` 会在宿主机重启时清空，适合用于评估测试。生产环境请使用持久路径（例如 `/data/alluxio-cache`），详见 [Worker 配置](/ee-ai-cn/administration/managing-worker.md)。

> **JVM 大小**：上述 `-Xmx8g -XX:MaxDirectMemorySize=8g` 适合 32 GB 内存的主机。Alluxio 将缓存数据存储在磁盘而非堆内存中，`-Xmx` 和 `-XX:MaxDirectMemorySize` 建议设置为宿主机内存的 25% 左右。详见 [Worker 配置](/ee-ai-cn/administration/managing-worker.md)。

**✅ 验证（从 coordinator 主机执行）：**

```shell
docker exec alluxio-coordinator alluxio info nodes
```

```console
WorkerId                                         Address                          Status
worker-15ed4a17-2154-4454-ba0b-32b46ff06bfb     ip-172-31-26-247...:29999        ONLINE
worker-704c4d42-d189-41ff-b6f5-775a7d1551b3     ip-172-31-18-67...:29999         ONLINE
```

所有 worker 显示 `ONLINE`。容器启动后 worker 可能需要 10–15 秒完成注册。

### 4. 挂载存储

在任意能访问 coordinator 的主机上执行 `alluxio mount add`。完整的 UFS 配置选项请参阅[底层存储](/ee-ai-cn/ufs.md)。

**使用 IAM role 挂载 S3（EC2 上推荐）：**

如果 EC2 主机绑定了 IAM instance profile，coordinator 会自动获取凭据，无需填写 access key。

```shell
docker exec alluxio-coordinator alluxio mount add \
  --path /s3 \
  --ufs-uri s3://<S3_BUCKET>/ \
  --option alluxio.underfs.s3.region=<S3_REGION>
```

如果报凭据错误，请参见附录 A 中的 [S3 挂载凭据错误](#s3-挂载报凭据错误)。

**使用 access key/secret 挂载 S3：**

```shell
docker exec alluxio-coordinator alluxio mount add \
  --path /s3 \
  --ufs-uri s3://<S3_BUCKET>/ \
  --option alluxio.underfs.s3.region=<S3_REGION> \
  --option s3a.accessKeyId=<ACCESS_KEY> \
  --option s3a.secretKey=<SECRET_KEY>
```

**✅ 验证：**

```shell
docker exec alluxio-coordinator alluxio mount list
```

```console
Listing all mount points
s3://<S3_BUCKET>/  on  /s3/  properties={alluxio.underfs.s3.region=<S3_REGION>}
```

### 5. 启动 FUSE（FUSE 客户端主机）

SSH 进入 **FUSE 客户端主机**。如果是重新安装，先清除可能残留的旧挂载：

```shell
sudo umount -l /mnt/alluxio/fuse 2>/dev/null || true
```

创建挂载目录：

```shell
sudo mkdir -p /mnt/alluxio/fuse
sudo chown $(whoami) /mnt/alluxio/fuse
chmod 755 /mnt/alluxio/fuse
```

`docker run` 命令末尾的 `-o allow_other` 选项要求 FUSE 主机上的 `/etc/fuse.conf` 中配置 `user_allow_other`。若未设置，FUSE 将报错 `fusermount: option allow_other only allowed if 'user_allow_other' is set in /etc/fuse.conf`。执行以下命令启用：

```shell
grep -q user_allow_other /etc/fuse.conf || echo user_allow_other | sudo tee -a /etc/fuse.conf
```

启动 FUSE 容器：

```shell
docker run -d --privileged --net=host --name=alluxio-fuse \
  -v /mnt/alluxio:/mnt/alluxio:shared \
  -e ALLUXIO_JAVA_OPTS="\
    -Xmx4g -Xms1g -XX:MaxDirectMemorySize=4g \
    -Dalluxio.etcd.endpoints=http://<COORDINATOR_PRIVATE_IP>:2379 \
    -Dalluxio.coordinator.hostname=<COORDINATOR_PRIVATE_IP> \
    -Dalluxio.mount.table.source=ETCD" \
  alluxio/alluxio-enterprise:AI-3.9-16.0.0 fuse -o allow_other /mnt/alluxio/fuse
```

> **`--privileged` 为必需项**，用于在容器内挂载文件系统，并通过 `-v /mnt/alluxio:/mnt/alluxio:shared` 暴露给宿主机。

**✅ 验证**（等待约 10 秒）：

```shell
ls /mnt/alluxio/fuse/
```

```console
s3
```

FUSE 挂载将每个 Alluxio 挂载点作为子目录暴露。`/mnt/alluxio/fuse/s3/` 下的文件直接映射到 `s3://<S3_BUCKET>/`。

> 上面的 FUSE 容器是一个 minimal smoke test——`--privileged` + 评估用 JVM 大小。生产部署（细粒度 capability、不使用 `--privileged`、高吞吐 JVM 配置、调优过的 mount options、多 client）请看 [POSIX API → Docker / Bare-Metal](/ee-ai-cn/data-access/fuse-based-posix-api.md#fang-fa-3-docker-baremetal)。

### 6. 验证数据访问

```shell
docker exec alluxio-coordinator alluxio fs ls /s3/
```

**✅ 成功标志：** 命令返回 S3 存储桶中的文件和目录列表，无报错。空存储桶返回空列表但不报错。示例：

```console
-rwx------  0  0  0  B  PERSISTED  01-01-2024 00:00:00:000  /s3/dataset/
-rwx------  0  0  1234  B  PERSISTED  01-01-2024 00:00:00:000  /s3/README.md
```

如果命令失败，请参阅[附录 A：问题排查](#a-问题排查)。

**通过 FUSE 测试读写：**

```shell
# 读取已有文件
cat /mnt/alluxio/fuse/s3/test.txt

# 写入新文件
echo "hello alluxio" > /mnt/alluxio/fuse/s3/hello.txt

# 验证文件已出现在 S3
aws s3 ls s3://<S3_BUCKET>/hello.txt
```

## 重启与恢复

EC2 或宿主机重启后，Docker 容器不会自动恢复，除非预先配置了自动重启。若 ETCD 启动时未挂载持久化 volume，其数据也将丢失。

### 宿主机重启后恢复

按以下顺序在各主机上执行。若 ETCD 启动时**未**挂载持久化 volume，挂载表将丢失——在 coordinator 重启后需重新执行第 4 步。

**ETCD 节点：**

```shell
docker start etcd-standalone
```

**Coordinator 节点：**

```shell
docker start alluxio-coordinator
```

**每台 Worker 节点：**

```shell
docker start alluxio-worker
```

**FUSE 客户端节点：**

```shell
# 容器启动前必须确保挂载目录存在
sudo mkdir -p /mnt/alluxio/fuse
docker start alluxio-fuse
```

**✅ 验证所有组件已恢复：**

```shell
# 在 coordinator 节点执行
docker exec alluxio-coordinator alluxio info nodes
```

### 持久化 ETCD 挂载表

默认情况下，ETCD 将数据存储在容器文件系统中。执行 `docker rm` 或宿主机重启（未配置 `--restart`）后，所有挂载点将丢失。如需在重启后保留挂载表，在 ETCD `docker run` 命令中挂载数据目录：

```shell
docker run -p 2379:2379 -p 2380:2380 -d --name etcd-standalone \
  -v /data/etcd:/etcd-data \
  quay.io/coreos/etcd:v3.5.9 etcd \
  --data-dir /etcd-data \
  --listen-client-urls http://0.0.0.0:2379 \
  --advertise-client-urls http://<COORDINATOR_PRIVATE_IP>:2379
```

## 卸载

按相反顺序在每台主机上停止并删除容器：

**FUSE 客户端主机：**

```shell
docker stop alluxio-fuse && docker rm alluxio-fuse
sudo umount -l /mnt/alluxio/fuse   # 容器删除后挂载卡住时使用
```

**每台 Worker 主机：**

```shell
docker stop alluxio-worker && docker rm alluxio-worker
```

**Coordinator 主机：**

```shell
docker stop alluxio-coordinator && docker rm alluxio-coordinator
docker stop etcd-standalone && docker rm etcd-standalone
```

## 监控（可选）

首先，按照[监控 → Prometheus 配置](https://documentation.alluxio.io/ee-ai-cn/start/pages/Er5rhWWY1Ja663vVECcf#prometheus-配置)中的说明创建 Prometheus 抓取配置和 Grafana 数据源文件。然后在 coordinator 节点上启动 Prometheus 和 Grafana：

```shell
docker run -d --net=host --name=prometheus \
  -v ~/monitoring/prometheus/prometheus.yml:/etc/prometheus/prometheus.yml \
  prom/prometheus --config.file=/etc/prometheus/prometheus.yml

docker run -d --net=host --name=grafana \
  -v ~/monitoring/grafana/provisioning:/etc/grafana/provisioning \
  -e GF_SECURITY_ADMIN_USER=admin \
  -e GF_SECURITY_ADMIN_PASSWORD=grafana \
  grafana/grafana
```

在浏览器中打开 `http://localhost:3000`（EC2：使用 SSH 隧道或在安全组中开放端口 3000 和 9090）。仪表盘导入、告警规则和 Datadog 集成，请参阅[监控](/ee-ai-cn/administration/monitoring-alluxio.md)。

## 附录

### A. 故障排查

**启动时报 `License checksum error`（Coordinator 或 Worker）**

基本都是 license 字符串在传输中被损坏造成的。Alluxio 的 base64 license 字符串里含有 `+`、`/`、`=`——这些字符在单层 shell 下没问题，但在多层嵌套里会被破坏（典型链路：本地 shell → `ssh` → `docker -e ALLUXIO_JAVA_OPTS="...-Dalluxio.license=${LICENSE}..."` → Java）。第 2 步和第 3 步均已把 license 写进 `/etc/alluxio/alluxio-site.properties` 并通过 `-v` 挂载进容器，从根上跳过所有 shell 引用层。如果你之前是用 `ALLUXIO_JAVA_OPTS` 里的 `-Dalluxio.license=...` 启动的，改用对应步骤的文件挂载方式即可。

**Worker 未出现在 `alluxio info nodes` 中**

1. 从 worker 主机验证 ETCD 是否可达：

   ```shell
   curl http://<COORDINATOR_PRIVATE_IP>:2379/health
   ```

   预期结果：`{"health":"true"}`
2. 检查 worker 容器日志：

   ```shell
   docker logs alluxio-worker 2>&1 | grep -i "error\|etcd\|register" | tail -20
   ```

**FUSE 容器启动后挂载不可见**

1. 验证挂载目录是否在**启动容器之前**已创建：

   ```shell
   ls -la /mnt/alluxio/fuse
   ```

   若目录不存在，容器会成功启动但挂载会被静默跳过。
2. 检查容器日志：

   ```shell
   docker logs alluxio-fuse 2>&1 | tail -20
   ```

   类似 `Mount point '/mnt/alluxio/fuse' does not exist` 的日志行确认目录缺失。
3. 修复后删除并重新创建容器：

   ```shell
   docker rm -f alluxio-fuse
   # 重新执行步骤 6 中的 docker run 命令
   ```

**容器删除后出现 `Transport endpoint is not connected`**

FUSE 文件系统在容器退出后仍注册在内核中。手动卸载：

```shell
sudo umount -l /mnt/alluxio/fuse
```

**`alluxio mount add` 报 `unknown command`**

`alluxio mount add` CLI 使用具名参数。旧的位置参数语法（`alluxio mount add /path s3://...`）已不再支持。请使用：

```shell
alluxio mount add --path /s3 --ufs-uri s3://<BUCKET>/ --option alluxio.underfs.s3.region=<REGION>
```

**S3 挂载报凭据错误**

coordinator 使用 `--net=host` 继承宿主机网络栈，因此可以访问 EC2 实例元数据服务（IMDS）——这是每台 EC2 实例内置的端点，为运行在该主机上的进程下发短期凭据。首先确认该实例是否绑定了 IAM role：

```shell
# 应返回绑定的 IAM role 名称；返回为空说明未绑定 role
curl -s http://169.254.169.254/latest/meta-data/iam/security-credentials/
```

如果未绑定 role，请使用 Step 4 中的 access key 方式重新挂载。如果已绑定 role 但仍报错，检查该 role 的策略是否包含目标 bucket 的 `s3:GetObject`、`s3:ListBucket` 和 `s3:PutObject` 权限。

### B. Worker 节点身份

Worker 首次启动时，Alluxio 会自动创建 `/opt/alluxio/conf/worker_identity` 并写入一个 UUID。若该文件丢失，worker 将以新 UUID 重新注册——其 ring slots 重新映射，已缓存的数据变得不可访问，旧 UUID 以过期条目形式留在 etcd 中，直到手动清除或自动清除（仅限 dynamic mode）。

**触发身份丢失的场景：**

* 宿主机重启（容器文件系统被重置）
* 容器重建：`docker rm` + `docker run`——更新 `ALLUXIO_JAVA_OPTS` 时最常见的触发方式

**清理 OFFLINE 条目**（身份丢失后的补救措施）：

```shell
docker exec alluxio-coordinator alluxio process remove-worker -n <WORKER_ID>
```

Worker ID 在 `alluxio info nodes` 输出中显示。

**预防身份丢失**——通过 `alluxio.worker.identity.uuid.file.path` 将身份文件重定向到一个宿主机 bind-mount 目录。首次启动时，Alluxio 会把 UUID 写入该路径；由于路径位于已挂载目录内，文件立即落到宿主机上，后续无论 `docker rm` + `docker run` 多少次都能保留。

在每台 worker 主机上**启动 worker 前**执行：

```shell
# 创建持久目录，归属 Alluxio 容器用户（UID 1000）
sudo mkdir -p /etc/alluxio/identity
sudo chown 1000 /etc/alluxio/identity
```

在 `/etc/alluxio/alluxio-site.properties` 中添加：

```properties
alluxio.worker.identity.uuid.file.path=/etc/alluxio/identity/worker_identity
```

在步骤 3 的 worker `docker run` 命令中加入 `-v /etc/alluxio/identity:/etc/alluxio/identity`：

```shell
docker run -d --net=host --name=alluxio-worker \
  -v /tmp/alluxio-cache:/tmp/alluxio-cache \
  -v /etc/alluxio/alluxio-site.properties:/opt/alluxio/conf/alluxio-site.properties \
  -v /etc/alluxio/identity:/etc/alluxio/identity \
  -e ALLUXIO_JAVA_OPTS="-Xmx8g -Xms2g -XX:MaxDirectMemorySize=8g" \
  alluxio/alluxio-enterprise:AI-3.9-16.0.0 worker
```

首次启动时，Alluxio 会在宿主机上创建 `/etc/alluxio/identity/worker_identity`；后续每次启动都从该文件读取 UUID，无需任何额外的复制步骤。

{% hint style="info" %}
如果首次启动前无法修改 `alluxio-site.properties`，可以先不带任何 identity 挂载启动 worker，再用 `docker cp alluxio-worker:/opt/alluxio/conf/worker_identity /etc/alluxio/worker_identity && sudo chmod 666 /etc/alluxio/worker_identity` 把生成的文件复制出来，然后以 `-v /etc/alluxio/worker_identity:/opt/alluxio/conf/worker_identity` 重建容器。注意：若宿主机路径不存在，Docker 会在该路径创建目录而非文件，容器启动时将报 `IsADirectoryException`。
{% endhint %}

有关身份持久化重要性及其对哈希环影响的完整说明，请参阅[重启 Worker](/ee-ai-cn/administration/managing-ring.md#zhong-qi-worker)。

### C. 收集日志用于技术支持

```shell
# Coordinator 日志
docker cp alluxio-coordinator:/opt/alluxio/logs /tmp/coordinator-logs

# Worker 日志（在每台 worker 主机上执行）
docker cp alluxio-worker:/opt/alluxio/logs /tmp/worker-logs
```

### D. 更新配置

主安装流程里 coordinator 和 worker 的容器都已经挂载了宿主机上的 `/etc/alluxio/alluxio-site.properties`（见 [启动 Coordinator](#id-2.-qi-dong-coordinatorcoordinator-zhu-ji) 和 [启动 Worker](#id-3.-qi-dong-workermei-tai-worker-zhu-ji)）。安装后要修改任何 Alluxio 属性，直接编辑宿主机上的文件、再 `docker restart` 容器即可——`docker restart` 会保留 worker ID；相反，用 `-e ALLUXIO_JAVA_OPTS` 改配置会强制重建容器，生成新的 worker ID，并在 ETCD 中留下 `OFFLINE` 记录（参见[附录 B](#b-worker-节点身份)）。

```shell
# 编辑宿主机上的 /etc/alluxio/alluxio-site.properties，然后重启：
docker restart alluxio-worker     # 或 alluxio-coordinator
```

重启后，验证 worker 以相同身份重新加入集群：

```shell
docker exec alluxio-coordinator alluxio info nodes
# 预期结果：WorkerId 与之前相同，状态为 ONLINE
```

## 相关文档

* [集群管理](/ee-ai-cn/administration/managing-alluxio.md) — 部署后运维：扩缩容、哈希环调优、Worker 生命周期和 UFS 挂载管理
* [Amazon S3 UFS](/ee-ai-cn/ufs/s3.md) — S3 凭据和配置选项
* [POSIX API (FUSE)](/ee-ai-cn/data-access/fuse-based-posix-api.md) — FUSE 挂载选项和调优
* [S3 API](/ee-ai-cn/data-access/s3-api.md) — 使用 Alluxio S3 兼容端点
* [监控](/ee-ai-cn/administration/monitoring-alluxio.md) — 告警规则、Datadog 集成和指标参考