Docker Installation
Deploy Alluxio on bare-metal Linux hosts, EC2 instances, or Slurm-managed clusters using Docker — no Kubernetes required.
Overview
Architecture
Each Alluxio component runs in its own Docker container using --net=host, sharing the host network stack. Components communicate directly by IP — no port mapping needed.
Coordinator node
ETCD, Alluxio Coordinator, Prometheus, Grafana
Worker node(s)
Alluxio Worker (one per host)
FUSE client node
Alluxio FUSE (one per host)
Artifacts
You will receive a download link for two artifacts:
Alluxio image
alluxio-enterprise-AI-3.8-15.1.2-linux-amd64-docker.tar
Single image for coordinator, worker, and FUSE roles
License
Required to activate the cluster
Platform: Use
-linux-amd64-docker.tarfor x86 hosts,-linux-arm64-docker.tarfor ARM.
The same image is loaded on every node. The role — coordinator, worker, or fuse — is set by the argument passed to docker run.
Before You Start
Run these checks before starting. Skipping them is the most common cause of deployment failures.
EC2 + IAM roles: Attach the IAM instance profile before launch. No access keys are needed in the
docker runcommands.
Installation Steps
0. Load the Alluxio Image
Run on every host (coordinator, each worker, FUSE client):
✅ Verify:
Note the image name and tag — you will use them in every subsequent docker run command.
1. Start ETCD
SSH into the ETCD node:
Use the private IP for --advertise-client-urls — not the public DNS or hostname — to ensure workers on other hosts can reach ETCD reliably.
For production, run a 3-node ETCD cluster for high availability. Single-node is suitable for evaluation only.
✅ Verify:
2. Start Coordinator (Coordinator host)
Key properties:
alluxio.license
License string
alluxio.etcd.endpoints
ETCD address (private IP + port 2379)
alluxio.coordinator.hostname
Private IP workers use to register
alluxio.mount.table.source=ETCD
Persist mount table in ETCD across restarts
✅ Verify:
No ERROR lines in the first 30 seconds indicates a healthy start.
3. Start Workers (each Worker host)
SSH into each worker host:
Set <CACHE_SIZE> to ~80% of available space on /data/alluxio-cache (e.g., 50GB, 200GB, 1TB). Avoid /tmp — it is cleared on reboot.
S3 API (optional): Add
-Dalluxio.worker.s3.api.enabled=trueto enable the S3-compatible endpoint on each worker (port 29998). Only needed if clients will access Alluxio via the S3 API.
JVM sizing — Alluxio stores cached data on disk, not in heap. The JVM heap does not need to be large relative to cache size:
Host RAM
-Xmx
-XX:MaxDirectMemorySize
16 GB
4g
4g
32 GB
8g
8g
64 GB
16g
16g
128 GB+
32g
32g
✅ Verify (from coordinator host):
Workers may take 10–15 seconds to register after starting.
4. Mount Storage
Run alluxio mount add from any host that has access to the coordinator. For full UFS configuration options, see Underlying Storage.
S3 with IAM role (recommended on EC2):
S3 with access key/secret:
✅ Verify:
5. Verify Data Access
Returns the contents of your S3 bucket — no errors means the mount is working.
6. Start FUSE (FUSE client host)
SSH into the FUSE client host. Create the mount directory:
Start the FUSE container:
--privilegedis required for FUSE to mount inside the container and propagate to the host via-v /mnt/alluxio:/mnt/alluxio:shared.
✅ Verify (wait ~10 seconds):
Each Alluxio mount point appears as a subdirectory. /mnt/alluxio/fuse/s3/ maps directly to s3://<S3_BUCKET>/.
Test read and write:
Uninstall
Stop and remove containers on each host in reverse order:
FUSE client host:
Each worker host:
Coordinator host:
Mount table persistence: ETCD stores the mount table in its container filesystem. Running
docker rmon the ETCD container will lose all mount points. To persist them across restarts, add-v /data/etcd:/etcd-data --data-dir /etcd-datato the ETCDdocker runcommand.
Monitoring (Optional)
Set up Prometheus and Grafana on the coordinator host using Docker Compose.
Create ~/monitoring/prometheus/prometheus.yml:
Create ~/monitoring/grafana/datasource.yml:
Create ~/monitoring/compose.yaml:
Access Grafana at http://<COORDINATOR_PUBLIC_IP>:3000 (login: admin / grafana).
Import the Alluxio dashboard JSON from the monitoring documentation.
Appendix
A. Troubleshooting
Workers not appearing in alluxio info nodes
Verify ETCD is reachable from the worker host:
Expected:
{"health":"true"}Check worker logs:
Confirm
alluxio.coordinator.hostnameis set to an IP reachable from the worker. If unreachable, registration silently fails.
FUSE mount not visible after container starts
Check that the mount directory exists:
If missing, the container starts successfully but the mount is silently skipped.
Check container logs:
Mount point '/mnt/alluxio/fuse' does not existconfirms the directory was missing.Fix the directory, then recreate the container:
Transport endpoint is not connected after FUSE container is removed
The FUSE filesystem stays registered with the kernel after the container exits. Unmount manually:
alluxio mount add fails with unknown command
Use named flags — the old positional syntax is no longer supported:
B. Worker Identity After Reboot
If a worker host reboots, the container loses its identity file and registers as a new worker. The old entry stays in ETCD as OFFLINE. Remove it:
Worker IDs are shown in alluxio info nodes.
C. Collecting Logs for Support
Related Documentation
Amazon S3 UFS — S3 credentials and configuration options
POSIX API (FUSE) — FUSE mount options and tuning
S3 API — Using the Alluxio S3-compatible endpoint
Monitoring — Grafana dashboard import and metrics reference
Last updated