Docker Installation

Deploy Alluxio on bare-metal Linux hosts, EC2 instances, or Slurm-managed clusters using Docker — no Kubernetes required.

Overview

Architecture

Each Alluxio component runs in its own Docker container using --net=host, sharing the host network stack. Components communicate directly by IP — no port mapping needed.

Host
Containers

Coordinator node

ETCD, Alluxio Coordinator, Prometheus, Grafana

Worker node(s)

Alluxio Worker (one per host)

FUSE client node

Alluxio FUSE (one per host)

Artifacts

You will receive a download link for two artifacts:

Artifact
Filename
Purpose

Alluxio image

alluxio-enterprise-AI-3.8-15.1.2-linux-amd64-docker.tar

Single image for coordinator, worker, and FUSE roles

License

Required to activate the cluster

Platform: Use -linux-amd64-docker.tar for x86 hosts, -linux-arm64-docker.tar for ARM.

The same image is loaded on every node. The role — coordinator, worker, or fuse — is set by the argument passed to docker run.

Before You Start

Run these checks before starting. Skipping them is the most common cause of deployment failures.

EC2 + IAM roles: Attach the IAM instance profile before launch. No access keys are needed in the docker run commands.

Installation Steps

0. Load the Alluxio Image

Run on every host (coordinator, each worker, FUSE client):

✅ Verify:

Note the image name and tag — you will use them in every subsequent docker run command.

1. Start ETCD

SSH into the ETCD node:

Use the private IP for --advertise-client-urls — not the public DNS or hostname — to ensure workers on other hosts can reach ETCD reliably.

For production, run a 3-node ETCD cluster for high availability. Single-node is suitable for evaluation only.

✅ Verify:

2. Start Coordinator (Coordinator host)

Key properties:

Property
Purpose

alluxio.license

License string

alluxio.etcd.endpoints

ETCD address (private IP + port 2379)

alluxio.coordinator.hostname

Private IP workers use to register

alluxio.mount.table.source=ETCD

Persist mount table in ETCD across restarts

✅ Verify:

No ERROR lines in the first 30 seconds indicates a healthy start.

3. Start Workers (each Worker host)

SSH into each worker host:

Set <CACHE_SIZE> to ~80% of available space on /data/alluxio-cache (e.g., 50GB, 200GB, 1TB). Avoid /tmp — it is cleared on reboot.

S3 API (optional): Add -Dalluxio.worker.s3.api.enabled=true to enable the S3-compatible endpoint on each worker (port 29998). Only needed if clients will access Alluxio via the S3 API.

JVM sizing — Alluxio stores cached data on disk, not in heap. The JVM heap does not need to be large relative to cache size:

Host RAM

-Xmx

-XX:MaxDirectMemorySize

16 GB

4g

4g

32 GB

8g

8g

64 GB

16g

16g

128 GB+

32g

32g

✅ Verify (from coordinator host):

Workers may take 10–15 seconds to register after starting.

4. Mount Storage

Run alluxio mount add from any host that has access to the coordinator. For full UFS configuration options, see Underlying Storage.

S3 with IAM role (recommended on EC2):

S3 with access key/secret:

✅ Verify:

5. Verify Data Access

Returns the contents of your S3 bucket — no errors means the mount is working.

6. Start FUSE (FUSE client host)

SSH into the FUSE client host. Create the mount directory:

Start the FUSE container:

--privileged is required for FUSE to mount inside the container and propagate to the host via -v /mnt/alluxio:/mnt/alluxio:shared.

✅ Verify (wait ~10 seconds):

Each Alluxio mount point appears as a subdirectory. /mnt/alluxio/fuse/s3/ maps directly to s3://<S3_BUCKET>/.

Test read and write:

Uninstall

Stop and remove containers on each host in reverse order:

FUSE client host:

Each worker host:

Coordinator host:

Mount table persistence: ETCD stores the mount table in its container filesystem. Running docker rm on the ETCD container will lose all mount points. To persist them across restarts, add -v /data/etcd:/etcd-data --data-dir /etcd-data to the ETCD docker run command.

Monitoring (Optional)

Set up Prometheus and Grafana on the coordinator host using Docker Compose.

Create ~/monitoring/prometheus/prometheus.yml:

Create ~/monitoring/grafana/datasource.yml:

Create ~/monitoring/compose.yaml:

Access Grafana at http://<COORDINATOR_PUBLIC_IP>:3000 (login: admin / grafana).

Import the Alluxio dashboard JSON from the monitoring documentation.

Appendix

A. Troubleshooting

Workers not appearing in alluxio info nodes

  1. Verify ETCD is reachable from the worker host:

    Expected: {"health":"true"}

  2. Check worker logs:

  3. Confirm alluxio.coordinator.hostname is set to an IP reachable from the worker. If unreachable, registration silently fails.

FUSE mount not visible after container starts

  1. Check that the mount directory exists:

    If missing, the container starts successfully but the mount is silently skipped.

  2. Check container logs:

    Mount point '/mnt/alluxio/fuse' does not exist confirms the directory was missing.

  3. Fix the directory, then recreate the container:

Transport endpoint is not connected after FUSE container is removed

The FUSE filesystem stays registered with the kernel after the container exits. Unmount manually:

alluxio mount add fails with unknown command

Use named flags — the old positional syntax is no longer supported:

B. Worker Identity After Reboot

If a worker host reboots, the container loses its identity file and registers as a new worker. The old entry stays in ETCD as OFFLINE. Remove it:

Worker IDs are shown in alluxio info nodes.

C. Collecting Logs for Support

Last updated