# Cache Loading

Alluxio populates its cache in two ways: **passively** on first read (automatic, no setup) and **actively** via the `job load` command (explicit preload before your job runs).

## Prerequisites

* A running Alluxio cluster with at least one worker
* At least one UFS mount configured (`alluxio mount list` to verify)

{% hint style="info" %}
Alluxio will automatically evict cached data to make room for new data according to the configured eviction policy. You do not need to pre-clear space before submitting a load job.
{% endhint %}

## Passive Caching

On every cache miss, Alluxio fetches the file from UFS and writes it into the worker cache while streaming it to the application. No configuration needed — subsequent reads are served from cache.

This is the default behavior. Use active preloading when you cannot afford the first-read latency.

## Active Preloading with `job load`

`job load` submits a distributed load job: the coordinator distributes work across all workers, each pulling its assigned files from UFS directly.

### Submit and Monitor

`--path` accepts either a UFS path (e.g. `s3://my-bucket/dataset/`) or an Alluxio virtual path (e.g. `/mnt/dataset/`). See the [CLI reference](https://documentation.alluxio.io/ee-ai-en/reference/user-cli#job-load) for details.

{% tabs %}
{% tab title="Kubernetes (Operator)" %}

```shell
# Submit (returns immediately)
kubectl exec -n <NAMESPACE> alluxio-cluster-coordinator-0 -- \
  alluxio job load --path <ufs-or-alluxio-path> --submit

# Monitor progress
kubectl exec -n <NAMESPACE> alluxio-cluster-coordinator-0 -- \
  alluxio job load --path <ufs-or-alluxio-path> --progress
```

{% endtab %}

{% tab title="Docker / Bare-Metal" %}

```shell
# Submit (returns immediately)
bin/alluxio job load --path <ufs-or-alluxio-path> --submit

# Monitor progress
bin/alluxio job load --path <ufs-or-alluxio-path> --progress
```

{% endtab %}
{% endtabs %}

Example progress output:

```console
Progress for loading path 's3://my-bucket/dataset/':
        Settings:       bandwidth: unlimited    verify: false
        Job State: SUCCEEDED
        Files Processed: 1000
        Bytes Loaded: 125.00MiB
        Throughput: 2509.80KiB/s
        Block load failure rate: 0.00%
        Files Failed: 0
```

### Stop a Running Job

{% tabs %}
{% tab title="Kubernetes (Operator)" %}

```shell
kubectl exec -n <NAMESPACE> alluxio-cluster-coordinator-0 -- \
  alluxio job load --path <ufs-or-alluxio-path> --stop
```

{% endtab %}

{% tab title="Docker / Bare-Metal" %}

```shell
bin/alluxio job load --path <ufs-or-alluxio-path> --stop
```

{% endtab %}
{% endtabs %}

### Key Flags

| Flag                      | Description                                                                                  |
| ------------------------- | -------------------------------------------------------------------------------------------- |
| `--submit`                | Submit the job asynchronously (returns immediately)                                          |
| `--progress`              | Show progress of a submitted job                                                             |
| `--stop`                  | Stop a running job                                                                           |
| `--verify`                | After load completes, verify all files are cached and reload any that are missing            |
| `--replicas <n>`          | Load `n` replicas per file (default: 1); useful for high-concurrency reads                   |
| `--skip-if-exists`        | Skip files that are already fully cached (safe to re-run a load job)                         |
| `--metadata-only`         | Load file metadata without caching file data                                                 |
| `--batch-size <n>`        | Number of files per batch per worker; tune for large directories                             |
| `--partial-listing`       | Start loading before the full directory listing completes; useful for very large directories |
| `--index-file <ufs-path>` | Load a specific list of files defined in a UFS index file (one path per line)                |

For the full flag reference, see [`job load` CLI documentation](https://documentation.alluxio.io/ee-ai-en/reference/user-cli#job-load).

### Loading from an Index File

For selective loading or when the directory tree is too large to traverse upfront:

{% tabs %}
{% tab title="Kubernetes (Operator)" %}

```shell
kubectl exec -n <NAMESPACE> alluxio-cluster-coordinator-0 -- \
  alluxio job load --index-file s3://my-bucket/load-manifest.txt --submit
```

{% endtab %}

{% tab title="Docker / Bare-Metal" %}

```shell
bin/alluxio job load --index-file s3://my-bucket/load-manifest.txt --submit
```

{% endtab %}
{% endtabs %}

Index file format — one UFS path per line, lines starting with `#` are comments:

```
s3://my-bucket/dataset/train/
s3://my-bucket/dataset/val/file.parquet
# s3://my-bucket/dataset/test/   <- skipped
```

Directories must end with `/` to be loaded recursively.

## Integrating with ML Training

A typical workflow: load data → verify → run training.

{% tabs %}
{% tab title="Kubernetes (Operator)" %}

```shell
# 1. Submit load
kubectl exec -n <NAMESPACE> alluxio-cluster-coordinator-0 -- \
  alluxio job load --path s3://my-bucket/dataset/ --submit --verify

# 2. Poll until SUCCEEDED
kubectl exec -n <NAMESPACE> alluxio-cluster-coordinator-0 -- \
  alluxio job load --path s3://my-bucket/dataset/ --progress
# Repeat until "Job State: SUCCEEDED", then launch training pods
```

{% endtab %}

{% tab title="Docker / Bare-Metal" %}

```shell
# 1. Submit load
bin/alluxio job load --path s3://my-bucket/dataset/ --submit --verify

# 2. Wait until SUCCEEDED
bin/alluxio job load --path s3://my-bucket/dataset/ --progress
# Repeat until "Job State: SUCCEEDED"

# 3. Start training
python train.py --data /mnt/alluxio/fuse/dataset/
```

{% endtab %}
{% endtabs %}

## Failure Modes

**`Job State: FAILED` with `Files Failed > 0`**

Check the file-level failure list:

{% tabs %}
{% tab title="Kubernetes (Operator)" %}

```shell
kubectl exec -n <NAMESPACE> alluxio-cluster-coordinator-0 -- \
  alluxio job load --path <path> --progress --file-status FAILURE
```

{% endtab %}

{% tab title="Docker / Bare-Metal" %}

```shell
bin/alluxio job load --path <path> --progress --file-status FAILURE
```

{% endtab %}
{% endtabs %}

Common causes: UFS access errors, network timeouts, or missing credentials. Fix the underlying issue, then resubmit with `--skip-if-exists` to avoid re-loading already-cached files.

**`Job State: FAILED` immediately after submit**

Run `--progress --verbose` for detail:

{% tabs %}
{% tab title="Kubernetes (Operator)" %}

```shell
kubectl exec -n <NAMESPACE> alluxio-cluster-coordinator-0 -- \
  alluxio job load --path <path> --progress --verbose
```

{% endtab %}

{% tab title="Docker / Bare-Metal" %}

```shell
bin/alluxio job load --path <path> --progress --verbose
```

{% endtab %}
{% endtabs %}

Often caused by: path not found in mount table (verify with `alluxio mount list`), or insufficient cache quota.

**Load succeeds but reads still go to UFS**

Verify that specific files are actually cached:

{% tabs %}
{% tab title="Kubernetes (Operator)" %}

```shell
kubectl exec -n <NAMESPACE> alluxio-cluster-coordinator-0 -- \
  alluxio fs check-cached <path>
```

{% endtab %}

{% tab title="Docker / Bare-Metal" %}

```shell
bin/alluxio fs check-cached <path>
```

{% endtab %}
{% endtabs %}

If files show as uncached after a successful load, data may have been evicted. Check cache capacity and eviction settings — see [Cache Eviction](https://documentation.alluxio.io/ee-ai-en/cache/removing-data-from-the-cache). For cluster-wide cache hit rate, see [Monitoring](https://documentation.alluxio.io/ee-ai-en/administration/monitoring-alluxio).

## Retention of Historical Jobs

Completed job records are kept for a configurable period. The default is 7 days. To adjust:

```properties
# retain completed job records for 3 days (default: 7d)
alluxio.job.retention.time=3d
```

## Related

* [Cache Eviction](https://documentation.alluxio.io/ee-ai-en/cache/removing-data-from-the-cache) — control what gets removed when cache is full
* [Multiple Replicas](https://documentation.alluxio.io/ee-ai-en/high-availability/multiple-replicas) — load multiple copies per file for fault tolerance
* [`job load` CLI Reference](https://documentation.alluxio.io/ee-ai-en/reference/user-cli#job-load)
