# Cache Eviction

Data leaves the Alluxio cache in three ways:

1. **Automatic eviction** — workers evict data when cache fills up, ordered by the configured policy (LRU by default)
2. **TTL expiry** — background scan removes data whose lifetime has elapsed, regardless of access or priority
3. **Manual eviction** — `job free` explicitly purges a path on demand

## Automatic Eviction

When a worker needs space for new data, it runs an evictor to select which cached pages to remove. Three policies are available:

| Policy          | Evicts                                 |
| --------------- | -------------------------------------- |
| `LRU` (default) | Data not accessed for the longest time |
| `LFU`           | Data accessed the fewest times overall |
| `FIFO`          | Data written earliest                  |

To change the eviction policy, set in `alluxio-site.properties` on all workers:

```properties
# Use LFU instead of the default LRU
alluxio.worker.page.store.evictor.type=LFU
```

### Asynchronous Eviction

By default, eviction runs synchronously during writes, which can add latency. Asynchronous eviction runs in the background to keep headroom available before the cache fills up:

```properties
alluxio.worker.page.store.async.eviction.enabled=true
# start evicting when cache usage exceeds this threshold (default: 0.9)
alluxio.worker.page.store.async.eviction.high.watermark=0.85
# stop evicting when cache usage drops below this threshold (default: 0.8)
alluxio.worker.page.store.async.eviction.low.watermark=0.75
# how often to check cache usage (default: 1min)
alluxio.worker.page.store.async.eviction.check.interval=30s
```

{% hint style="info" %}
TTL-based eviction and Cache Priority also affect what gets evicted and when. See [Cache Policies](https://documentation.alluxio.io/ee-ai-en/cache/managing-data-in-the-cache) for details.
{% endhint %}

## Manual Eviction: The `free` Job

Use `job free` to explicitly purge cached data for a path — without touching the underlying UFS data. Common scenarios:

* **Model version update**: free the old version before (or after) loading the new one
* **Post-job cleanup**: release space after a batch job completes
* **Force re-cache**: free then reload to pick up UFS changes for an immutable-policy path

### Submit and Monitor

{% tabs %}
{% tab title="Kubernetes (Operator)" %}

```shell
# Submit (returns immediately)
kubectl exec -n <NAMESPACE> alluxio-cluster-coordinator-0 -- \
  alluxio job free --path <ufs-or-alluxio-path> --submit

# Monitor progress
kubectl exec -n <NAMESPACE> alluxio-cluster-coordinator-0 -- \
  alluxio job free --path <ufs-or-alluxio-path> --progress
```

{% endtab %}

{% tab title="Docker / Bare-Metal" %}

```shell
# Submit (returns immediately)
bin/alluxio job free --path <ufs-or-alluxio-path> --submit

# Monitor progress
bin/alluxio job free --path <ufs-or-alluxio-path> --progress
```

{% endtab %}
{% endtabs %}

Example progress output:

```console
Progress for Free path file '<path>':
    Job Id: b21ce9fb-f332-4d39-8bb4-554f9a4fa601
    Job Submitted: Fri Feb 02 21:28:56 CST 2024
    Job path: <path>
    Job State: SUCCEEDED, finished at Fri Feb 02 21:29:01 CST 2024
    Free Info:  totalFile:4 totalByte:3072.00KB
    Free Files Failed: 0
    Free Bytes Failed: 0B
    Free Files Succeeded: 4
    Free Bytes Succeeded: 3072.00KB
```

### Stop a Running Free Job

{% tabs %}
{% tab title="Kubernetes (Operator)" %}

```shell
kubectl exec -n <NAMESPACE> alluxio-cluster-coordinator-0 -- \
  alluxio job free --path <ufs-or-alluxio-path> --stop
```

{% endtab %}

{% tab title="Docker / Bare-Metal" %}

```shell
bin/alluxio job free --path <ufs-or-alluxio-path> --stop
```

{% endtab %}
{% endtabs %}

Stopping leaves partially-freed data in the cache. The job can be resumed by submitting it again with `--submit`.

### Version Update Pattern

To replace a pinned dataset with a newer version:

{% tabs %}
{% tab title="Kubernetes (Operator)" %}

```shell
# 1. Remove priority pin from old version
kubectl exec -n <NAMESPACE> alluxio-cluster-coordinator-0 -- \
  alluxio priority remove --path <old-version-path>

# 2. Free old version from cache
kubectl exec -n <NAMESPACE> alluxio-cluster-coordinator-0 -- \
  alluxio job free --path <old-version-path> --submit

# 3. Load and pin new version
kubectl exec -n <NAMESPACE> alluxio-cluster-coordinator-0 -- \
  alluxio job load --path <new-version-path> --submit --verify
kubectl exec -n <NAMESPACE> alluxio-cluster-coordinator-0 -- \
  alluxio priority add --path <new-version-path> --priority high
```

{% endtab %}

{% tab title="Docker / Bare-Metal" %}

```shell
# 1. Remove priority pin from old version
bin/alluxio priority remove --path <old-version-path>

# 2. Free old version from cache
bin/alluxio job free --path <old-version-path> --submit

# 3. Load and pin new version
bin/alluxio job load --path <new-version-path> --submit --verify
bin/alluxio priority add --path <new-version-path> --priority high
```

{% endtab %}
{% endtabs %}

For a complete list of `job free` flags, see the [`job free` CLI reference](https://documentation.alluxio.io/ee-ai-en/reference/user-cli#job-free).

You can also trigger and manage free jobs via the [REST API](https://documentation.alluxio.io/ee-ai-en/reference/rest-api#free-cache).

## Stale Cache Cleaning

Cluster topology changes can leave data cached on workers that no longer "own" that data according to the consistent hash ring. This stale data consumes space but is never served to clients.

**When this happens:**

* Workers are added or removed (ownership redistributes)
* A file's replication factor is reduced
* A worker goes offline temporarily and its data migrates, then it rejoins

### Trigger Stale Cleaning

```shell
curl -X POST <coordinator-host>:<coordinator-api-port>/api/v1/cache \
  -d '{"op":"clear-stale"}' \
  -H "Content-Type: application/json"
```

This submits an async job to each worker. Workers scan local storage, verify ownership against the current hash ring, and delete any data they no longer own. Monitor progress via the `alluxio_cleared_stale_cached_data` Prometheus metric or worker logs.

For more details, see the [REST API reference](https://documentation.alluxio.io/ee-ai-en/reference/rest-api).

{% hint style="warning" %}
Mass file deletion from stale cleaning can create metadata I/O pressure that delays Worker Pod termination. Run during a maintenance window on large clusters.
{% endhint %}
