# Stale Cache Cleaning

In Alluxio, clients use consistent hashing to determine the appropriate worker to access or write a file. This ensures that each file is typically cached only on its designated worker. However, under certain conditions, a worker may end up caching data that no longer belongs to it. To reclaim memory and maintain optimal cache utilization, Alluxio provides a mechanism to **clear stale cached data** from workers.

This operation triggers each worker to scan its local cache, verify whether it still owns the cached data, and delete any data that no longer belongs to it.

## When Stale Cache Occurs

Stale data may exist on a worker due to the following situations:

1. **Replica Reduction**\
   If the replication factor of a file is reduced (e.g., from 3 to 2), the third worker still holds a redundant replica that is no longer needed.
2. **Dynamic Hash Ring Membership Change**\
   When using dynamic hash ring, a worker may temporarily go offline and its responsibilities are taken over by other workers. If the original worker later rejoins, the other workers that were serving in its place may hold stale data.
3. **Cluster Expansion**\
   Adding new workers can change the ownership of cached files. Data previously cached on old workers may now be the responsibility of newly added workers.

To clean up such stale data, the clear stale cache operation can be manually triggered.

## Differences between Clearing Stale Cache and Free Job

Clearing Stale Cache and Free Job are two mechanisms for cache cleanup in Alluxio, and they are often confused due to their similar purposes. The table below outlines their differences:

| Aspect               | Clearing Stale Cache                                      | Free Job                                                          |
| -------------------- | --------------------------------------------------------- | ----------------------------------------------------------------- |
| Primary Use Case     | Cleaning up stale or misplaced data after cluster changes | Releasing cache for data that is no longer needed by applications |
| Type of Cache Freed  | Incorrect/invalid cache                                   | Valid cache that is no longer needed                              |
| Target of Cleanup    | Removes cached files that shouldn't reside on a worker    | Removes files from workers based on the input specification       |
| Interface            | REST API                                                  | REST API & CLI                                                    |
| Input Parameters     | None                                                      | Requires a directory path or an index file as input               |
| Scheduling Mechanism | Immediately executes on all workers                       | Relies on the job system for scheduling                           |

## Usage

This feature is currently accessible **via REST API** only. Please refer to the [API reference page](/ee-ai-en/ai-3.6/reference/rest-api.md#Clear-Stale-Cache) for more details.

### Submit Task

The following API triggers the `clear stale cache` task to be asynchronously executed across all workers:

```shell
curl -X POST <coordinator-host>:<coordinator-api-port>/api/v1/cache -d '{"op":"clear-stale"}' -H "Content-Type: application/json"
```

This command submits a background job to all workers. Submitting the same request multiple times will not cause duplicate executions.

**Example Response**:

```json
{
  "errors": {
    "worker1-host": "Connection refused",
    "worker2-host": "Timeout"
  }
}
```

> An empty `errors` object indicates successful job submission to all workers. Otherwise, the `errors` field will be a mapping of the hostname of the workers where an error occurred, and the error message. An error occurs if the job failed to be submitted to a worker due to network connection failure, or a job submitted earlier has not finished running.

### Stop Task

To cancel the task (if needed), send a DELETE request with the same `op`:

```shell
curl -X DELETE <coordinator-host>:<coordinator-api-port>/api/v1/cache -d '{"op":"clear-stale"}' -H "Content-Type: application/json"
```

This request will stop the background task on all workers. If no such task is running, the command will still succeed without error.

## Monitoring Task Progress

There is currently **no RPC to track the progress** of the clear stale cache job. However, you can monitor its progress in the following ways:

### Via Logs

When the task completes on a worker, the following log will appear:

```
2025-04-21T19:51:22,889 INFO  AsyncJobWorker - Clear stale cached files finished. 104857600 bytes released
```

This log message indicates the job completion and the amount of stale data removed.

### Via Prometheus Metrics

Alluxio exposes a metric to track stale cache clearance:

```
alluxio_cleared_stale_cached_data
```

This metric accumulates the total number of bytes cleared by the clear stale cache operation on a worker. At the completion of the job, the aggregated sum of this metric across all workers will plateau. You can use this metric to monitor and alert on cache cleanup trends across your cluster.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://documentation.alluxio.io/ee-ai-en/ai-3.6/cache/stale-cache-cleaning.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
