Removing Data from the Cache
While Alluxio's automatic eviction policies (like LRU, TTL, and quotas) handle most cache cleanup, there are times when you need to manually or systematically remove data. This guide covers three primary mechanisms for removing data from the Alluxio cache:
Automatic Eviction: The standard process driven by cache capacity and policies.
Manual Eviction: Forcibly removing specific files or directories using a job.
Stale Cache Cleaning: A specialized cleanup for removing misplaced data after cluster changes.
Automatic Eviction
Automatic eviction is the most common way data is removed from the cache. It is triggered when a worker's cache storage reaches its capacity and needs to make room for new data. The eviction process is governed by policies you define.
Note: In addition to the capacity-based eviction described here, Alluxio offers advanced eviction strategies like Time-to-Live (TTL) and Priority-based eviction. These policies allow you to remove data based on its age or importance, giving you more granular control over the cache. You can learn more in the Managing Data in the Cache guide.
How It Works
When a worker needs to write a new page of data but its cache is full, it runs an evictor to select and remove existing pages. The primary eviction strategies are:
LRU (Least Recently Used): The default policy. It evicts the data that hasn't been accessed for the longest time.
LFU (Least Frequently Used): Evicts the data that has been accessed the fewest times.
FIFO (First-In, First-Out): Evicts the data that was the first to be written.
You can configure the eviction policy in alluxio-site.properties
:
# Sets the eviction policy for the worker cache
alluxio.worker.page.store.evictor.type=LRU
Asynchronous Eviction
To avoid performance degradation caused by synchronous eviction during a write operation, Alluxio also performs background asynchronous eviction. This process periodically checks cache usage and proactively evicts data to maintain free space.
You can configure high and low "watermarks" to control this process:
# Enable background asynchronous eviction
alluxio.worker.page.store.async.eviction.enabled=true
# Check cache usage every minute
alluxio.worker.page.store.async.eviction.check.interval=1min
# Start evicting when usage hits 90%
alluxio.worker.page.store.async.eviction.high.watermark=0.9
# Stop evicting when usage falls to 80%
alluxio.worker.page.store.async.eviction.low.watermark=0.8
This setup ensures that there is almost always space available for new writes, minimizing latency.
Manual Eviction: The free
Job
free
JobSometimes you need to explicitly remove data from the cache, even if there is plenty of space available. For example, you might want to clear a dataset after a job is finished to make room for the next one.
You can trigger a manual eviction using the CLI or the REST API.
Using the CLI
The job free
command allows you to manually trigger the eviction of all data within a specified UFS path or from a list of files. For a complete list of commands and flags, please refer to the CLI guide.
To free all cached data under the path s3://bucket/path
:
$ bin/alluxio job free --path s3://bucket/path --submit
You can monitor the progress of the free job:
$ {ALLUXIO_HOME}/bin/alluxio job free --path s3://alluxio/path --progress
Progress for Free path file 's3://alluxio/path':
Job Id: b21ce9fb-f332-4d39-8bb4-554f9a4fa601
Job Submitted: Fri Feb 02 21:28:56 CST 2024
Job path: s3://alluxio/path
Job State: SUCCEEDED, finished at Fri Feb 02 21:29:01 CST 2024
Free Info : totalFile:4 totalByte:3072.00KB
Free Files Failed: 0
Free Bytes Failed: 0B
Free Files Succeeded: 4
Free Bytes Succeeded: 3072.00KB
You can stop and interrupt a running free job. Note this will leave some files remaining in the cache.
$ {ALLUXIO_HOME}/bin/alluxio job free --path s3://alluxio/path --stop
This operation is useful for administrative tasks and for giving application owners direct control over the cache lifecycle of their data.
Using the REST API
You can also trigger and manage free jobs programmatically via the REST API. This is useful for integrating manual eviction into automated workflows. For more details, please refer to the REST API guide.
Stale Cache Cleaning
Stale cache refers to data that is present on a worker but no longer "belongs" there according to the cluster's consistent hash ring. This can happen in several situations:
Cluster Resizing: When you add or remove workers, the ownership of files can change. Data previously owned by one worker may now belong to another.
Replica Reduction: If you reduce a file's replication factor (e.g., from 3 to 2), one worker will be left with a now-redundant copy.
Temporary Worker Unavailability: If a worker goes offline and its duties are temporarily taken over by others, those other workers may hold stale data after the original worker rejoins the cluster.
This stale data consumes valuable cache space without providing any benefit, as it will not be served to clients.
How It Works
The stale cache cleaning operation instructs each worker to scan its local storage, verify ownership of every file against the current hash ring, and delete any data that it no longer owns.
This is a specialized administrative task designed to maintain cache hygiene in a dynamic cluster environment.
Usage
This feature is currently available via the REST API. To trigger a stale cache cleaning job across all workers, please refer to the REST API guide for endpoint details and examples.
A sample request looks like this:
curl -X POST <coordinator-host>:<coordinator-api-port>/api/v1/cache -d '{"op":"clear-stale"}' -H "Content-Type: application/json"
This submits an asynchronous job to each worker. You can monitor the progress by checking the worker logs or by observing the alluxio_cleared_stale_cached_data
Prometheus metric, which tracks the total bytes of stale data cleared on each worker.
Last updated