Removing Data from the Cache

While Alluxio's automatic eviction policies (like LRU, TTL, and quotas) handle most cache cleanup, there are times when you need to manually or systematically remove data. This guide covers three primary mechanisms for removing data from the Alluxio cache:

  1. Automatic Eviction: The standard process driven by cache capacity and policies.

  2. Manual Eviction: Forcibly removing specific files or directories using a job.

  3. Stale Cache Cleaning: A specialized cleanup for removing misplaced data after cluster changes.

Automatic Eviction

Automatic eviction is the most common way data is removed from the cache. It is triggered when a worker's cache storage reaches its capacity and needs to make room for new data. The eviction process is governed by policies you define.

Note: In addition to the capacity-based eviction described here, Alluxio offers advanced eviction strategies like Time-to-Live (TTL) and Cache Priority. These policies allow you to remove data based on its age or importance, giving you more granular control over the cache. You can learn more in the Managing Data in the Cache guide.

How It Works

When a worker needs to write a new page of data but its cache is full, it runs an evictor to select and remove existing pages. The primary eviction strategies are:

  • LRU (Least Recently Used): The default policy. It evicts the data that hasn't been accessed for the longest time.

  • LFU (Least Frequently Used): Evicts the data that has been accessed the fewest times.

  • FIFO (First-In, First-Out): Evicts the data that was the first to be written.

You can configure the eviction policy in alluxio-site.properties:

# Sets the eviction policy for the worker cache
alluxio.worker.page.store.evictor.type=LRU

Asynchronous Eviction

To avoid performance degradation caused by synchronous eviction during a write operation, Alluxio also performs background asynchronous eviction. This process periodically checks cache usage and proactively evicts data to maintain free space.

You can configure high and low "watermarks" to control this process:

This setup ensures that there is almost always space available for new writes, minimizing latency.

Manual Eviction: The free Job

Sometimes you need to explicitly remove data from the cache, even if there is plenty of space available. For example, you might want to clear a dataset after a job is finished to make room for the next one.

You can trigger a manual eviction using the CLI or the REST API.

Using the CLI

The job free command allows you to manually trigger the eviction of all data within a specified UFS path or from a list of files. For a complete list of commands and flags, please refer to the CLI guide.

To free all cached data under the path s3://bucket/path:

You can monitor the progress of the free job:

You can stop and interrupt a running free job. Note this will leave some files remaining in the cache.

This operation is useful for administrative tasks and for giving application owners direct control over the cache lifecycle of their data.

Using the REST API

You can also trigger and manage free jobs programmatically via the REST API. This is useful for integrating manual eviction into automated workflows. For more details, please refer to the REST API guide.

Stale Cache Cleaning

Changes in the cluster topology or data lifecycle can sometimes leave "stale" data on workers. This section describes how to clean up these stale entries to reclaim space.

Cleaning Stale Data

Stale data refers to file content cached on a worker that no longer "belongs" to that worker according to the cluster's consistent hash ring. This can happen in several situations:

  • Cluster Resizing: When you add or remove workers, the ownership of files changes. Data previously owned by one worker may now belong to another.

  • Replica Reduction: If you reduce a file's replication factor (e.g., from 3 to 2), a worker might retain a copy that is no longer needed.

  • Temporary Worker Unavailability: If a worker goes offline and its workload is temporarily reassigned to other workers, those workers may retain cached data they no longer own after the original worker returns.

This stale data consumes cache space without being useful, as it will not be served to clients.

How It Works

The stale data cleaning operation instructs each worker to scan its local storage, verify the ownership of every block against the current hash ring, and delete any data it no longer owns.

Usage

This operation is available via the REST API. To trigger a stale cache cleaning job across all workers:

This submits an asynchronous job to each worker. You can monitor the progress by checking worker logs or the alluxio_cleared_stale_cached_data Prometheus metric. For more details, refer to the REST API documentation.

Known Limitation

Mass file deletion triggers metadata I/O pressure that may block kernel flushing, causing delays in Worker Pod termination.

Last updated