Managing Cache
Effectively managing the data cached in Alluxio is key to achieving maximum performance and resource efficiency. This section provides a comprehensive overview of the data caching lifecycle in Alluxio, from loading data into the cache to managing its presence and eventual removal.
The Data Cache Lifecycle
We can think of the life of cached data in three main phases:
Loading Data: How data gets into the cache in the first place.
Managing Data: How to control what is cached, how much space it uses, and for how long.
Removing Data: How data is evicted or deleted from the cache.
Loading Data into the Cache
This guide covers the two primary ways data is loaded into Alluxio.
Passive Caching: The default behavior where data is automatically cached the first time it is read by an application.
Active Preloading: Proactively loading data into the cache before it is needed using the distributed
job load
command. This is ideal for warming up the cache for performance-sensitive workloads.
Learn more about Loading Data...
Managing Data in the Cache
Once data is in the cache, Alluxio provides a powerful set of tools to control its lifecycle and resource consumption.
Cache Filter Policies: Define rules to selectively cache or ignore files based on their path. This is crucial for managing mutable data and optimizing cache space.
Cache Quotas: Set limits on the amount of cache space a specific directory tree can consume, which is essential for multi-tenancy and resource isolation.
Time-to-Live (TTL): Automatically expire and evict cached data after a defined period, ensuring that stale or temporary data is cleaned up.
Eviction Priority: Assign priorities to different datasets to influence which data is evicted first when the cache is full, protecting critical data from being removed.
Learn more about Managing Data...
Removing Data from the Cache
This guide details the different ways data is removed from the Alluxio cache, either automatically or manually.
Automatic Eviction: The standard process where Alluxio removes data based on policies like LRU (Least Recently Used) when the cache reaches its capacity.
Manual Eviction: Forcibly remove specific files or directories from the cache using the
job free
command, giving you direct control over cache contents.Stale Cache Cleaning: A specialized administrative tool to find and remove misplaced or redundant data that can occur after cluster topology changes.
Learn more about Removing Data...
Last updated