Cache Filtering
Cache Filter Overview
There can be cases where the total size of the dataset exceeds the total disk space allocated for the Alluxio cache. To address this, Alluxio provides the cache filter feature that allows us to cache only hot data. By configuring the cache filter, users can specify to cache certain files based on the file path.
Configuration
To enable the cache filter feature, add the following configurations to the alluxio-site.properties
file:
alluxio.user.client.cache.filter.config.file
is used for specifying the path of the JSON file that is to set the rules for filteringalluxio.user.client.cache.filter.config.check.interval
is the interval for checking the JSON file
These two properties allow the filter rules to be dynamically updated on a running cluster.
Setting rules of filtering in the JSON file
There are three different types of rules for filtering in Alluxio:
Immutable: assume that the file on UFS will never change; these files are cached and never evicted
Skip Cache: assume that the file on UFS changes frequently; these files are never cached and always read from the UFS
Max Age: assume that a file on the UFS will change within a specified time interval; these files are cached but set to expire after a certain time interval
To illustrate how to set the rules, an example JSON file as shown below.
There are two parts in this JSON file. One is for metadata, the other is for data. Let's take the data part as an example.
A singleton list with the regex pattern
.*/immutable_tables/.*
is set for theimmutable
key, so these files will never be evicted.A singleton list with the regex pattern
.*/skip_cache_tables/.*
is set for theskipCache
key, so these files will never be cached.A single entry map
{".*/mutable_tables/.*":"10s"}
is set for themaxAge
key. Files matching the.*/mutable_tables/.*
regex pattern will be cached and expired after10s
.
The metadata part is set in an identical way, so the metadata for the same files will have the same caching behavior. The two types are separately defined to allow different behaviors for each.
Finally, we can set the default type for the file that doesn't match any configured regex patterns. In the example, the defaultType
is immutable
.
Last updated