# Cache Filtering

## Cache Filter Overview

There can be cases where the total size of the dataset exceeds the total disk space allocated for the Alluxio cache. To address this, Alluxio provides the cache filter feature that allows us to cache only hot data. By configuring the cache filter, users can specify to cache certain files based on the file path.

## Configuration

To enable the cache filter feature, add the following configurations to the `alluxio-site.properties` file:

```properties
alluxio.user.client.cache.filter.enabled=true
alluxio.user.client.cache.filter.class=alluxio.client.file.cache.filter.RuleSetBasedCacheFilter
alluxio.user.client.cache.filter.type=RULE_SET
alluxio.user.client.cache.filter.config.file=${ALLUXIO_HOME}/conf/cache_filter.json
alluxio.user.client.cache.filter.config.check.interval=5min
```

* `alluxio.user.client.cache.filter.config.file` is used for specifying the path of the JSON file that is to set the rules for filtering
* `alluxio.user.client.cache.filter.config.check.interval` is the interval for checking the JSON file

These two properties allow the filter rules to be dynamically updated on a running cluster.

### Setting rules of filtering in the JSON file

There are three different types of rules for filtering in Alluxio:

* Immutable: assume that the file on UFS will never change; these files are cached and never evicted
* Skip Cache: assume that the file on UFS changes frequently; these files are never cached and always read from the UFS
* Max Age: assume that a file on the UFS will change within a specified time interval; these files are cached but set to expire after a certain time interval

To illustrate how to set the rules, an example JSON file as shown below.

```json
{
  "apiVersion": 1,
  "data": {
    "immutable": [".*/immutable_tables/.*"],
    "skipCache": [".*/skip_cache_tables/.*"],
    "maxAge": {".*/mutable_tables/.*":"10s"},
    "defaultType": "immutable"
  },
  "metadata": {
    "immutable": [".*/immutable_tables/.*"],
    "skipCache": [".*/skip_cache_tables/.*"],
    "maxAge": {".*/mutable_tables/.*":"10s"},
    "defaultType": "immutable"
  }
}
```

There are two parts in this JSON file. One is for **metadata**, the other is for **data**. Let's take the data part as an example.

* A singleton list with the regex pattern `.*/immutable_tables/.*` is set for the `immutable` key, so these files will never be evicted.
* A singleton list with the regex pattern `.*/skip_cache_tables/.*` is set for the `skipCache` key, so these files will never be cached.
* A single entry map `{".*/mutable_tables/.*":"10s"}` is set for the `maxAge` key. Files matching the `.*/mutable_tables/.*` regex pattern will be cached and expired after `10s`.

The **metadata** part is set in an identical way, so the metadata for the same files will have the same caching behavior. The two types are separately defined to allow different behaviors for each.

Finally, we can set the default type for the file that doesn't match any configured regex patterns. In the example, the `defaultType` is `immutable`.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://documentation.alluxio.io/ee-ai-en/ai-3.2/feature/cache-filter.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
