Cache Evicting by Priority

Alluxio supports assigning different eviction priorities to different files. This document describes the use cases, design and operations of priority eviction.

Use cases

By default, Alluxio uses a LRU-based evictor. The LRU-based evictor selects candidate pages using knowledge gained by observing data access patterns over time to determine the least "needed" page. However, sometimes the admin knows beforehand that some files, although less likely to be accessed than other files from the perspective of a LRU evictor, are more important because a cache miss on these files would cause a greater impact than other files. Therefore, the admin wants the cached data to be retained in the cache for a longer time, regardless of the evictors' decision to evict these files. For example, the admin is going to load a dataset into Alluxio for an important compute job, and wants to make sure that the loaded files are kept in the cache for as long as it takes the compute job to consume the data. During this period, no files in the dataset should be evicted by other concurrent, less important jobs. Such preference can be set by assigning different eviction priorities to the files.

Design overview

If the priority evictor is enabled, and the admin has assigned priorities for different files, the priority evictor ensures that files with a higher priority won't be evicted until there are no files with lower priorities being cached in the page store. For files with the same priority, the evictor behaves exactly the same as a LRU or LFU evictor, as configured by the admin. When there is need to evict some pages for a new page, but the new page belongs to a file with lower priority than any other files currently cached, then the eviction will not happen and the new page will not be cached.

Note that the workers independently evaluate the priority rules without the knowledge of the cache status on the other workers in a cluster. In other words, a worker may start evicting HIGH priority pages since no lower priority pages are cached in its own page store, while in another worker there can still be LOW priority pages.

Priority rules

The admin can define priority rules to tell Alluxio what files are assigned what priorities. A priority rule consists of a UFS path prefix and a priority level. There are 3 priority levels available, namely HIGH, MEDIUM, and LOW. The UFS path prefix decides which files the rule applies to. The UFS path prefix can be a path denoting a particular file in the UFS, in which case the rule will only apply to this file. It can also be a path denoting a directory in the UFS, then the rule will apply to any file under the directory tree.

Multiple rules can be defined at the same time. The user-defined rules, along with a default rule that matches any file and has a LOW priority level, forms the rule set. Rules may be nested in a rule set, e.g. two rules with path prefixes s3://bucket/data and s3://bucket/data/dir may exist at the same time. When resolving the priority level, the most specific rule wins. For example, given the following rule set:

/a -> HIGH
/a/b -> MEDIUM
/a/b/c -> LOW
(default) -> LOW

For file /a/file, only rule 1 matches, so its priority is HIGH. For file /a/b/file, both rules /a and /a/b matches, but /a/b is a more specific match, so its priority is MEDIUM. For file /d/file, none of the user-defined rules match, so the default rule takes effect and its priority is LOW.

Enabling priority eviction

To enable priority eviction, add the following configurations:

alluxio.worker.page.store.evictor.priority.enabled=true

This configuration needs to be applied to all client and worker nodes.

Since the priority rules are persisted in etcd, the admin needs to make sure the connection details of etcd have been properly configured:

alluxio.etcd.endpoints=http://etcd-host:2379/

Configure authentication, secure connection, etc., of etcd as needed.

After making updates to the configuration, restart all worker nodes.

Adding and updating priority rules

Use the following command to add a new priority rule:

$ bin/alluxio priority add --path s3://bucket/data --priority high

where --path specifies the path prefix of the rule, and --priority defines the priority level.

List the existing rules with:

$ bin/alluxio priority list

Delete an existing rule with:

$ bin/alluxio priority remove --path s3://bucket/data

Note that simply removing a priority rule will not cause the files affected by the rule to be evicted from the cache.

Update the priority level of an existing rule with:

$ bin/alluxio priority update --path s3://bucket/data --priority medium

When the admin makes update to the priority rule set, the workers will be notified to update their internal data structures according to the updated rule set. Since it takes some time to update the internal data structures, users may see increased I/O delay on the workers when such processing is in progress.

For complete description of the commands, please refer to the CLI reference.

Monitoring status of the priority evictor

A metric alluxio_cached_storage_by_priority_bytes is available to provide information about the amount of cached data for different priority levels:

$ curl "http://localhost:30000/metrics/"
(snip)
# HELP alluxio_cached_storage_by_priority_bytes amount of the cached data
# TYPE alluxio_cached_storage_by_priority_bytes gauge
alluxio_cached_storage_by_priority_bytes{priority="HIGH"} 2.147483648E9
alluxio_cached_storage_by_priority_bytes{priority="LOW"} 1048576.0
alluxio_cached_storage_by_priority_bytes{priority="MEDIUM"} 7.79091968E8
# HELP alluxio_cached_storage_bytes amount of the cached data
# TYPE alluxio_cached_storage_bytes gauge
alluxio_cached_storage_bytes 2.927624192E9
(snip)

Last updated