Alluxio
ProductsLanguageHome
AI-3.6 (stable)
AI-3.6 (stable)
  • Overview
    • Alluxio Namespace and Under File System
    • Worker Management and Consistent Hashing
    • Multi Tenancy and Unified Management
    • I/O Resiliency
  • Getting Started with K8s
    • Resource Prerequisites and Compatibility
    • Installation
      • Install on Kubernetes
      • Handling Images
      • Advanced Configuration
      • License
    • Monitoring and Metrics
    • Management Console
      • Deployment
      • Navigation
      • User Roles & Access Control
    • Cluster Administration
    • System Health Check & Quick Recovery
    • Diagnostic Snapshot
  • Storage Integrations
    • Amazon AWS S3
    • Google Cloud GCS
    • Azure Blob Store
    • Aliyun OSS
    • Tencent COS
    • Volcengine TOS
    • Baidu Object Storage
    • HDFS
    • Network Attached Storage (NAS)
  • Data Access
    • Access via FUSE (POSIX API)
      • Client Writeback
      • Client Virtual Path Mapping
    • Access via S3 API
    • Access via PythonSDK/FSSpec
    • Data Access High Availability
      • Multiple Replicas
      • Multiple Availability Zones (AZ)
    • Performance Optimizations
      • File Reading
      • File Writing
      • Metadata Listing
    • UFS Bandwidth Limiter
  • Cache Management
    • Cache Filter Policy
    • Cache Loading
    • Cache Eviction
      • Manual Eviction by Free Command
      • Auto Eviction by TTL Policy
      • Auto Eviction by Priority Policy
    • Stale Cache Cleaning
    • Cache Quota
  • Performance Benchmarks
    • Fio (POSIX) Benchmark
    • COSBench (S3) Benchmark
    • MLPerf Storage Benchmark
  • Security
    • TLS Support
  • Reference
    • User CLI
    • Metrics
    • S3 API Usage
    • Third Party Licenses
  • Release Notes
Powered by GitBook
On this page
  • When Stale Cache Occurs
  • Differences between Clearing Stale Cache and Free Job
  • Usage
  • Submit Task
  • Stop Task
  • Monitoring Task Progress
  • Via Logs
  • Via Prometheus Metrics
  1. Cache Management

Stale Cache Cleaning

In Alluxio, clients use consistent hashing to determine the appropriate worker to access or write a file. This ensures that each file is typically cached only on its designated worker. However, under certain conditions, a worker may end up caching data that no longer belongs to it. To reclaim memory and maintain optimal cache utilization, Alluxio provides a mechanism to clear stale cached data from workers.

This operation triggers each worker to scan its local cache, verify whether it still owns the cached data, and delete any data that no longer belongs to it.

When Stale Cache Occurs

Stale data may exist on a worker due to the following situations:

  1. Replica Reduction If the replication factor of a file is reduced (e.g., from 3 to 2), the third worker still holds a redundant replica that is no longer needed.

  2. Dynamic Hash Ring Membership Change When using dynamic hash ring, a worker may temporarily go offline and its responsibilities are taken over by other workers. If the original worker later rejoins, the other workers that were serving in its place may hold stale data.

  3. Cluster Expansion Adding new workers can change the ownership of cached files. Data previously cached on old workers may now be the responsibility of newly added workers.

To clean up such stale data, the clear stale cache operation can be manually triggered.

Differences between Clearing Stale Cache and Free Job

Clearing Stale Cache and Free Job are two mechanisms for cache cleanup in Alluxio, and they are often confused due to their similar purposes. The table below outlines their differences:

Aspect
Clearing Stale Cache
Free Job

Primary Use Case

Cleaning up stale or misplaced data after cluster changes

Releasing cache for data that is no longer needed by applications

Type of Cache Freed

Incorrect/invalid cache

Valid cache that is no longer needed

Target of Cleanup

Removes cached files that shouldn't reside on a worker

Removes files from workers based on the input specification

Interface

RESTful API

RESTful API & CLI

Input Parameters

None

Requires a directory path or an index file as input

Scheduling Mechanism

Immediately executes on all workers

Relies on the job system for scheduling

Usage

This feature is currently accessible via RESTful API only.

Submit Task

The following API triggers the clear stale cache task to be asynchronously executed across all workers:

curl -X POST <coordinator-host>:<coordinator-api-port>/api/v1/cache -d '{"op":"clear-stale"}' -H "Content-Type: application/json"

This command submits a background job to all workers. Submitting the same request multiple times will not cause duplicate executions.

Example Response:

{
  "errors": {
    "worker1-host": "Connection refused",
    "worker2-host": "Timeout"
  }
}

An empty errors object indicates successful job submission to all workers. Otherwise, the errors field will be a mapping of the hostname of the workers where an error occurred, and the error message. An error occurs if the job failed to be submitted to a worker due to network connection failure, or a job submitted earlier has not finished running.

Stop Task

To cancel the task (if needed), send a DELETE request with the same op:

curl -X DELETE <coordinator-host>:<coordinator-api-port>/api/v1/cache -d '{"op":"clear-stale"}' -H "Content-Type: application/json"

This request will stop the background task on all workers. If no such task is running, the command will still succeed without error.

Monitoring Task Progress

There is currently no RPC to track the progress of the clear stale cache job. However, you can monitor its progress in the following ways:

Via Logs

When the task completes on a worker, the following log will appear:

2025-04-21T19:51:22,889 INFO  AsyncJobWorker - Clear stale cached files finished. 104857600 bytes released

This log message indicates the job completion and the amount of stale data removed.

Via Prometheus Metrics

Alluxio exposes a metric to track stale cache clearance:

alluxio_cleared_stale_cached_data

This metric accumulates the total number of bytes cleared by the clear stale cache operation on a worker. At the completion of the job, the aggregated sum of this metric across all workers will plateau. You can use this metric to monitor and alert on cache cleanup trends across your cluster.

Last updated 8 hours ago