Access Log
1. Overview
To better support enterprise-level data governance and data access monitoring requirements while significantly reducing log storage and processing pressure in high-concurrency scenarios, the system introduces an Aggregated Access Log mechanism.
It is particularly emphasized that the Access Log takes effect on the Worker process. Its core role is to record key operational trajectories of files cached in the Worker throughout their lifecycle.
2. Key Operations Recorded
The Access Log focuses on capturing the following five key operations during a file's cache lifecycle:
LOAD: File data blocks are loaded into the Alluxio cache for the first time.
HOT_READ: Data is read from the Alluxio cache (cache hit), verifying the cache acceleration effect and identifying the data as "hot."
COLD_READ: Data is read from the underlying UFS (cache miss), used to analyze the cold-to-hot transition process of data.
EVICT: Data is evicted from the Alluxio cache due to strategies such as insufficient cache space.
DELETE: Cache deletion or release events actively initiated by users.
3. Deduplication Mechanism and False Positives
To keep the log content concise, the system does not output every single operation in a streaming fashion. Instead, it uses a Bloom Filter for log deduplication. This means the Access Log will not record duplicate operation entries, significantly reducing log noise, performance overhead, and storage costs.
It should be noted that due to the inherent characteristics of the Bloom Filter algorithm, the system may experience "False Positive" phenomena, resulting in a very small number of entries that should have been recorded being treated as duplicates and thus omitted.
4. Peak Throughput and Configuration Standards
To maintain efficient recording even under extremely high concurrency, the system's current configuration standard is: under peak throughput, it can process and generate 432 million operation entries within 3 hours, while strictly controlling the false positive rate at 0.0001. This ensures that in massive data access scenarios, the log volume is effectively compressed while maintaining high reliability for core access monitoring data.
5. Usage
The access log is enabled and tuned through the same REST API and User CLI used by the audit log. See Dynamic Configuration via REST API and CLI.
Last updated