# Metrics

## Alluxio Metrics

### Cache Storage

| Metric                               | Labels        | Type    | Component   | Description                                                        |
| ------------------------------------ | ------------- | ------- | ----------- | ------------------------------------------------------------------ |
| `alluxio_cached_storage`             | `type`        | gauge   | worker      | amount of the cached data                                          |
| `alluxio_cached_capacity`            | `type`        | gauge   | worker      | configured maximum cache storage                                   |
| `alluxio_data_cached_pages`          | -             | gauge   | worker      | number of pages being cached in the page store                     |
| `alluxio_data_cached_files`          | -             | gauge   | worker      | number of cached files, including fully and partially cached files |
| `alluxio_cached_storage_by_priority` | `priority`    | gauge   | worker      | amount of the cached data                                          |
| `alluxio_eviction_by_ttl`            | `policy_path` | counter | worker      | Total number of bytes evicted from Alluxio workers by TTL policy.  |
| `alluxio_quota_size_used`            | `dir`         | gauge   | coordinator | Bytes used in the given quota scope.                               |
| `alluxio_quota_size_capacity`        | `dir`         | gauge   | coordinator | Capacity of the given quota scope as defined in the quota rules.   |
| `alluxio_eviction_by_quota`          | `dir`         | counter | worker      | Total number of bytes evicted from Alluxio workers by quota rules. |

### Cache Access

| Metric                                         | Labels                         | Type      | Component | Description                                                                                                                                                                                                                                           |
| ---------------------------------------------- | ------------------------------ | --------- | --------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `alluxio_data_access`                          | `method`                       | histogram | worker    | aggregated all data access requests                                                                                                                                                                                                                   |
| `alluxio_data_throughput`                      | `dir`, `method`, `destination` | counter   | worker    | counter of data throughput of all data access                                                                                                                                                                                                         |
| `alluxio_meta_operation`                       | `op`                           | counter   | worker    | counter of rpc calls of the meta operations                                                                                                                                                                                                           |
| `alluxio_meta_operation_latency_ms`            | `op`, `state`                  | histogram | worker    | latency of rpc calls of the meta operations                                                                                                                                                                                                           |
| `alluxio_meta_operation_errors`                | `op`                           | counter   | worker    | counter of errors during handling of rpc calls of the meta operations                                                                                                                                                                                 |
| `alluxio_ufs_error`                            | `ufs_type`, `error_code`       | counter   | worker    | counter of the rpc calls                                                                                                                                                                                                                              |
| `alluxio_ufs_latency_ms`                       | `method`, `ufs_type`           | histogram | worker    | Histogram of ufs call latency                                                                                                                                                                                                                         |
| `alluxio_ufs_client_latency_ms`                | `method`, `ufs_type`, `state`  | histogram | worker    | Histogram of ufs client api call latency                                                                                                                                                                                                              |
| `alluxio_ufs_client_call_processing`           | `method`, `ufs_type`           | gauge     | worker    | Gauge of the ufs client calls that are being processed                                                                                                                                                                                                |
| `alluxio_ufs_data_access`                      | `dir`, `method`                | counter   | worker    | amount of the ufs access                                                                                                                                                                                                                              |
| `alluxio_ufs_fallback`                         | `method`                       | counter   | worker    | amount of the ufs fallback                                                                                                                                                                                                                            |
| `alluxio_cached_data_read`                     | `dir`, `is_pinned`             | counter   | worker    | amount of data that, when read, was present in and served from the Alluxio cache                                                                                                                                                                      |
| `alluxio_missed_data_read`                     | `dir`, `is_pinned`             | counter   | worker    | amount of data that, when read, was absent from the Alluxio cache                                                                                                                                                                                     |
| `alluxio_cache_hit_calls`                      | -                              | counter   | worker    | number of cache hits in page store                                                                                                                                                                                                                    |
| `alluxio_cache_miss_calls`                     | -                              | counter   | worker    | number of cache misses in page store                                                                                                                                                                                                                  |
| `alluxio_external_data_read`                   | `dir`                          | counter   |           | amount of the read data when cache missed on client                                                                                                                                                                                                   |
| `alluxio_cleared_stale_cached_data`            | -                              | counter   | worker    | amount of cleared stale cached data                                                                                                                                                                                                                   |
| `alluxio_cached_evicted_data`                  | -                              | counter   | worker    | amount of the evicted data                                                                                                                                                                                                                            |
| `alluxio_cached_async_evicted_data`            | -                              | counter   | worker    | amount of the async evicted data                                                                                                                                                                                                                      |
| `alluxio_trigger_async_evicted_total`          | -                              | counter   | worker    | counter of times asynchronous eviction is triggered                                                                                                                                                                                                   |
| `alluxio_page_store_operation_errors`          | `op`, `cause`                  | counter   | worker    | counter of failures in page store operations                                                                                                                                                                                                          |
| `alluxio_page_store_dir_operation_errors`      | `dir`                          | counter   | worker    | counter of failures in specific page store directory                                                                                                                                                                                                  |
| `alluxio_page_store_dir_operations`            | `dir`                          | counter   | worker    | operation counter in specific page store directory                                                                                                                                                                                                    |
| `alluxio_page_store_io_latency_microseconds`   | `dir`, `op`, `success`         | histogram | worker    | latency of IO operations in page store                                                                                                                                                                                                                |
| `alluxio_metadata_cache_hit_calls`             | `type`                         | counter   | worker    | counter of metadata retrieval calls that resulted in a cache hit                                                                                                                                                                                      |
| `alluxio_external_file_metadata_request_calls` | -                              | counter   | worker    | counter of file metadata retrieval calls that are fetched from UFS, usually as a result of a cache miss in the file metadata cache                                                                                                                    |
| `alluxio_metadata_cache_miss_calls`            | `type`                         | counter   | worker    | counter of metadata retrieval calls that resulted in a cache miss. Note that this is different from alluxio\_external\_file\_metadata\_request\_calls in that a cache miss for a file does not always result in a request to an external data source. |
| `alluxio_passive_cache_async_loaded_files`     | `result`                       | counter   | worker    | number of async loaded files when passive cache is enabled                                                                                                                                                                                            |
| `alluxio_page_store_device_total_capacity`     | `dir`                          | gauge     | worker    | the total capacity of the physical storage device where the page store directory resides                                                                                                                                                              |
| `alluxio_page_store_device_available_capacity` | `dir`                          | gauge     | worker    | the available capacity of the physical storage device where the page store directory resides                                                                                                                                                          |
| `alluxio_metastore_storage_size`               | `dir`                          | gauge     | worker    | Total logical size of files in the metastore RocksDB directory.                                                                                                                                                                                       |
| `alluxio_metastore_disk_capacity`              | `dir`                          | gauge     | worker    | The capacity of the disk where the metastore rocksdb is located.                                                                                                                                                                                      |
| `alluxio_netty_data_ingress`                   | -                              | counter   | worker    | number of ingress bytes from clients to worker, excluding TLS                                                                                                                                                                                         |
| `alluxio_netty_data_egress`                    | -                              | counter   | worker    | number of egress bytes from worker to clients, excluding TLS                                                                                                                                                                                          |
| `alluxio_worker_thread_pool_rejections`        | `dir`                          | counter   | worker    | counter of rejections in worker thread pool                                                                                                                                                                                                           |
| `alluxio_rpc_executor_current_queue_length`    | `executor_name`                | gauge     | worker    | number of RPC requests currently being processed and pending processing                                                                                                                                                                               |
| `alluxio_rpc_executor_active_threads`          | `executor_name`                | gauge     | worker    | number of threads that are actively executing RPCs                                                                                                                                                                                                    |
| `alluxio_rpc_executor_current_threads`         | `executor_name`                | gauge     | worker    | number of threads for executing RPCs, both occupied and idle                                                                                                                                                                                          |
| `alluxio_rpc_executor_max_threads`             | `executor_name`                | gauge     | worker    | maximum number of threads for executing RPCs                                                                                                                                                                                                          |

### S3 API

| Metric                            | Labels                       | Type      | Component | Description                                            |
| --------------------------------- | ---------------------------- | --------- | --------- | ------------------------------------------------------ |
| `alluxio_s3_api_throughput`       | `method`                     | histogram | worker    | histogram of S3 API throughput                         |
| `alluxio_s3_api_call_latency_ms`  | `method`, `state`            | histogram | worker    | latency of S3 API calls                                |
| `alluxio_s3_api_call_processing`  | `method`                     | gauge     | worker    | counter of the S3 API calls that are being processed   |
| `alluxio_s3_authn_latency_ms`     | `result`, `reason`           | histogram | worker    | Latency of S3 authentication.                          |
| `alluxio_s3_authz_latency_ms`     | `result`, `method`, `reason` | histogram | worker    | Latency of S3 authorization.                           |
| `alluxio_sts_api_call_processing` | `method`                     | gauge     | worker    | Counter of the sts API calls that are being processed. |
| `alluxio_sts_requests_total`      | `result`, `reason`           | counter   | worker    | Total number of STS requests.                          |

### FUSE

| Metric                               | Labels                 | Type      | Component | Description                                                                                                                          |
| ------------------------------------ | ---------------------- | --------- | --------- | ------------------------------------------------------------------------------------------------------------------------------------ |
| `alluxio_fuse_concurrency`           | `method`               | gauge     | fuse      | record the realtime concurrency for fuse method                                                                                      |
| `alluxio_fuse_call_latency_ms`       | `method`, `state`      | histogram | fuse      | latency of fuse operations                                                                                                           |
| `alluxio_fuse_result`                | `method`, `state`      | counter   | fuse      | counter of fuse operation results                                                                                                    |
| `alluxio_fuse_path_cache_hits_bytes` | -                      | counter   | fuse      | counter of fuse path cache hits                                                                                                      |
| `alluxio_fuse_path_cache_misses`     | -                      | counter   | fuse      | counter of fuse path cache misses                                                                                                    |
| `alluxio_fuse_buffer_size`           | `method`, `sequential` | histogram | fuse      | Record sequential or random read/write and its buffer size.                                                                          |
| `alluxio_fuse_block_size`            | `method`               | histogram | fuse      | Record the block size during random reads/writes. 'Block size' can be understood as the 'bs' parameter specified during fio testing. |
| `alluxio_fuse_open_files`            | -                      | gauge     | fuse      | The number of fuse open files.                                                                                                       |

### Client SDK

| Metric                                                      | Labels                                      | Type      | Component                 | Description                                                                                                                                        |
| ----------------------------------------------------------- | ------------------------------------------- | --------- | ------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------- |
| `alluxio_grpc_client_call_latency_ms`                       | `method`, `instance`, `state`               | histogram | worker, coordinator, fuse | latency of gRPC calls from the client                                                                                                              |
| `alluxio_grpc_client_concurrency`                           | `method`, `instance`                        | gauge     | worker, coordinator, fuse | concurrency of gRPC calls from the client                                                                                                          |
| `alluxio_grpc_client_errors`                                | `method`, `status_code`, `instance`         | counter   | worker, coordinator, fuse | total number of gRPC errors from the client                                                                                                        |
| `alluxio_grpc_client_successes`                             | `method`, `instance`                        | counter   | worker, coordinator, fuse | total number of successful gRPC calls from the client                                                                                              |
| `alluxio_netty_operations`                                  | `op`                                        | counter   | worker, coordinator, fuse | number of netty operations (e.g. read and write requests)                                                                                          |
| `alluxio_netty_operation_errors`                            | `op`, `reason`, `instance`                  | counter   | worker, coordinator, fuse | total number of Netty operation errors from the client                                                                                             |
| `alluxio_read_from_workers`                                 | `instance`                                  | counter   | worker, coordinator, fuse | total number of client read bytes from worker                                                                                                      |
| `alluxio_async_prefetch_cache_bytes`                        | `instance`                                  | counter   | worker, coordinator, fuse | total number of bytes that client async prefetch data to local                                                                                     |
| `alluxio_async_prefetch_hit_cache_bytes`                    | `instance`                                  | counter   | worker, coordinator, fuse | total number of bytes that client hit cache from async prefetch cache                                                                              |
| `alluxio_async_prefetch_random_read_requests`               | `instance`                                  | counter   | worker, coordinator, fuse | total number of client random read recorded by async prefetch                                                                                      |
| `alluxio_multi_replica_read_from_workers`                   | `cluster_name`, `local_cluster`, `hot_read` | counter   | worker                    | number of bytes read by a client from Alluxio workers when reading multi-replica files                                                             |
| `alluxio_rpc_retry_on_different_workers`                    | `op`, `retry_count`                         | counter   | worker                    | counter of client retry on different workers if multi replica is enabled.                                                                          |
| `alluxio_rpc_position_reader_read_calls`                    | `component`                                 | counter   | worker                    | counter of client position reader read success.                                                                                                    |
| `alluxio_rpc_position_reader_data_read`                     | `component`                                 | counter   | worker                    | counter of bytes read by client position reader.                                                                                                   |
| `alluxio_rpc_position_reader_read_failed_total`             | `component`, `final_attempt`                | counter   | worker                    | counter of client position reader read failure.                                                                                                    |
| `alluxio_client_netty_read_time_to_receive_first_packet_ms` | -                                           | histogram | fuse                      | latency between when the client sends a read request to the worker, and when the worker sends the first packet of the response back to the client. |

### Job Service

| Metric                                          | Labels                              | Type    | Component   | Description                                                               |
| ----------------------------------------------- | ----------------------------------- | ------- | ----------- | ------------------------------------------------------------------------- |
| `alluxio_completed_job`                         | `type`, `state`                     | counter | coordinator | counter of the jobs.                                                      |
| `alluxio_job_process_file`                      | `type`, `state`                     | counter | coordinator | counter of the files.                                                     |
| `alluxio_job_process_file_size`                 | `type`, `state`                     | counter | coordinator | cumulative size of the files that are processed by job service.           |
| `alluxio_active_job_count`                      | `type`                              | gauge   | coordinator | counter of the jobs in scheduler. the value of type is running or waiting |
| `alluxio_distributed_load_job_dispatched_size`  | -                                   | counter | coordinator | counter of the bytes dispatched in distributed load                       |
| `alluxio_distributed_load_job_failure`          | `reason`, `final_attempt`, `worker` | counter | coordinator | counter of the distributed load failure                                   |
| `alluxio_distributed_load_job_loaded_bytes`     | -                                   | counter | coordinator | counter of the bytes loaded in distributed load                           |
| `alluxio_distributed_load_job_processed`        | -                                   | counter | coordinator | counter of the non empty file copies loaded in distributed load           |
| `alluxio_distributed_load_job_scanned`          | -                                   | counter | coordinator | counter of the inodes scanned in distributed load                         |
| `alluxio_distributed_load_job_skipped`          | -                                   | counter | coordinator | counter of the inodes skipped in distributed load                         |
| `alluxio_distributed_load_data_loaded`          | -                                   | counter | worker      | counter of the bytes loaded by a worker in distributed load               |
| `alluxio_distributed_load_data_loaded_from_ufs` | -                                   | counter | worker      | counter of the bytes loaded by a worker from ufs in distributed load      |
| `alluxio_worker_job_task_count`                 | -                                   | gauge   | coordinator | Number of tasks currently executed by each worker.                        |

### Write Cache

| Metric                                                   | Labels                     | Type      | Component | Description                                                  |
| -------------------------------------------------------- | -------------------------- | --------- | --------- | ------------------------------------------------------------ |
| `alluxio_write_buffer_write_status`                      | `status`                   | counter   | worker    | the status of write buffer writes                            |
| `alluxio_write_buffer_worker_failure`                    | `worker`                   | counter   | worker    | the failure count of writes to workers                       |
| `alluxio_write_buffer_worker_bytes_written`              | `worker`                   | counter   | worker    | the bytes written to workers                                 |
| `alluxio_write_buffer_unique_bytes_written`              | -                          | counter   | worker    | the unique bytes written by the client                       |
| `alluxio_write_buffer_foundationdb_call_latency_ms`      | `method`, `state`          | histogram | worker    | the latency of FoundationDB calls                            |
| `alluxio_write_buffer_persist_tasks`                     | `status`                   | counter   | worker    | the number of persist tasks                                  |
| `alluxio_write_buffer_transition_worker`                 | `worker`                   | counter   | worker    | the number of worker transitions                             |
| `alluxio_write_buffer_async_persist_throughput`          | -                          | counter   | worker    | the throughput of async persist                              |
| `alluxio_write_buffer_async_file_checker_abnormal_files` | -                          | counter   | worker    | the number of abnormal files found by the async file checker |
| `alluxio_dual_buffer_file_system_requests`               | `operation`, `buffer_type` | counter   | worker    | counter of requests to dual buffer file system.              |

### Authorization

| Metric                                       | Labels | Type    | Component    | Description                                                             |
| -------------------------------------------- | ------ | ------- | ------------ | ----------------------------------------------------------------------- |
| `alluxio_auth_permission_check_total`        | -      | counter | worker, fuse | The total number of authorization permission checks.                    |
| `alluxio_auth_permission_check_cache_misses` | -      | counter | worker, fuse | The total number of misses in the authorization permission check cache. |

### Cluster & Process

| Metric                                                         | Labels                                             | Type      | Component                 | Description                                                                      |
| -------------------------------------------------------------- | -------------------------------------------------- | --------- | ------------------------- | -------------------------------------------------------------------------------- |
| `alluxio_version`                                              | `version`                                          | gauge     | worker, coordinator, fuse | Alluxio component version information                                            |
| `alluxio_license_expiration_date`                              | -                                                  | gauge     | coordinator               | the license expiration date in epoch time format                                 |
| `alluxio_cumulative_unavailable_workers`                       | `worker_addr`                                      | counter   | worker, coordinator       | number of cumulative occurrences of unavailable workers encountered by a client. |
| `alluxio_unavailable_worker_probe_attempts`                    | `worker_addr`, `attempt`                           | counter   | worker, coordinator       | number of liveness probing attempts for an unavailable workers                   |
| `alluxio_worker_membership_refresh_count`                      | -                                                  | counter   | worker                    | total number of worker membership refreshes                                      |
| `alluxio_dynamic_resource_pool_current_resources`              | `pool_name`                                        | gauge     | worker                    | current number of resources in the dynamic resource pool                         |
| `alluxio_dynamic_resource_pool_capacity`                       | `pool_name`                                        | gauge     | worker                    | capacity of the dynamic resource pool                                            |
| `alluxio_dynamic_resource_pool_acquisition_timeouts`           | `pool_name`                                        | counter   | worker                    | total number of acquisition timeouts in the dynamic resource pool                |
| `alluxio_dynamic_resource_pool_create_new_resource_latency_ms` | `pool_name`                                        | histogram | worker                    | latency of creating a new resource in the dynamic resource pool                  |
| `alluxio_etcd_call_errors`                                     | `type`                                             | counter   | coordinator, worker, fuse | total number of etcd call errors                                                 |
| `alluxio_etcd_client_calls`                                    | `type`                                             | counter   | coordinator, worker, fuse | total number of etcd client calls                                                |
| `alluxio_etcd_client_call_latency_ms`                          | `type`                                             | histogram | coordinator, worker, fuse | latency of etcd client calls                                                     |
| `alluxio_netty_direct_memory_usage`                            | -                                                  | gauge     | worker                    | direct memory usage of Netty                                                     |
| `alluxio_rocksdb_memory_usage`                                 | -                                                  | gauge     | worker                    | memory usage of RocksDB                                                          |
| `process_start_time_seconds`                                   | -                                                  | gauge     | coordinator, worker, fuse | start time of the process since unix epoch in seconds                            |
| `process_cpu_seconds_total`                                    | -                                                  | counter   | coordinator, worker, fuse | total user and system CPU time spent in seconds                                  |
| `jvm_threads_current`                                          | -                                                  | gauge     | coordinator, worker, fuse | current thread count of a JVM                                                    |
| `jvm_memory_used_bytes`                                        | `area=heap/nonheap`                                | gauge     | coordinator, worker, fuse | used bytes of a given JVM memory area                                            |
| `jvm_memory_max_bytes`                                         | `area=heap/nonheap`                                | gauge     | coordinator, worker, fuse | max (bytes) of a given JVM memory area                                           |
| `jvm_gc_collection_seconds`                                    | `gc="G1 Young Generation"/"G1 Old Generation"/...` | summary   | coordinator, worker, fuse | time spent in a given JVM garbage collector in seconds                           |
| `jvm_buffer_pool_used_bytes`                                   | `pool=direct/mapped`                               | gauge     | coordinator, worker, fuse | used bytes of a given JVM buffer pool                                            |
