# 监控指标

## Alluxio 监控指标

### 缓存存储

| 指标                                   | 标签            | 类型      | 组件          | 描述                                 |
| ------------------------------------ | ------------- | ------- | ----------- | ---------------------------------- |
| `alluxio_cached_storage`             | `type`        | gauge   | worker      | 缓存数据量                              |
| `alluxio_cached_capacity`            | `type`        | gauge   | worker      | 配置的最大缓存存储量                         |
| `alluxio_data_cached_pages`          | -             | gauge   | worker      | 在页面存储中缓存的页面数量                      |
| `alluxio_data_cached_files`          | -             | gauge   | worker      | 缓存的文件数量，包括完全缓存和部分缓存的文件             |
| `alluxio_cached_storage_by_priority` | `priority`    | gauge   | worker      | 缓存数据量                              |
| `alluxio_eviction_by_ttl`            | `policy_path` | counter | worker      | 由 TTL 策略从 Alluxio worker 中驱逐的字节总数。 |
| `alluxio_quota_size_used`            | `dir`         | gauge   | coordinator | 给定配额范围内已使用的字节数。                    |
| `alluxio_quota_size_capacity`        | `dir`         | gauge   | coordinator | 配额规则中定义的给定配额范围的容量。                 |
| `alluxio_eviction_by_quota`          | `dir`         | counter | worker      | 由配额规则从 Alluxio worker 中驱逐的字节总数。    |

### 缓存访问

| 指标                                             | 标签                             | 类型        | 组件     | 描述                                                                                                   |
| ---------------------------------------------- | ------------------------------ | --------- | ------ | ---------------------------------------------------------------------------------------------------- |
| `alluxio_data_access`                          | `method`                       | histogram | worker | 所有数据访问请求的聚合                                                                                          |
| `alluxio_data_throughput`                      | `dir`, `method`, `destination` | counter   | worker | 所有数据访问的数据吞吐量计数器                                                                                      |
| `alluxio_meta_operation`                       | `op`                           | counter   | worker | 元数据操作的 rpc 调用计数器                                                                                     |
| `alluxio_meta_operation_latency_ms`            | `op`, `state`                  | histogram | worker | 元数据操作的 rpc 调用延迟                                                                                      |
| `alluxio_meta_operation_errors`                | `op`                           | counter   | worker | 元数据操作 rpc 调用处理过程中的错误计数器                                                                              |
| `alluxio_ufs_error`                            | `ufs_type`, `error_code`       | counter   | worker | rpc 调用计数器                                                                                            |
| `alluxio_ufs_latency_ms`                       | `method`, `ufs_type`           | histogram | worker | UFS 调用延迟的直方图                                                                                         |
| `alluxio_ufs_client_latency_ms`                | `method`, `ufs_type`, `state`  | histogram | worker | UFS 客户端 API 调用延迟的直方图                                                                                 |
| `alluxio_ufs_client_call_processing`           | `method`, `ufs_type`           | gauge     | worker | 正在处理的 UFS 客户端调用的 gauge 值                                                                             |
| `alluxio_ufs_data_access`                      | `dir`, `method`                | counter   | worker | UFS 访问量                                                                                              |
| `alluxio_ufs_fallback`                         | `method`                       | counter   | worker | UFS fallback 量                                                                                       |
| `alluxio_cached_data_read`                     | `dir`, `is_pinned`             | counter   | worker | 读取时命中 Alluxio 缓存并由其提供服务的数据量                                                                          |
| `alluxio_missed_data_read`                     | `dir`, `is_pinned`             | counter   | worker | 读取时未命中 Alluxio 缓存的数据量                                                                                |
| `alluxio_cache_hit_calls`                      | -                              | counter   | worker | 页面存储中的缓存命中次数                                                                                         |
| `alluxio_cache_miss_calls`                     | -                              | counter   | worker | 页面存储中的缓存未命中次数                                                                                        |
| `alluxio_external_data_read`                   | `dir`                          | counter   |        | 客户端缓存未命中时的读取数据量                                                                                      |
| `alluxio_cleared_stale_cached_data`            | -                              | counter   | worker | 已清除的陈旧缓存数据量                                                                                          |
| `alluxio_cached_evicted_data`                  | -                              | counter   | worker | 被驱逐的数据量                                                                                              |
| `alluxio_cached_async_evicted_data`            | -                              | counter   | worker | 被异步驱逐的数据量                                                                                            |
| `alluxio_trigger_async_evicted_total`          | -                              | counter   | worker | 触发异步驱逐的次数计数器                                                                                         |
| `alluxio_page_store_operation_errors`          | `op`, `cause`                  | counter   | worker | 页面存储操作失败计数器                                                                                          |
| `alluxio_page_store_dir_operation_errors`      | `dir`                          | counter   | worker | 特定页面存储目录中的失败计数器                                                                                      |
| `alluxio_page_store_dir_operations`            | `dir`                          | counter   | worker | 特定页面存储目录中的操作计数器                                                                                      |
| `alluxio_page_store_io_latency_microseconds`   | `dir`, `op`, `success`         | histogram | worker | 页面存储中 IO 操作的延迟                                                                                       |
| `alluxio_metadata_cache_hit_calls`             | `type`                         | counter   | worker | 命中缓存的元数据检索调用计数器                                                                                      |
| `alluxio_external_file_metadata_request_calls` | -                              | counter   | worker | 从 UFS 获取文件元数据的调用计数器，通常是文件元数据缓存未命中的结果                                                                 |
| `alluxio_metadata_cache_miss_calls`            | `type`                         | counter   | worker | 未命中缓存的元数据检索调用计数器。注意，文件缓存未命中并不总是导致对外部数据源的请求，因此与 alluxio\_external\_file\_metadata\_request\_calls 不同。 |
| `alluxio_passive_cache_async_loaded_files`     | `result`                       | counter   | worker | 启用被动缓存时异步加载的文件数量                                                                                     |
| `alluxio_page_store_device_total_capacity`     | `dir`                          | gauge     | worker | 页面存储目录所在物理存储设备的总容量                                                                                   |
| `alluxio_page_store_device_available_capacity` | `dir`                          | gauge     | worker | 页面存储目录所在物理存储设备的可用容量                                                                                  |
| `alluxio_metastore_storage_size`               | `dir`                          | gauge     | worker | metastore RocksDB 目录中文件的总逻辑大小。                                                                       |
| `alluxio_metastore_disk_capacity`              | `dir`                          | gauge     | worker | metastore RocksDB 所在磁盘的容量。                                                                           |
| `alluxio_netty_data_ingress`                   | -                              | counter   | worker | 从客户端到 worker 的入站字节数（不含 TLS）                                                                          |
| `alluxio_netty_data_egress`                    | -                              | counter   | worker | 从 worker 到客户端的出站字节数（不含 TLS）                                                                          |
| `alluxio_worker_thread_pool_rejections`        | `dir`                          | counter   | worker | worker 线程池中的拒绝计数器                                                                                    |
| `alluxio_rpc_executor_current_queue_length`    | `executor_name`                | gauge     | worker | 当前正在处理和待处理的 RPC 请求数量                                                                                 |
| `alluxio_rpc_executor_active_threads`          | `executor_name`                | gauge     | worker | 正在执行 RPC 的活跃线程数量                                                                                     |
| `alluxio_rpc_executor_current_threads`         | `executor_name`                | gauge     | worker | 用于执行 RPC 的线程数量（包括占用和空闲）                                                                              |
| `alluxio_rpc_executor_max_threads`             | `executor_name`                | gauge     | worker | 用于执行 RPC 的最大线程数量                                                                                     |

### S3 API

| 指标                                | 标签                           | 类型        | 组件     | 描述                   |
| --------------------------------- | ---------------------------- | --------- | ------ | -------------------- |
| `alluxio_s3_api_throughput`       | `method`                     | histogram | worker | S3 API 吞吐量的直方图       |
| `alluxio_s3_api_call_latency_ms`  | `method`, `state`            | histogram | worker | S3 API 调用延迟          |
| `alluxio_s3_api_call_processing`  | `method`                     | gauge     | worker | 正在处理的 S3 API 调用计数器   |
| `alluxio_s3_authn_latency_ms`     | `result`, `reason`           | histogram | worker | S3 认证延迟。             |
| `alluxio_s3_authz_latency_ms`     | `result`, `method`, `reason` | histogram | worker | S3 授权延迟。             |
| `alluxio_sts_api_call_processing` | `method`                     | gauge     | worker | 正在处理的 STS API 调用计数器。 |
| `alluxio_sts_requests_total`      | `result`, `reason`           | counter   | worker | STS 请求总数。            |

### FUSE

| 指标                                   | 标签                     | 类型        | 组件   | 描述                                          |
| ------------------------------------ | ---------------------- | --------- | ---- | ------------------------------------------- |
| `alluxio_fuse_concurrency`           | `method`               | gauge     | fuse | 记录 fuse 方法的实时并发数                            |
| `alluxio_fuse_call_latency_ms`       | `method`, `state`      | histogram | fuse | fuse 操作延迟                                   |
| `alluxio_fuse_result`                | `method`, `state`      | counter   | fuse | fuse 操作结果计数器                                |
| `alluxio_fuse_path_cache_hits_bytes` | -                      | counter   | fuse | fuse 路径缓存命中计数器                              |
| `alluxio_fuse_path_cache_misses`     | -                      | counter   | fuse | fuse 路径缓存未命中计数器                             |
| `alluxio_fuse_buffer_size`           | `method`, `sequential` | histogram | fuse | 记录顺序或随机读写及其缓冲区大小。                           |
| `alluxio_fuse_block_size`            | `method`               | histogram | fuse | 记录随机读写时的块大小。"块大小"可理解为 fio 测试中指定的 \`bs\` 参数。 |
| `alluxio_fuse_open_files`            | -                      | gauge     | fuse | fuse 打开的文件数量。                               |

### 客户端 SDK

| 指标                                                          | 标签                                          | 类型        | 组件                        | 描述                                                |
| ----------------------------------------------------------- | ------------------------------------------- | --------- | ------------------------- | ------------------------------------------------- |
| `alluxio_grpc_client_call_latency_ms`                       | `method`, `instance`, `state`               | histogram | worker, coordinator, fuse | 客户端 gRPC 调用延迟                                     |
| `alluxio_grpc_client_concurrency`                           | `method`, `instance`                        | gauge     | worker, coordinator, fuse | 客户端 gRPC 调用并发数                                    |
| `alluxio_grpc_client_errors`                                | `method`, `status_code`, `instance`         | counter   | worker, coordinator, fuse | 客户端 gRPC 错误总数                                     |
| `alluxio_grpc_client_successes`                             | `method`, `instance`                        | counter   | worker, coordinator, fuse | 客户端成功的 gRPC 调用总数                                  |
| `alluxio_netty_operations`                                  | `op`                                        | counter   | worker, coordinator, fuse | Netty 操作数量（如读写请求）                                 |
| `alluxio_netty_operation_errors`                            | `op`, `reason`, `instance`                  | counter   | worker, coordinator, fuse | 客户端 Netty 操作错误总数                                  |
| `alluxio_read_from_workers`                                 | `instance`                                  | counter   | worker, coordinator, fuse | 客户端从 worker 读取的字节总数                               |
| `alluxio_async_prefetch_cache_bytes`                        | `instance`                                  | counter   | worker, coordinator, fuse | 客户端异步预取到本地的字节总数                                   |
| `alluxio_async_prefetch_hit_cache_bytes`                    | `instance`                                  | counter   | worker, coordinator, fuse | 客户端命中异步预取缓存的字节总数                                  |
| `alluxio_async_prefetch_random_read_requests`               | `instance`                                  | counter   | worker, coordinator, fuse | 异步预取记录的客户端随机读请求总数                                 |
| `alluxio_multi_replica_read_from_workers`                   | `cluster_name`, `local_cluster`, `hot_read` | counter   | worker                    | 客户端在读取多副本文件时从 Alluxio worker 读取的字节数               |
| `alluxio_rpc_retry_on_different_workers`                    | `op`, `retry_count`                         | counter   | worker                    | 启用多副本时客户端在不同 worker 上重试的计数器。                      |
| `alluxio_rpc_position_reader_read_calls`                    | `component`                                 | counter   | worker                    | 客户端 position reader 读取成功计数器。                      |
| `alluxio_rpc_position_reader_data_read`                     | `component`                                 | counter   | worker                    | 客户端 position reader 读取的字节计数器。                     |
| `alluxio_rpc_position_reader_read_failed_total`             | `component`, `final_attempt`                | counter   | worker                    | 客户端 position reader 读取失败计数器。                      |
| `alluxio_client_netty_read_time_to_receive_first_packet_ms` | -                                           | histogram | fuse                      | 从客户端发送读请求到 worker，再到 worker 将响应的第一个数据包发回客户端之间的延迟。 |

### 作业服务

| 指标                                              | 标签                                  | 类型      | 组件          | 描述                                   |
| ----------------------------------------------- | ----------------------------------- | ------- | ----------- | ------------------------------------ |
| `alluxio_completed_job`                         | `type`, `state`                     | counter | coordinator | 作业计数器。                               |
| `alluxio_job_process_file`                      | `type`, `state`                     | counter | coordinator | 文件计数器。                               |
| `alluxio_job_process_file_size`                 | `type`, `state`                     | counter | coordinator | 作业服务处理的文件累计大小。                       |
| `alluxio_active_job_count`                      | `type`                              | gauge   | coordinator | 调度器中的作业计数器，type 值为 running 或 waiting |
| `alluxio_distributed_load_job_dispatched_size`  | -                                   | counter | coordinator | 分布式加载中分派的字节计数器                       |
| `alluxio_distributed_load_job_failure`          | `reason`, `final_attempt`, `worker` | counter | coordinator | 分布式加载失败计数器                           |
| `alluxio_distributed_load_job_loaded_bytes`     | -                                   | counter | coordinator | 分布式加载中加载的字节计数器                       |
| `alluxio_distributed_load_job_processed`        | -                                   | counter | coordinator | 分布式加载中加载的非空文件副本计数器                   |
| `alluxio_distributed_load_job_scanned`          | -                                   | counter | coordinator | 分布式加载中扫描的 inode 计数器                  |
| `alluxio_distributed_load_job_skipped`          | -                                   | counter | coordinator | 分布式加载中跳过的 inode 计数器                  |
| `alluxio_distributed_load_data_loaded`          | -                                   | counter | worker      | 分布式加载中每个 worker 加载的字节计数器             |
| `alluxio_distributed_load_data_loaded_from_ufs` | -                                   | counter | worker      | 分布式加载中每个 worker 从 UFS 加载的字节计数器       |
| `alluxio_worker_job_task_count`                 | -                                   | gauge   | coordinator | 每个 worker 当前执行的任务数量。                 |

### 写入缓存

| 指标                                                       | 标签                         | 类型        | 组件     | 描述                |
| -------------------------------------------------------- | -------------------------- | --------- | ------ | ----------------- |
| `alluxio_write_buffer_write_status`                      | `status`                   | counter   | worker | 写缓冲区写入状态          |
| `alluxio_write_buffer_worker_failure`                    | `worker`                   | counter   | worker | 向 worker 写入的失败计数  |
| `alluxio_write_buffer_worker_bytes_written`              | `worker`                   | counter   | worker | 写入 worker 的字节数    |
| `alluxio_write_buffer_unique_bytes_written`              | -                          | counter   | worker | 客户端写入的唯一字节数       |
| `alluxio_write_buffer_foundationdb_call_latency_ms`      | `method`, `state`          | histogram | worker | FoundationDB 调用延迟 |
| `alluxio_write_buffer_persist_tasks`                     | `status`                   | counter   | worker | 持久化任务数量           |
| `alluxio_write_buffer_transition_worker`                 | `worker`                   | counter   | worker | worker 转换次数       |
| `alluxio_write_buffer_async_persist_throughput`          | -                          | counter   | worker | 异步持久化吞吐量          |
| `alluxio_write_buffer_async_file_checker_abnormal_files` | -                          | counter   | worker | 异步文件检查器发现的异常文件数量  |
| `alluxio_dual_buffer_file_system_requests`               | `operation`, `buffer_type` | counter   | worker | 双缓冲文件系统请求计数器。     |

### 授权

| 指标                                           | 标签 | 类型      | 组件           | 描述              |
| -------------------------------------------- | -- | ------- | ------------ | --------------- |
| `alluxio_auth_permission_check_total`        | -  | counter | worker, fuse | 授权权限检查总次数。      |
| `alluxio_auth_permission_check_cache_misses` | -  | counter | worker, fuse | 授权权限检查缓存未命中总次数。 |

### 集群与进程

| 指标                                                             | 标签                                                 | 类型        | 组件                        | 描述                        |
| -------------------------------------------------------------- | -------------------------------------------------- | --------- | ------------------------- | ------------------------- |
| `alluxio_version`                                              | `version`                                          | gauge     | worker, coordinator, fuse | Alluxio 组件版本信息            |
| `alluxio_license_expiration_date`                              | -                                                  | gauge     | coordinator               | 许可证到期日期，以 epoch 时间格式表示    |
| `alluxio_cumulative_unavailable_workers`                       | `worker_addr`                                      | counter   | worker, coordinator       | 客户端遇到不可用 worker 的累计次数。    |
| `alluxio_unavailable_worker_probe_attempts`                    | `worker_addr`, `attempt`                           | counter   | worker, coordinator       | 对不可用 worker 的存活探测尝试次数     |
| `alluxio_worker_membership_refresh_count`                      | -                                                  | counter   | worker                    | worker 成员关系刷新总次数          |
| `alluxio_dynamic_resource_pool_current_resources`              | `pool_name`                                        | gauge     | worker                    | 动态资源池中当前的资源数量             |
| `alluxio_dynamic_resource_pool_capacity`                       | `pool_name`                                        | gauge     | worker                    | 动态资源池的容量                  |
| `alluxio_dynamic_resource_pool_acquisition_timeouts`           | `pool_name`                                        | counter   | worker                    | 动态资源池中获取超时的总次数            |
| `alluxio_dynamic_resource_pool_create_new_resource_latency_ms` | `pool_name`                                        | histogram | worker                    | 在动态资源池中创建新资源的延迟           |
| `alluxio_etcd_call_errors`                                     | `type`                                             | counter   | coordinator, worker, fuse | etcd 调用错误总数               |
| `alluxio_etcd_client_calls`                                    | `type`                                             | counter   | coordinator, worker, fuse | etcd 客户端调用总数              |
| `alluxio_etcd_client_call_latency_ms`                          | `type`                                             | histogram | coordinator, worker, fuse | etcd 客户端调用延迟              |
| `alluxio_netty_direct_memory_usage`                            | -                                                  | gauge     | worker                    | Netty 直接内存用量              |
| `alluxio_rocksdb_memory_usage`                                 | -                                                  | gauge     | worker                    | RocksDB 内存用量              |
| `process_start_time_seconds`                                   | -                                                  | gauge     | coordinator, worker, fuse | 自 unix epoch 以来的进程启动时间（秒） |
| `process_cpu_seconds_total`                                    | -                                                  | counter   | coordinator, worker, fuse | 总用户和系统 CPU 时间（秒）          |
| `jvm_threads_current`                                          | -                                                  | gauge     | coordinator, worker, fuse | JVM 的当前线程数                |
| `jvm_memory_used_bytes`                                        | `area=heap/nonheap`                                | gauge     | coordinator, worker, fuse | 给定 JVM 内存区域的已用字节数         |
| `jvm_memory_max_bytes`                                         | `area=heap/nonheap`                                | gauge     | coordinator, worker, fuse | 给定 JVM 内存区域的最大字节数         |
| `jvm_gc_collection_seconds`                                    | `gc="G1 Young Generation"/"G1 Old Generation"/...` | summary   | coordinator, worker, fuse | 在给定 JVM 垃圾回收器中花费的时间（秒）    |
| `jvm_buffer_pool_used_bytes`                                   | `pool=direct/mapped`                               | gauge     | coordinator, worker, fuse | 给定 JVM 缓冲池的已用字节数          |


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://documentation.alluxio.io/ee-ai-cn/reference/metrics.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
