List of Metrics
There are two types of metrics in Alluxio, cluster-wide aggregated metrics, and per-process detailed metrics.
Cluster metrics are collected and calculated by the leading master and displayed in the metrics tab of the web UI. These metrics are designed to provide a snapshot of the cluster state and the overall amount of data and metadata served by Alluxio.
Process metrics are collected by each Alluxio process and exposed in a machine-readable format through any configured sinks. Process metrics are highly detailed and are intended to be consumed by third-party monitoring tools. Users can then view fine-grained dashboards with time-series graphs of each metric, such as data transferred or the number of RPC invocations.
Metrics in Alluxio have the following format for master node metrics:
Metrics in Alluxio have the following format for non-master node metrics:
There is generally an Alluxio metric for every RPC invocation, to Alluxio or to the under store.
Tags are additional pieces of metadata for the metric such as user name or under storage location. Tags can be used to further filter or aggregate on various characteristics.
Cluster Metrics
Workers and clients send metrics data to the Alluxio master through heartbeats. The interval is defined by property alluxio.master.worker.heartbeat.interval
and alluxio.user.metrics.heartbeat.interval
respectively.
Bytes metrics are aggregated value from workers or clients. Bytes throughput metrics are calculated on the leading master. The values of bytes throughput metrics equal to bytes metrics counter value divided by the metrics record time and shown as bytes per minute.
Cluster.ActiveRpcReadCount
COUNTER
The number of active read-RPCs managed by workers
Cluster.ActiveRpcWriteCount
COUNTER
The number of active write-RPCs managed by workers
Cluster.BytesReadDirect
COUNTER
Total number of bytes read from all workers without external RPC involved. Data exists in worker storage or is fetched by workers from UFSes. This records data read by worker internal calls (e.g. clients embedded in workers).
Cluster.BytesReadDirectThroughput
GAUGE
Total number of bytes read from all workers without external RPC involved. Data exists in worker storage or is fetched by workers from UFSes. This records data read by worker internal calls (e.g. clients embedded in workers).
Cluster.BytesReadDomain
COUNTER
Total number of bytes read from all works via domain socket
Cluster.BytesReadDomainThroughput
GAUGE
Bytes read per minute throughput from all workers via domain socket
Cluster.BytesReadLocal
COUNTER
Total number of bytes short-circuit read reported by all clients. Each client reads data from the collocated worker data storage directly.
Cluster.BytesReadLocalThroughput
GAUGE
Bytes per minute throughput short-circuit read reported by all clients
Cluster.BytesReadPerUfs
COUNTER
Total number of bytes read from a specific UFS by all workers
Cluster.BytesReadRemote
COUNTER
Total number of bytes read from all workers via network (RPC). Data exists in worker storage or is fetched by workers from UFSes. This does not include short-circuit local reads and domain socket reads
Cluster.BytesReadRemoteThroughput
GAUGE
Bytes read per minute throughput from all workers via network (RPC calls). Data exists in worker storage or is fetched by workers from UFSes. This does not include short-circuit local reads and domain socket reads
Cluster.BytesReadUfsAll
COUNTER
Total number of bytes read from all Alluxio UFSes by all workers
Cluster.BytesReadUfsThroughput
GAUGE
Bytes read per minute throughput from all Alluxio UFSes by all workers
Cluster.BytesWrittenDomain
COUNTER
Total number of bytes written to all workers via domain socket
Cluster.BytesWrittenDomainThroughput
GAUGE
Throughput of bytes written per minute to all workers via domain socket
Cluster.BytesWrittenLocal
COUNTER
Total number of bytes short-circuit written to local worker data storage by all clients
Cluster.BytesWrittenLocalThroughput
GAUGE
Bytes per minute throughput written to local worker data storage by all clients
Cluster.BytesWrittenPerUfs
COUNTER
Total number of bytes written to a specific Alluxio UFS by all workers
Cluster.BytesWrittenRemote
COUNTER
Total number of bytes written to workers via network (RPC). Data is written to worker storage or is written by workers to underlying UFSes. This does not include short-circuit local writes and domain socket writes.
Cluster.BytesWrittenRemoteThroughput
GAUGE
Bytes write per minute throughput to workers via network (RPC). Data is written to worker storage or is written by workers to underlying UFSes. This does not include short-circuit local writes and domain socket writes.
Cluster.BytesWrittenUfsAll
COUNTER
Total number of bytes written to all Alluxio UFSes by all workers
Cluster.BytesWrittenUfsThroughput
GAUGE
Bytes write per minute throughput to all Alluxio UFSes by all workers
Cluster.CacheHitRate
GAUGE
Cache hit rate: (# bytes read from cache) / (# bytes requested)
Cluster.CapacityFree
GAUGE
Total free bytes on all tiers, on all workers of Alluxio
Cluster.CapacityTotal
GAUGE
Total capacity (in bytes) on all tiers, on all workers of Alluxio
Cluster.CapacityUsed
GAUGE
Total used bytes on all tiers, on all workers of Alluxio
Cluster.LeaderId
GAUGE
Display current leader id
Cluster.LeaderIndex
GAUGE
Index of current leader
Cluster.LostWorkers
GAUGE
Total number of lost workers inside the cluster
Cluster.RootUfsCapacityFree
GAUGE
Free capacity of the Alluxio root UFS in bytes
Cluster.RootUfsCapacityTotal
GAUGE
Total capacity of the Alluxio root UFS in bytes
Cluster.RootUfsCapacityUsed
GAUGE
Used capacity of the Alluxio root UFS in bytes
Cluster.Workers
GAUGE
Total number of active workers inside the cluster
Process Metrics
Metrics shared by the all Alluxio server and client processes.
Process.pool.direct.mem.used
GAUGE
The used direct memory by NIO direct buffer pool
Server Metrics
Metrics shared by the Alluxio server processes.
Server.JvmPauseMonitorInfoTimeExceeded
GAUGE
The total number of times that JVM slept and the sleep period is larger than the info level threshold defined by alluxio.jvm.monitor.info.threshold
Server.JvmPauseMonitorTotalExtraTime
GAUGE
The total time that JVM slept and didn't do GC
Server.JvmPauseMonitorWarnTimeExceeded
GAUGE
The total number of times that JVM slept and the sleep period is larger than the warn level threshold defined by alluxio.jvm.monitor.warn.threshold
Master Metrics
Default master metrics:
Master.AbsentCacheHits
GAUGE
Number of cache hits on the absent cache
Master.AbsentCacheMisses
GAUGE
Number of cache misses on the absent cache
Master.AbsentCacheSize
GAUGE
Size of the absent cache
Master.AbsentPathCacheQueueSize
GAUGE
Alluxio maintains a cache of absent UFS paths. This is the number of UFS paths being processed.
Master.AsyncPersistCancel
COUNTER
The number of cancelled AsyncPersist operations
Master.AsyncPersistFail
COUNTER
The number of failed AsyncPersist operations
Master.AsyncPersistFileCount
COUNTER
The number of files created by AsyncPersist operations
Master.AsyncPersistFileSize
COUNTER
The total size of files created by AsyncPersist operations
Master.AsyncPersistSuccess
COUNTER
The number of successful AsyncPersist operations
Master.AuditLogEntriesSize
GAUGE
The size of the audit log entries blocking queue
Master.BlockHeapSize
GAUGE
An estimate of the blocks heap size
Master.BlockReplicaCount
GAUGE
Total number of block replicas in Alluxio
Master.CachedBlockLocations
GAUGE
Total number of cached block locations
Master.CompleteFileOps
COUNTER
Total number of the CompleteFile operations
Master.CompletedOperationRetryCount
COUNTER
Total number of completed operations that has been retried by client.
Master.CreateDirectoryOps
COUNTER
Total number of the CreateDirectory operations
Master.CreateFileOps
COUNTER
Total number of the CreateFile operations
Master.DeletePathOps
COUNTER
Total number of the Delete operations
Master.DirectoriesCreated
COUNTER
Total number of the succeed CreateDirectory operations
Master.EdgeCacheEvictions
GAUGE
Total number of edges (inode metadata) that was evicted from cache. The edge cache is responsible for managing the mapping from (parentId, childName) to childId.
Master.EdgeCacheHits
GAUGE
Total number of hits in the edge (inode metadata) cache. The edge cache is responsible for managing the mapping from (parentId, childName) to childId.
Master.EdgeCacheLoadTimes
GAUGE
Total load times in the edge (inode metadata) cache that resulted from a cache miss. The edge cache is responsible for managing the mapping from (parentId, childName) to childId.
Master.EdgeCacheMisses
GAUGE
Total number of misses in the edge (inode metadata) cache. The edge cache is responsible for managing the mapping from (parentId, childName) to childId.
Master.EdgeCacheSize
GAUGE
Total number of edges (inode metadata) cached. The edge cache is responsible for managing the mapping from (parentId, childName) to childId.
Master.EdgeLockPoolSize
GAUGE
The size of master edge lock pool
Master.EmbeddedJournalLastSnapshotDownloadDiskSize
GAUGE
Describes the size on disk of the snapshot downloaded from other masters in the cluster the previous time the download occurred. Only valid when using the embedded journal.
Master.EmbeddedJournalLastSnapshotDownloadDurationMs
GAUGE
Describes the amount of time taken to download journal snapshots from other masters in the cluster the previous time the download occurred. Only valid when using the embedded journal.
Master.EmbeddedJournalLastSnapshotDownloadSize
GAUGE
Describes the size of the snapshot downloaded from other masters in the cluster the previous time the download occurred. Only valid when using the embedded journal.
Master.EmbeddedJournalLastSnapshotDurationMs
GAUGE
Describes the amount of time taken to generate the last local journal snapshots on this master. Only valid when using the embedded journal.
Master.EmbeddedJournalLastSnapshotEntriesCount
GAUGE
Describes the number of entries in the last local journal snapshots on this master. Only valid when using the embedded journal.
Master.EmbeddedJournalLastSnapshotReplayDurationMs
GAUGE
Represents the time the last restore from checkpoint operation took in milliseconds.
Master.EmbeddedJournalLastSnapshotReplayEntriesCount
GAUGE
Represents the time the last restore from checkpoint operation took in milliseconds.
Master.EmbeddedJournalLastSnapshotUploadDiskSize
GAUGE
Describes the size on disk of the snapshot uploaded to other masters in the cluster the previous time the download occurred. Only valid when using the embedded journal.
Master.EmbeddedJournalLastSnapshotUploadDurationMs
GAUGE
Describes the amount of time taken to upload journal snapshots to another master in the cluster the previous time the upload occurred. Only valid when using the embedded journal.
Master.EmbeddedJournalLastSnapshotUploadSize
GAUGE
Describes the size of the snapshot uploaded to other masters in the cluster the previous time the download occurred. Only valid when using the embedded journal.
Master.EmbeddedJournalSnapshotDownloadDiskHistogram
HISTOGRAM
Describes the size on disk of the snapshot downloaded from another master in the cluster. Only valid when using the embedded journal. Long running average.
Master.EmbeddedJournalSnapshotDownloadGenerate
TIMER
Describes the amount of time taken to download journal snapshots from other masters in the cluster. Only valid when using the embedded journal. Long running average.
Master.EmbeddedJournalSnapshotDownloadHistogram
HISTOGRAM
Describes the size of the snapshot downloaded from another master in the cluster. Only valid when using the embedded journal. Long running average.
Master.EmbeddedJournalSnapshotGenerateTimer
TIMER
Describes the amount of time taken to generate local journal snapshots on this master. Only valid when using the embedded journal. Use this metric to measure the performance of Alluxio's snapshot generation.
Master.EmbeddedJournalSnapshotInstallTimer
TIMER
Describes the amount of time taken to install a downloaded journal snapshot from another master. Only valid only when using the embedded journal. Use this metric to determine the performance of Alluxio when installing snapshots from the leader. Higher numbers may indicate a slow disk or CPU contention.
Master.EmbeddedJournalSnapshotLastIndex
GAUGE
Represents the latest journal index that was recorded by this master in the most recent local snapshot or from a snapshot downloaded from another master in the cluster. Only valid when using the embedded journal.
Master.EmbeddedJournalSnapshotReplayTimer
TIMER
Describes the amount of time taken to replay a journal snapshot onto the master's state machine. Only valid only when using the embedded journal. Use this metric to determine the performance of Alluxio when replaying journal snapshot file. Higher numbers may indicate a slow disk or CPU contention
Master.EmbeddedJournalSnapshotUploadDiskHistogram
HISTOGRAM
Describes the size on disk of the snapshot uploaded to another master in the cluster. Only valid when using the embedded journal. Long running average.
Master.EmbeddedJournalSnapshotUploadHistogram
HISTOGRAM
Describes the size of the snapshot uploaded to another master in the cluster. Only valid when using the embedded journal. Long running average.
Master.EmbeddedJournalSnapshotUploadTimer
TIMER
Describes the amount of time taken to upload journal snapshots to another master in the cluster. Only valid when using the embedded journal. long running average
Master.FileBlockInfosGot
COUNTER
Total number of succeed GetFileBlockInfo operations
Master.FileInfosGot
COUNTER
Total number of the succeed GetFileInfo operations
Master.FileSize
GAUGE
File size distribution
Master.FilesCompleted
COUNTER
Total number of the succeed CompleteFile operations
Master.FilesCreated
COUNTER
Total number of the succeed CreateFile operations
Master.FilesFreed
COUNTER
Total number of succeed FreeFile operations
Master.FilesPersisted
COUNTER
Total number of successfully persisted files
Master.FilesPinned
GAUGE
Total number of currently pinned files. Note that IDs for these files are stored in memory.
Master.FilesToBePersisted
GAUGE
Total number of currently to be persisted files. Note that the IDs for these files are stored in memory.
Master.FreeFileOps
COUNTER
Total number of FreeFile operations
Master.GetFileBlockInfoOps
COUNTER
Total number of GetFileBlockInfo operations
Master.GetFileInfoOps
COUNTER
Total number of the GetFileInfo operations
Master.GetNewBlockOps
COUNTER
Total number of the GetNewBlock operations
Master.InodeCacheEvictions
GAUGE
Total number of inodes that was evicted from the cache.
Master.InodeCacheHitRatio
GAUGE
Inode Cache hit ratio
Master.InodeCacheHits
GAUGE
Total number of hits in the inodes (inode metadata) cache.
Master.InodeCacheLoadTimes
GAUGE
Total load times in the inodes (inode metadata) cache that resulted from a cache miss.
Master.InodeCacheMisses
GAUGE
Total number of misses in the inodes (inode metadata) cache.
Master.InodeCacheSize
GAUGE
Total number of inodes (inode metadata) cached.
Master.InodeHeapSize
GAUGE
An estimate of the inode heap size
Master.InodeLockPoolSize
GAUGE
The size of master inode lock pool
Master.JobCanceled
COUNTER
The number of canceled status job
Master.JobCompleted
COUNTER
The number of completed status job
Master.JobCount
GAUGE
The number of all status job
Master.JobCreated
COUNTER
The number of created status job
Master.JobDistributedLoadBlockSizes
COUNTER
The total block size loaded by load commands
Master.JobDistributedLoadCancel
COUNTER
The number of cancelled DistributedLoad operations
Master.JobDistributedLoadFail
COUNTER
The number of failed DistributedLoad operations
Master.JobDistributedLoadFileCount
COUNTER
The number of files by DistributedLoad operations
Master.JobDistributedLoadFileSizes
COUNTER
The total file size by DistributedLoad operations
Master.JobDistributedLoadRate
METER
The average DistributedLoad loading rate
Master.JobDistributedLoadSuccess
COUNTER
The number of successful DistributedLoad operations
Master.JobFailed
COUNTER
The number of failed status job
Master.JobLoadBlockCount
COUNTER
The number of blocks loaded by load commands
Master.JobLoadBlockFail
COUNTER
The number of blocks failed to be loaded by load commands
Master.JobLoadFail
COUNTER
The number of failed Load commands
Master.JobLoadRate
METER
The average loading rate of Load commands
Master.JobLoadSuccess
COUNTER
The number of successful Load commands
Master.JobRunning
COUNTER
The number of running status job
Master.JournalCheckpointWarn
GAUGE
If the raft log index exceeds alluxio.master.journal.checkpoint.period.entries, and the last checkpoint exceeds alluxio.master.journal.checkpoint.warning.threshold.time, it returns 1 to indicate that a warning is required, otherwise it returns 0
Master.JournalEntriesSinceCheckPoint
GAUGE
Journal entries since last checkpoint
Master.JournalFlushFailure
COUNTER
Total number of failed journal flush
Master.JournalFlushTimer
TIMER
The timer statistics of journal flush
Master.JournalFreeBytes
GAUGE
Bytes left on the journal disk(s) for an Alluxio master. This metric is only valid on Linux and when embedded journal is used. Use this metric to monitor whether your journal is running out of disk space.
Master.JournalFreePercent
GAUGE
Percentage of free space left on the journal disk(s) for an Alluxio master.This metric is only valid on Linux and when embedded journal is used. Use this metric to monitor whether your journal is running out of disk space.
Master.JournalGainPrimacyTimer
TIMER
The timer statistics of journal gain primacy
Master.JournalLastAppliedCommitIndex
GAUGE
The last raft log index which was applied to the state machine
Master.JournalLastCheckPointTime
GAUGE
Last Journal Checkpoint Time
Master.JournalSequenceNumber
GAUGE
Current journal sequence number
Master.LastBackupEntriesCount
GAUGE
The total number of entries written in the last leading master metadata backup
Master.LastBackupRestoreCount
GAUGE
The total number of entries restored from backup when a leading master initializes its metadata
Master.LastBackupRestoreTimeMs
GAUGE
The process time of the last restore from backup
Master.LastBackupTimeMs
GAUGE
The process time of the last backup
Master.LastGainPrimacyTime
GAUGE
Last time the master gains primacy
Master.LastLosePrimacyTime
GAUGE
Last time the master loses primacy
Master.ListingCacheEvictions
COUNTER
The total number of evictions in master listing cache
Master.ListingCacheHits
COUNTER
The total number of hits in master listing cache
Master.ListingCacheLoadTimes
COUNTER
The total load time (in nanoseconds) in master listing cache that resulted from a cache miss.
Master.ListingCacheMisses
COUNTER
The total number of misses in master listing cache
Master.ListingCacheSize
GAUGE
The size of master listing cache
Master.LostBlockCount
GAUGE
Count of lost unique blocks
Master.LostFileCount
GAUGE
Count of lost files. This number is cached and may not be in sync with Master.LostBlockCount
Master.MetadataSyncActivePaths
COUNTER
The number of in-progress paths from all InodeSyncStream instances
Master.MetadataSyncExecutor
EXECUTOR_SERVICE
Metrics concerning the master metadata sync executor threads. Master.MetadataSyncExecutor.submitted is a meter of the tasks submitted to the executor. Master.MetadataSyncExecutor.completed is a meter of the tasks completed by the executor. Master.MetadataSyncExecutor.activeTaskQueue is exponentially-decaying random reservoir of the number of active tasks (running or submitted) at the executor calculated each time a new task is added to the executor. The max value is the maximum number of active tasks at any time during execution. Master.MetadataSyncExecutor.running is the number of tasks actively being run by the executor. Master.MetadataSyncExecutor.idle is the time spent idling by the submitted tasks (i.e. waiting the the queue before being executed). Master.MetadataSyncExecutor.duration is the time spent running the submitted tasks. If the executor is a thread pool executor then Master.MetadataSyncExecutor.queueSize is the size of the task queue.
Master.MetadataSyncExecutorQueueSize
GAUGE
The number of queuing sync tasks in the metadata sync thread pool controlled by alluxio.master.metadata.sync.executor.pool.size
Master.MetadataSyncFail
COUNTER
The number of InodeSyncStream that failed, either partially or fully
Master.MetadataSyncNoChange
COUNTER
The number of InodeSyncStream that finished with no change to inodes.
Master.MetadataSyncOpsCount
COUNTER
The number of metadata sync operations. Each sync operation corresponds to one InodeSyncStream instance.
Master.MetadataSyncPathsCancel
COUNTER
The number of pending paths from all InodeSyncStream instances that are ignored in the end instead of processed
Master.MetadataSyncPathsFail
COUNTER
The number of paths that failed during metadata sync from all InodeSyncStream instances
Master.MetadataSyncPathsSuccess
COUNTER
The number of paths sync-ed from all InodeSyncStream instances
Master.MetadataSyncPendingPaths
COUNTER
The number of pending paths from all active InodeSyncStream instances,waiting for metadata sync
Master.MetadataSyncPrefetchCancel
COUNTER
Number of cancelled prefetch jobs from metadata sync
Master.MetadataSyncPrefetchExecutor
EXECUTOR_SERVICE
Metrics concerning the master metadata sync prefetchexecutor threads. Master.MetadataSyncPrefetchExecutor.submitted is a meter of the tasks submitted to the executor. Master.MetadataSyncPrefetchExecutor.completed is a meter of the tasks completed by the executor. Master.MetadataSyncPrefetchExecutor.activeTaskQueue is exponentially-decaying random reservoir of the number of active tasks (running or submitted) at the executor calculated each time a new task is added to the executor. The max value is the maximum number of active tasks at any time during execution. Master.MetadataSyncPrefetchExecutor.running is the number of tasks actively being run by the executor. Master.MetadataSyncPrefetchExecutor.idle is the time spent idling by the submitted tasks (i.e. waiting the the queue before being executed). Master.MetadataSyncPrefetchExecutor.duration is the time spent running the submitted tasks. If the executor is a thread pool executor then Master.MetadataSyncPrefetchExecutor.queueSize is the size of the task queue.
Master.MetadataSyncPrefetchExecutorQueueSize
GAUGE
The number of queuing prefetch tasks in the metadata sync thread pool controlled by alluxio.master.metadata.sync.ufs.prefetch.pool.size
Master.MetadataSyncPrefetchFail
COUNTER
Number of failed prefetch jobs from metadata sync
Master.MetadataSyncPrefetchOpsCount
COUNTER
The number of prefetch operations handled by the prefetch thread pool
Master.MetadataSyncPrefetchPaths
COUNTER
Total number of UFS paths fetched by prefetch jobs from metadata sync
Master.MetadataSyncPrefetchRetries
COUNTER
Number of retries to get from prefetch jobs from metadata sync
Master.MetadataSyncPrefetchSuccess
COUNTER
Number of successful prefetch jobs from metadata sync
Master.MetadataSyncSkipped
COUNTER
The number of InodeSyncStream that are skipped because the Alluxio metadata is fresher than alluxio.user.file.metadata.sync.interval
Master.MetadataSyncSuccess
COUNTER
The number of InodeSyncStream that succeeded
Master.MetadataSyncTimeMs
COUNTER
The total time elapsed in all InodeSyncStream instances
Master.MetadataSyncUfsMount.
COUNTER
The number of UFS sync operations for a given mount point
Master.MigrateJobCancel
COUNTER
The number of cancelled MigrateJob operations
Master.MigrateJobFail
COUNTER
The number of failed MigrateJob operations
Master.MigrateJobFileCount
COUNTER
The number of MigrateJob files
Master.MigrateJobFileSize
COUNTER
The total size of MigrateJob files
Master.MigrateJobSuccess
COUNTER
The number of successful MigrateJob operations
Master.MountOps
COUNTER
Total number of Mount operations
Master.NewBlocksGot
COUNTER
Total number of the succeed GetNewBlock operations
Master.PathsDeleted
COUNTER
Total number of the succeed Delete operations
Master.PathsMounted
COUNTER
Total number of succeed Mount operations
Master.PathsRenamed
COUNTER
Total number of succeed Rename operations
Master.PathsUnmounted
COUNTER
Total number of succeed Unmount operations
Master.RenamePathOps
COUNTER
Total number of Rename operations
Master.ReplicaMgmtActiveJobSize
GAUGE
Number of active block replication/eviction jobs. These jobs are created by the master to maintain the block replica factor. The value is an estimate with lag.
Master.ReplicationLimitedFiles
COUNTER
Number of files that have a replication count set to a non-default value. Note that these files have IDs that are stored in memory.
Master.RocksBlockBackgroundErrors
GAUGE
RocksDB block table. Accumulated number of background errors.
Master.RocksBlockBlockCacheCapacity
GAUGE
RocksDB block table. Block cache capacity.
Master.RocksBlockBlockCachePinnedUsage
GAUGE
RocksDB block table. Memory size for the entries being pinned.
Master.RocksBlockBlockCacheUsage
GAUGE
RocksDB block table. Memory size for the entries residing in block cache.
Master.RocksBlockCompactionPending
GAUGE
RocksDB block table. This metric 1 if at least one compaction is pending; otherwise, the metric reports 0.
Master.RocksBlockCurSizeActiveMemTable
GAUGE
RocksDB block table. Approximate size of active memtable in bytes.
Master.RocksBlockCurSizeAllMemTables
GAUGE
RocksDB block table. Approximate size of active, unflushed immutable, and pinned immutable memtables in bytes. Pinned immutable memtables are flushed memtables that are kept in memory to maintain write history in memory.
Master.RocksBlockEstimateNumKeys
GAUGE
RocksDB block table. Estimated number of total keys in the active and unflushed immutable memtables and storage.
Master.RocksBlockEstimatePendingCompactionBytes
GAUGE
RocksDB block table. Estimated total number of bytes a compaction needs to rewrite on disk to get all levels down to under target size. In other words, this metrics relates to the write amplification in level compaction. Thus, this metric is not valid for compactions other than level-based.
Master.RocksBlockEstimateTableReadersMem
GAUGE
RocksDB inode table. Estimated memory in bytes used for reading SST tables, excluding memory used in block cache (e.g., filter and index blocks). This metric records the memory used by iterators as well as filters and indices if the filters and indices are not maintained in the block cache. Basically this metric reports the memory used outside the block cache to read data.
Master.RocksBlockEstimatedMemUsage
GAUGE
RocksDB block table. This metric estimates the memory usage of the RockDB Block table by aggregating the values of Master.RocksBlockBlockCacheUsage, Master.RocksBlockEstimateTableReadersMem, Master.RocksBlockCurSizeAllMemTables, and Master.RocksBlockBlockCachePinnedUsage
Master.RocksBlockLiveSstFilesSize
GAUGE
RocksDB block table. Total size in bytes of all SST files that belong to the latest LSM tree.
Master.RocksBlockMemTableFlushPending
GAUGE
RocksDB block table. This metric returns 1 if a memtable flush is pending; otherwhise it returns 0.
Master.RocksBlockNumDeletesActiveMemTable
GAUGE
RocksDB block table. Total number of delete entries in the active memtable.
Master.RocksBlockNumDeletesImmMemTables
GAUGE
RocksDB block table. Total number of delete entries in the unflushed immutable memtables.
Master.RocksBlockNumEntriesActiveMemTable
GAUGE
RocksDB block table. Total number of entries in the active memtable.
Master.RocksBlockNumEntriesImmMemTables
GAUGE
RocksDB block table. Total number of entries in the unflushed immutable memtables.
Master.RocksBlockNumImmutableMemTable
GAUGE
RocksDB block table. Number of immutable memtables that have not yet been flushed.
Master.RocksBlockNumLiveVersions
GAUGE
RocksDB inode table. Number of live versions. More live versions often mean more SST files are held from being deleted, by iterators or unfinished compactions.
Master.RocksBlockNumRunningCompactions
GAUGE
RocksDB block table. Number of currently running compactions.
Master.RocksBlockNumRunningFlushes
GAUGE
RocksDB block table. Number of currently running flushes.
Master.RocksBlockSizeAllMemTables
GAUGE
RocksDB block table. Size all mem tables.
Master.RocksBlockTotalSstFilesSize
GAUGE
RocksDB block table. Total size in bytes of all SST files.
Master.RocksInodeBackgroundErrors
GAUGE
RocksDB inode table. Accumulated number of background errors.
Master.RocksInodeBlockCacheCapacity
GAUGE
RocksDB inode table. Block cache capacity.
Master.RocksInodeBlockCachePinnedUsage
GAUGE
RocksDB inode table. Memory size for the entries being pinned.
Master.RocksInodeBlockCacheUsage
GAUGE
RocksDB inode table. Memory size for the entries residing in block cache.
Master.RocksInodeCompactionPending
GAUGE
RocksDB inode table. This metric 1 if at least one compaction is pending; otherwise, the metric reports 0.
Master.RocksInodeCurSizeActiveMemTable
GAUGE
RocksDB inode table. Approximate size of active memtable in bytes.
Master.RocksInodeCurSizeAllMemTables
GAUGE
RocksDB inode table. Approximate size of active and unflushed immutable memtable in bytes.
Master.RocksInodeEstimateNumKeys
GAUGE
RocksDB inode table. Estimated number of total keys in the active and unflushed immutable memtables and storage.
Master.RocksInodeEstimatePendingCompactionBytes
GAUGE
RocksDB block table. Estimated total number of bytes a compaction needs to rewrite on disk to get all levels down to under target size. In other words, this metrics relates to the write amplification in level compaction. Thus, this metric is not valid for compactions other than level-based.
Master.RocksInodeEstimateTableReadersMem
GAUGE
RocksDB inode table. Estimated memory in bytes used for reading SST tables, excluding memory used in block cache (e.g., filter and index blocks). This metric records the memory used by iterators as well as filters and indices if the filters and indices are not maintained in the block cache. Basically this metric reports the memory used outside the block cache to read data.
Master.RocksInodeEstimatedMemUsage
GAUGE
RocksDB block table. This metric estimates the memory usage of the RockDB Inode table by aggregating the values of Master.RocksInodeBlockCacheUsage, Master.RocksInodeEstimateTableReadersMem, Master.RocksInodeCurSizeAllMemTables, and Master.RocksInodeBlockCachePinnedUsage
Master.RocksInodeLiveSstFilesSize
GAUGE
RocksDB inode table. Total size in bytes of all SST files that belong to the latest LSM tree.
Master.RocksInodeMemTableFlushPending
GAUGE
RocksDB inode table. This metric returns 1 if a memtable flush is pending; otherwhise it returns 0.
Master.RocksInodeNumDeletesActiveMemTable
GAUGE
RocksDB inode table. Total number of delete entries in the active memtable.
Master.RocksInodeNumDeletesImmMemTables
GAUGE
RocksDB inode table. Total number of delete entries in the unflushed immutable memtables.
Master.RocksInodeNumEntriesActiveMemTable
GAUGE
RocksDB inode table. Total number of entries in the active memtable.
Master.RocksInodeNumEntriesImmMemTables
GAUGE
RocksDB inode table. Total number of entries in the unflushed immutable memtables.
Master.RocksInodeNumImmutableMemTable
GAUGE
RocksDB inode table. Number of immutable memtables that have not yet been flushed.
Master.RocksInodeNumLiveVersions
GAUGE
RocksDB inode table. Number of live versions. More live versions often mean more SST files are held from being deleted, by iterators or unfinished compactions.
Master.RocksInodeNumRunningCompactions
GAUGE
RocksDB inode table. Number of currently running compactions.
Master.RocksInodeNumRunningFlushes
GAUGE
RocksDB inode table. Number of currently running flushes.
Master.RocksInodeSizeAllMemTables
GAUGE
RocksDB inode table. Approximate size of active, unflushed immutable, and pinned immutable memtables in bytes. Pinned immutable memtables are flushed memtables that are kept in memory to maintain write history in memory.
Master.RocksInodeTotalSstFilesSize
GAUGE
RocksDB inode table. Total size in bytes of all SST files.
Master.RocksTotalEstimatedMemUsage
GAUGE
This metric gives an estimate of the total memory used by RocksDB by aggregating the values of Master.RocksBlockEstimatedMemUsage and Master.RocksInodeEstimatedMemUsage
Master.RoleId
GAUGE
Display master role id
Master.RpcQueueLength
GAUGE
Length of the master rpc queue. Use this metric to monitor the RPC pressure on master.
Master.RpcThreadActiveCount
GAUGE
The number of threads that are actively executing tasks in the master RPC executor thread pool. Use this metric to monitor the RPC pressure on master.
Master.RpcThreadCurrentCount
GAUGE
Current count of threads in the master RPC executor thread pool. Use this metric to monitor the RPC pressure on master.
Master.SetAclOps
COUNTER
Total number of SetAcl operations
Master.SetAttributeOps
COUNTER
Total number of SetAttribute operations
Master.StartTime
GAUGE
The start time of the master process
Master.TTLBuckets
GAUGE
The number of TTL buckets at the master. Note that these buckets are stored in memory.
Master.TTLInodes
GAUGE
The total number of inodes contained in TTL buckets at the mater. Note that these inodes are stored in memory.
Master.ToRemoveBlockCount
GAUGE
Count of block replicas to be removed from the workers. If 1 block is to be removed from 2 workers, 2 will be counted here.
Master.TotalPaths
GAUGE
Total number of files and directory in Alluxio namespace
Master.TotalRpcs
TIMER
Throughput of master RPC calls. This metrics indicates how busy the master is serving client and worker requests
Master.UfsJournalCatchupTimer
TIMER
The timer statistics of journal catchupOnly valid when ufs journal is used. This provides a summary of how long a standby master takes to catch up with primary master, and should be monitored if master transition takes too long
Master.UfsJournalFailureRecoverTimer
TIMER
The timer statistics of ufs journal failure recover
Master.UfsJournalInitialReplayTimeMs
GAUGE
The process time of the ufs journal initial replay.Only valid when ufs journal is used. It records the time it took for the very first journal replay. Use this metric to monitor when your master boot-up time is high。
Master.UfsStatusCacheChildrenSize
COUNTER
Total number of UFS file metadata cached. The cache is used during metadata sync.
Master.UfsStatusCacheSize
COUNTER
Total number of Alluxio paths being processed by the metadata sync prefetch thread pool.
Master.UniqueBlocks
GAUGE
Total number of unique blocks in Alluxio
Master.UnmountOps
COUNTER
Total number of Unmount operations
Dynamically generated master metrics:
Master.CapacityTotalTier{TIER_NAME}
Total capacity in tier {TIER_NAME} of the Alluxio file system in bytes
Master.CapacityUsedTier{TIER_NAME}
Used capacity in tier {TIER_NAME} of the Alluxio file system in bytes
Master.CapacityFreeTier{TIER_NAME}
Free capacity in tier {TIER_NAME} of the Alluxio file system in bytes
Master.UfsSessionCount-Ufs:{UFS_ADDRESS}
The total number of currently opened UFS sessions to connect to the given {UFS_ADDRESS}
Master.{UFS_RPC_NAME}.UFS:{UFS_ADDRESS}.UFS_TYPE:{UFS_TYPE}.User:{USER}
The details UFS rpc operation done by the current master
Master.PerUfsOp{UFS_RPC_NAME}.UFS:{UFS_ADDRESS}
The aggregated number of UFS operation {UFS_RPC_NAME} ran on UFS {UFS_ADDRESS} by leading master
Master.{LEADING_MASTER_RPC_NAME}
The duration statistics of RPC calls exposed on leading master
Worker Metrics
Default worker metrics:
Worker.ActiveClients
COUNTER
The number of clients actively reading from or writing to this worker
Worker.ActiveRpcReadCount
COUNTER
The number of active read-RPCs managed by this worker
Worker.ActiveRpcWriteCount
COUNTER
The number of active write-RPCs managed by this worker
Worker.BlockReaderCompleteTaskCount
GAUGE
The approximate total number of block read tasks that have completed execution
Worker.BlockReaderThreadActiveCount
GAUGE
The approximate number of block read threads that are actively executing tasks in reader thread pool
Worker.BlockReaderThreadCurrentCount
GAUGE
The current number of read threads in the reader thread pool
Worker.BlockReaderThreadMaxCount
GAUGE
The maximum allowed number of block read thread in the reader thread pool
Worker.BlockRemoverBlocksRemovedCount
COUNTER
The total number of blocks successfully removed from this worker by asynchronous block remover.
Worker.BlockRemoverRemovingBlocksSize
GAUGE
The size of blocks is being removed from this worker at a moment by asynchronous block remover.
Worker.BlockRemoverTryRemoveBlocksSize
GAUGE
The number of blocks to be removed from this worker at a moment by asynchronous block remover.
Worker.BlockRemoverTryRemoveCount
COUNTER
The total number of blocks this worker attempted to remove with asynchronous block remover.
Worker.BlockSerializedCompleteTaskCount
GAUGE
The approximate total number of block serialized tasks that have completed execution
Worker.BlockSerializedThreadActiveCount
GAUGE
The approximate number of block serialized threads that are actively executing tasks in serialized thread pool
Worker.BlockSerializedThreadCurrentCount
GAUGE
The current number of serialized threads in the serialized thread pool
Worker.BlockSerializedThreadMaxCount
GAUGE
The maximum allowed number of block serialized thread in the serialized thread pool
Worker.BlockWriterCompleteTaskCount
GAUGE
The approximate total number of block write tasks that have completed execution
Worker.BlockWriterThreadActiveCount
GAUGE
The approximate number of block write threads that are actively executing tasks in writer thread pool
Worker.BlockWriterThreadCurrentCount
GAUGE
The current number of write threads in the writer thread pool
Worker.BlockWriterThreadMaxCount
GAUGE
The maximum allowed number of block write thread in the writer thread pool
Worker.BlocksAccessed
COUNTER
Total number of times any one of the blocks in this worker is accessed.
Worker.BlocksCached
GAUGE
Total number of blocks used for caching data in an Alluxio worker
Worker.BlocksCancelled
COUNTER
Total number of aborted temporary blocks in this worker.
Worker.BlocksDeleted
COUNTER
Total number of deleted blocks in this worker by external requests.
Worker.BlocksEvicted
COUNTER
Total number of evicted blocks in this worker.
Worker.BlocksEvictionRate
METER
Block eviction rate in this worker.
Worker.BlocksLost
COUNTER
Total number of lost blocks in this worker.
Worker.BlocksPromoted
COUNTER
Total number of times any one of the blocks in this worker moved to a new tier.
Worker.BlocksReadLocal
COUNTER
Total number of local blocks read by this worker.
Worker.BlocksReadRemote
COUNTER
Total number of a remote blocks read by this worker.
Worker.BlocksReadUfs
COUNTER
Total number of a UFS blocks read by this worker.
Worker.BytesReadDirect
COUNTER
Total number of bytes read from the this worker without external RPC involved. Data exists in worker storage or is fetched by this worker from underlying UFSes. This records data read by worker internal calls (e.g. a client embedded in this worker).
Worker.BytesReadDirectThroughput
METER
Throughput of bytes read from the this worker without external RPC involved. Data exists in worker storage or is fetched by this worker from underlying UFSes. This records data read by worker internal calls (e.g. a client embedded in this worker).
Worker.BytesReadDomain
COUNTER
Total number of bytes read from the this worker via domain socket
Worker.BytesReadDomainThroughput
METER
Bytes read throughput from the this worker via domain socket
Worker.BytesReadPerUfs
COUNTER
Total number of bytes read from a specific Alluxio UFS by this worker
Worker.BytesReadRemote
COUNTER
Total number of bytes read from the this worker via network (RPC). Data exists in worker storage or is fetched by this worker from underlying UFSes. This does not include short-circuit local reads and domain socket reads.
Worker.BytesReadRemoteThroughput
METER
Throughput of bytes read from the this worker via network (RPC). Data exists in worker storage or is fetched by this worker from underlying UFSes. This does not include short-circuit local reads and domain socket reads
Worker.BytesReadUfsThroughput
METER
Bytes read throughput from all Alluxio UFSes by this worker
Worker.BytesWrittenDirect
COUNTER
Total number of bytes written to this worker without external RPC involved. Data is written to worker storage or is written by this worker to underlying UFSes. This records data written by worker internal calls (e.g. a client embedded in this worker).
Worker.BytesWrittenDirectThroughput
METER
Total number of bytes written to this worker without external RPC involved. Data is written to worker storage or is written by this worker to underlying UFSes. This records data written by worker internal calls (e.g. a client embedded in this worker).
Worker.BytesWrittenDomain
COUNTER
Total number of bytes written to this worker via domain socket
Worker.BytesWrittenDomainThroughput
METER
Throughput of bytes written to this worker via domain socket
Worker.BytesWrittenPerUfs
COUNTER
Total number of bytes written to a specific Alluxio UFS by this worker
Worker.BytesWrittenRemote
COUNTER
Total number of bytes written to this worker via network (RPC). Data is written to worker storage or is written by this worker to underlying UFSes. This does not include short-circuit local writes and domain socket writes.
Worker.BytesWrittenRemoteThroughput
METER
Bytes write throughput to this worker via network (RPC). Data is written to worker storage or is written by this worker to underlying UFSes. This does not include short-circuit local writes and domain socket writes.
Worker.BytesWrittenUfsThroughput
METER
Bytes write throughput to all Alluxio UFSes by this worker
Worker.CacheBlocksSize
COUNTER
Total number of bytes that being cached through cache requests
Worker.CacheFailedBlocks
COUNTER
Total number of failed cache blocks in this worker
Worker.CacheManagerCompleteTaskCount
GAUGE
The approximate total number of block cache tasks that have completed execution
Worker.CacheManagerThreadActiveCount
GAUGE
The approximate number of block cache threads that are actively executing tasks in the cache manager thread pool
Worker.CacheManagerThreadCurrentCount
GAUGE
The current number of cache threads in the cache manager thread pool
Worker.CacheManagerThreadMaxCount
GAUGE
The maximum allowed number of block cache thread in the cache manager thread pool
Worker.CacheManagerThreadQueueWaitingTaskCount
GAUGE
The current number of tasks waiting in the work queue in the cache manager thread pool, bounded by alluxio.worker.network.async.cache.manager.queue.max
Worker.CacheRemoteBlocks
COUNTER
Total number of blocks that need to be cached from remote source
Worker.CacheRequests
COUNTER
Total number of cache request received by this worker
Worker.CacheRequestsAsync
COUNTER
Total number of async cache request received by this worker
Worker.CacheRequestsSync
COUNTER
Total number of sync cache request received by this worker
Worker.CacheSucceededBlocks
COUNTER
Total number of cache succeeded blocks in this worker
Worker.CacheUfsBlocks
COUNTER
Total number of blocks that need to be cached from local source
Worker.CapacityFree
GAUGE
Total free bytes on all tiers of a specific Alluxio worker
Worker.CapacityTotal
GAUGE
Total capacity (in bytes) on all tiers of a specific Alluxio worker
Worker.CapacityUsed
GAUGE
Total used bytes on all tiers of a specific Alluxio worker
Worker.MasterRegistrationSuccessCount
COUNTER
Total number of the succeed master registration.
Worker.RpcQueueLength
GAUGE
Length of the worker rpc queue. Use this metric to monitor the RPC pressure on worker.
Worker.RpcThreadActiveCount
GAUGE
The number of threads that are actively executing tasks in the worker RPC executor thread pool. Use this metric to monitor the RPC pressure on worker.
Worker.RpcThreadCurrentCount
GAUGE
Current count of threads in the worker RPC executor thread pool. Use this metric to monitor the RPC pressure on worker.
Dynamically generated worker metrics:
Worker.UfsSessionCount-Ufs:{UFS_ADDRESS}
The total number of currently opened UFS sessions to connect to the given {UFS_ADDRESS}
Worker.{RPC_NAME}
The duration statistics of RPC calls exposed on workers
Client Metrics
Each client metric will be recorded with its local hostname or alluxio.user.app.id
is configured. If alluxio.user.app.id
is configured, multiple clients can be combined into a logical application.
Client.BlockMasterClientCount
COUNTER
Number of instances in the BlockMasterClientPool.
Client.BlockReadChunkRemote
TIMER
The timer statistics of reading block data in chunks from remote Alluxio workers via RPC framework. This metrics will only be recorded when alluxio.user.block.read.metrics.enabled is set to true
Client.BlockWorkerClientCount
COUNTER
Number of instances in the BlockWorkerClientPool.
Client.BusyExceptionCount
COUNTER
Total number of BusyException observed
Client.BytesReadLocal
COUNTER
Total number of bytes short-circuit read from worker data storage that collocates with the client
Client.BytesReadLocalThroughput
METER
Bytes throughput short-circuit read from worker data storage that collocated with this client
Client.BytesWrittenLocal
COUNTER
Total number of bytes short-circuit written to local storage by this client
Client.BytesWrittenLocalThroughput
METER
Bytes throughput short-circuit written to local storage by this client
Client.BytesWrittenUfs
COUNTER
Total number of bytes write to Alluxio UFS by this client
Client.CacheBytesDiscarded
METER
Total number of bytes discarded when restoring the page store.
Client.CacheBytesEvicted
METER
Total number of bytes evicted from the client cache.
Client.CacheBytesReadCache
METER
Total number of bytes read from the client cache.
Client.CacheBytesReadExternal
METER
Total number of bytes read from external storage due to a cache miss on the client cache.
Client.CacheBytesReadInStreamBuffer
METER
Total number of bytes read from the client cache's in stream buffer.
Client.CacheBytesRequestedExternal
METER
Total number of bytes the user requested to read which resulted in a cache miss. This number may be smaller than Client.CacheBytesReadExternal due to chunk reads.
Client.CacheBytesWrittenCache
METER
Total number of bytes written to the client cache.
Client.CacheCleanErrors
COUNTER
Number of failures when cleaning out the existing cache directory to initialize a new cache.
Client.CacheCleanupGetErrors
COUNTER
Number of failures when cleaning up a failed cache read.
Client.CacheCleanupPutErrors
COUNTER
Number of failures when cleaning up a failed cache write.
Client.CacheCreateErrors
COUNTER
Number of failures when creating a cache in the client cache.
Client.CacheDeleteErrors
COUNTER
Number of failures when deleting cached data in the client cache.
Client.CacheDeleteFromStoreErrors
COUNTER
Number of failures when deleting pages from page stores.
Client.CacheDeleteNonExistingPageErrors
COUNTER
Number of failures when deleting pages due to absence.
Client.CacheDeleteNotReadyErrors
COUNTER
Number of failures when cache is not ready to delete pages.
Client.CacheGetErrors
COUNTER
Number of failures when getting cached data in the client cache.
Client.CacheGetNotReadyErrors
COUNTER
Number of failures when cache is not ready to get pages.
Client.CacheGetStoreReadErrors
COUNTER
Number of failures when getting cached data in the client cache due to failed read from page stores.
Client.CacheHitRate
GAUGE
Cache hit rate: (# bytes read from cache) / (# bytes requested).
Client.CachePageReadCacheTimeNanos
METER
Time in nanoseconds taken to read a page from the client cache when the cache hits.
Client.CachePageReadExternalTimeNanos
METER
Time in nanoseconds taken to read a page from external source when the cache misses.
Client.CachePages
COUNTER
Total number of pages in the client cache.
Client.CachePagesDiscarded
METER
Total number of pages discarded when restoring the page store.
Client.CachePagesEvicted
METER
Total number of pages evicted from the client cache.
Client.CachePutAsyncRejectionErrors
COUNTER
Number of failures when putting cached data in the client cache due to failed injection to async write queue.
Client.CachePutBenignRacingErrors
COUNTER
Number of failures when adding pages due to racing eviction. This error is benign.
Client.CachePutErrors
COUNTER
Number of failures when putting cached data in the client cache.
Client.CachePutEvictionErrors
COUNTER
Number of failures when putting cached data in the client cache due to failed eviction.
Client.CachePutInsufficientSpaceErrors
COUNTER
Number of failures when putting cached data in the client cache due to insufficient space made after eviction.
Client.CachePutNotReadyErrors
COUNTER
Number of failures when cache is not ready to add pages.
Client.CachePutStoreDeleteErrors
COUNTER
Number of failures when putting cached data in the client cache due to failed deletes in page store.
Client.CachePutStoreWriteErrors
COUNTER
Number of failures when putting cached data in the client cache due to failed writes to page store.
Client.CachePutStoreWriteNoSpaceErrors
COUNTER
Number of failures when putting cached data in the client cache but getting disk is full while cache capacity is not achieved. This can happen if the storage overhead ratio to write data is underestimated.
Client.CacheShadowCacheBytes
COUNTER
Amount of bytes in the client shadow cache.
Client.CacheShadowCacheBytesHit
COUNTER
Total number of bytes hit the client shadow cache.
Client.CacheShadowCacheBytesRead
COUNTER
Total number of bytes read from the client shadow cache.
Client.CacheShadowCacheFalsePositiveRatio
COUNTER
Probability that the working set bloom filter makes an error. The value is 0-100. If too high, need to allocate more space
Client.CacheShadowCachePages
COUNTER
Amount of pages in the client shadow cache.
Client.CacheShadowCachePagesHit
COUNTER
Total number of pages hit the client shadow cache.
Client.CacheShadowCachePagesRead
COUNTER
Total number of pages read from the client shadow cache.
Client.CacheSpaceAvailable
GAUGE
Amount of bytes available in the client cache.
Client.CacheSpaceUsed
GAUGE
Amount of bytes used by the client cache.
Client.CacheSpaceUsedCount
COUNTER
Amount of bytes used by the client cache as a counter.
Client.CacheState
COUNTER
State of the cache: 0 (NOT_IN_USE), 1 (READ_ONLY) and 2 (READ_WRITE)
Client.CacheStoreDeleteTimeout
COUNTER
Number of timeouts when deleting pages from page store.
Client.CacheStoreGetTimeout
COUNTER
Number of timeouts when reading pages from page store.
Client.CacheStorePutTimeout
COUNTER
Number of timeouts when writing new pages to page store.
Client.CacheStoreThreadsRejected
COUNTER
Number of rejection of I/O threads on submitting tasks to thread pool, likely due to unresponsive local file system.
Client.CloseAlluxioOutStreamLatency
TIMER
Latency of close Alluxio outstream latency
Client.CloseUFSOutStreamLatency
TIMER
Latency of close UFS outstream latency
Client.DefaultHiveClientCount
COUNTER
Number of instances in the DefaultHiveClientPool.
Client.FileSystemMasterClientCount
COUNTER
Number of instances in the FileSystemMasterClientPool.
Client.MetadataCacheSize
GAUGE
The total number of files and directories whose metadata is cached on the client-side. Only valid if the filesystem is alluxio.client.file.MetadataCachingBaseFileSystem.
Fuse Metrics
Fuse is a long-running Alluxio client. Depending on the launching ways, Fuse metrics show as
client metrics when Fuse client is launching in a standalone AlluxioFuse process.
worker metrics when Fuse client is embedded in the AlluxioWorker process.
Fuse metrics includes:
Fuse.CachedPathCount
GAUGE
Total number of FUSE-to-Alluxio path mappings being cached. This value will be smaller or equal to alluxio.fuse.cached.paths.max
Fuse.ReadWriteFileCount
GAUGE
Total number of files being opened for reading or writing concurrently.
Fuse.TotalCalls
TIMER
Throughput of JNI FUSE operation calls. This metrics indicates how busy the Alluxio Fuse application is serving requests
Fuse reading/writing file count can be used as the indicators for Fuse application pressure. If a large amount of concurrent read/write occur in a short period of time, each of the read/write operations may take longer time to finish.
When a user or an application runs a filesystem command under Fuse mount point, this command will be processed and translated by operating system which will trigger the related Fuse operations exposed in AlluxioFuse. The count of how many times each operation is called, and the duration of each call will be recorded with metrics name Fuse.<FUSE_OPERATION_NAME>
dynamically.
The important Fuse metrics include:
Fuse.readdir
The duration metrics of listing a directory
Fuse.getattr
The duration metrics of getting the metadata of a file
Fuse.open
The duration metrics of opening a file for read or overwrite
Fuse.read
The duration metrics of reading a part of a file
Fuse.create
The duration metrics of creating a file for write
Fuse.write
The duration metrics of writing a file
Fuse.release
The duration metrics of closing a file after read or write. Note that release is async so fuse threads will not wait for release to finish
Fuse.mkdir
The duration metrics of creating a directory
Fuse.unlink
The duration metrics of removing a file or a directory
Fuse.rename
The duration metrics of renaming a file or a directory
Fuse.chmod
The duration metrics of modifying the mode of a file or a directory
Fuse.chown
The duration metrics of modifying the user and/or group ownership of a file or a directory
Fuse related metrics include:
Client.TotalRPCClients
shows the total number of RPC clients exist that is using to or can be used to connect to master or worker for operations.Worker metrics with
Direct
keyword. When Fuse is embedded in worker process, it can go through worker internal API to read from / write to this worker. The related metrics are ended withDirect
. For example,Worker.BytesReadDirect
shows how many bytes are served by this worker to its embedded Fuse client for read.If
alluxio.user.block.read.metrics.enabled=true
is configured,Client.BlockReadChunkRemote
will be recorded. This metric shows the duration statistics of reading data from remote workers via gRPC.
Client.TotalRPCClients
and Fuse.TotalCalls
metrics are good indicator of the current load of the Fuse applications. If applications (e.g. Tensorflow) are running on top of Alluxio Fuse but these two metrics show a much lower value than before, the training job may be stuck with Alluxio.
Process Common Metrics
The following metrics are collected on each instance (Master, Worker or Client).
JVM Attributes
name
The name of the JVM
uptime
The uptime of the JVM
vendor
The current JVM vendor
Garbage Collector Statistics
PS-MarkSweep.count
Total number of mark and sweep
PS-MarkSweep.time
The time used to mark and sweep
PS-Scavenge.count
Total number of scavenge
PS-Scavenge.time
The time used to scavenge
Memory Usage
Alluxio provides overall and detailed memory usage information. Detailed memory usage information of code cache, compressed class space, metaspace, PS Eden space, PS old gen, and PS survivor space is collected in each process.
A subset of the memory usage metrics are listed as following:
total.committed
The amount of memory in bytes that is guaranteed to be available for use by the JVM
total.init
The amount of the memory in bytes that is available for use by the JVM
total.max
The maximum amount of memory in bytes that is available for use by the JVM
total.used
The amount of memory currently used in bytes
heap.committed
The amount of memory from heap area guaranteed to be available
heap.init
The amount of memory from heap area available at initialization
heap.max
The maximum amount of memory from heap area that is available
heap.usage
The amount of memory from heap area currently used in GB
heap.used
The amount of memory from heap area that has been used
pools.Code-Cache.used
Used memory of collection usage from the pool from which memory is used for compilation and storage of native code
pools.Compressed-Class-Space.used
Used memory of collection usage from the pool from which memory is use for class metadata
pools.PS-Eden-Space.used
Used memory of collection usage from the pool from which memory is initially allocated for most objects
pools.PS-Survivor-Space.used
Used memory of collection usage from the pool containing objects that have survived the garbage collection of the Eden space
ClassLoading Statistics
loaded
The total number of classes loaded
unloaded
The total number of unloaded classes
Thread Statistics
count
The current number of live threads
daemon.count
The current number of live daemon threads
peak.count
The peak live thread count
total_started.count
The total number of threads started
deadlock.count
The number of deadlocked threads
deadlock
The call stack of each thread related deadlock
new.count
The number of threads with new state
blocked.count
The number of threads with blocked state
runnable.count
The number of threads with runnable state
terminated.count
The number of threads with terminated state
timed_waiting.count
The number of threads with timed_waiting state
Last updated