Monitoring Alluxio
Metrics provide invaluable insight into your Alluxio cluster's health and performance. Alluxio exposes metrics in the Prometheus exposition format, allowing for easy integration with modern monitoring stacks.
This guide covers how to monitor your Alluxio cluster, from using the pre-configured dashboards provided by the Alluxio Operator to setting up your own monitoring manually.
Default Monitoring with the Alluxio Operator
The easiest way to monitor Alluxio on Kubernetes is with the Alluxio Operator. By default, the operator deploys a complete monitoring stack alongside your Alluxio cluster, including Prometheus for metrics collection and Grafana for visualization.
Accessing the Grafana Dashboard
The Grafana dashboard is the primary tool for visualizing your cluster's metrics. You can access it in two ways:
1. Accessing via Port Forwarding (Recommended)
Use kubectl port-forward
to securely access the Grafana UI from your local machine.
# Find the Grafana pod and forward port 3000
kubectl -n alx-ns port-forward $(kubectl -n alx-ns get pod -l app.kubernetes.io/component=grafana -o jsonpath="{.items[0].metadata.name}") 3000:3000
You can then open your browser and navigate to http://localhost:3000
.
2. Accessing via Node Hostname
If your Kubernetes nodes are directly accessible on your network, you can reach Grafana via its NodePort.
# Get the hostname of the node where Grafana is running
kubectl -n alx-ns get pod $(kubectl -n alx-ns get pod -l app.kubernetes.io/component=grafana --no-headers -o custom-columns=:metadata.name) -o jsonpath='{.spec.nodeName}'
Assuming the hostname is foo.kubernetes.org
, you can access the Grafana service at http://foo.kubernetes.org:8080/
.
Understanding the Dashboard
The default dashboard provides a comprehensive overview of your cluster's state.

The Cluster section gives a high-level summary of the cluster status.
The Process section details resource consumption (CPU, memory) and JVM metrics for each Alluxio component.
Other sections provide detailed metrics for specific components like the coordinator and workers.
Disabling the Default Grafana
If you wish to use your own Grafana instance, you can disable the default one by setting spec.grafana.enabled
to false
in your AlluxioCluster
definition. Prometheus is a core component and cannot be disabled.
apiVersion: k8s-operator.alluxio.com/v1
kind: AlluxioCluster
spec:
grafana:
enabled: false
Advanced: Querying Metrics Directly
For advanced analysis or debugging, you can query the Prometheus and component endpoints directly.
Querying with Promtool
You can execute queries directly against the Prometheus server running in your cluster.
# Open a shell into the Prometheus pod
kubectl -n alx-ns exec -it $(kubectl -n alx-ns get pod -l app.kubernetes.io/component=prometheus --no-headers -o custom-columns=:metadata.name) -- /bin/sh
# Example: List all available Alluxio metrics
promtool query instant http://localhost:9090 'count({__name__=~".+"}) by (__name__)' | grep alluxio_
# Example: Get the total cache capacity
promtool query instant http://localhost:9090 'alluxio_cached_capacity_bytes'
# Example output:
# alluxio_cached_capacity_bytes{instance="worker:30000", job="worker"} => 10737418240 @[1753677978.351]
Querying Component Endpoints
Alluxio components (coordinator, workers, FUSE) expose a /metrics/
endpoint for scraping.
# Get metrics directly from a component (e.g., local coordinator)
$ curl 127.0.0.1:19999/metrics/
Refer to the Metrics Reference for a complete list of available metrics.
Integrating with an Existing Monitoring System
If you are not using the Alluxio Operator or have an existing monitoring infrastructure, you can integrate Alluxio with it manually.
Integrating with Prometheus
Add the following scrape jobs to your prometheus.yml
to collect metrics from Alluxio.
Standalone Prometheus
For a standalone Prometheus instance, use static_configs
:
global:
scrape_interval: 60s
scrape_configs:
- job_name: "coordinator"
static_configs:
- targets: [ '<COORDINATOR_HOSTNAME>:<COORDINATOR_WEB_PORT>' ]
- job_name: "worker"
static_configs:
- targets: [ '<WORKER_HOSTNAME>:<WORKER_WEB_PORT>' ]
- job_name: "fuse"
static_configs:
- targets: [ '<FUSE_HOSTNAME>:<FUSE_WEB_PORT>' ]
Prometheus in Kubernetes
For Prometheus running in Kubernetes, use kubernetes_sd_configs
to automatically discover Alluxio pods. Ensure your Alluxio pods have the required labels and annotations.
# prometheus.yml snippet for Kubernetes service discovery
scrape_configs:
- job_name: 'alluxio-components'
kubernetes_sd_configs:
- role: pod
relabel_configs:
# Keep only pods with the prometheus.io/scrape=true annotation
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: true
# Scrape only Alluxio components
- source_labels: [__meta_kubernetes_pod_label_app_kubernetes_io_name]
action: keep
regex: alluxio
# Use the annotated path, default to /metrics
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
# Use the annotated port
- source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
action: replace
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
target_label: __address__
# Create a 'job' label from the component name
- source_labels: [__meta_kubernetes_pod_label_app_kubernetes_io_component]
action: replace
target_label: job
Your Alluxio pods must have the following metadata:
# Example metadata for an Alluxio worker pod
metadata:
labels:
app.kubernetes.io/name: alluxio
app.kubernetes.io/component: worker # (or coordinator, fuse)
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "30000" # (19999 for coordinator, 49999 for fuse)
prometheus.io/path: "/metrics/"
Integrating with Grafana
Add Prometheus as a Data Source: In Grafana, add your Prometheus server as a new data source.
Import the Alluxio Dashboard: Download the official Alluxio dashboard template and import it into Grafana.
Template URL: alluxio-ai-dashboard-template.json
Follow the Grafana import guide.
Integrating with Datadog
Datadog can ingest metrics directly from Alluxio's Prometheus endpoints.
Ensure your Datadog agent can reach the Alluxio component's metrics port (
19999
for coordinator,30000
for workers).In your Datadog configuration, add the Alluxio endpoints to your
prometheus.yml
check configuration.
Example conf.d/prometheus.d/conf.yaml
snippet:
instances:
- prometheus_url: http://<alluxio-coordinator-hostname>:19999/metrics
namespace: alluxio
metrics:
- "*"
- prometheus_url: http://<alluxio-worker-1-hostname>:30000/metrics
namespace: alluxio
metrics:
- "*"
# Add an entry for each worker
This configuration allows Datadog to collect, monitor, and alert on your Alluxio cluster's metrics.
Last updated