Monitoring
Prometheus Setup
kubectl -n alx-ns get pod -l app.kubernetes.io/component=prometheusNAME READY STATUS RESTARTS AGE
alluxio-cluster-prometheus-6f697b6db8-sbvvg 1/1 Running 0 2mmkdir -p ~/monitoring/prometheusglobal:
scrape_interval: 60s
scrape_configs:
- job_name: "coordinator"
static_configs:
- targets: ["<COORDINATOR_PRIVATE_IP>:19999"]
- job_name: "workers"
static_configs:
- targets: ["<WORKER1_PRIVATE_IP>:30000", "<WORKER2_PRIVATE_IP>:30000"]
- job_name: "fuse"
static_configs:
- targets: ["<FUSE_PRIVATE_IP>:49999"]docker run -d --net=host --name=prometheus \
-v ~/monitoring/prometheus/prometheus.yml:/etc/prometheus/prometheus.yml \
prom/prometheus --config.file=/etc/prometheus/prometheus.ymlKubernetes: Bring Your Own Prometheus
apiVersion: k8s-operator.alluxio.com/v1
kind: AlluxioCluster
spec:
prometheus:
enabled: falsescrape_configs:
- job_name: 'alluxio-components'
kubernetes_sd_configs:
- role: pod
relabel_configs:
# Keep only pods with prometheus.io/scrape=true
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: true
# Keep only Alluxio components
- source_labels: [__meta_kubernetes_pod_label_app_kubernetes_io_name]
action: keep
regex: alluxio
# Use the annotated metrics path, default to /metrics
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
# Use the annotated port
- source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
action: replace
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
target_label: __address__
# Set job label from the component name
- source_labels: [__meta_kubernetes_pod_label_app_kubernetes_io_component]
action: replace
target_label: jobGrafana Setup
Dashboard Import
Understanding the Dashboard

Alert Rules
Process Availability — ETCD
Field
Value
Field
Value
Process Availability — Worker Count
Field
Value
Process Resource
Field
Value
Field
Value
Field
Value
Field
Value
Field
Value
Cache — Cache Hit Rate
Field
Value
Cache — Utilization
Field
Value
Cache — Eviction Correlation
Field
Value
FUSE — UFS Fallback
Field
Value
Read Throughput
Field
Value
Data — Read Request Rate
Field
Value
License — Expiration
Field
Value
License — Version Mismatch
Field
Value
Querying Metrics Directly
Datadog Integration
Last updated