Metrics provide insight into what is going on in the cluster. They are an invaluable resource for monitoring and debugging. Alluxio has a configurable metrics system based on the Prometheus Official Metrics Library. The metrics system exposes metrics in Prometheus exposition format.
Alluxio’s metrics are partitioned into different instances corresponding to Alluxio components. The following instances are currently supported:
Master: The Alluxio master process.
Worker: The Alluxio worker process.
FUSE process: The Alluxio FUSE process, for both as a daemon set process as well as via CSI
Usage
Send an HTTP request to /metrics/ of the target Alluxio processes and get a snapshot of all metrics.
# Get the metrics from Alluxio processes$curl<MASTER_HOSTNAME>:<MASTER_WEB_PORT>/metrics/$curl<WORKER_HOSTNAME>:<WORKER_WEB_PORT>/metrics/$curl<FUSE_HOSTNAME>:<FUSE_WEB_PORT>/metrics/
For example, for the local processes:
# Get the local master metrics with its default web port 19999$curl127.0.0.1:19999/metrics/# Get the local worker metrics with its default web port 30000$curl127.0.0.1:30000/metrics/# Get the local fuse metrics with its default web port 49999$curl127.0.0.1:49999/metrics/
Integration
Prometheus
Configure the Prometheus service using the sample prometheus.yml to scrape the metrics. Note that the job_name should not be changed if the Grafana integration is needed.
Grafana is a metrics analytics and visualization software used for visualizing time series data. You can use Grafana to better visualize the various metrics that Alluxio collects. The software allows users to more easily see changes in memory, storage, and completed operations in Alluxio.
Grafana supports visualizing data from Prometheus. The following steps can help you to build your Alluxio monitoring based on Grafana and Prometheus easily.
Import the template JSON file to create a dashboard. See this example for importing a dashboard.
Add the Prometheus data source to Grafana with a custom name, for example, prometheus-alluxio. Refer to the tutorial for help on importing a dashboard.
If your Grafana dashboard appears like the screenshot below, you have built your monitoring successfully.
By default, only the Cluster row is unfolded, to show the abstract of the current status. The Process row shows the resource consumption and JVM-related metrics, which can be filtered by either services or instances at the top. The other rows show the details of certain components and can be filtered by instances.
Kubernetes Operator
The operator supports building a cluster with bundled Prometheus and Grafana. The configuration and the Grafana template are already included. Just set the following switch in the AlluxioCluster configuration:
spec:alluxio-monitor:enabled:true
The Grafana will expose its service on the 8080 port on its host. Use kubectl to get the hostname:
kubectl get pod $(kubectl get pod -l name=alluxio-monitor-grafana --no-headers -o custom-columns=:metadata.name) -o jsonpath='{.spec.nodeName}'
Assume the hostname is foo.kubernetes.org, then you can access the Grafana service on:
http://foo.kubernetes.org:8080/
Prometheus with Kubernetes
Add the following snippet to the Prometheus configuration. The configuration will make the Prometheus scrape from the Kubernetes pods with certain annotations.
Notice that the job_name in the scrape_configs needs to be unmodified, since it’ll be used as filters in the dashboard.
The following metadata is required:
labels:app.kubernetes.io/instance:alluxio# used to distinguish different alluxio cluster app.kubernetes.io/component: worker # values from operator deployment are master, worker, fuse, and csi-fuse. depends on the pod
annotations:prometheus.io/scrape:"true" # values should match with the port of the component. By default, it's 19999 for master, 30000 for worker, and 49999 for fuse
prometheus.io/port:"30000"prometheus.io/path:"/metrics/"