# Monitoring

Alluxio exposes metrics in the [Prometheus exposition format](https://prometheus.io/docs/instrumenting/exposition_formats/), enabling integration with standard monitoring stacks. This guide covers Prometheus setup, Grafana dashboard import, alert rules, and direct metric queries for both Kubernetes (Operator) and Docker/Bare-Metal deployments.

## Prometheus Setup

{% tabs %}
{% tab title="Kubernetes (Operator)" %}
The Alluxio Operator deploys a Prometheus instance alongside your cluster automatically. No manual configuration is required.

Verify Prometheus is running:

```shell
kubectl -n alx-ns get pod -l app.kubernetes.io/component=prometheus
```

```console
NAME                                          READY   STATUS    RESTARTS   AGE
alluxio-cluster-prometheus-6f697b6db8-sbvvg   1/1     Running   0          2m
```

{% endtab %}

{% tab title="Docker / Bare-Metal" %}
Run Prometheus on the coordinator node with a static scrape configuration pointing at all Alluxio components.

**Step 1: Create the Prometheus config**

```shell
mkdir -p ~/monitoring/prometheus
```

Create `~/monitoring/prometheus/prometheus.yml`:

```yaml
global:
  scrape_interval: 60s

scrape_configs:
  - job_name: "coordinator"
    static_configs:
      - targets: ["<COORDINATOR_PRIVATE_IP>:19999"]
  - job_name: "workers"
    static_configs:
      - targets: ["<WORKER1_PRIVATE_IP>:30000", "<WORKER2_PRIVATE_IP>:30000"]
  - job_name: "fuse"
    static_configs:
      - targets: ["<FUSE_PRIVATE_IP>:49999"]
```

Add one entry per worker under `targets`.

**Step 2: Start Prometheus**

```shell
docker run -d --net=host --name=prometheus \
  -v ~/monitoring/prometheus/prometheus.yml:/etc/prometheus/prometheus.yml \
  prom/prometheus --config.file=/etc/prometheus/prometheus.yml
```

**Step 3: Verify targets are UP**

{% hint style="info" %}
Prometheus scrapes on the `scrape_interval` configured above (60 s). Targets show `unknown` until the first scrape completes — wait up to 60 seconds before checking.
{% endhint %}

Open `http://localhost:9090/targets` in a browser (or via SSH tunnel — see [Grafana Setup](#grafana-setup) for access options), or query the API:

```shell
curl -s 'http://localhost:9090/api/v1/targets' | \
  python3 -c "import sys,json; [print(t['labels']['job'], t['health']) for t in json.load(sys.stdin)['data']['activeTargets']]"
```

```console
coordinator up
workers up
fuse up
```

{% endtab %}
{% endtabs %}

### Kubernetes: Bring Your Own Prometheus

If your cluster already has a Prometheus instance, you can disable the Operator-managed one and use Kubernetes service discovery instead.

Disable the Operator-managed Prometheus:

```yaml
apiVersion: k8s-operator.alluxio.com/v1
kind: AlluxioCluster
spec:
  prometheus:
    enabled: false
```

Add the following scrape config to your existing `prometheus.yml` to automatically discover Alluxio pods by annotation:

```yaml
scrape_configs:
  - job_name: 'alluxio-components'
    kubernetes_sd_configs:
      - role: pod
    relabel_configs:
      # Keep only pods with prometheus.io/scrape=true
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
        action: keep
        regex: true
      # Keep only Alluxio components
      - source_labels: [__meta_kubernetes_pod_label_app_kubernetes_io_name]
        action: keep
        regex: alluxio
      # Use the annotated metrics path, default to /metrics
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
        action: replace
        target_label: __metrics_path__
        regex: (.+)
      # Use the annotated port
      - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
        action: replace
        regex: ([^:]+)(?::\d+)?;(\d+)
        replacement: $1:$2
        target_label: __address__
      # Set job label from the component name
      - source_labels: [__meta_kubernetes_pod_label_app_kubernetes_io_component]
        action: replace
        target_label: job
```

Your Alluxio pods must carry the following labels and annotations for discovery to work:

```yaml
# Example metadata for an Alluxio worker pod
metadata:
  labels:
    app.kubernetes.io/name: alluxio
    app.kubernetes.io/component: worker   # or coordinator, fuse
  annotations:
    prometheus.io/scrape: "true"
    prometheus.io/port: "30000"           # 19999 for coordinator, 49999 for fuse
    prometheus.io/path: "/metrics/"
```

## Grafana Setup

{% tabs %}
{% tab title="Kubernetes (Operator)" %}
The Operator deploys Grafana automatically alongside your cluster.

### Access via Port Forwarding (Recommended)

```shell
kubectl -n alx-ns port-forward \
  $(kubectl -n alx-ns get pod -l app.kubernetes.io/component=grafana -o jsonpath="{.items[0].metadata.name}") \
  3000:3000
```

Then open `http://localhost:3000` in your browser.

### Access via Node Hostname

If Kubernetes nodes are directly accessible on your network, look up the node where Grafana is scheduled:

```shell
kubectl -n alx-ns get pod \
  $(kubectl -n alx-ns get pod -l app.kubernetes.io/component=grafana --no-headers -o custom-columns=:metadata.name) \
  -o jsonpath='{.spec.nodeName}'
```

Then access Grafana at `http://<node-hostname>:8080/`.

### Disabling the Default Grafana

To use your own Grafana instance, disable the Operator-managed one:

```yaml
apiVersion: k8s-operator.alluxio.com/v1
kind: AlluxioCluster
spec:
  grafana:
    enabled: false
```

{% hint style="info" %}
Prometheus is a core component of the Operator deployment and cannot be disabled independently.
{% endhint %}
{% endtab %}

{% tab title="Docker / Bare-Metal" %}
Run Grafana on the coordinator node alongside Prometheus. Pre-provision the Prometheus datasource so no manual setup is needed after Grafana starts.

**Step 1: Create the datasource provisioning file**

```shell
mkdir -p ~/monitoring/grafana/provisioning/datasources
```

Create `~/monitoring/grafana/provisioning/datasources/prometheus.yml`:

```yaml
apiVersion: 1
datasources:
  - name: Prometheus
    type: prometheus
    url: http://localhost:9090
    isDefault: true
    access: proxy
    editable: true
```

**Step 2: Start Grafana**

```shell
docker run -d --net=host --name=grafana \
  -v ~/monitoring/grafana/provisioning:/etc/grafana/provisioning \
  -e GF_SECURITY_ADMIN_USER=admin \
  -e GF_SECURITY_ADMIN_PASSWORD=grafana \
  grafana/grafana
```

**Step 3: Access Grafana**

{% hint style="info" %}
Ports 3000 (Grafana) and 9090 (Prometheus) are closed by default in EC2 Security Groups. Either open them in your Security Group, or use an SSH tunnel:

```shell
ssh -L 3000:localhost:3000 -L 9090:localhost:9090 user@<COORDINATOR_PUBLIC_IP>
```

Then access Grafana at `http://localhost:3000`.
{% endhint %}

If ports are open in your Security Group, access Grafana directly at `http://<COORDINATOR_PUBLIC_IP>:3000` (login: `admin` / `grafana`).
{% endtab %}
{% endtabs %}

## Dashboard Import

Download the official Alluxio dashboard template and import it into Grafana:

```shell
wget -O /tmp/alluxio-dashboard.json \
  https://alluxio-binaries.s3.amazonaws.com/artifactsBundle/ee/AI-3.8-15.1.0/alluxio-ai-dashboard-template.json
```

In Grafana: **Dashboards → Import → Upload JSON file** → select `/tmp/alluxio-dashboard.json` → select **Prometheus** as the data source → click **Import**. For detailed import options, see the [Grafana import guide](https://grafana.com/docs/grafana/latest/dashboards/export-import/#importing-a-dashboard).

### Understanding the Dashboard

<figure><img src="/files/NYbV4QWfcEXEP6fiUs64" alt=""><figcaption></figcaption></figure>

* The **Cluster** section gives a high-level summary of the cluster status.
* The **Process** section shows resource consumption (CPU, memory) and JVM metrics for each component.
* Additional sections provide detailed metrics for the coordinator, workers, and cache.

## Alert Rules

The queries below can be used to build Prometheus alert rules or Grafana alert panels. Thresholds are recommended starting points — tune them to your workload and cluster size.

### Process Availability — ETCD

| Field              | Value                                            |
| ------------------ | ------------------------------------------------ |
| Component          | Process Availability - ETCD                      |
| Metric             | `etcd_server_has_leader`                         |
| Metric Explanation | Shows if each etcd member currently has a leader |
| Query              | `sum(etcd_server_has_leader{job="etcd"})`        |
| Query Explanation  | Sums all members that currently have a leader    |
| Trigger Condition  | value < 3                                        |
| Threshold/Value    | 3 members expected                               |
| Meaning            | One or more etcd pods are down or quorum is lost |
| Note               |                                                  |

| Field              | Value                                                                                       |
| ------------------ | ------------------------------------------------------------------------------------------- |
| Component          | Process Availability - ETCD                                                                 |
| Metric             | `etcd_server_leader_changes_seen_total`                                                     |
| Metric Explanation | Counts how many times leader has changed                                                    |
| Query              | `changes(etcd_server_leader_changes_seen_total{job="etcd"}[5m])`                            |
| Query Explanation  | Calculates the number of leader changes (elections) that occurred within the last 5 minutes |
| Trigger Condition  | > 0 for 5+ min                                                                              |
| Threshold/Value    | Any change > 0                                                                              |
| Meaning            | Leader flapping; indicates etcd instability or network issues                               |
| Note               | Query needs to be modified on the dashboard from 1d to 5m                                   |

### Process Availability — Worker Count

| Field              | Value                                                     |
| ------------------ | --------------------------------------------------------- |
| Component          | Process Availability - Worker count                       |
| Metric             | `up{job="worker"}`                                        |
| Metric Explanation | Shows how many workers are alive (responding to scrapes)  |
| Query              | `sum(up{job="worker"})`                                   |
| Query Explanation  | Counts the number of live worker targets                  |
| Trigger Condition  | value < desired worker count                              |
| Threshold/Value    | < desired worker count                                    |
| Meaning            | One or more workers are down or not responding            |
| Note               | Set desired worker count to match production cluster size |

### Process Resource

| Field              | Value                                                                                                                |
| ------------------ | -------------------------------------------------------------------------------------------------------------------- |
| Component          | Process Resource                                                                                                     |
| Metric             | `jvm_memory_used_bytes`                                                                                              |
| Metric Explanation | Shows current JVM heap usage as % of max                                                                             |
| Query              | `jvm_memory_used_bytes{area="heap"}/jvm_memory_max_bytes{area="heap"}`                                               |
| Query Explanation  | Calculates current heap usage as a percentage of the maximum heap                                                    |
| Trigger Condition  | > 0.75 for 5+ min                                                                                                    |
| Threshold/Value    | 75–80%                                                                                                               |
| Meaning            | Component is using a high percentage of its heap memory, indicating potential memory pressure or impending GC thrash |
| Note               | Applies to all components (coordinator, workers, fuse, etc.)                                                         |

| Field              | Value                                                             |
| ------------------ | ----------------------------------------------------------------- |
| Component          | Process Resource                                                  |
| Metric             | `jvm_gc_collection_seconds_sum`                                   |
| Metric Explanation | Time spent in old GC collections                                  |
| Query              | `rate(jvm_gc_collection_seconds_sum{gc="G1 Old Generation"}[5m])` |
| Query Explanation  | Calculates time spent in old/full GC over 5 minutes               |
| Trigger Condition  | > 5s/min for 5+ min                                               |
| Threshold/Value    | > 0.083                                                           |
| Meaning            | JVM doing frequent full GCs → major pause risk                    |
| Note               | Combine with old GC count to confirm                              |

| Field              | Value                                                               |
| ------------------ | ------------------------------------------------------------------- |
| Component          | Process Resource                                                    |
| Metric             | `jvm_gc_collection_seconds_count`                                   |
| Metric Explanation | Frequency of old GC collections                                     |
| Query              | `rate(jvm_gc_collection_seconds_count{gc="G1 Old Generation"}[5m])` |
| Query Explanation  | Calculates number of old/full GCs per minute                        |
| Trigger Condition  | > 1/min for 5+ min                                                  |
| Threshold/Value    | > 1                                                                 |
| Meaning            | JVM doing many full GCs, likely due to memory pressure              |
| Note               | Early memory pressure warning                                       |

| Field              | Value                                                               |
| ------------------ | ------------------------------------------------------------------- |
| Component          | Process Resource                                                    |
| Metric             | `jvm_gc_collection_seconds_sum`                                     |
| Metric Explanation | Time spent in young GC collections                                  |
| Query              | `rate(jvm_gc_collection_seconds_sum{gc="G1 Young Generation"}[5m])` |
| Query Explanation  | Calculates time spent in young GC over 5 minutes                    |
| Trigger Condition  | > 10s/min for 5+ min                                                |
| Threshold/Value    | > 0.166                                                             |
| Meaning            | High GC overhead slowing throughput                                 |
| Note               | Only alert if persistent                                            |

| Field              | Value                                                                                                  |
| ------------------ | ------------------------------------------------------------------------------------------------------ |
| Component          | Process Resource                                                                                       |
| Metric             | `process_cpu_seconds_total`                                                                            |
| Metric Explanation | Measures total user + system CPU time consumed by the process                                          |
| Query              | `irate(process_cpu_seconds_total{job=~"$service",instance=~"$instance",cluster_name=~"$cluster"}[5m])` |
| Query Explanation  | Calculates the per-second CPU usage rate over 5 minutes                                                |
| Trigger Condition  | stays consistently high for 5+ min                                                                     |
| Threshold/Value    | > 80% of 1 CPU core (≈ 0.8)                                                                            |
| Meaning            | Process is CPU bound or stuck consuming full CPU                                                       |
| Note               | Tune threshold based on node vCPU cores; alert if usage is flat and near saturation                    |

### Cache — Cache Hit Rate

| Field              | Value                                                                                                                                                                                                                                                                                       |
| ------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Component          | Cache - Cache Hit Rate                                                                                                                                                                                                                                                                      |
| Metric             | `alluxio_cached_data_read_bytes_total` & `alluxio_missed_data_read_bytes_total`                                                                                                                                                                                                             |
| Metric Explanation | Measures how much read data was served from cache vs fetched from UFS                                                                                                                                                                                                                       |
| Query              | `sum(irate(alluxio_cached_data_read_bytes_total{job="worker",cluster_name=~"$cluster"}[5m])) / (sum(irate(alluxio_cached_data_read_bytes_total{job="worker",cluster_name=~"$cluster"}[5m])) + sum(irate(alluxio_missed_data_read_bytes_total{job="worker",cluster_name=~"$cluster"}[5m])))` |
| Query Explanation  | Calculates cache hit ratio over 5 minutes                                                                                                                                                                                                                                                   |
| Trigger Condition  | cache hit % stays low for 5+ min                                                                                                                                                                                                                                                            |
| Threshold/Value    | < 80%                                                                                                                                                                                                                                                                                       |
| Meaning            | High UFS reads, cache not being utilized effectively                                                                                                                                                                                                                                        |
| Note               | Adjust threshold based on workload (e.g. 70–90%)                                                                                                                                                                                                                                            |

### Cache — Utilization

| Field              | Value                                                                                                                                                  |
| ------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------ |
| Component          | Cache - Utilization                                                                                                                                    |
| Metric             | `alluxio_cached_storage_bytes` & `alluxio_cached_capacity_bytes`                                                                                       |
| Metric Explanation | Shows how much of the configured cache capacity is currently used                                                                                      |
| Query              | `sum(alluxio_cached_storage_bytes{job="worker",cluster_name=~"$cluster"}) / sum(alluxio_cached_capacity_bytes{job="worker",cluster_name=~"$cluster"})` |
| Query Explanation  | Calculates current used/total cache ratio                                                                                                              |
| Trigger Condition  | > 0.85 (warning), > 0.95 (critical) for 5+ min                                                                                                         |
| Threshold/Value    | 85–95% utilization                                                                                                                                     |
| Meaning            | Cache is nearly full, risk of eviction thrash or write failures                                                                                        |
| Note               | Adjust thresholds based on cluster size and workload pattern                                                                                           |

### Cache — Eviction Correlation

| Field              | Value                                                                                                                                                                                                 |
| ------------------ | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Component          | Cache - Cache Eviction - Correlation                                                                                                                                                                  |
| Metric             | `alluxio_cached_evicted_data_bytes_total` + `alluxio_block_store_used_bytes`                                                                                                                          |
| Metric Explanation | Tracks evicted bytes and current cache usage to detect cache pressure                                                                                                                                 |
| Query              | `(sum(irate(alluxio_cached_evicted_data_bytes_total{job="worker"}[5m])) > 0) and ((sum(alluxio_block_store_used_bytes{job="worker"}) / sum(alluxio_block_store_capacity_bytes{job="worker"})) > 0.8)` |
| Query Explanation  | Checks if evictions are occurring while cache usage is above 80%                                                                                                                                      |
| Trigger Condition  | Evictions > 0 while usage > 80% for 5+ minutes                                                                                                                                                        |
| Threshold/Value    | Usage > 80% and Evictions > 0                                                                                                                                                                         |
| Meaning            | Indicates cache thrashing or pressure (evictions happening despite high cache utilization)                                                                                                            |
| Note               | Needs to be created manually as a new panel                                                                                                                                                           |

### FUSE — UFS Fallback

| Field              | Value                                                                                               |
| ------------------ | --------------------------------------------------------------------------------------------------- |
| Component          | Fuse - UFS Fallback                                                                                 |
| Metric             | `alluxio_ufs_data_access_bytes_total`                                                               |
| Metric Explanation | Tracks read traffic from Fuse pods going directly to the UFS (bypassing Alluxio cache)              |
| Query              | `irate(alluxio_ufs_data_access_bytes_total{job="fuse",method="read",cluster_name=~"$cluster"}[5m])` |
| Query Explanation  | Calculates Fuse-driven UFS read throughput over 5 minutes                                           |
| Trigger Condition  | sustained Fuse UFS read traffic increases                                                           |
| Threshold/Value    | >10 MiB/s sustained for >5m                                                                         |
| Meaning            | Fuse clients are bypassing Alluxio cache, high fallback                                             |
| Note               | Correlate with cache hit % and request rate; fallback >10–20 MiB/s usually worth investigating      |

### Read Throughput

| Field              | Value                                                                                                      |
| ------------------ | ---------------------------------------------------------------------------------------------------------- |
| Component          | Read Throughput                                                                                            |
| Metric             | `alluxio_data_throughput_bytes_total`                                                                      |
| Metric Explanation | Measures read throughput served by workers                                                                 |
| Query              | `sum(irate(alluxio_data_throughput_bytes_total{job="worker",method="read",cluster_name=~"$cluster"}[5m]))` |
| Query Explanation  | Calculates worker read throughput over 5m                                                                  |
| Trigger Condition  | worker read throughput drops                                                                               |
| Threshold/Value    | < set baseline (e.g. < 10 MiB/s) while UFS read goes up                                                    |
| Meaning            | Cache not serving data, workload hitting UFS                                                               |
| Note               | Tune threshold based on normal workload pattern                                                            |

### Data — Read Request Rate

| Field              | Value                                                                    |
| ------------------ | ------------------------------------------------------------------------ |
| Component          | Data                                                                     |
| Metric             | `alluxio_data_access_bytes_count{method="read"}`                         |
| Metric Explanation | Counts the number of read operations (requests) served by workers        |
| Query              | `irate(alluxio_data_access_bytes_count{method="read",job="worker"}[5m])` |
| Query Explanation  | Calculates read request rate (req/s) over 5 minutes                      |
| Trigger Condition  | Alert when rate drops to 0 while workload is expected                    |
| Threshold/Value    | near 0 for > 5 min                                                       |
| Meaning            | Worker not serving data — possible worker crash or cache unavailable     |
| Note               | Correlate with workload schedule to avoid false positives                |

### License — Expiration

| Field              | Value                                                                                                                          |
| ------------------ | ------------------------------------------------------------------------------------------------------------------------------ |
| Component          | License - Expiration                                                                                                           |
| Metric             | `alluxio_license_expiration_date`                                                                                              |
| Metric Explanation | Shows the UNIX timestamp when the Alluxio license will expire                                                                  |
| Query              | `(max by (cluster_name) (alluxio_license_expiration_date) - time()) / 86400`                                                   |
| Query Explanation  | Calculates the number of days remaining until license expiration by subtracting current time from the license expiry timestamp |
| Trigger Condition  | < 30 (Warning), < 7 (Critical)                                                                                                 |
| Threshold/Value    | 30 days, 7 days                                                                                                                |
| Meaning            | License is about to expire; renew before it lapses                                                                             |
| Note               | Needs to be created manually as a new panel                                                                                    |

### License — Version Mismatch

| Field              | Value                                                                       |
| ------------------ | --------------------------------------------------------------------------- |
| Component          | License - Version Mismatch                                                  |
| Metric             | `alluxio_version_info`                                                      |
| Metric Explanation | Shows the version of each running Alluxio component (via version label)     |
| Query              | `count(count by (version) (alluxio_version_info)) > 1`                      |
| Query Explanation  | Checks if more than one unique Alluxio version is running across components |
| Trigger Condition  | > 1                                                                         |
| Threshold/Value    | More than 1 version                                                         |
| Meaning            | Version mismatch between Alluxio components                                 |
| Note               | Needs to be created manually as a new panel                                 |

## Querying Metrics Directly

For advanced analysis or debugging, query Prometheus or component endpoints directly.

{% tabs %}
{% tab title="Kubernetes (Operator)" %}
Open a shell into the Prometheus pod:

```shell
kubectl -n alx-ns exec -it \
  $(kubectl -n alx-ns get pod -l app.kubernetes.io/component=prometheus --no-headers -o custom-columns=:metadata.name) \
  -- /bin/sh
```

Then use `promtool` to run instant queries:

```shell
# List all available Alluxio metrics
promtool query instant http://localhost:9090 'count({__name__=~".+"}) by (__name__)' | grep alluxio_

# Check total cache capacity across all workers
promtool query instant http://localhost:9090 'alluxio_cached_capacity_bytes'
# Example output:
# alluxio_cached_capacity_bytes{instance="worker:30000", job="worker"} => 10737418240 @[...]
```

Query component endpoints directly from within a pod:

```shell
# Coordinator metrics
kubectl -n alx-ns exec alluxio-cluster-coordinator-0 -- curl -s http://localhost:19999/metrics/ | head -20

# Worker metrics
kubectl -n alx-ns exec alluxio-cluster-worker-0 -- curl -s http://localhost:30000/metrics/ | head -20
```

{% endtab %}

{% tab title="Docker / Bare-Metal" %}
Query via the Prometheus HTTP API from the coordinator host:

```shell
# Check cache capacity for all workers
curl -s 'http://localhost:9090/api/v1/query?query=alluxio_cached_capacity_bytes' | python3 -m json.tool

# Check live worker count
curl -s 'http://localhost:9090/api/v1/query?query=sum(up{job="workers"})' | python3 -m json.tool
```

Query component endpoints directly:

```shell
# Coordinator metrics (from coordinator host)
curl http://localhost:19999/metrics/

# Worker metrics (from worker host)
curl http://localhost:30000/metrics/

# FUSE metrics (from FUSE host)
curl http://localhost:49999/metrics/
```

{% endtab %}
{% endtabs %}

Refer to the [Metrics Reference](/ee-ai-en/ai-3.8-15.1.x/reference/metrics.md) for a complete list of available metrics and their descriptions.

## Datadog Integration

Datadog can ingest metrics directly from Alluxio's Prometheus endpoints.

1. Ensure your Datadog agent can reach the Alluxio metrics ports: `19999` (coordinator), `30000` (workers), `49999` (FUSE).
2. Add the following to your `conf.d/prometheus.d/conf.yaml`:

```yaml
instances:
  - prometheus_url: http://<alluxio-coordinator-hostname>:19999/metrics
    namespace: alluxio
    metrics:
      - "*"
  - prometheus_url: http://<alluxio-worker-1-hostname>:30000/metrics
    namespace: alluxio
    metrics:
      - "*"
  # Add one entry per worker
```

This configuration instructs the Datadog agent to scrape and forward all Alluxio metrics.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://documentation.alluxio.io/ee-ai-en/ai-3.8-15.1.x/administration/monitoring-alluxio.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
