Diagnostic Snapshot

To diagnosis an issue or problem of Alluxio, a "Diagnostic Snapshot" including various cluster information is required, obtained by running the collectinfo function.

The results of collectinfo contain the following types of information:

  • config: The configuration files in Alluxio's conf/ directory

  • hardware: CPU and memory and hardware specifications of K8s nodes which are running coordinator, worker, fuse, and operator components.

  • etcd: Information stored in etcd within the Alluxio cluster, including mount, quota, priority, TTL, workers, and license information.

  • logs: Logs from coordinator, worker, fuse, and operator components. Supports tailing logs to show a specified number of lines from the end.

  • metrics: Allows setting duration and step to define the time range and sampling interval for metrics.

  • job history: The historical records of load, free, and copy jobs within the Alluxio cluster, including detailed job information and status.

Prerequisite

First, ensure that the operator has started successfully and that the alluxio-collectinfo-controller is running. Below is the information of the operator, showing that the alluxio-collectinfo-controller is running. If the alluxio-collectinfo-controller does not exist, it means the current version of the operator does not support the collectinfo feature. Please upgrade the operator version.

kubectl -n alluxio-operator get pod
NAME                                             READY   STATUS    RESTARTS   AGE
alluxio-cluster-controller-8656d54bc-x6ms6       1/1     Running   0          19s
alluxio-collectinfo-controller-cc49c56b6-wlw8k   1/1     Running   0          19s
alluxio-csi-controller-84df9646fd-4d5b8          2/2     Running   0          19s
alluxio-csi-nodeplugin-fcp7b                     2/2     Running   0          19s
alluxio-csi-nodeplugin-t59ch                     2/2     Running   0          19s
alluxio-csi-nodeplugin-vbq2q                     2/2     Running   0          19s
alluxio-ufs-controller-57fbdf8d5c-2f79l          1/1     Running   0          19s

Ensure that the Alluxio cluster has started successfully. Below is the information of the Alluxio cluster, showing that all components of the Alluxio cluster are running.

Collecting Information

collectinfo tool offers two collection methods: scheduled collection and one-time collection.

  • Scheduled collection allows you to set the collection interval, such as daily, weekly, monthly, etc.

  • One-time collection triggers an immediate collection task.

Once an Alluxio cluster is created, a scheduled collection is automatically generated for collecting cluster information.

By default, the scheduled collection is daily. You can check the progress of the collecting below command. The LASTSCHEDULEDTIME field indicates the next scheduled time for the collection task, while the LASTSUCCESSFULTIME field represents the most recent successful collection.

After a collection is completed, the collected results will be saved to the coordinator pod to persist data. The collection results will be deleted based on the value of the expiration field (see Detailed Configuration for more info). By default, the expiration value is set to 720h or 30 days, meaning the collected results will be deleted after 30 days.

You can access the collected results by entering the coordinator pod. Use the following command to access the coordinator pod:

The collection results are stored in a tar.gz file. The file name includes the collection task name, the Alluxio cluster's namespace, and the collection time.

If you want to modify the collection content or schedule, you can do the following.

  1. Delete the existing collectinfo resource

  1. Create a new scheduled collection yaml

Assuming the Alluxio cluster is in the alx-ns namespace, create collectinfo.yaml with the following contents for a daily scheduled collection with the cron expression "0 0 * * *" representing a daily execution at midnight. You can refer to the Cron schedule syntaxarrow-up-right for more information on how to build the cron expression.

The detailed configuration lists the possible fields and their description.

  1. Apply new scheduled collection

Downloading Results

Use kubectl cp to copy the collected information to your local machine.

Detailed Configuration

Last updated