> For the complete documentation index, see [llms.txt](https://documentation.alluxio.io/ee-ai-en/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://documentation.alluxio.io/ee-ai-en/administration/audit-access-logs/audit-log.md).

# Audit Log

The Alluxio Audit Log is a critical component for security and compliance, providing a detailed, structured record of operations performed within the system. It helps you monitor sensitive data access, track administrative commands, and meet regulatory requirements.

Audit logs are generated in a structured **JSON** format, allowing for easy parsing and integration with analysis tools like Splunk or the ELK Stack. Logging covers two primary categories of operations:

1. **Management Operations**: API requests for management tasks, such as loading data or managing mount points.
2. **Data Access**: Read and write operations performed through various interfaces, including **S3, HDFS, FUSE, and the Python SDK**.

This guide explains how to enable and configure audit logging for both categories.

## Enabling and Configuring Audit Logging

Audit logging is configured separately for management operations and for data access interfaces.

### Step 1: Enable Management Operation Audit Logging

This configuration logs all administrative API requests. This feature is **enabled by default**.

The audit logs are stored in the `/mnt/alluxio/logs/gateway/audit/` directory on the Gateway host machine and stored in the `/opt/alluxio/logs/audit/`in the Gateway pod. To modify the log location or other settings, you can update the `gateway` section in your `alluxio-cluster.yaml` file:

```yaml
spec:
  gateway:
    enabled: true
    log:
      level: info
      type: hostPath
      hostPath: /mnt/alluxio/logs/gateway
      size: 1Gi
      storageClass: standard
    auditLog:
      enabled: true
```

### Step 2: Enable Data Access Audit Logging

This configuration logs data access operations from clients using the S3, HDFS, and FUSE interfaces.

To enable it, add the following properties to your `alluxio-cluster.yaml` file:

```yaml
spec:
  properties:
    alluxio.audit.logging.enabled: "true"
    alluxio.audit.logging.poll.interval: "5s"
```

* By default, data access audit logs are stored in the `${audit.logs.dir}` directory. You can customize this location by modifying the `*_AUDIT_LOG` appenders in your `conf/log4j2.xml` file.
* For operator versions 3.3.3 and later, the required audit log appenders are included in `log4j2.xml` by default. For earlier versions, you must manually add the logger configuration to the `log4j2.xml` section of your `alluxio-configmap`.

| Property                              | Description                                                       | Default Value |
| ------------------------------------- | ----------------------------------------------------------------- | ------------- |
| `alluxio.audit.logging.enabled`       | Set to `true` to enable audit logging for data access operations. | `false`       |
| `alluxio.audit.logging.poll.interval` | The interval at which audit logs are written to the file.         | `5s`          |

## Dynamic Configuration via REST API and CLI

Both audit logs and access logs support dynamic configuration through the REST API or the User CLI. Changes apply cluster-wide and persist across cluster restarts.

{% hint style="info" %}
If you plan to enable audit logs or access logs, discuss the log retention policy with customer support first.
{% endhint %}

### REST API

Use the `LogConfigEntity` to control both log types.

Create a `LogConfigEntity.json` file:

```json
{
  "accessLogEnabled": false,
  "auditLogEnabled": false,
  "bfRefreshInterval": "6000s",
  "logPollInterval": "5s"
}
```

| Field               | Description                                                                               | Default |
| ------------------- | ----------------------------------------------------------------------------------------- | ------- |
| `accessLogEnabled`  | Enable the access log.                                                                    | `false` |
| `auditLogEnabled`   | Enable the audit log.                                                                     | `false` |
| `bfRefreshInterval` | Bloom Filter refresh interval for the access log. Recommended: 100 minutes.               | —       |
| `logPollInterval`   | Interval at which access and audit logs are flushed from memory to disk. Recommended: 5s. | —       |

Apply the configuration:

```shell
jq -n --rawfile conf ./LogConfigEntity.json '{key: "LogConfigEntity", conf: $conf}' | curl -sS 'http://<coordinator_host>:19999/api/v1/conf' -X PUT -H 'Content-Type: application/json' --data @-
```

Verify the configuration:

```shell
curl -sS 'http://<coordinator_host>:19999/api/v1/conf?key=LogConfigEntity' | jq
```

### User CLI

The `alluxio log` command group manages the dynamic configuration of audit and access logs.

#### log access

Manage the access log enable state and the Bloom Filter refresh interval.

```shell
bin/alluxio log access [--enable <true|false>] [--bfRefreshInterval <duration>]
```

* `--enable`: enable or disable the access log. Values: `true` or `false`.
* `--bfRefreshInterval`: Bloom Filter refresh interval. Accepts duration units such as `30s`, `10min`, or `1h`. Recommended: 100 minutes.

Examples:

```shell
# Enable the access log with a 30-minute refresh interval
bin/alluxio log access --enable=true --bfRefreshInterval=30min

# Disable the access log
bin/alluxio log access --enable=false
```

#### log audit

Manage the audit log enable state.

```shell
bin/alluxio log audit [--enable <true|false>]
```

* `--enable`: enable or disable the audit log. Values: `true` or `false`.

Examples:

```shell
# Enable the audit log
bin/alluxio log audit --enable=true

# Disable the audit log
bin/alluxio log audit --enable=false
```

#### log status

Display the current audit and access log configuration. The command takes no arguments.

```shell
bin/alluxio log status
```

Sample output:

```console
Access Log Enabled: true
Audit Log Enabled: false
Bloom Filter Refresh Interval: 3600000ms
Log Poll Interval: 5000ms
```

{% hint style="info" %}

* **Persistence**: changes made via `log access` and `log audit` persist across cluster restarts.
* **Scope**: configuration changes propagate to all affected processes through the dynamic configuration mechanism.
  {% endhint %}

## Understanding the Audit Log Format

Each audit log entry is a structured JSON object with a consistent schema, making it easy to process and analyze.

### Audit Log Schema

| Field            | Type                 | Description                                                                                                                                                            |
| ---------------- | -------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `timestamp`      | string               | The timestamp of the operation in ISO 8601 format with timezone (e.g., `2025-08-02T09:12:30.123456+08:00[Asia/Singapore]`).                                            |
| `user`           | JSON Object          | Information about the user who performed the operation, containing `name` (user identifier), `group` (list of groups), and `role` (list of roles).                     |
| `interface`      | string               | The access interface used. Possible values: `S3`, `FUSE`, `HADOOP_FS`, and `GATEWAY`. `GATEWAY` represents management operations.                                      |
| `operation`      | string / JSON Object | The specific operation or API name. For data access interfaces, this is a string (e.g., `GetObject`). For the Gateway, this is a JSON object with `method` and `path`. |
| `resource`       | JSON Object / string | The resource involved. The content varies by interface. For example, an S3 operation includes `bucket` and `object`, while an HDFS operation includes a `path`.        |
| `status`         | string               | The result of the operation, such as `SUCCESS`, `FAILURE`, `FORBIDDEN`, `ALLOWED`, or `UNAUTHORIZED`.                                                                  |
| `errorMessage`   | string               | If the operation failed, this field contains the error message.                                                                                                        |
| `clientIp`       | string               | The IP address of the client that initiated the request.                                                                                                               |
| `clientPort`     | string               | The source port number of the client connection.                                                                                                                       |
| `reqContentLen`  | string               | The content length of the request, if applicable.                                                                                                                      |
| `respContentLen` | string               | The content length of the response, if applicable.                                                                                                                     |
| `requestId`      | string               | A unique identifier for the request, primarily used by the Gateway.                                                                                                    |

### Log Examples

#### Example 1: Management Operation

A user lists all mount points via the API.

```json
{
    "timestamp": "2025-07-29T15:21:21.846416+08:00",
    "user": {
        "name": "super-user",
        "group": ["Storage"],
        "role": ["SuperAdmin"]
    },
    "interface": "GATEWAY",
    "operation": {
        "method": "GET",
        "path": "/api/v1/mount"
    },
    "resource": {
        "parameters": {},
        "body": {}
    },
    "status": "SUCCESS",
    "errorMessage": "",
    "clientIp": "192.168.124.21",
    "requestId": "b3c9efe4-35aa-42d0-8690-ab044126452c"
}
```

#### Example 2: S3 API Data Access

A user fetches an object using an S3 client.

```json
{
    "timestamp": "2025-07-24T14:45:59.911358+08:00[Asia/Shanghai]",
    "user": {
        "name": "search-admin@alluxio.com",
        "group": ["Search"],
        "role": ["GroupAdmin"]
    },
    "interface": "S3",
    "operation": "GetObject",
    "resource": {
        "bucket": "testbucket",
        "object": "hosts3",
        "sourcePath": null,
        "prefix": null,
        "path": null
    },
    "status": "SUCCESS",
    "errorMessage": null,
    "clientIp": "127.0.0.1",
    "clientPort": "60304",
    "reqContentLen": "None",
    "respContentLen": "268",
    "requestId": null
}
```

#### Example 3: HDFS API Data Access

A user opens a file using a Hadoop client.

```json
{
    "timestamp": "2025-07-24T16:37:28.71468+08:00[Asia/Shanghai]",
    "user": {
        "name": "search-admin@alluxio.com",
        "group": ["Search"],
        "role": ["GroupAdmin"]
    },
    "interface": "HADOOP_FS",
    "operation": "HadoopFs.Open",
    "resource": {
        "path": "/testbucket/hosts3"
    },
    "status": "ALLOWED",
    "clientIp": "192.168.1.104"
}
```

#### Example 4: FUSE Data Access

An anonymous user reads a file from a FUSE mount (`cat testbucket/hosts`).

```json
{
    "timestamp": "2025-07-24T14:48:14.566555+08:00[Asia/Shanghai]",
    "user": {
        "name": "anonymous",
        "group": null,
        "role": null
    },
    "interface": "FUSE",
    "operation": "Fuse.Open",
    "resource": {
        "path": "/testbucket/hosts"
    },
    "status": "SUCCESS",
    "errorMessage": null,
    "clientIp": null,
    "clientPort": null,
    "reqContentLen": null,
    "respContentLen": null,
    "requestId": null
}
{
    "timestamp": "2025-07-24T14:48:14.650128+08:00[Asia/Shanghai]",
    "user": {
        "name": "anonymous",
        "group": null,
        "role": null
    },
    "interface": "FUSE",
    "operation": "Fuse.Read",
    "resource": {
        "fd": 3,
        "path": "/testbucket/hosts"
    },
    "status": "SUCCESS",
    "errorMessage": null,
    "clientIp": null,
    "clientPort": null,
    "reqContentLen": null,
    "respContentLen": "268",
    "requestId": null
}
{
    "timestamp": "2025-07-24T14:48:14.650381+08:00[Asia/Shanghai]",
    "user": {
        "name": "anonymous",
        "group": null,
        "role": null
    },
    "interface": "FUSE",
    "operation": "Fuse.Release",
    "resource": {
        "path": "/testbucket/hosts"
    },
    "status": "SUCCESS",
    "errorMessage": null,
    "clientIp": null,
    "clientPort": null,
    "reqContentLen": null,
    "respContentLen": null,
    "requestId": null
}
```

#### Example 5: Python SDK Data Access

A user deletes a file via the Python SDK.

```json
{
    "timestamp": "2025-07-24T15:42:59.002146+08:00[Asia/Shanghai]",
    "user": {
        "name": "anonymous",
        "group": null,
        "role": null
    },
    "interface": "HTTP",
    "operation": "HttpServer.Rm",
    "resource": {
        "path": null,
        "srcPath": null,
        "dstPath": null,
        "ufsFullPath": "/testbucket/test.txt"
    },
    "status": "SUCCESS",
    "errorMessage": null,
    "clientIp": "0:0:0:0:0:0:0:1",
    "clientPort": "48304",
    "reqContentLen": null,
    "respContentLen": "38",
    "requestId": null
}
```

## Appendix: Audited Operations by Interface

### Management Operations

All administrative API calls are audited.

### S3 API

All S3 API calls are audited. For batch operations like `DeleteObjects`, a separate log entry is generated for each sub-operation.

### Hadoop Filesystem (HDFS) API

The following Hadoop Filesystem API operations are audited:

* `HadoopFs.Authenticate`
* `HadoopFs.Create`
* `HadoopFs.Append`
* `HadoopFs.Delete`
* `HadoopFs.GetFileStatus`
* `HadoopFs.GetFileBlockLocations`
* `HadoopFs.ListStatus`
* `HadoopFs.Mkdirs`
* `HadoopFs.Open`
* `HadoopFs.Rename`
* `HadoopFs.SetOwner`
* `HadoopFs.SetPermission`

**Note on `clientIp`**: For the Hadoop Filesystem interface, an IPv4 address is obtained. If the machine has multiple network cards, it may not be possible to determine which one was used for the operation. It is recommended to rely on the `user` to identify the actor.

### FUSE API

The following FUSE operations are audited:

* `Fuse.Create`
* `Fuse.Open`
* `Fuse.Opendir`
* `Fuse.Release`
* `Fuse.Mkdir`
* `Fuse.Rmdir`
* `Fuse.Unlink`
* `Fuse.Rename`
* `Fuse.Chown`
* `Fuse.Chmod`
* `Fuse.Truncate`
* `Fuse.Symlink`
* `Fuse.Link`

**Note on FUSE logging**: A typical file access sequence involves `Fuse.Open`, followed by `read` or `write` operations, and then `Fuse.Release`. To avoid generating excessive logs, individual `read` and `write` calls are not audited. Instead, information about whether a read or write occurred is logged as part of the `Fuse.Release` event.

### HTTP Server (Used by the Python SDK)

The following HTTP server operations are audited:

* `HttpServer.GetFile`
* `HttpServer.WriteFile`
* `HttpServer.FilterParquetFile`
* `HttpServer.FilterParquetFileRaw`
* `HttpServer.Mkdir`
* `HttpServer.Touch`
* `HttpServer.Mv`
* `HttpServer.Rm`
* `HttpServer.Copy`
* `HttpServer.ListFiles`
* `HttpServer.GetFileStatus`
* `HttpServer.Load`
* `HttpServer.Tail`
* `HttpServer.Head`
* `HttpServer.GetRange`

Operations like `HealthCheck`, `GetCache`, `PostCache` are not audited.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://documentation.alluxio.io/ee-ai-en/administration/audit-access-logs/audit-log.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
