Enabling Audit Log

The Alluxio Audit Log is a critical component for security and compliance, providing a detailed, structured record of operations performed within the system. It helps you monitor sensitive data access, track administrative commands, and meet regulatory requirements.

Audit logs are generated in a structured JSON format, allowing for easy parsing and integration with analysis tools like Splunk or the ELK Stack. Logging covers two primary categories of operations:

  1. Management Operations: API requests for management tasks, such as loading data or managing mount points.

  2. Data Access: Read and write operations performed through various interfaces, including S3, HDFS, FUSE, and the Python SDK.

This guide explains how to enable and configure audit logging for both categories.

Enabling and Configuring Audit Logging

Audit logging is configured separately for management operations and for data access interfaces.

Step 1: Enable Management Operation Audit Logging

This configuration logs all administrative API requests. This feature is enabled by default.

The audit logs are stored in the /mnt/alluxio/logs/gateway/audit/ directory on the Gateway host machine and stored in the /opt/alluxio/logs/audit/in the Gateway pod. To modify the log location or other settings, you can update the gateway section in your alluxio-cluster.yaml file:

spec:
  gateway:
    enabled: true
    log:
      level: info
      type: hostPath
      hostPath: /mnt/alluxio/logs/gateway
      size: 1Gi
      storageClass: standard
    auditLog:
      enabled: true

Step 2: Enable Data Access Audit Logging

This configuration logs data access operations from clients using the S3, HDFS, and FUSE interfaces.

To enable it, add the following properties to your alluxio-cluster.yaml file:

spec:
  properties:
    alluxio.audit.logging.enabled: "true"
    alluxio.audit.logging.poll.interval: "5s"
  • By default, data access audit logs are stored in the ${audit.logs.dir} directory. You can customize this location by modifying the *_AUDIT_LOG appenders in your conf/log4j2.xml file.

  • For operator versions 3.3.3 and later, the required audit log appenders are included in log4j2.xml by default. For earlier versions, you must manually add the logger configuration to the log4j2.xml section of your alluxio-configmap.

Property
Description
Default Value

alluxio.audit.logging.enabled

Set to true to enable audit logging for data access operations.

false

alluxio.audit.logging.poll.interval

The interval at which audit logs are written to the file.

5s

Understanding the Audit Log Format

Each audit log entry is a structured JSON object with a consistent schema, making it easy to process and analyze.

Audit Log Schema

Field
Type
Description

timestamp

string

The timestamp of the operation in ISO 8601 format with timezone (e.g., 2025-08-02T09:12:30.123456+08:00[Asia/Singapore]).

user

JSON Object

Information about the user who performed the operation, containing name (user identifier), group (list of groups), and role (list of roles).

interface

string

The access interface used. Possible values: S3, FUSE, HADOOP_FS, and GATEWAY. GATEWAY represents management operations.

operation

string / JSON Object

The specific operation or API name. For data access interfaces, this is a string (e.g., GetObject). For the Gateway, this is a JSON object with method and path.

resource

JSON Object / string

The resource involved. The content varies by interface. For example, an S3 operation includes bucket and object, while an HDFS operation includes a path.

status

string

The result of the operation, such as SUCCESS, FAILURE, FORBIDDEN, ALLOWED, or UNAUTHORIZED.

errorMessage

string

If the operation failed, this field contains the error message.

clientIp

string

The IP address of the client that initiated the request.

clientPort

string

The source port number of the client connection.

reqContentLen

string

The content length of the request, if applicable.

respContentLen

string

The content length of the response, if applicable.

requestId

string

A unique identifier for the request, primarily used by the Gateway.

Log Examples

Example 1: Management Operation

A user lists all mount points via the API.

{
    "timestamp": "2025-07-29T15:21:21.846416+08:00",
    "user": {
        "name": "super-user",
        "group": ["Storage"],
        "role": ["SuperAdmin"]
    },
    "interface": "GATEWAY",
    "operation": {
        "method": "GET",
        "path": "/api/v1/mount"
    },
    "resource": {
        "parameters": {},
        "body": {}
    },
    "status": "SUCCESS",
    "errorMessage": "",
    "clientIp": "192.168.124.21",
    "requestId": "b3c9efe4-35aa-42d0-8690-ab044126452c"
}

Example 2: S3 API Data Access

A user fetches an object using an S3 client.

{
    "timestamp": "2025-07-24T14:45:59.911358+08:00[Asia/Shanghai]",
    "user": {
        "name": "[email protected]",
        "group": ["Search"],
        "role": ["GroupAdmin"]
    },
    "interface": "S3",
    "operation": "GetObject",
    "resource": {
        "bucket": "testbucket",
        "object": "hosts3",
        "sourcePath": null,
        "prefix": null,
        "path": null
    },
    "status": "SUCCESS",
    "errorMessage": null,
    "clientIp": "127.0.0.1",
    "clientPort": "60304",
    "reqContentLen": "None",
    "respContentLen": "268",
    "requestId": null
}

Example 3: HDFS API Data Access

A user opens a file using a Hadoop client.

{
    "timestamp": "2025-07-24T16:37:28.71468+08:00[Asia/Shanghai]",
    "user": {
        "name": "[email protected]",
        "group": ["Search"],
        "role": ["GroupAdmin"]
    },
    "interface": "HADOOP_FS",
    "operation": "HadoopFs.Open",
    "resource": {
        "path": "/testbucket/hosts3"
    },
    "status": "ALLOWED",
    "clientIp": "192.168.1.104"
}

Example 4: FUSE Data Access

An anonymous user reads a file from a FUSE mount (cat testbucket/hosts).

{
    "timestamp": "2025-07-24T14:48:14.566555+08:00[Asia/Shanghai]",
    "user": {
        "name": "anonymous",
        "group": null,
        "role": null
    },
    "interface": "FUSE",
    "operation": "Fuse.Open",
    "resource": {
        "path": "/testbucket/hosts"
    },
    "status": "SUCCESS",
    "errorMessage": null,
    "clientIp": null,
    "clientPort": null,
    "reqContentLen": null,
    "respContentLen": null,
    "requestId": null
}
{
    "timestamp": "2025-07-24T14:48:14.650128+08:00[Asia/Shanghai]",
    "user": {
        "name": "anonymous",
        "group": null,
        "role": null
    },
    "interface": "FUSE",
    "operation": "Fuse.Read",
    "resource": {
        "fd": 3,
        "path": "/testbucket/hosts"
    },
    "status": "SUCCESS",
    "errorMessage": null,
    "clientIp": null,
    "clientPort": null,
    "reqContentLen": null,
    "respContentLen": "268",
    "requestId": null
}
{
    "timestamp": "2025-07-24T14:48:14.650381+08:00[Asia/Shanghai]",
    "user": {
        "name": "anonymous",
        "group": null,
        "role": null
    },
    "interface": "FUSE",
    "operation": "Fuse.Release",
    "resource": {
        "path": "/testbucket/hosts"
    },
    "status": "SUCCESS",
    "errorMessage": null,
    "clientIp": null,
    "clientPort": null,
    "reqContentLen": null,
    "respContentLen": null,
    "requestId": null
}

Example 5: Python SDK Data Access

A user deletes a file via the Python SDK.

{
    "timestamp": "2025-07-24T15:42:59.002146+08:00[Asia/Shanghai]",
    "user": {
        "name": "anonymous",
        "group": null,
        "role": null
    },
    "interface": "HTTP",
    "operation": "HttpServer.Rm",
    "resource": {
        "path": null,
        "srcPath": null,
        "dstPath": null,
        "ufsFullPath": "/testbucket/test.txt"
    },
    "status": "SUCCESS",
    "errorMessage": null,
    "clientIp": "0:0:0:0:0:0:0:1",
    "clientPort": "48304",
    "reqContentLen": null,
    "respContentLen": "38",
    "requestId": null
}

Appendix: Audited Operations by Interface

Management Operations

All administrative API calls are audited.

S3 API

All S3 API calls are audited. For batch operations like DeleteObjects, a separate log entry is generated for each sub-operation.

Hadoop Filesystem (HDFS) API

The following Hadoop Filesystem API operations are audited:

  • HadoopFs.Authenticate

  • HadoopFs.Create

  • HadoopFs.Append

  • HadoopFs.Delete

  • HadoopFs.GetFileStatus

  • HadoopFs.GetFileBlockLocations

  • HadoopFs.ListStatus

  • HadoopFs.Mkdirs

  • HadoopFs.Open

  • HadoopFs.Rename

  • HadoopFs.SetOwner

  • HadoopFs.SetPermission

Note on clientIp: For the Hadoop Filesystem interface, an IPv4 address is obtained. If the machine has multiple network cards, it may not be possible to determine which one was used for the operation. It is recommended to rely on the user to identify the actor.

FUSE API

The following FUSE operations are audited:

  • Fuse.Create

  • Fuse.Open

  • Fuse.Opendir

  • Fuse.Release

  • Fuse.Mkdir

  • Fuse.Rmdir

  • Fuse.Unlink

  • Fuse.Rename

  • Fuse.Chown

  • Fuse.Chmod

  • Fuse.Truncate

  • Fuse.Symlink

  • Fuse.Link

Note on FUSE logging: A typical file access sequence involves Fuse.Open, followed by read or write operations, and then Fuse.Release. To avoid generating excessive logs, individual read and write calls are not audited. Instead, information about whether a read or write occurred is logged as part of the Fuse.Release event.

HTTP Server (Used by the Python SDK)

The following HTTP server operations are audited:

  • HttpServer.GetFile

  • HttpServer.WriteFile

  • HttpServer.FilterParquetFile

  • HttpServer.FilterParquetFileRaw

  • HttpServer.Mkdir

  • HttpServer.Touch

  • HttpServer.Mv

  • HttpServer.Rm

  • HttpServer.Copy

  • HttpServer.ListFiles

  • HttpServer.GetFileStatus

  • HttpServer.Load

  • HttpServer.Tail

  • HttpServer.Head

  • HttpServer.GetRange

Operations like HealthCheck, GetCache, PostCache are not audited.

Last updated