Audit Log

Overview

The Audit Log is a critical component in Alluxio for security and compliance. It provides a detailed record of various operations performed within the system, which is essential for:

  • Security Monitoring: Logging all sensitive operations to help you track who accessed what data or executed which administrative commands, and when.

  • Regulatory Compliance: Meeting the detailed logging requirements mandated by many industries (such as finance and healthcare).

Alluxio's audit logs are generated in a structured JSON format, allowing for easy parsing and integration with log analysis tools like Splunk, the ELK Stack, etc.

Audit logging primarily covers two categories of operations:

  1. Management Operations: API requests executed via the Gateway component, such as loading data, mounting storage systems, etc.

  2. Data Access: Read and write operations on files and data performed through various interfaces, including S3, HDFS, FUSE, and the Python SDK.

How to Enable and Configure Audit Logging

Enabling audit logging requires configuration in two parts: one for the management Gateway and another for all data access interfaces.

Step 1: Enable Management Gateway Logging

This configuration logs all administrative API requests sent via the Gateway.

In your alluxio-cluster.yml file, add or modify the following configuration:

spec:
  gateway:
    enabled: true
    auditLog:
      enabled: true
      type: hostPath
      hostPath: /mnt/alluxio/gateway/logs/audit/gateway-audit.log
      size: 1Gi
      storageClass: standard

Step 2: Enable Data Access Logging

This configuration logs data access operations performed through interfaces like S3, HDFS, FUSE, and the Python SDK.

In your alluxio-site.properties file, add or modify the following properties:

alluxio.audit.logging.enabled=true
alluxio.audit.logging.poll.interval=5s
  • By default, data access logs are stored in the ${audit.logs.dir} directory. You can customize the storage location by modifying the *_AUDIT_LOG appenders in the conf/log4j2.xml file.

Property
Description
Default Value

alluxio.audit.logging.enabled

Set to true to enable audit logging for data access operations.

false

alluxio.audit.logging.poll.interval

The interval at which audit logs are written to the file.

5s

The Audit Log Format

Each audit log entry is a structured JSON object containing a standard set of fields, which allows for consistent processing and analysis.

Audit Log Schema

Field
Type
Description

timestamp

string

The timestamp of the operation in ISO 8601 format with timezone (e.g., 2025-08-02T09:12:30.123456+08:00[Asia/Singapore]).

user

JSON Object

Information about the user who performed the operation, containing name (user identifier), group (list of groups), and role (list of roles).

interface

string

The access interface used for the operation. Possible values include S3, HTTP, FUSE, HADOOP_FS, and GATEWAY. HTTP is used for Python SDK

operation

string / JSON Object

The specific operation or API name. For data access interfaces (S3, HDFS, etc.), this is a string like GetObject. For the Gateway interface, this is an expanded JSON object containing method and path.

resource

JSON Object / string

The resource involved in the operation. This is an expanded JSON object whose content varies by interface and operation. For example, an S3 operation will include bucket and object fields, while an HDFS operation will include a path field.

status

string

The result of the operation, such as SUCCESS, FAILURE, FORBIDDEN, ALLOWED, or UNAUTHORIZED.

errorMessage

string

If the operation failed, this field records the error message.

clientIp

string

The IP address of the client that initiated the request.

clientPort

string

The source port number of the client connection.

reqContentLen

string

The content length of the request, if applicable.

respContentLen

string

The content length of the response, if applicable.

requestId

string

A unique identifier for the request, primarily used by the Gateway.

Audit Log Examples

Example 1: Management Gateway Operations

A user lists all mount points and then deletes a mount point via the API.

{
    "timestamp": "2025-07-29T15:21:21.846416+08:00",
    "user": {
        "name": "super-user",
        "group": ["Storage"],
        "role": ["SuperAdmin"]
    },
    "interface": "GATEWAY",
    "operation": {
        "method": "GET",
        "path": "/api/v1/mount"
    },
    "resource": {
        "parameters": {},
        "body": {}
    },
    "status": "SUCCESS",
    "errorMessage": "",
    "clientIp": "192.168.124.21",
    "requestId": "b3c9efe4-35aa-42d0-8690-ab044126452c"
}
{
    "timestamp": "2025-07-29T15:21:30.738177787+08:00",
    "user": {
        "name": "super-user",
        "group": ["Storage"],
        "role": ["SuperAdmin"]
    },
    "interface": "GATEWAY",
    "operation": {
        "method": "DELETE",
        "path": "/api/v1/mount"
    },
    "resource": {
        "parameters": {
            "path": ["testbucket"]
        },
        "body": {
            "path": "/testbucket"
        }
    },
    "status": "SUCCESS",
    "errorMessage": "",
    "clientIp": "192.168.124.21",
    "clientPort": "1870",
    "reqContentLen": "29",
    "respContentLen": "15",
    "requestId": "c4225178-341b-4b83-ad7b-3816e034d5da"
}

Example 2: S3 API Data Access

Note on the S3 resource field: For S3 operations, the resource field is a JSON object containing relevant details such as bucket, object, sourcePath, prefix, and path.

A user fetches an object using an S3 client.

{
    "timestamp": "2025-07-24T14:45:59.911358+08:00[Asia/Shanghai]",
    "user": {
        "name": "[email protected]",
        "group": ["Search"],
        "role": ["GroupAdmin"]
    },
    "interface": "S3",
    "operation": "GetObject",
    "resource": {
        "bucket": "testbucket",
        "object": "hosts3",
        "sourcePath": null,
        "prefix": null,
        "path": null
    },
    "status": "SUCCESS",
    "errorMessage": null,
    "clientIp": "127.0.0.1",
    "clientPort": "60304",
    "reqContentLen": "None",
    "respContentLen": "268",
    "requestId": null
}

Example 3: HDFS API Data Access

A user opens a file using a Hadoop client.

{
    "timestamp": "2025-07-24T16:37:28.71468+08:00[Asia/Shanghai]",
    "user": {
        "name": "[email protected]",
        "group": ["Search"],
        "role": ["GroupAdmin"]
    },
    "interface": "HADOOP_FS",
    "operation": "HadoopFs.Open",
    "resource": {
        "path": "/testbucket/hosts3"
    },
    "status": "ALLOWED",
    "clientIp": "192.168.1.104"
}

Example 4: FUSE Data Access

An anonymous user reads a file from a FUSE mount (cat testbucket/hosts).

{
    "timestamp": "2025-07-24T14:48:14.566555+08:00[Asia/Shanghai]",
    "user": {
        "name": "anonymous",
        "group": null,
        "role": null
    },
    "interface": "FUSE",
    "operation": "Fuse.Open",
    "resource": {
        "path": "/testbucket/hosts"
    },
    "status": "SUCCESS",
    "errorMessage": null,
    "clientIp": null,
    "clientPort": null,
    "reqContentLen": null,
    "respContentLen": null,
    "requestId": null
}
{
    "timestamp": "2025-07-24T14:48:14.650128+08:00[Asia/Shanghai]",
    "user": {
        "name": "anonymous",
        "group": null,
        "role": null
    },
    "interface": "FUSE",
    "operation": "Fuse.Read",
    "resource": {
        "fd": 3,
        "path": "/testbucket/hosts"
    },
    "status": "SUCCESS",
    "errorMessage": null,
    "clientIp": null,
    "clientPort": null,
    "reqContentLen": null,
    "respContentLen": "268",
    "requestId": null
}
{
    "timestamp": "2025-07-24T14:48:14.650381+08:00[Asia/Shanghai]",
    "user": {
        "name": "anonymous",
        "group": null,
        "role": null
    },
    "interface": "FUSE",
    "operation": "Fuse.Release",
    "resource": {
        "path": "/testbucket/hosts"
    },
    "status": "SUCCESS",
    "errorMessage": null,
    "clientIp": null,
    "clientPort": null,
    "reqContentLen": null,
    "respContentLen": null,
    "requestId": null
}

Example 5: Python SDK Data Access

A user deletes a file via the Python SDK.

{
    "timestamp": "2025-07-24T15:42:59.002146+08:00[Asia/Shanghai]",
    "user": {
        "name": "anonymous",
        "group": null,
        "role": null
    },
    "interface": "HTTP",
    "operation": "HttpServer.Rm",
    "resource": {
        "path": null,
        "srcPath": null,
        "dstPath": null,
        "ufsFullPath": "/testbucket/test.txt"
    },
    "status": "SUCCESS",
    "errorMessage": null,
    "clientIp": "0:0:0:0:0:0:0:1",
    "clientPort": "48304",
    "reqContentLen": null,
    "respContentLen": "38",
    "requestId": null
}

Appendix: Audited Operations by Interface

  • S3 API: All S3 API calls are audited. For batch operations like DeleteObjects, a separate log entry is generated for each sub-operation.

  • Gateway: All administrative API calls made via the Gateway are audited.

Hadoop Filesystem (HDFS) API

The following Hadoop Filesystem API operations are audited:

  • HadoopFs.Authenticate

  • HadoopFs.Create

  • HadoopFs.Append

  • HadoopFs.Delete

  • HadoopFs.GetFileStatus

  • HadoopFs.GetFileBlockLocations

  • HadoopFs.ListStatus

  • HadoopFs.Mkdirs

  • HadoopFs.Open

  • HadoopFs.Rename

  • HadoopFs.SetOwner

  • HadoopFs.SetPermission

Note on clientIp: For the Hadoop Filesystem interface, an IPv4 address is obtained. If the machine has multiple network cards, it may not be possible to determine which one was used for the operation. It is recommended to rely on the userId to identify the actor.

FUSE API

The following FUSE operations are audited:

  • Fuse.Create

  • Fuse.Open

  • Fuse.Opendir

  • Fuse.Release

  • Fuse.Mkdir

  • Fuse.Rmdir

  • Fuse.Unlink

  • Fuse.Rename

  • Fuse.Chown

  • Fuse.Chmod

  • Fuse.Truncate

  • Fuse.Symlink

  • Fuse.Link

Note on FUSE logging: A typical file access sequence involves Fuse.Open, followed by read or write operations, and then Fuse.Release. To avoid generating excessive logs, individual read and write calls are not audited. Instead, information about whether a read or write occurred is logged as part of the Fuse.Release event.

HTTP Server (Used by the Python SDK)

The following HTTP server operations are audited:

  • HttpServer.GetFile

  • HttpServer.WriteFile

  • HttpServer.FilterParquetFile

  • HttpServer.FilterParquetFileRaw

  • HttpServer.Mkdir

  • HttpServer.Touch

  • HttpServer.Mv

  • HttpServer.Rm

  • HttpServer.Copy

  • HttpServer.ListFiles

  • HttpServer.GetFileStatus

  • HttpServer.Load

  • HttpServer.Tail

  • HttpServer.Head

  • HttpServer.GetRange

Operations like HealthCheck, GetCache, PostCache are not audited.

Last updated