Authorization

This document outlines the authorization policies and Access Control List (ACL) permission design for Alluxio, focusing on S3 API and Hadoop filesystem integrations. Proper ACL configuration is essential for data security, ensuring access is controlled by operation type (directory/object) and action (read, write, delete, etc.).

S3 API Authorization

Alluxio supports a subset of S3 APIs. Authorization is divided into object-level and bucket-level tasks, with recommended ACLs for each operation.

Object Operations ACL Matrix

Operation Type
HTTP Method
Condition
Target Resource
Required Permission

ListParts

GET

uploadId in query

Current Object

READ

GetObjectTagging

GET

tagging in query

Current Object

READ

GetObject

GET

(GET Method for other cases)

Current Object

READ

PutObjectTagging

PUT

tagging in query

Current Object

WRITE

UploadPartCopy

PUT

uploadId in query AND x-amz-copy-source

Current Object

READ

Target parent Dir

WRITE

UploadPart

PUT

uploadId in query

Current Object

WRITE

CopyObject

PUT

x-amz-copy-source header

Source Object

READ

Target parent Dir

WRITE

PutObject

PUT

(PUT Method for other cases)

Parent Directory

WRITE

CreateMultipartUpload

POST

uploads in query

Parent Directory

WRITE

CompleteMultipartUpload

POST

uploadId in query

Parent Directory

WRITE

HeadObject

HEAD

-

Current Object

READ

AbortMultipartUpload

DELETE

uploadId in query

Current Object

WRITE

DeleteObjectTagging

DELETE

tagging in query

Current Object

WRITE

DeleteObject

DELETE

(DELETE method for other cases)

Current Object

WRITE

Bucket Operations ACL Matrix

Operation Type
HTTP Method
Condition
Target Resource
Required Permission

ListBuckets

GET

bucket is empty

Root Path

EXECUTE

GetBucketTagging

GET

tagging in query

Bucket Directory

READ

ListMultipartUploads

GET

uploads in query

Bucket Directory

EXECUTE

ListObjects

GET

GET Method for other cases

Requested Directory

EXECUTE

PutBucketTagging

PUT

tagging in query

Bucket Directory

WRITE

CreateBucket

PUT

(PUT Method for other cases)

Root Path

WRITE

DeleteObjects

POST

delete in query

Each Object Specified

WRITE (per object)

HeadBucket

HEAD

-

Bucket Directory

READ

DeleteBucketTagging

DELETE

tagging in query

Bucket Directory

WRITE

DeleteBucket

DELETE

(DELETE method for other cases)

Bucket Directory

WRITE

Parquet Task (Read Permission Check)

GetParquetRaw: Requires READ permission on the query parameter path.

Hadoop Filesystem Authorization

Alluxio's Hadoop filesystem integration requires permission checks for file and directory operations.

Hadoop Filesystem Functions and Permissions

Function
CheckPermission
Check Path
Permission
Comments

append

Yes

Current path

WRITE

Appends data to an existing file.

create

Yes

Parent path

WRITE

Creates a new file or overwrites an existing one.

delete

Yes

Parent path

WRITE

Deletes a file or directory.

getFileBlockLocations

Yes

Current path

READ

Returns block locations for a file.

getFileStatus

Yes

Current path

READ

Returns status of a file or directory.

setOwner

Yes

Current path

WRITE

Sets owner/group for a file or directory.

setPermission

Yes

Current path

WRITE

Sets permissions for a file or directory.

listStatus

Yes

Current path

EXECUTE

Lists files/directories in a path.

mkdirs

Yes

Parent path

WRITE

Creates directory and necessary parents.

open

Yes

Current path

READ

Opens a file for reading.

rename

Yes

Src Parent, Dst Parent

WRITE, WRITE

Renames a file or directory.

Managing Alluxio Permissions with Ranger

Prerequisites

Before configuring Ranger integration with Alluxio, ensure the following requirements are met:

  1. Plugin JAR Availability: Ensure the authorization plugin jar (e.g., lib/alluxio-authorization-hdfs-AI-3.7-13.0.0.jar) is available on all Alluxio nodes.

  2. Ranger HDFS Plugin: Verify that the HDFS plugin is enabled in your Ranger configuration.

  3. HDFS Repository Setup: Follow the instructions in Cloudera's HDFS Service documentation to set up a new HDFS repository for Alluxio.

Important Note about Repository Configuration: When configuring the HDFS repository in Ranger Admin, you can use any placeholder URL in the "Name Node URL" field (e.g., hdfs://alluxio-namenode:8020). The connection test will fail with an error similar to:

Connection Failed.
Unable to retrieve any files using given parameters. You can still save the repository and start creating policies, but you would not be able to use autocomplete for resource names.

org.apache.ranger.plugin.client.HadoopException: listFilesInternal: Unable to get listing of files for directory /null from Hadoop environment [alluxio-test].
No FileSystem for scheme "alluxio".

This error is expected and can be safely ignored. You can proceed to save the repository configuration and create policies manually without the autocomplete functionality.

Configuration Files Setup

Copy the following configuration files from /etc/hadoop/conf/ on the HDFS namenode to a directory on Alluxio nodes, such as /opt/alluxio/conf/ranger/:

  • ranger-hdfs-security.xml

  • ranger-hdfs-audit.xml

  • ranger-policymgr-ssl.xml

Update Ranger Configuration

Update the configuration settings in ranger-hdfs-security.xml to use the new HDFS repository:

  1. Set ranger.plugin.hdfs.policy.cache.dir to a valid directory on Alluxio nodes where you want to store the policy cache

  2. Set ranger.plugin.hdfs.policy.rest.ssl.config.file to point to the path of the ranger-policymgr-ssl.xml file on Alluxio nodes

  3. Set ranger.plugin.hdfs.service.name to be the new HDFS repository name, e.g. alluxio-test

  4. Verify that ranger.plugin.hdfs.policy.rest.url is pointing to the correct Ranger service URL

  5. Set xasecure.add-hadoop-authorization to false to disable fallback to Hadoop authorization

Configure Alluxio Nodes

Configure Alluxio nodes to use Ranger plugin for authorization. In alluxio-site.properties, add the following properties:

For S3 API:

alluxio.worker.s3.authorization.enabled=true
alluxio.worker.s3.authorizer.classname=alluxio.s3.auth.ExtendableAuthorizer
alluxio.security.authorization.plugin.name=ranger-privacera-4.7
alluxio.security.authorization.plugin.paths=/opt/alluxio/conf/ranger/  # or wherever you copied the 3 ranger configuration xml files

For Hadoop Filesystem: On the client side, enable authorization by adding:

alluxio.security.authorization.plugin.name=ranger-privacera-4.7
alluxio.security.authorization.plugin.paths=/opt/alluxio/conf/ranger/  # or wherever you copied the 3 ranger configuration xml files

Apply Configuration

Restart all Alluxio nodes to apply the new configurations, or configure before starting up the Alluxio cluster. You can now add policies to the Alluxio repository in Ranger and verify they take effect in Alluxio.

Example: Using Ranger with Alluxio S3 API

This example demonstrates how to set up and test authorization policies using Ranger with Alluxio's S3 API.

Step 1: Mount S3 Bucket

Mount an S3 bucket to Alluxio namespace:

alluxio fs mount --path /testbucket --ufs s3a://my-test-bucket

This creates an Alluxio path /testbucket that maps to the S3 bucket my-test-bucket.

Step 2: Create Ranger Policies

In the Ranger web console, create policies for the Alluxio service:

  1. Read Policy for File Access:

    • Resource Path: /testbucket/data/file.txt

    • User: userA

    • Permissions: Read

    • Description: Allow userA to read specific file

  2. Write Policy for Directory Access:

    • Resource Path: /testbucket/uploads/*

    • User: userB

    • Permissions: Write

    • Description: Allow userB to write to uploads directory

  3. Execute Policy for Bucket Listing:

    • Resource Path: /testbucket

    • User: userA, userB

    • Permissions: Execute

    • Description: Allow users to list bucket contents

Step 3: Test Authorization Scenarios

Scenario 1: Successful Read Access

# Using AWS CLI with userA credentials with aws_session_token
aws s3api get-object \
  --endpoint-url http://alluxio-s3:29998 \
  --bucket testbucket \
  --key data/file.txt \
  /tmp/downloaded-file.txt

Expected Result: SUCCESS - userA has READ permission for /testbucket/data/file.txt

Scenario 2: Failed Write Access

# Using AWS CLI with userA credentials
aws s3api put-object \
  --endpoint-url http://alluxio-s3:29998 \
  --bucket testbucket \
  --key uploads/new-file.txt \
  --body /tmp/local-file.txt

Expected Result: FORBIDDEN - userA lacks WRITE permission for /testbucket/uploads/

Scenario 3: Successful Write Access

# Using AWS CLI with userB credentials
aws s3api put-object \
  --endpoint-url http://alluxio-s3:29998 \
  --bucket testbucket \
  --key uploads/new-file.txt \
  --body /tmp/local-file.txt

Expected Result: SUCCESS - userB has WRITE permission for /testbucket/uploads/*

Scenario 4: Bucket Listing

# Using AWS CLI with userA credentials
aws s3api list-objects-v2 \
  --endpoint-url http://alluxio-s3:29998 \
  --bucket testbucket

Expected Result: SUCCESS - userA has EXECUTE permission for /testbucket

Important Notes:

  1. Path Mapping: Alluxio paths in Ranger policies correspond to the mounted paths in Alluxio's filesystem, not the underlying S3 bucket paths.

  2. Permission Hierarchy:

    • GetObject operations require READ permission on the file path

    • PutObject operations require WRITE permission on the parent directory

    • Listing operations require EXECUTE permission on the directory

  3. Policy Precedence: More specific policies take precedence over general ones. A deny policy will override an allow policy for the same resource.

  4. Troubleshooting: If authorization fails unexpectedly, check:

    • Ranger policy cache refresh

    • Alluxio logs for detailed error messages

    • User authentication is working correctly

Gateway Authorization with OPA

Alluxio Gateway provides comprehensive authorization through Open Policy Agent (OPA), enabling fine-grained, policy-based access control for its APIs. This approach allows administrators to define rules based on user identity (roles and groups) and the resources being accessed.

Core Concepts

Before configuring, it's helpful to understand the three main components:

  1. OPA Policy (.rego file): The "logic engine" for authorization. This file contains a set of rules written in the Rego language that define the conditions for allowing or denying a request.

  2. OPA Data (.yaml file): The "configuration" that the policy uses. This file provides the specific data—such as lists of super-admins, group definitions, and path permissions—that the Rego rules consult to make decisions.

  3. Kubernetes ConfigMap: The standard Kubernetes method for providing the policy and data files to the Alluxio Gateway Pod.

Setup and Configuration

Follow these steps to enable and configure OPA authorization for the Gateway.

Step 1: Prepare OPA Policy and Data Files

First, create the policy (opa_auth_policy.rego) and data (opa_data.yaml) files locally.

1. opa_auth_policy.rego (Policy File)

This is a comprehensive reference policy that you can use as a starting point or customize for your needs.

    package opa_auth_policy

    import rego.v1

    # -----------------------------------------------------------------------------
    # --- Default Policy
    # -----------------------------------------------------------------------------

    # By default, deny all requests.
    default allow := false

    # -----------------------------------------------------------------------------
    # --- Main Authorization Rules
    # --- These are the top-level rules that determine the final 'allow' decision.
    # -----------------------------------------------------------------------------

    # Rule 1: SuperAdmins are allowed to do anything, bypassing all other checks.
    allow if {
        user_is_superadmin
    }

    # Rule 2: For users who pass the base checks, allow GET access to globally allowed APIs.
    allow if {
        base_checks_pass
        method_is_get
        api_in_global_allow_list
    }

    # Rule 3: Admins can list resources (i.e., a GET request with no specific path or ID).
    allow if {
        base_checks_pass
        user_is_admin
        method_is_get
        request_is_for_listing
    }

    # Rule 4: Allow any user who passes base checks to GET resources if all requested paths are permitted.
    allow if {
        base_checks_pass
        method_is_get
        all_request_paths_are_allowed
    }

    # Rule 5: Allow Admins to perform update actions if all requested paths are permitted.
    allow if {
        base_checks_pass
        user_is_admin
        method_is_update
        all_request_paths_are_allowed
    }

    # Rule 6: Allow requests that provide an ID but no path.
    # This handles a specific update scenario.
    allow if {
        base_checks_pass
        user_is_admin
        method_is_update
        request_has_id_but_no_path
    }

    # Rule 7: Allow requests that provide an ID but no path.
    # This handles a specific GET scenario.
    allow if {
        base_checks_pass
        method_is_get
        request_has_id_but_no_path
    }

    # -----------------------------------------------------------------------------
    # --- Core Logic Helpers
    # -----------------------------------------------------------------------------

    # Combines common checks for non-superadmin users to improve performance.
    base_checks_pass if {
        not user_is_superadmin
        user_is_valid
        not api_in_global_deny_list
    }

    # Checks if all paths in the request are in the user's allow list and not in the deny list.
    all_request_paths_are_allowed if {
        count(relevant_paths) > 0
        count({x | relevant_paths[x]; path_is_allowed(x)}) == count(relevant_paths)
        count({x | relevant_paths[x]; path_is_denied(x)}) == 0
    }

    # Determines if the request is a "listing" operation (no ID and no paths).
    request_is_for_listing if {
        not request_has_id
        count(relevant_paths) == 0
    }

    # Determines if the request contains an ID but no paths.
    request_has_id_but_no_path if {
        request_has_id
        count(relevant_paths) == 0
    }

    # -----------------------------------------------------------------------------
    # --- Permission Detail Helpers
    # -----------------------------------------------------------------------------

    # Checks if a path is in the user's allowed prefixes.
    path_is_allowed(path) if {
        some i, j
        some group in input_groups
        group == data.groups[i].group
        clean_path := trim_suffix(path, "/")
        clean_prefix := trim_suffix(data.groups[i].allow.pathPrefixes[j].prefix, "/")
        strings.any_prefix_match(clean_path, clean_prefix)
        is_valid_prefix_match(clean_path, clean_prefix)
        api_is_valid_for_path_rule(data.groups[i].allow.pathPrefixes[j])
    }

    # Checks if a path is in the user's denied prefixes.
    path_is_denied(path) if {
        some i, j
        some group in input_groups
        group == data.groups[i].group
        clean_path := trim_suffix(path, "/")
        clean_prefix := trim_suffix(data.groups[i].deny.pathPrefixes[j].prefix, "/")
        strings.any_prefix_match(clean_path, clean_prefix)
        is_valid_prefix_match(clean_path, clean_prefix)
        api_is_valid_for_path_rule(data.groups[i].deny.pathPrefixes[j])
    }

    # Validates that the prefix match is legitimate.
    # This rule is true if the prefix is an exact match to the path.
    is_valid_prefix_match(path, prefix) if {
        strings.any_prefix_match(path, prefix)
        suffix := trim_prefix(path, prefix)
        suffix == ""
    }

    # This rule is true if the prefix matches a directory boundary.
    # Example: prefix "/a/b" matches path "/a/b/c".
    is_valid_prefix_match(path, prefix) if {
        strings.any_prefix_match(path, prefix)
        suffix := trim_prefix(path, prefix)
        startswith(suffix, "/")
    }

    # Checks if the current API is valid for the given path rule.
    # Rule 1: If the path rule does not specify an 'apis' list, it applies to all APIs.
    api_is_valid_for_path_rule(rule) if {
        not rule.apis
    }

    # Rule 2: If the path rule specifies an 'apis' list, the current API must be in that list.
    api_is_valid_for_path_rule(rule) if {
        input_api in rule.apis
    }

    # -----------------------------------------------------------------------------
    # --- Request Parsing Helpers
    # -----------------------------------------------------------------------------

    # Extract the API endpoint from the request path.
    input_api := split(input.path, "/v1")[1]

    # HTTP method checks.
    method_is_get if input.method == "GET"
    method_is_update if input.method != "GET"

    # Extracts paths from multiple possible locations in the request.
    # Uses the 'contains' keyword to incrementally build the set of paths.
    paths_from_request contains path if {
        path := input.query.path[0]
    }

    paths_from_request contains path if {
        path := input.parsed_body.path
    }

    paths_from_request contains path if {
        path := input.parsed_body.paths[_]
    }

    paths_from_request contains path if {
        path := input.parsed_body.index
    }

    # Collects all non-empty paths from the request for validation.
    relevant_paths := {p | some p in paths_from_request; p != ""}

    # Checks if the request contains an ID.
    request_has_id if input.parsed_body.id != ""
    request_has_id if input.query.id != ""

    # Global API list checks.
    api_in_global_deny_list if input_api in data.denyApis
    api_in_global_allow_list if input_api in data.allowApis

    # -----------------------------------------------------------------------------
    # --- User & Role Helpers
    # -----------------------------------------------------------------------------

    claims := payload if {
        token := input.header.Authorization
        count(token) != 0
        startswith(token[0], "Bearer ")
        bearer_token := substring(token[0], count("Bearer "), -1)
        [_, payload, _] := io.jwt.decode(bearer_token)
    }

    else := user_info if {
        token := input.header.Authorization
        count(token) != 0
        not startswith(token[0], "Bearer ")
        base64.is_valid(token[0])
        ui = base64.decode(token[0])
        json.is_valid(ui)
        user_info = json.unmarshal(ui)
    }

    default input_roles := []

    input_roles := claims.roleFieldName if {
        claims.roleFieldName != ""
        is_array(claims.roleFieldName)
    }

    else := [claims.roleFieldName] if {
        claims.roleFieldName != ""
        is_string(claims.roleFieldName)
    }

    else := claims.role if  {
        claims.role != ""
        is_array(claims.role)
    }

    else := [claims.role] if {
        claims.role != ""
        is_string(claims.role)
    }

    default input_groups := []

    input_groups := claims.groupFieldName if {
        claims.groupFieldName != ""
        is_array(claims.groupFieldName)
    }

    else := [claims.groupFieldName] if {
        claims.groupFieldName != ""
        is_string(claims.groupFieldName)
    }

    else := claims.group if {
        claims.group != ""
        is_array(claims.group)
    }

    else := [claims.group] if {
        claims.group != ""
        is_string(claims.group)
    }

    user_is_valid if {
        count(input_roles) > 0
        count(input_groups) > 0
    }

    user_is_superadmin if {
        count(input_roles) > 0
        some i
        some role in input_roles
        role == data.superAdmin[i]
    }

    user_is_admin if {
        some i
        some role in input_roles
        role == data.groupAdmin[i]
    }

2. opa_data.yaml (Data File)

Define your roles, API restrictions, and group-based path permissions in this file.

# Super Admin definition
superAdmin: ["SuperAdmin"]

# Admin User definition  
groupAdmin: ["GroupAdmin"]

# APIs that can only be accessed by superAdmin
denyApis:
  - /file_index
  - /nodes
  - /rebalance
  - /cache
  - /mount

# APIs that everyone can access
allowApis: 
  - /nodes

# Team-based access control policies
groups:
  - group: Search
    allow: 
      pathPrefixes:
        - prefix: /testbucket/search-bucket/中文目录文件.txt
        - prefix: s3://search-bucket/dir1/dir2
        # If apis not defined, default allows all APIs
        - prefix: s3://search-bucket/dir1/dir3/
          apis:
            - /load
            
  - group: Recommend
    allow:  
      pathPrefixes:
        - prefix: s3://recommend-bucket/dir1/dir2
        - prefix: s3://recommend-bucket/dir1/dir3
          apis:
            - /load

Step 2: Create the Kubernetes ConfigMap

Use the kubectl CLI to create a ConfigMap from the two files you prepared.

kubectl create configmap opa-gateway-configmap \
  --from-file=opa_auth_policy.rego \
  --from-file=opa_data.yaml

Step 3: Configure the Alluxio Gateway

Finally, modify your alluxio-cluster.yml to enable OPA and instruct the Gateway to use the ConfigMap you created.

spec:
  global:
    authentication:
      enabled: true
      type: oidc
      oidc:
        jwksUri: 
        nbfCheck: false
        roleFieldName: 
        userFieldName: 
        groupFieldName:
    authorization:
      enabled: true
      opa:
        components:
          gateway:
            configMapName: opa-gateway-configmap
            filenames:
              - opa_auth_policy.rego
              - opa_data.yaml
            query: data.opa_auth_policy.allow

After applying this configuration, the Alluxio Gateway will use your OPA policy to authorize incoming API requests.

How It Works

OPA Input Format

For every API request it receives, the Gateway constructs a JSON object that serves as the input for the OPA policy evaluation. This object contains all critical information about the request.

{
    "header": {
        "Accept": [
            "*/*"
        ],
        "Accept-Encoding": [
            "gzip, deflate, br"
        ],
        "Authorization": [
            "Bearer eyJuYW1lIjoxxxx="
        ],
        "Connection": [
            "keep-alive"
        ],
        "Postman-Token": [
            "b9844ab1-27b5-41a2-83a5-40f4d9fb74f4"
        ],
        "User-Agent": [
            "PostmanRuntime/7.42.0"
        ],
        "X-Request-Id": [
            "97be1ace-dc50-40a2-b3a1-e02061ca9504"
        ]
    },
    "method": "POST",
    "parsed_body": {
        "paths": [
            "s3://search-bucket/dir1/dir1",
			"",
			"s3://search-bucket/dir1/dir3"
        ],
        "options": {
            "batchSize": 0,
            "fileFilterRegx": "",
            "replicas": 0,
            "skipIfExists": false
        }
    },
    "path": "/api/v1/load"
}

Note: When using OIDC authentication, clients must provide an Authorization header with the Bearer prefix.

Policy Decision Flow

The reference OPA policy follows a clear decision-making process:

  1. Default Deny: All requests are denied unless explicitly allowed by a rule.

  2. SuperAdmin Override: If the user has the SuperAdmin role, the request is always allowed.

  3. Global Rules: The policy checks if the API is on a global deny-list or allow-list.

  4. Path-Based Permissions: The policy checks if the user's groups have permission to access the resource paths specified in the request.

  5. Action-Specific Rules: The final decision is made based on the action type (e.g., GET vs. POST/PUT) and the user's role (e.g., GroupAdmin vs. a standard user).

Implementation Guidelines and Testing

  • Secure Defaults: The policy is built on a "default-deny" principle, ensuring that only explicitly permitted actions are allowed.

  • Role Privileges: SuperAdmin has unrestricted access. GroupAdmin has elevated privileges to update and list resources within their authorized paths.

  • Testing: Test your policies by sending requests with different user tokens to ensure they behave as expected.

# Test using a user's OIDC token
curl -H "Authorization: Bearer YOUR_OIDC_TOKEN" \
     -H "Content-Type: application/json" \
     -d '{"path": "s3://search-bucket/dir1/dir2/test"}' \
     http://gateway-host:port/api/v1/load

Last updated