Enabling Authorization

Alluxio provides a flexible authorization framework to secure both data access and management operations. Depending on your security requirements, you can choose the appropriate integration:

  • For Data Access Control (S3 API & Hadoop FS): Integrate with Apache Ranger to manage fine-grained permissions for users and applications accessing files and directories.

  • For Management API Control (Gateway): Integrate with Open Policy Agent (OPA) to enforce sophisticated, policy-based authorization for administrative actions performed through the Alluxio Gateway.

This guide provides step-by-step instructions for configuring these integrations.

Data Access Authorization with Apache Ranger

Apache Ranger provides centralized, fine-grained authorization for data access operations. By integrating Alluxio with Ranger, you can manage policies for S3 API and Hadoop Filesystem clients, ensuring that users and applications only have the permissions they need to read, write, or manage data within the Alluxio namespace. This is the recommended solution for controlling access to the data plane.

Prerequisites

Before configuring Ranger integration with Alluxio, ensure the following requirements are met:

  1. Plugin JAR Availability: Ensure the authorization plugin jar (e.g., lib/alluxio-authorization-hdfs-DA-3.7-13.0.1.jar) is available on all Alluxio nodes.

  2. Ranger HDFS Plugin: Verify that the HDFS plugin is enabled in your Ranger configuration.

  3. HDFS Service Configuration: Follow the instructions in Cloudera's HDFS Service documentation to configure a new HDFS service for Alluxio.

Configuration

Follow these steps to configure the integration.

Step 1: Set Up Configuration Files

Copy or create the following configuration files to a directory on Alluxio nodes, such as /opt/alluxio/conf/ranger/:

  • ranger-hdfs-security.xml

  • ranger-hdfs-audit.xml

Below are example configurations for these files. ranger-hdfs-security.xml

ranger-hdfs-audit.xml

Step 2: Update Ranger Configuration

Update the configuration settings in ranger-hdfs-security.xml to use the new HDFS repository:

  1. Set ranger.plugin.hdfs.policy.cache.dir to a valid directory on Alluxio nodes where you want to store the policy cache.

  2. Set ranger.plugin.hdfs.service.name to be the new HDFS service name, e.g. alluxio-test.

  3. Verify that ranger.plugin.hdfs.policy.rest.url is pointing to the correct Ranger service URL.

  4. Set xasecure.add-hadoop-authorization to false to disable fallback to Hadoop authorization.

Step 3: Configure Alluxio Nodes

To secure the S3 API with the Ranger plugin for authorization on the workers, add the following to the properties section of your alluxio-cluster.yaml.

For S3 API:

For Hadoop Filesystem: On the client side, enable authorization by adding:

Step 4: Apply Configuration

Restart all Alluxio nodes to apply the new configurations, or configure before starting up the Alluxio cluster. You can now add policies to the Alluxio service(e.g. alluxio-test) in Ranger and verify they take effect in Alluxio.

Important Notes

  1. Path Mapping: Paths in Ranger policies correspond to the paths in Alluxio namespace, not the underlying S3 objects.

  2. Permission Hierarchy:

    • Reading a file requires READ permission on the file itself.

    • Writing or creating a file requires WRITE permission on the parent directory.

    • Listing the contents of a directory requires EXECUTE permission on that directory.

  3. Policy Precedence: More specific policies take precedence over general ones. A deny policy will override an allow policy for the same resource.

  4. Repository Connection: When configuring the HDFS service in Ranger for Alluxio, the connection test is expected to fail because Ranger cannot directly communicate with Alluxio's filesystem scheme. This is normal. You can safely ignore the error, save the service, and proceed to create policies manually.

  5. Troubleshooting: If authorization fails unexpectedly, check:

    • Ranger policy cache refresh status.

    • Alluxio logs for detailed error messages.

    • User authentication is working correctly.

Management API Authorization with Open Policy Agent (OPA)

For controlling access to administrative and management operations, Alluxio Gateway integrates with Open Policy Agent (OPA). OPA enables you to define and enforce fine-grained, policy-based access control for all Gateway API endpoints. This is the recommended solution for securing the management plane, allowing you to create sophisticated rules based on user identity (role, group) and the specific API operations being requested.

Core Concepts

Before configuring, it's helpful to understand the three main components:

  1. OPA Policy (.rego file): The "logic engine" for authorization. This file contains a set of rules written in the Rego language that define the conditions for allowing or denying a request.

  2. OPA Data (.yaml file): The "configuration" that the policy uses. This file provides the specific data, such as lists of super-admins, group definitions, and path permissions that the Rego rules consult to make decisions.

  3. Kubernetes ConfigMap: The standard Kubernetes method for providing the policy and data files to the Alluxio Gateway Pod.

Setup and Configuration

Follow these steps to enable and configure OPA authorization for the Gateway.

Step 1: Prepare OPA Policy and Data Files

First, create the policy (opa_auth_policy.rego) and data (opa_data.yaml) files locally.

1. opa_auth_policy.rego (Policy File)

This is a comprehensive reference policy that you can use as a starting point or customize for your needs.

2. opa_data.yaml (Data File)

Define your roles, API restrictions, and group and path-based permissions in this file.

Step 2: Create the Kubernetes ConfigMap

Use the kubectl CLI to create a ConfigMap from the two files you prepared.

Step 3: Configure the Alluxio Gateway

Finally, modify your alluxio-cluster.yaml to enable OPA and instruct the Gateway to use the ConfigMap you created.

After applying this configuration, the Alluxio Gateway will use your OPA policy to authorize incoming API requests.

How It Works

OPA Input Format

For every API request it receives, the Gateway constructs a JSON object that serves as the input for the OPA policy evaluation. This object contains all critical information about the request.

Note: When using OIDC authentication, clients must provide an Authorization header with the Bearer prefix.

Policy Decision Flow

The reference OPA policy follows a clear decision-making process:

  1. Default Deny: All requests are denied unless explicitly allowed by a rule.

  2. SuperAdmin Override: If the user has the SuperAdmin role, the request is always allowed.

  3. Global Rules: The policy checks if the API is on a global deny-list or allow-list.

  4. Group and Path Based Permissions: The policy checks if the user's groups have permission to access the resource paths specified in the request.

  5. Action-Specific Rules: The final decision is made based on the action type (e.g., GET vs. POST/PUT) and the user's role (e.g., GroupAdmin vs. a standard user).

Implementation Guidelines and Testing

  • Secure Defaults: The policy is built on a "default-deny" principle, ensuring that only explicitly permitted actions are allowed.

  • Role Privileges: SuperAdmin has unrestricted access. GroupAdmin has elevated privileges to update and list resources within their authorized paths.

  • Testing: Test your policies by sending requests with different user tokens to ensure they behave as expected.

Appendix: Permission Reference

This appendix provides a detailed breakdown of the permissions required for various S3 API and Hadoop Filesystem operations in Alluxio.

S3 API Authorization

Authorization is divided into object-level and bucket-level tasks, with recommended ACLs for each operation.

Object Operations ACL Matrix

Operation Type
HTTP Method
Condition
Target Resource
Required Permission

ListParts

GET

uploadId in query

Current Object

READ

GetObjectTagging

GET

tagging in query

Current Object

READ

GetObject

GET

(GET Method for other cases)

Current Object

READ

PutObjectTagging

PUT

tagging in query

Current Object

WRITE

UploadPartCopy

PUT

uploadId in query AND x-amz-copy-source

Current Object

READ

Target parent Dir

WRITE

UploadPart

PUT

uploadId in query

Current Object

WRITE

CopyObject

PUT

x-amz-copy-source header

Source Object

READ

Target parent Dir

WRITE

PutObject

PUT

(PUT Method for other cases)

Parent Directory

WRITE

CreateMultipartUpload

POST

uploads in query

Parent Directory

WRITE

CompleteMultipartUpload

POST

uploadId in query

Parent Directory

WRITE

HeadObject

HEAD

-

Current Object

READ

AbortMultipartUpload

DELETE

uploadId in query

Current Object

WRITE

DeleteObjectTagging

DELETE

tagging in query

Current Object

WRITE

DeleteObject

DELETE

(DELETE method for other cases)

Current Object

WRITE

Bucket Operations ACL Matrix

Operation Type
HTTP Method
Condition
Target Resource
Required Permission

ListBuckets

GET

bucket is empty

Root Path

EXECUTE

GetBucketTagging

GET

tagging in query

Bucket Directory

READ

ListMultipartUploads

GET

uploads in query

Bucket Directory

EXECUTE

ListObjects

GET

GET Method for other cases

Requested Directory

EXECUTE

PutBucketTagging

PUT

tagging in query

Bucket Directory

WRITE

CreateBucket

PUT

(PUT Method for other cases)

Root Path

WRITE

DeleteObjects

POST

delete in query

Each Object Specified

WRITE (per object)

HeadBucket

HEAD

-

Bucket Directory

READ

DeleteBucketTagging

DELETE

tagging in query

Bucket Directory

WRITE

DeleteBucket

DELETE

(DELETE method for other cases)

Bucket Directory

WRITE

Hadoop Filesystem Authorization

Alluxio's Hadoop filesystem integration requires permission checks for file and directory operations.

Hadoop Filesystem Functions and Permissions

Function
CheckPermission
Check Path
Permission
Comments

append

Yes

Current path

WRITE

Appends data to an existing file.

create

Yes

Parent path

WRITE

Creates a new file or overwrites an existing one.

delete

Yes

Parent path

WRITE

Deletes a file or directory.

getFileBlockLocations

Yes

Current path

READ

Returns block locations for a file.

getFileStatus

Yes

Current path

READ

Returns status of a file or directory.

setOwner

Yes

Current path

WRITE

Sets owner/group for a file or directory.

setPermission

Yes

Current path

WRITE

Sets permissions for a file or directory.

listStatus

Yes

Current path

EXECUTE

Lists files/directories in a path.

mkdirs

Yes

Parent path

WRITE

Creates directory and necessary parents.

open

Yes

Current path

READ

Opens a file for reading.

rename

Yes

Src Parent, Dst Parent

WRITE, WRITE

Renames a file or directory.

Last updated