Enabling Authorization
Alluxio provides a flexible authorization framework to secure both data access and management operations. Depending on your security requirements, you can choose the appropriate integration:
For Data Access Control (S3 API & Hadoop FS): Integrate with Apache Ranger to manage fine-grained permissions for users and applications accessing files and directories.
For Management API Control (Gateway): Integrate with Open Policy Agent (OPA) to enforce sophisticated, policy-based authorization for administrative actions performed through the Alluxio Gateway.
This guide provides step-by-step instructions for configuring these integrations.
Data Access Authorization with Apache Ranger
Apache Ranger provides centralized, fine-grained authorization for data access operations. By integrating Alluxio with Ranger, you can manage policies for S3 API and Hadoop Filesystem clients, ensuring that users and applications only have the permissions they need to read, write, or manage data within the Alluxio namespace. This is the recommended solution for controlling access to the data plane.
Prerequisites
Before configuring Ranger integration with Alluxio, ensure the following requirements are met:
Plugin JAR Availability: Ensure the authorization plugin jar (e.g.,
lib/alluxio-authorization-hdfs-AI-3.7-13.0.0.jar
) is available on all Alluxio nodes.Ranger HDFS Plugin: Verify that the HDFS plugin is enabled in your Ranger configuration.
HDFS Service Configuration: Follow the instructions in Cloudera's HDFS Service documentation to configure a new HDFS service for Alluxio.
Configuration
Follow these steps to configure the integration.
Step 1: Set Up Configuration Files
Copy or create the following configuration files to a directory on Alluxio nodes, such as /opt/alluxio/conf/ranger/
:
ranger-hdfs-security.xml
ranger-hdfs-audit.xml
Below are example configurations for these files. ranger-hdfs-security.xml
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration xmlns:xi="http://www.w3.org/2001/XInclude">
<property>
<name>ranger.plugin.hdfs.service.name</name>
<value>alluxio-test</value>
<description>
Name of the Ranger service containing policies for this YARN instance
</description>
</property>
<property>
<name>ranger.plugin.hdfs.policy.source.impl</name>
<value>org.apache.ranger.admin.client.RangerAdminRESTClient</value>
<description>
Class to retrieve policies from the source
</description>
</property>
<property>
<name>ranger.plugin.hdfs.policy.rest.url</name>
<value>http://your-ranger-ip:6080</value>
<description>
URL to Ranger Admin
</description>
</property>
<property>
<name>ranger.plugin.hdfs.policy.pollIntervalMs</name>
<value>30000</value>
<description>
How often to poll for changes in policies?
</description>
</property>
<property>
<name>ranger.plugin.hdfs.policy.cache.dir</name>
<value>/path/to/policycache</value>
<description>
Directory where Ranger policies are cached after successful retrieval from the source
</description>
</property>
<property>
<name>ranger.plugin.hdfs.policy.rest.client.connection.timeoutMs</name>
<value>120000</value>
<description>
Hdfs Plugin RangerRestClient Connection Timeout in Milli Seconds
</description>
</property>
<property>
<name>ranger.plugin.hdfs.policy.rest.client.read.timeoutMs</name>
<value>30000</value>
<description>
Hdfs Plugin RangerRestClient read Timeout in Milli Seconds
</description>
</property>
<!-- The following fields are used to customize the audit logging feature -->
<property>
<name>xasecure.auditlog.xasecureAcl.name</name>
<value>ranger-acl</value>
<description>
The module name listed in the auditlog when the permission check is done by RangerACL
</description>
</property>
<property>
<name>xasecure.add-hadoop-authorization</name>
<value>false</value>
<description>
Enable/Disable the default hadoop authorization (based on
rwxrwxrwx permission on the resource) if Ranger Authorization fails.
</description>
</property>
</configuration>
ranger-hdfs-audit.xml
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration xmlns:xi="http://www.w3.org/2001/XInclude">
<property>
<name>xasecure.audit.is.enabled</name>
<value>false</value>
</property>
</configuration>
Step 2: Update Ranger Configuration
Update the configuration settings in ranger-hdfs-security.xml
to use the new HDFS repository:
Set
ranger.plugin.hdfs.policy.cache.dir
to a valid directory on Alluxio nodes where you want to store the policy cache.Set
ranger.plugin.hdfs.service.name
to be the new HDFS service name, e.g.alluxio-test
.Verify that
ranger.plugin.hdfs.policy.rest.url
is pointing to the correct Ranger service URL.Set
xasecure.add-hadoop-authorization
tofalse
to disable fallback to Hadoop authorization.
Step 3: Configure Alluxio Nodes
To secure the S3 API with the Ranger plugin for authorization on the workers, add the following to the properties
section of your alluxio-cluster.yaml
.
For S3 API:
properties:
alluxio.worker.s3.authorization.enabled: "true"
alluxio.worker.s3.authorizer.classname: "alluxio.s3.auth.ExtendableAuthorizer"
alluxio.security.authorization.plugin.name: "ranger-privacera-4.7"
alluxio.security.authorization.plugin.paths: "/opt/alluxio/conf/ranger/"
# or wherever you copied the ranger configuration xml files
For Hadoop Filesystem: On the client side, enable authorization by adding:
alluxio.security.authorization.plugin.name=ranger-privacera-4.7
alluxio.security.authorization.plugin.paths=/opt/alluxio/conf/ranger/
# or wherever you copied the ranger configuration xml files
Step 4: Apply Configuration
Restart all Alluxio nodes to apply the new configurations, or configure before starting up the Alluxio cluster. You can now add policies to the Alluxio service(e.g. alluxio-test
) in Ranger and verify they take effect in Alluxio.
Important Notes
Path Mapping: Paths in Ranger policies correspond to the paths in Alluxio namespace, not the underlying S3 objects.
Permission Hierarchy:
Reading a file requires
READ
permission on the file itself.Writing or creating a file requires
WRITE
permission on the parent directory.Listing the contents of a directory requires
EXECUTE
permission on that directory.
Policy Precedence: More specific policies take precedence over general ones. A deny policy will override an allow policy for the same resource.
Repository Connection: When configuring the HDFS service in Ranger for Alluxio, the connection test is expected to fail because Ranger cannot directly communicate with Alluxio's filesystem scheme. This is normal. You can safely ignore the error, save the service, and proceed to create policies manually.
Troubleshooting: If authorization fails unexpectedly, check:
Ranger policy cache refresh status.
Alluxio logs for detailed error messages.
User authentication is working correctly.
Management API Authorization with Open Policy Agent (OPA)
For controlling access to administrative and management operations, Alluxio Gateway integrates with Open Policy Agent (OPA). OPA enables you to define and enforce fine-grained, policy-based access control for all Gateway API endpoints. This is the recommended solution for securing the management plane, allowing you to create sophisticated rules based on user identity (role, group) and the specific API operations being requested.
Core Concepts
Before configuring, it's helpful to understand the three main components:
OPA Policy (
.rego
file): The "logic engine" for authorization. This file contains a set of rules written in the Rego language that define the conditions for allowing or denying a request.OPA Data (
.yaml
file): The "configuration" that the policy uses. This file provides the specific data, such as lists of super-admins, group definitions, and path permissions that the Rego rules consult to make decisions.Kubernetes ConfigMap: The standard Kubernetes method for providing the policy and data files to the Alluxio Gateway Pod.
Setup and Configuration
Follow these steps to enable and configure OPA authorization for the Gateway.
Step 1: Prepare OPA Policy and Data Files
First, create the policy (opa_auth_policy.rego
) and data (opa_data.yaml
) files locally.
1. opa_auth_policy.rego
(Policy File)
This is a comprehensive reference policy that you can use as a starting point or customize for your needs.
package opa_auth_policy
import rego.v1
# -----------------------------------------------------------------------------
# --- Default Policy
# -----------------------------------------------------------------------------
# By default, deny all requests.
default allow := false
# -----------------------------------------------------------------------------
# --- Main Authorization Rules
# --- These are the top-level rules that determine the final 'allow' decision.
# -----------------------------------------------------------------------------
# Rule 1: SuperAdmins are allowed to do anything, bypassing all other checks.
allow if {
user_is_superadmin
}
# Rule 2: For users who pass the base checks, allow GET access to globally allowed APIs.
allow if {
base_checks_pass
method_is_get
api_in_global_allow_list
}
# Rule 3: Admins can list resources (i.e., a GET request with no specific path or ID).
allow if {
base_checks_pass
user_is_admin
method_is_get
request_is_for_listing
}
# Rule 4: Allow any user who passes base checks to GET resources if all requested paths are permitted.
allow if {
base_checks_pass
method_is_get
all_request_paths_are_allowed
}
# Rule 5: Allow Admins to perform update actions if all requested paths are permitted.
allow if {
base_checks_pass
user_is_admin
method_is_update
all_request_paths_are_allowed
}
# Rule 6: Allow requests that provide an ID but no path.
# This handles a specific update scenario.
allow if {
base_checks_pass
user_is_admin
method_is_update
request_has_id_but_no_path
}
# Rule 7: Allow requests that provide an ID but no path.
# This handles a specific GET scenario.
allow if {
base_checks_pass
method_is_get
request_has_id_but_no_path
}
# -----------------------------------------------------------------------------
# --- Core Logic Helpers
# -----------------------------------------------------------------------------
# Combines common checks for non-superadmin users to improve performance.
base_checks_pass if {
not user_is_superadmin
user_is_valid
not api_in_global_deny_list
}
# Checks if all paths in the request are in the user's allow list and not in the deny list.
all_request_paths_are_allowed if {
count(relevant_paths) > 0
count({x | relevant_paths[x]; path_is_allowed(x)}) == count(relevant_paths)
count({x | relevant_paths[x]; path_is_denied(x)}) == 0
}
# Determines if the request is a "listing" operation (no ID and no paths).
request_is_for_listing if {
not request_has_id
count(relevant_paths) == 0
}
# Determines if the request contains an ID but no paths.
request_has_id_but_no_path if {
request_has_id
count(relevant_paths) == 0
}
# -----------------------------------------------------------------------------
# --- Permission Detail Helpers
# -----------------------------------------------------------------------------
# Checks if a path is in the user's allowed prefixes.
path_is_allowed(path) if {
some i, j
some group in input_groups
group == data.groups[i].group
clean_path := trim_suffix(path, "/")
clean_prefix := trim_suffix(data.groups[i].allow.pathPrefixes[j].prefix, "/")
strings.any_prefix_match(clean_path, clean_prefix)
is_valid_prefix_match(clean_path, clean_prefix)
api_is_valid_for_path_rule(data.groups[i].allow.pathPrefixes[j])
}
# Checks if a path is in the user's denied prefixes.
path_is_denied(path) if {
some i, j
some group in input_groups
group == data.groups[i].group
clean_path := trim_suffix(path, "/")
clean_prefix := trim_suffix(data.groups[i].deny.pathPrefixes[j].prefix, "/")
strings.any_prefix_match(clean_path, clean_prefix)
is_valid_prefix_match(clean_path, clean_prefix)
api_is_valid_for_path_rule(data.groups[i].deny.pathPrefixes[j])
}
# Validates that the prefix match is legitimate.
# This rule is true if the prefix is an exact match to the path.
is_valid_prefix_match(path, prefix) if {
strings.any_prefix_match(path, prefix)
suffix := trim_prefix(path, prefix)
suffix == ""
}
# This rule is true if the prefix matches a directory boundary.
# Example: prefix "/a/b" matches path "/a/b/c".
is_valid_prefix_match(path, prefix) if {
strings.any_prefix_match(path, prefix)
suffix := trim_prefix(path, prefix)
startswith(suffix, "/")
}
# Checks if the current API is valid for the given path rule.
# Rule 1: If the path rule does not specify an 'apis' list, it applies to all APIs.
api_is_valid_for_path_rule(rule) if {
not rule.apis
}
# Rule 2: If the path rule specifies an 'apis' list, the current API must be in that list.
api_is_valid_for_path_rule(rule) if {
input_api in rule.apis
}
# -----------------------------------------------------------------------------
# --- Request Parsing Helpers
# -----------------------------------------------------------------------------
# Extract the API endpoint from the request path.
input_api := split(input.path, "/v1")[1]
# HTTP method checks.
method_is_get if input.method == "GET"
method_is_update if input.method != "GET"
# Extracts paths from multiple possible locations in the request.
# Uses the 'contains' keyword to incrementally build the set of paths.
paths_from_request contains path if {
path := input.query.path[0]
}
paths_from_request contains path if {
path := input.parsed_body.path
}
paths_from_request contains path if {
path := input.parsed_body.paths[_]
}
paths_from_request contains path if {
path := input.parsed_body.index
}
# Collects all non-empty paths from the request for validation.
relevant_paths := {p | some p in paths_from_request; p != ""}
# Checks if the request contains an ID.
request_has_id if input.parsed_body.id != ""
request_has_id if input.query.id != ""
# Global API list checks.
api_in_global_deny_list if input_api in data.denyApis
api_in_global_allow_list if input_api in data.allowApis
# -----------------------------------------------------------------------------
# --- User & Role Helpers
# -----------------------------------------------------------------------------
claims := payload if {
token := input.header.Authorization
count(token) != 0
startswith(token[0], "Bearer ")
bearer_token := substring(token[0], count("Bearer "), -1)
[_, payload, _] := io.jwt.decode(bearer_token)
}
else := user_info if {
token := input.header.Authorization
count(token) != 0
not startswith(token[0], "Bearer ")
base64.is_valid(token[0])
ui = base64.decode(token[0])
json.is_valid(ui)
user_info = json.unmarshal(ui)
}
default input_roles := []
input_roles := claims.roleFieldName if {
claims.roleFieldName != ""
is_array(claims.roleFieldName)
}
else := [claims.roleFieldName] if {
claims.roleFieldName != ""
is_string(claims.roleFieldName)
}
else := claims.role if {
claims.role != ""
is_array(claims.role)
}
else := [claims.role] if {
claims.role != ""
is_string(claims.role)
}
default input_groups := []
input_groups := claims.groupFieldName if {
claims.groupFieldName != ""
is_array(claims.groupFieldName)
}
else := [claims.groupFieldName] if {
claims.groupFieldName != ""
is_string(claims.groupFieldName)
}
else := claims.group if {
claims.group != ""
is_array(claims.group)
}
else := [claims.group] if {
claims.group != ""
is_string(claims.group)
}
user_is_valid if {
count(input_roles) > 0
count(input_groups) > 0
}
user_is_superadmin if {
count(input_roles) > 0
some i
some role in input_roles
role == data.superAdmin[i]
}
user_is_admin if {
some i
some role in input_roles
role == data.groupAdmin[i]
}
2. opa_data.yaml
(Data File)
Define your roles, API restrictions, and group and path-based permissions in this file.
# Super Admin definition
superAdmin: ["SuperAdmin"]
# Admin User definition
groupAdmin: ["GroupAdmin"]
# APIs that can only be accessed by superAdmin
denyApis:
- /file_index
- /nodes
- /rebalance
- /cache
- /mount
# APIs that everyone can access
allowApis:
# Group and path based permissions define.
groups:
- group: Search
allow:
pathPrefixes:
- prefix: s3://search-bucket/dir1/dir2
# If apis not defined, default allows all APIs
- prefix: s3://search-bucket/dir1/dir3/
apis:
- /load
- group: Recommend
allow:
pathPrefixes:
- prefix: s3://recommend-bucket/dir1/dir2
- prefix: s3://recommend-bucket/dir1/dir3
apis:
- /load
Step 2: Create the Kubernetes ConfigMap
Use the kubectl
CLI to create a ConfigMap from the two files you prepared.
kubectl create configmap opa-gateway-configmap \
--from-file=opa_auth_policy.rego \
--from-file=opa_data.yaml
Step 3: Configure the Alluxio Gateway
Finally, modify your alluxio-cluster.yaml
to enable OPA and instruct the Gateway to use the ConfigMap you created.
spec:
global:
authentication:
enabled: true
type: oidc
oidc:
jwksUri:
nbfCheck: false
roleFieldName:
userFieldName:
groupFieldName:
authorization:
enabled: true
opa:
components:
gateway:
configMapName: opa-gateway-configmap
filenames:
- opa_auth_policy.rego
- opa_data.yaml
query: data.opa_auth_policy.allow
After applying this configuration, the Alluxio Gateway will use your OPA policy to authorize incoming API requests.
How It Works
OPA Input Format
For every API request it receives, the Gateway constructs a JSON object that serves as the input
for the OPA policy evaluation. This object contains all critical information about the request.
{
"header": {
"Accept": [
"*/*"
],
"Accept-Encoding": [
"gzip, deflate, br"
],
"Authorization": [
"Bearer eyJuYW1lIjoxxxx="
],
"Connection": [
"keep-alive"
],
"Postman-Token": [
"b9844ab1-27b5-41a2-83a5-40f4d9fb74f4"
],
"User-Agent": [
"PostmanRuntime/7.42.0"
],
"X-Request-Id": [
"97be1ace-dc50-40a2-b3a1-e02061ca9504"
]
},
"method": "POST",
"parsed_body": {
"paths": [
"s3://search-bucket/dir1/dir1",
"",
"s3://search-bucket/dir1/dir3"
],
"options": {
"batchSize": 0,
"fileFilterRegx": "",
"replicas": 0,
"skipIfExists": false
}
},
"path": "/api/v1/load"
}
Note: When using OIDC authentication, clients must provide an
Authorization
header with theBearer
prefix.
Policy Decision Flow
The reference OPA policy follows a clear decision-making process:
Default Deny: All requests are denied unless explicitly allowed by a rule.
SuperAdmin Override: If the user has the
SuperAdmin
role, the request is always allowed.Global Rules: The policy checks if the API is on a global deny-list or allow-list.
Group and Path Based Permissions: The policy checks if the user's groups have permission to access the resource paths specified in the request.
Action-Specific Rules: The final decision is made based on the action type (e.g.,
GET
vs.POST
/PUT
) and the user's role (e.g.,GroupAdmin
vs. a standard user).
Implementation Guidelines and Testing
Secure Defaults: The policy is built on a "default-deny" principle, ensuring that only explicitly permitted actions are allowed.
Role Privileges:
SuperAdmin
has unrestricted access.GroupAdmin
has elevated privileges to update and list resources within their authorized paths.Testing: Test your policies by sending requests with different user tokens to ensure they behave as expected.
# Test using a user's OIDC token
curl --location 'http://gateway-host:port/api/v1/load' \
--header 'Authorization: Bearer <YOUR_OIDC_TOKEN>' \
--header 'Content-Type: application/json' \
--data '{"paths": ["s3://search-bucket/dir1/dir2/test"], "alias":"loads3a"}'
Appendix: Permission Reference
This appendix provides a detailed breakdown of the permissions required for various S3 API and Hadoop Filesystem operations in Alluxio.
S3 API Authorization
Authorization is divided into object-level and bucket-level tasks, with recommended ACLs for each operation.
Object Operations ACL Matrix
ListParts
GET
uploadId in query
Current Object
READ
GetObjectTagging
GET
tagging in query
Current Object
READ
GetObject
GET
(GET Method for other cases)
Current Object
READ
PutObjectTagging
PUT
tagging in query
Current Object
WRITE
UploadPartCopy
PUT
uploadId in query AND x-amz-copy-source
Current Object
READ
Target parent Dir
WRITE
UploadPart
PUT
uploadId in query
Current Object
WRITE
CopyObject
PUT
x-amz-copy-source header
Source Object
READ
Target parent Dir
WRITE
PutObject
PUT
(PUT Method for other cases)
Parent Directory
WRITE
CreateMultipartUpload
POST
uploads in query
Parent Directory
WRITE
CompleteMultipartUpload
POST
uploadId in query
Parent Directory
WRITE
HeadObject
HEAD
-
Current Object
READ
AbortMultipartUpload
DELETE
uploadId in query
Current Object
WRITE
DeleteObjectTagging
DELETE
tagging in query
Current Object
WRITE
DeleteObject
DELETE
(DELETE method for other cases)
Current Object
WRITE
Bucket Operations ACL Matrix
ListBuckets
GET
bucket is empty
Root Path
EXECUTE
GetBucketTagging
GET
tagging in query
Bucket Directory
READ
ListMultipartUploads
GET
uploads in query
Bucket Directory
EXECUTE
ListObjects
GET
GET Method for other cases
Requested Directory
EXECUTE
PutBucketTagging
PUT
tagging in query
Bucket Directory
WRITE
CreateBucket
PUT
(PUT Method for other cases)
Root Path
WRITE
DeleteObjects
POST
delete in query
Each Object Specified
WRITE (per object)
HeadBucket
HEAD
-
Bucket Directory
READ
DeleteBucketTagging
DELETE
tagging in query
Bucket Directory
WRITE
DeleteBucket
DELETE
(DELETE method for other cases)
Bucket Directory
WRITE
Hadoop Filesystem Authorization
Alluxio's Hadoop filesystem integration requires permission checks for file and directory operations.
Hadoop Filesystem Functions and Permissions
append
Yes
Current path
WRITE
Appends data to an existing file.
create
Yes
Parent path
WRITE
Creates a new file or overwrites an existing one.
delete
Yes
Parent path
WRITE
Deletes a file or directory.
getFileBlockLocations
Yes
Current path
READ
Returns block locations for a file.
getFileStatus
Yes
Current path
READ
Returns status of a file or directory.
setOwner
Yes
Current path
WRITE
Sets owner/group for a file or directory.
setPermission
Yes
Current path
WRITE
Sets permissions for a file or directory.
listStatus
Yes
Current path
EXECUTE
Lists files/directories in a path.
mkdirs
Yes
Parent path
WRITE
Creates directory and necessary parents.
open
Yes
Current path
READ
Opens a file for reading.
rename
Yes
Src Parent, Dst Parent
WRITE, WRITE
Renames a file or directory.
Last updated