Alluxio
ProductsLanguageHome
  • Introduction
  • Overview
    • Architecture
    • Job Service
    • Quick Start Guide
    • FAQ
    • Use Cases
  • Core Services
    • Caching
    • Unified Namespace
  • Install Alluxio
    • Local Machine
    • Cluster
    • Cluster with HA
    • Docker
    • Software Requirements
  • Kubernetes
    • Deploy
    • Spark on Kubernetes
    • Metrics
  • Cloud Native
    • Alibaba Cloud ACK
    • AWS EMR
    • Tencent EMR
    • Google Dataproc
  • Compute Integration
    • Apache Spark
    • Apache Hadoop MapReduce
    • Apache Flink
    • Apache Hive
    • Presto on Iceberg (Experimental)
    • Presto
    • Trino
    • Tensorflow
  • Storage Integrations
    • Amazon AWS S3
    • HDFS
    • Azure Blob Store
    • Azure Data Lake Storage Gen2
    • Azure Data Lake Storage
    • Google Cloud Storage
    • Qiniu Kodo
    • COSN
    • CephObjectStorage
    • MinIO
    • NFS
    • Aliyun Object Storage Service
    • Ozone
    • Swift
    • WEB
    • CephFS
  • Security
  • Operations
    • Configuration Settings
    • User CLI
    • Admin CLI
    • Web UI
    • Journal Management
    • Metastore Management
    • Metrics
  • Administration
    • Troubleshooting
    • Basic Logging
    • Remote Logging
    • Performance Tuning
    • Scalability Tuning
    • StressBench (Experimental)
    • Upgrading
  • Solutions
  • Client APIs
    • Java API
    • S3 API
    • REST API
    • POSIX API
  • Contributor Resources
    • Building Alluxio From Source
    • Contribution Guide
    • Code Conventions
    • Documentation Conventions
    • Contributor Tools
  • Reference
    • List Of Configuration Properties
    • List of Metrics
  • REST API
    • Master REST API
    • Worker REST API
    • Proxy REST API
    • Job REST API
  • Javadoc
Powered by GitBook
On this page
  • Prerequisites
  • Setup with Shared Key
  • Root Mount
  • Nested Mount
  • Setup with OAuth 2.0 Client Credentials
  • Root Mount
  • Nested Mount
  • Setup with Azure Managed Identities
  • Root Mount
  • Nested Mount
  • Running Alluxio Locally with Data Lake Storage
  1. Storage Integrations

Azure Data Lake Storage Gen2

Last updated 6 months ago

This guide describes how to configure Alluxio with as the under storage system.

Prerequisites

The Alluxio binaries must be on your machine. You can either , or .

In preparation for using Azure Data Lake storage with Alluxio, or use an existing Data Lake storage. You should also note the directory you want to use, either by creating a new directory, or using an existing one. You also need a . For the purposes of this guide, the Azure storage account name is called <AZURE_ACCOUNT>, the directory in that storage account is called <AZURE_DIRECTORY>, and the name of the container is called <AZURE_CONTAINER>.

Setup with Shared Key

Root Mount

To use Azure Data Lake Storage as the UFS of Alluxio root mount point, you need to configure Alluxio to use under storage systems by modifying conf/alluxio-site.properties. If it does not exist, create the configuration file from the template.

$ cp conf/alluxio-site.properties.template conf/alluxio-site.properties

Specify the underfs address by modifying conf/alluxio-site.properties to include:

alluxio.master.mount.table.root.ufs=abfs://<AZURE_CONTAINER>@<AZURE_ACCOUNT>.dfs.core.windows.net/<AZURE_DIRECTORY>/

Specify the Shared Key by adding the following property in conf/alluxio-site.properties:

alluxio.master.mount.table.root.option.fs.azure.account.key.<AZURE_ACCOUNT>.dfs.core.windows.net=<SHARED_KEY>

Nested Mount

An Azure Data Lake store location can be mounted at a nested directory in the Alluxio namespace to have unified access to multiple under storage systems. Alluxio's can be used for this purpose.

$ ./bin/alluxio fs mount \
  --option fs.azure.account.key.<AZURE_ACCOUNT>.dfs.core.windows.net=<SHARED_KEY> \
  /mnt/abfs abfs://<AZURE_CONTAINER>@<AZURE_ACCOUNT>.dfs.core.windows.net/<AZURE_DIRECTORY>/

After these changes, Alluxio should be configured to work with Azure Data Lake storage as its under storage system, and you can run Alluxio locally with it.

Setup with OAuth 2.0 Client Credentials

Root Mount

To use Azure Data Lake Storage as the UFS of Alluxio root mount point, you need to configure Alluxio to use under storage systems by modifying conf/alluxio-site.properties. If it does not exist, create the configuration file from the template.

$ cp conf/alluxio-site.properties.template conf/alluxio-site.properties

Specify the underfs address by modifying conf/alluxio-site.properties to include:

alluxio.master.mount.table.root.ufs=abfs://<AZURE_CONTAINER>@<AZURE_ACCOUNT>.dfs.core.windows.net/<AZURE_DIRECTORY>/

Specify the OAuth 2.0 Client Credentials by adding the following property in conf/alluxio-site.properties: (Please note that for URL Endpoint, use the V1 token endpoint)

alluxio.master.mount.table.root.option.fs.azure.account.oauth2.client.endpoint=<OAUTH_ENDPOINT>
alluxio.master.mount.table.root.option.fs.azure.account.oauth2.client.id=<CLIENT_ID>
alluxio.master.mount.table.root.option.fs.azure.account.oauth2.client.secret=<CLIENT_SECRET>

Nested Mount

$ ./bin/alluxio fs mount \
  --option fs.azure.account.oauth2.client.endpoint=<OAUTH_ENDPOINT> \
  --option fs.azure.account.oauth2.client.id=<CLIENT_ID> \
  --option fs.azure.account.oauth2.client.secret=<CLIENT_SECRET> \
  /mnt/abfs abfs://<AZURE_CONTAINER>@<AZURE_ACCOUNT>.dfs.core.windows.net/<AZURE_DIRECTORY>/

After these changes, Alluxio should be configured to work with Azure Data Lake storage as its under storage system, and you can run Alluxio locally with it.

Setup with Azure Managed Identities

Root Mount

To use Azure Data Lake Storage as the UFS of Alluxio root mount point, you need to configure Alluxio to use under storage systems by modifying conf/alluxio-site.properties. If it does not exist, create the configuration file from the template.

$ cp conf/alluxio-site.properties.template conf/alluxio-site.properties

Specify the underfs address by modifying conf/alluxio-site.properties to include:

alluxio.master.mount.table.root.ufs=abfs://<AZURE_CONTAINER>@<AZURE_ACCOUNT>.dfs.core.windows.net/<AZURE_DIRECTORY>/

Specify the Azure Managed Identities by adding the following property in conf/alluxio-site.properties:

alluxio.master.mount.table.root.option.fs.azure.account.oauth2.msi.endpoint=<MSI_ENDPOINT>
alluxio.master.mount.table.root.option.fs.azure.account.oauth2.client.id=<CLIENT_ID>
alluxio.master.mount.table.root.option.fs.azure.account.oauth2.msi.tenant=<TENANT>

Nested Mount

$ ./bin/alluxio fs mount \
  --option fs.azure.account.oauth2.msi.endpoint=<MSI_ENDPOINT> \
  --option fs.azure.account.oauth2.client.id=<CLIENT_ID> \
  --option fs.azure.account.oauth2.msi.tenant=<TENANT> \
  /mnt/abfs abfs://<AZURE_CONTAINER>@<AZURE_ACCOUNT>.dfs.core.windows.net/<AZURE_DIRECTORY>/

After these changes, Alluxio should be configured to work with Azure Data Lake storage as its under storage system, and you can run Alluxio locally with it.

Running Alluxio Locally with Data Lake Storage

Start up Alluxio locally to see that everything works.

./bin/alluxio format
./bin/alluxio-start.sh local

Run a simple example program:

./bin/alluxio runTests

Visit your directory <AZURE_DIRECTORY> to verify the files and directories created by Alluxio exist. For this test, you should see files named like:

<AZURE_DIRECTORY>/default_tests_files/BASIC_CACHE_PROMOTE_CACHE_THROUGH

To stop Alluxio, you can run:

./bin/alluxio-stop.sh local

An Azure Data Lake store location can be mounted at a nested directory in the Alluxio namespace to have unified access to multiple under storage systems. Alluxio's can be used for this purpose.

An Azure Data Lake store location can be mounted at a nested directory in the Alluxio namespace to have unified access to multiple under storage systems. Alluxio's can be used for this purpose.

This should start an Alluxio master and an Alluxio worker. You can see the master UI at .

Azure Data Lake Storage Gen2
compile the binaries from Alluxio source code
download the precompiled binaries directly
create a new Data Lake storage in your Azure account
SharedKey
Command Line Interface
Command Line Interface
Command Line Interface
http://localhost:19999