# Azure Data Lake Storage Gen2

This guide describes how to configure Alluxio with [Azure Data Lake Storage Gen2](https://docs.microsoft.com/en-in/azure/storage/blobs/data-lake-storage-introduction) as the under storage system.

## Prerequisites

The Alluxio binaries must be on your machine. You can either [compile the binaries from Alluxio source code](https://documentation.alluxio.io/os-en/contributor/building-alluxio-from-source), or [download the precompiled binaries directly](https://documentation.alluxio.io/os-en/install-alluxio/running-alluxio-locally).

In preparation for using Azure Data Lake storage with Alluxio, [create a new Data Lake storage in your Azure account](https://docs.microsoft.com/en-in/azure/storage/blobs/create-data-lake-storage-account) or use an existing Data Lake storage. You should also note the directory you want to use, either by creating a new directory, or using an existing one. You also need a [SharedKey](https://docs.microsoft.com/en-us/rest/api/storageservices/authorize-with-shared-key). For the purposes of this guide, the Azure storage account name is called `<AZURE_ACCOUNT>`, the directory in that storage account is called `<AZURE_DIRECTORY>`, and the name of the container is called `<AZURE_CONTAINER>`.

## Setup with Shared Key

### Root Mount

To use Azure Data Lake Storage as the UFS of Alluxio root mount point, you need to configure Alluxio to use under storage systems by modifying `conf/alluxio-site.properties`. If it does not exist, create the configuration file from the template.

```console
$ cp conf/alluxio-site.properties.template conf/alluxio-site.properties
```

Specify the underfs address by modifying `conf/alluxio-site.properties` to include:

```properties
alluxio.master.mount.table.root.ufs=abfs://<AZURE_CONTAINER>@<AZURE_ACCOUNT>.dfs.core.windows.net/<AZURE_DIRECTORY>/
```

Specify the Shared Key by adding the following property in `conf/alluxio-site.properties`:

```properties
alluxio.master.mount.table.root.option.fs.azure.account.key.<AZURE_ACCOUNT>.dfs.core.windows.net=<SHARED_KEY>
```

### Nested Mount

An Azure Data Lake store location can be mounted at a nested directory in the Alluxio namespace to have unified access to multiple under storage systems. Alluxio's [Command Line Interface](https://documentation.alluxio.io/os-en/operation/user-cli) can be used for this purpose.

```console
$ ./bin/alluxio fs mount \
  --option fs.azure.account.key.<AZURE_ACCOUNT>.dfs.core.windows.net=<SHARED_KEY> \
  /mnt/abfs abfs://<AZURE_CONTAINER>@<AZURE_ACCOUNT>.dfs.core.windows.net/<AZURE_DIRECTORY>/
```

After these changes, Alluxio should be configured to work with Azure Data Lake storage as its under storage system, and you can run Alluxio locally with it.

## Setup with OAuth 2.0 Client Credentials

### Root Mount

To use Azure Data Lake Storage as the UFS of Alluxio root mount point, you need to configure Alluxio to use under storage systems by modifying `conf/alluxio-site.properties`. If it does not exist, create the configuration file from the template.

```console
$ cp conf/alluxio-site.properties.template conf/alluxio-site.properties
```

Specify the underfs address by modifying `conf/alluxio-site.properties` to include:

```properties
alluxio.master.mount.table.root.ufs=abfs://<AZURE_CONTAINER>@<AZURE_ACCOUNT>.dfs.core.windows.net/<AZURE_DIRECTORY>/
```

Specify the OAuth 2.0 Client Credentials by adding the following property in `conf/alluxio-site.properties`: (Please note that for URL Endpoint, use the V1 token endpoint)

```properties
alluxio.master.mount.table.root.option.fs.azure.account.oauth2.client.endpoint=<OAUTH_ENDPOINT>
alluxio.master.mount.table.root.option.fs.azure.account.oauth2.client.id=<CLIENT_ID>
alluxio.master.mount.table.root.option.fs.azure.account.oauth2.client.secret=<CLIENT_SECRET>
```

### Nested Mount

An Azure Data Lake store location can be mounted at a nested directory in the Alluxio namespace to have unified access to multiple under storage systems. Alluxio's [Command Line Interface](https://documentation.alluxio.io/os-en/operation/user-cli) can be used for this purpose.

```console
$ ./bin/alluxio fs mount \
  --option fs.azure.account.oauth2.client.endpoint=<OAUTH_ENDPOINT> \
  --option fs.azure.account.oauth2.client.id=<CLIENT_ID> \
  --option fs.azure.account.oauth2.client.secret=<CLIENT_SECRET> \
  /mnt/abfs abfs://<AZURE_CONTAINER>@<AZURE_ACCOUNT>.dfs.core.windows.net/<AZURE_DIRECTORY>/
```

After these changes, Alluxio should be configured to work with Azure Data Lake storage as its under storage system, and you can run Alluxio locally with it.

## Setup with Azure Managed Identities

### Root Mount

To use Azure Data Lake Storage as the UFS of Alluxio root mount point, you need to configure Alluxio to use under storage systems by modifying `conf/alluxio-site.properties`. If it does not exist, create the configuration file from the template.

```console
$ cp conf/alluxio-site.properties.template conf/alluxio-site.properties
```

Specify the underfs address by modifying `conf/alluxio-site.properties` to include:

```properties
alluxio.master.mount.table.root.ufs=abfs://<AZURE_CONTAINER>@<AZURE_ACCOUNT>.dfs.core.windows.net/<AZURE_DIRECTORY>/
```

Specify the Azure Managed Identities by adding the following property in `conf/alluxio-site.properties`:

```properties
alluxio.master.mount.table.root.option.fs.azure.account.oauth2.msi.endpoint=<MSI_ENDPOINT>
alluxio.master.mount.table.root.option.fs.azure.account.oauth2.client.id=<CLIENT_ID>
alluxio.master.mount.table.root.option.fs.azure.account.oauth2.msi.tenant=<TENANT>
```

### Nested Mount

An Azure Data Lake store location can be mounted at a nested directory in the Alluxio namespace to have unified access to multiple under storage systems. Alluxio's [Command Line Interface](https://documentation.alluxio.io/os-en/operation/user-cli) can be used for this purpose.

```console
$ ./bin/alluxio fs mount \
  --option fs.azure.account.oauth2.msi.endpoint=<MSI_ENDPOINT> \
  --option fs.azure.account.oauth2.client.id=<CLIENT_ID> \
  --option fs.azure.account.oauth2.msi.tenant=<TENANT> \
  /mnt/abfs abfs://<AZURE_CONTAINER>@<AZURE_ACCOUNT>.dfs.core.windows.net/<AZURE_DIRECTORY>/
```

After these changes, Alluxio should be configured to work with Azure Data Lake storage as its under storage system, and you can run Alluxio locally with it.

## Running Alluxio Locally with Data Lake Storage

Start up Alluxio locally to see that everything works.

```console
./bin/alluxio format
./bin/alluxio-start.sh local
```

This should start an Alluxio master and an Alluxio worker. You can see the master UI at <http://localhost:19999>.

Run a simple example program:

```console
./bin/alluxio runTests
```

Visit your directory `<AZURE_DIRECTORY>` to verify the files and directories created by Alluxio exist. For this test, you should see files named like:

```
<AZURE_DIRECTORY>/default_tests_files/BASIC_CACHE_PROMOTE_CACHE_THROUGH
```

To stop Alluxio, you can run:

```console
./bin/alluxio-stop.sh local
```


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://documentation.alluxio.io/os-en/ufs/azure-data-lake-gen2.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
