Alluxio Namespace and Under File System Namespaces
Last updated
Last updated
Alluxio enables effective data management across different storage systems. Alluxio provides a unified view of all data sources. Alluxio achieves this by using a mount table to map paths in Alluxio to those storage systems.
We use the term "Under File System (UFS)" for a storage system managed and cached by Alluxio. Alluxio is built on top of the storage layer, providing cache speed-up and various other data management functionalities. Therefore, those storage systems are "under" the Alluxio layer.
A user "mounts" a UFS to an Alluxio path. The example below illustrates how a user mounts an S3 bucket and a GCS bucket to Alluxio.
The mount table for the example above will look like:
Alluxio supports using ETCD to store the mount table information. By storing the mount table in ETCD, all Alluxio processes (clients, workers, etc.) will read ETCD for the mount table information.
To use ETCD as the mount table backend, add the following configurations to alluxio-site.properties
:
Set alluxio.etcd.endpoints
to be the list of instances in the ETCD cluster, e.g.
ETCD is required to be running when Alluxio processes start to initialize the mount table. Subsequently they will regularly poll ETCD for updates on the mount table. The poll interval is specified by the configuration below in alluxio-site.properties
.
In a large cluster with thousands of Alluxio clients and hundreds of Alluxio workers, you may want to use a larger interval to reduce the pressure on ETCD. If your mount table is seldom updated, feel free to use a much larger interval like 30s or more.
Add or remove mount points using the Alluxio command line:
List the mount table using the bin/alluxio mount list
command:
Note: It takes a little while for all Alluxio components to reload the updated mount table from ETCD. The time depends on
alluxio.mount.table.etcd.polling.interval.ms
.
Alluxio processes will need configurations specific to the UFS in order to correctly access it, most notably security credentials.
You can leave all the UFS configurations in alluxio-site.properties
and Alluxio will use those configurations for all mount points of that UFS type. For example:
All mount points will use the same S3 credentials and HDFS configurations. This is the simplest way to configure Alluxio for mount points if all UFS of the corresponding type can use the same configuration. It is the simplest way to manage all configuration properties in one alluxio-site.properties
file.
It is common that a user may want to use different configurations for different mount points. For example, if a user has two mount points to S3-flavor paths, one backed by AWS S3 and the other backed by MinIO, they will need to use different credentials and endpoints for each mount point.
You can specify mount options while you add a new mount point.
In this way, you specify configuration properties for this specific mount point.
Note: If you specify mount options in the command line, please remove those configuration options from the
alluxio-site.properties
file to avoid confusion as the mount options will take precendence.
To update mount options for an existing mount point, it must be first unmounted and then mounted again with the updated options.
Let's look at a more complicated mount table example.
The mount table from the example above consists of 4 entries. The first column is the paths of mount points in Alluxio namespace and the second column lists the corresponding UFS paths that are mounted on Alluxio.
The first mount entry defines a mapping from an S3 path s3://my-bucket/data/images
to an Alluxio path /s3-images
. Any objects with the S3 prefix s3://my-bucket/data/images
will be available under the Alluxio directory /s3-images
. For example, s3://my-bucket/data/images/collections/20240101/sample.png
can be found at Alluxio path /s3-images/collections/20240101/sample.png
.
The second mount entry defines a mapping from an S3 path s3://my-bucket/data/tables
to an Alluxio path /s3-tables
. The UFS path is actually from the same bucket as the first mount entry. As this example shows, users may freely choose parts of their UFS namespaces to mount to Alluxio.
The third and fourth entries define mappings between the Alluxio paths /hive
and /presto
to two directories in the same HDFS, hdfs://hdfs-cluster.company.com/user/hive
and hdfs://hdfs-cluster.company.com/user/presto
respectively. Similarly, files and directories under the two HDFS directory trees will be available at their corresponding Alluxio paths. For example, hdfs://hdfs-cluster.company.com/user/hive/schema/table/part1.parquet
becomes /hive/schema/table/part1.parquet
in Alluxio namespace.
A few rules must be followed when defining mount points in Alluxio.
Rule 1. Mount directly under root path /
A mount point in Alluxio MUST be a direct child of the root path /
. For example, /s3-images
, /hive
and /presto
are valid mount points. The root path /
, is just a virtual node in Alluxio namespace. It does NOT map to any UFS path.
Rule 2. No nested mount points
Mount points cannot be nested. The Alluxio path of one mount point cannot be under the Alluxio path of another mount point. Similarly, the UFS path of one mount cannot be under the UFS path of another mount point.