FUSE based POSIX API
Last updated
Last updated
The Alluxio POSIX API is a client-side protocol that allows mounting an Alluxio File System as a standard file system on most Unix variants. By using this protocol, standard tools like ls
, cat
, or mkdir
can access the distributed cache managed by Alluxio. More importantly, with POSIX API integration, applications can interact with Alluxio regardless of the language they are written in (C, C++, Python, Ruby, Perl, or Java), without requiring any Alluxio library integrations with existing applications.
Note that Alluxio-FUSE is different from projects like S3Fs or mountable HDFS, which mount specific storage services like S3 or HDFS to the local filesystem. The Alluxio POSIX API is a generic solution for the many storage systems supported by Alluxio. Data orchestration and caching features from Alluxio speed up I/O access to frequently used data.
Currently, the Alluxio POSIX API is widely used in model training and model distribution to inference servers.
The Alluxio POSIX API is based on the Filesystem in Userspace (FUSE) project. Most basic file system operations are supported. However, given the intrinsic characteristics of Alluxio, like its write-once/read-many-times file data model, the mounted file system does not have full POSIX semantics and contains some limitations. Please read the functionalities and limitations for details.
There are some special characters and patterns in file path names that are not supported in Alluxio. Please avoid creating file path names with these patterns or acquire additional handling from client end:
Question mark ('?')
Pattern with period (./ and ../)
Backslash ('')
Before following the instructions, make sure a functional Alluxio cluster has been installed. For more information, please refer to the installing Alluxio on Kubernetes page.
The Container Storage Interface (CSI) is a standard defined by Kubernetes to expose storage systems to the containers. And it is the default way to use Alluxio FUSE on Kubernetes.
The operator will create a PVC named alluxio-alluxio-csi-fuse-pvc
after the installation of the cluster. You can mount the PVC to the pods you need, and the operator will create and bind proper PV.
In the configuration above, you'll mount the FUSE to /data
directory. Note the following details about the configuration:
All the pods or replica sets can use the same PVC. The pods on the same node will share the same FUSE process.
The mountPropagation
is necessary for the auto-recover when the FUSE process crashes.
You can run I/O operations (e.g., shell commands, training) on top of the local directory. Here is a simple example:
The operations will be translated and executed by the Alluxio system and may be executed on the under storage based on configuration.
Most basic file system operations are supported. However, due to Alluxio implicit characteristics, some operations are not fully supported.
Category | Supported Operations | Unsupported Operations |
---|---|---|
Metadata Write | Create file, delete file, create directory, delete directory, rename, change owner, change group, change mode | Symlink, link, change access/modification time (utimens), change special file attributes (chattr), sticky bit |
Metadata Read | Get file status, get directory status, list directory status | |
Data Write | Sequential write, append write, random write, overwrite, truncate | Concurrent writes to the same file by multiple threads/clients |
Data Read | Sequential read, random read, multiple threads/clients concurrently reading the same file | |
Combinations | FIFO special file type |
To enable append write and random write, we need to add the configuration alluxio.user.fuse.random.access.file.stream.enabled=true
.
You can update the mountOptions
configurations in the Alluxio Cluster YAML file to set mount options. If no mount option is provided, the value of Alluxio configuration alluxio.fuse.mount.options
(default: direct_io
) will be used. The available Linux mount options are listed here.
Mount option | Default value | Tuning suggestion | Description |
---|---|---|---|
direct_io | enabled by default | set when deploying AlluxioFuse in Kubernetes environment | When `direct_io` is enabled, the kernel will not cache data and read-ahead. It eliminates the use of system buffer cache and improves pod stability in kubernetes environment |
kernel_cache | `kernel_cache` utilizes kernel system caching and improves read performance. This should only be enabled on filesystems where the file data is never changed externally through the underlying storage | ||
auto_cache | set when deploying AlluxioFuse in plain machine | `auto_cache` utilizes kernel system caching and improves read performance. Instead of unconditionally keeping cached data, the cached data is invalidated if the modification time or the size of the file has changed since it was last opened. See [libfuse documentation](https://libfuse.github.io/doxygen/structfuse__config.html#a9db154b1f75284dd4fccc0248be71f66) for more info | |
attr_timeout=N | 1.0 | 600 | The timeout in seconds for which file/directory attributes are cached |
big_writes | Set | Stop Fuse from splitting I/O into small chunks and speed up write. [Not supported in libfuse3](https://github.com/libfuse/libfuse/blob/master/ChangeLog.rst#libfuse-300-2016-12-08). Will be ignored if libfuse3 is used. | |
entry_timeout=N | 1.0 | 600 | The timeout in seconds for which name lookups will be cached |
max_read=N | 131072 | Use default value | Define the maximum size of data can be read in a single Fuse request. The default is infinite. Note that the size of read requests is limited anyway to 32 pages (which is 128kbon i386). |
max_background=N | 12 | 256 | The maximum number of outstanding background requests that the FUSE kernel driver is allowed to submit. |
max_idle_threads=N | 10 | 256 | the maximum number of idle FUSE daemon threads allowed. If the value is too small, FUSE may frequently create and destroy threads which will introduce extra performance overhead. |
If the version of the Kubernetes doesn't support CSI, or the cloud vendor doesn't provide proper permission to use CSI, you can try to use the DaemonSet type of Alluxio FUSE. In this type, the FUSE pods need to be deployed on all nodes beforehand. You can use nodeSelector
to restrict the deployment to specific nodes.
To use DaemonSet FUSE, change the alluxio-cluster.yaml
configuration before deploying the cluster:
FUSE pods will be deployed on all the nodes with the label alluxio.com/selected-for-fuse: true
.
DaemonSet FUSE will mount the FUSE to a path on the host specified with hostPathForMount
. To mount the FUSE in your pod, add a hostPath
volume:
The example mounts the parent directory of the FUSE mount point and sets the mountPropagation
. In this way, the mount point in the container can auto-recover when the FUSE process crashes.
The default way to mount a FUSE device will allow access to the root of the Alluxio namespace, which contains all the mount points. For those who want to provide FUSE for others and want to keep them from accessing the wrong files or modifying paths, here are some methods:
Mount the PVC with a sub-path. This is suitable when you have control over the user’s pod.
In the example configuration, accessing the /data
path in the container is the same as accessing the /s3/path/to/files
in the Alluxio namespace.
DaemonSet FUSE can also use subPath
, but this will break the propagation of the new FUSE mount point to the mount path in the container, preventing it from auto-recovering. Use with caution.
Creating PVCs with custom StorageClass can make the PVC bound to a sub-path. This requires additional operations, but it’s suitable when you can’t control the user’s pod.
Create the StorageClass and the PVC above, then mount the PVC to the container. Accessing the mount point in the container is equivalent to accessing /s3/path/to/files
in the Alluxio namespace.
PVC is a namespace scoped resource in Kubernetes. If your pod is in another namespace, to use the CSI fuse in a different namespace, you need to create a PVC in your namespace:
Save the file to csi-pvc.yaml
and then run:
Note that the CSI fuse pod will still be started in the same namespace as the Alluxio cluster.