POSIX API via FUSE
Alluxio's POSIX API allows you to mount the Alluxio namespace as a standard filesystem on most Unix-like operating systems. This feature, commonly known as "Alluxio FUSE," lets you use standard command-line tools (ls
, cat
, mkdir
) and existing applications to interact with data in Alluxio without any code changes.
Unlike specific filesystem wrappers like S3FS, Alluxio FUSE acts as a generic caching and data orchestration layer for the many storage systems Alluxio supports, making it ideal for accelerating I/O in workloads like AI/ML model training and serving.

Based on the Filesystem in Userspace (FUSE) project, the mounted filesystem provides most basic operations but is not fully POSIX-compliant due to Alluxio's distributed nature. See the Functionality and Limitations section for details.
Unsupported Path Names: Avoid using special characters (
?
,\
) or patterns (./
,../
) in file paths.
When to Use the FUSE
The FUSE interface is particularly powerful for traditional applications and modern AI/ML workloads. Common use cases include:
AI/ML Model Training: When training models with frameworks like PyTorch or TensorFlow, you can read datasets directly from the mounted FUSE path. This simplifies data access and leverages Alluxio's caching to dramatically speed up training jobs.
Model Serving: For inference servers that need to load models quickly, FUSE provides low-latency access to models stored in Alluxio.
Legacy Applications: Applications that expect a standard filesystem can be pointed to the FUSE mount to read and write data from Alluxio without modification.
Interactive Data Exploration: Data scientists and engineers can use shell commands (
ls
,cat
,head
) to explore and interact with data in Alluxio just like a local filesystem.
Getting Started with FUSE on Kubernetes
The most common way to deploy Alluxio FUSE is on a Kubernetes cluster alongside your applications.
Prerequisites
Ensure you have a running Alluxio cluster in your Kubernetes environment. For instructions, see Installing Alluxio on Kubernetes.
Method 1: Using CSI (Recommended)
The Container Storage Interface (CSI) is the standard, recommended way to use Alluxio FUSE in Kubernetes. The Alluxio Operator automatically provisions a PersistentVolumeClaim (PVC) named alluxio-cluster-fuse
when the cluster is installed.
To use it, mount this PVC into your application pods. The operator will handle the creation and binding of the underlying PersistentVolume (PV).
Example Pod Configuration:
Save the following configuration to a file named fuse-pod.yaml
:
apiVersion: v1
kind: Pod
metadata:
name: fuse-test-0
namespace: alx-ns
labels:
app: alluxio
spec:
containers:
- image: ubuntu:22.04
imagePullPolicy: IfNotPresent
name: fuse-test
command: ["sleep", "infinity"]
volumeMounts:
- mountPath: /data
name: alluxio-pvc
mountPropagation: HostToContainer
volumes:
- name: alluxio-pvc
persistentVolumeClaim:
claimName: alluxio-cluster-fuse
Then, create the pod using kubectl apply -f fuse-pod.yaml
.
Key details:
Shared FUSE Process: Multiple pods on the same Kubernetes node can use the same PVC and will share a single Alluxio FUSE process for efficiency.
mountPropagation: HostToContainer
: This setting is critical. It ensures that if the FUSE process crashes, the mount point can be automatically recovered and re-propagated to your container.
Once mounted, you can interact with the /data
directory as if it were the root of your Alluxio namespace.
Example Usage:
# Connect to the test pod
$ kubectl -n alx-ns exec -it fuse-test-0 -- bash
# List the contents of the mounted directory
root@fuse-test-0:/$ ls /data/
s3
# Write a file and read it back
root@fuse-test-0:/$ echo "hello, world!" >/data/s3/message.txt
root@fuse-test-0:/$ cat /data/s3/message.txt
hello, world!
# Verify the file exists in Alluxio from another pod
$ kubectl -n alx-ns exec -it alluxio-cluster-coordinator-0 -- alluxio fs ls /s3/message.txt
14 06-27-2024 07:54:40:000 FILE /message.txt
Method 2: Using a DaemonSet
If your Kubernetes version or environment does not support CSI, you can deploy FUSE using a DaemonSet. This approach runs a FUSE pod on each node (or a subset of nodes you select).
Configure the DaemonSet: Before deploying your Alluxio cluster, modify your
alluxio-cluster.yaml
to use thedaemonSet
type and specify a host path for the mount.apiVersion: k8s-operator.alluxio.com/v1 kind: AlluxioCluster spec: fuse: type: daemonSet hostPathForMount: /mnt/alluxio/fuse # will use /mnt/alluxio/fuse if not specified nodeSelector: alluxio.com/selected-for-fuse: true
This will deploy FUSE pods on all nodes with the label
alluxio.com/selected-for-fuse: true
.Mount in Your Application Pod: In your application pod, mount the
hostPath
where the FUSE DaemonSet exposes the filesystem.apiVersion: v1 kind: Pod metadata: name: fuse-test-0 namespace: alx-ns labels: app: alluxio spec: containers: - image: ubuntu:22.04 imagePullPolicy: IfNotPresent name: fuse-test command: ["sleep", "infinity"] volumeMounts: - mountPath: /mnt/alluxio name: alluxio-fuse-mount mountPropagation: HostToContainer volumes: - name: alluxio-fuse-mount hostPath: path: /mnt/alluxio type: Directory
Similar to the CSI method,
mountPropagation
is essential for auto-recovery.
Advanced Configuration
Enabling Append and Random Writes
To enable append and random write operations, you need to set the following property in your Alluxio configuration (alluxio-site.properties
or via the Helm chart values):
alluxio.user.fuse.random.access.file.stream.enabled=true
This allows applications to modify existing files, which is useful for workloads like logging or databases, but may have performance implications.
Isolating Data Access
By default, the FUSE mount provides access to the entire Alluxio namespace. For multi-tenant environments, you may want to restrict a user's access to a specific subdirectory.
Using subPath
(CSI only)
subPath
(CSI only)You can mount a specific subdirectory within the Alluxio namespace into your pod using the subPath
field. This is the simplest method for data isolation.
# ... pod spec ...
volumeMounts:
- mountPath: /data
name: alluxio-pvc
mountPropagation: HostToContainer
subPath: s3/path/to/files
# ...
In this example, the /data
directory inside the container maps directly to /s3/path/to/files
in Alluxio.
Caution: Using
subPath
with the DaemonSet method is not recommended, as it breaks the auto-recovery mechanism.
Using Separate PVCs (CSI only)
For more robust isolation where you cannot control the user's pod spec, you can create a dedicated StorageClass
and PersistentVolumeClaim
that is pre-bound to a specific Alluxio path.
Create a custom
StorageClass
andPVC
: Save the following to a file namedcustom-sc-pvc.yaml
:apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: default-alluxio-s3 namespace: alx-ns parameters: alluxioClusterName: alluxio-cluster alluxioClusterNamespace: alx-ns mountPath: /s3/path/to/files provisioner: alluxio volumeBindingMode: WaitForFirstConsumer --- apiVersion: v1 kind: PersistentVolumeClaim metadata: name: alluxio-csi-s3 namespace: alx-ns spec: accessModes: - ReadWriteOnce resources: requests: storage: 1Mi storageClassName: default-alluxio-s3
Now, apply the configuration:
kubectl apply -f custom-sc-pvc.yaml
Mount the new PVC: The user can now mount the
alluxio-csi-s3
PVC, and their access will be automatically scoped to/s3/path/to/files
.
Accessing FUSE from Another Namespace
If your application runs in a different namespace from the Alluxio cluster, you must create a corresponding PVC in your application's namespace.
Create the PVC in your namespace: The
storageClassName
must point to the FUSE StorageClass created by the operator in the Alluxio namespace (e.g.,alx-ns-alluxio-cluster-fuse
). Save the following to a file namedcsi-pvc.yaml
:apiVersion: v1 kind: PersistentVolumeClaim metadata: name: alluxio-fuse spec: accessModes: - ReadWriteOnce resources: requests: storage: 1Mi storageClassName: alx-ns-alluxio-cluster-fuse
Apply the PVC:
kubectl create -f csi-pvc.yaml -n <my-namespace>
Customizing FUSE Mount Options
You can tune FUSE performance by providing mount options in the AlluxioCluster
YAML. These options are passed directly to the underlying FUSE driver. For a full list, see the FUSE documentation.
Example Configuration:
fuse:
mountOptions:
- allow_other
- kernel_cache
- entry_timeout=60
- attr_timeout=60
- max_idle_threads=128
- max_background=128
Commonly Tuned Options:
kernel_cache
Enable
Allows the kernel to cache file data, which can significantly improve read performance. Only use this if the underlying files are not modified externally (i.e., outside of Alluxio).
auto_cache
Enable for bare-metal
Similar to kernel_cache
, but the cache is invalidated if the file's modification time or size changes.
attr_timeout=N
1.0
60
The number of seconds for which file and directory attributes (like permissions and size) are cached by the kernel. Increasing this reduces metadata overhead.
entry_timeout=N
1.0
60
The number of seconds for which filename lookups are cached. Increasing this speeds up path resolutions.
max_background=N
12
128
The maximum number of outstanding background requests that the FUSE kernel driver is allowed to submit.
max_idle_threads=N
10
128
The maximum number of idle FUSE daemon threads. Increasing this can prevent performance overhead from frequent thread creation/destruction under heavy load.
Customizing Resource Limits
You can adjust the CPU and memory resources allocated to the FUSE pods and their JVMs. See Advanced Cluster Configuration for more details.
apiVersion: k8s-operator.alluxio.com/v1
kind: AlluxioCluster
spec:
fuse:
resources:
limits:
cpu: "12"
memory: "36Gi"
requests:
cpu: "1"
memory: "32Gi"
jvmOptions:
- "-Xmx22g"
- "-Xms22g"
- "-XX:MaxDirectMemorySize=10g"
Functionality and Limitations
While most standard filesystem operations are supported, Alluxio FUSE does not provide full POSIX compliance due to the nature of a distributed, write-once system.
Metadata Write
Create file, delete file, create directory, delete directory, rename, change owner, change group, change mode
Symlink, link, change access/modification time (utimens
), change special file attributes (chattr
), sticky bit
Metadata Read
Get file status, get directory status, list directory status
Data Write
Sequential write, append write, random write, overwrite, truncate
Concurrent writes to the same file by multiple threads/clients
Data Read
Sequential read, random read, multiple threads/clients concurrently reading the same file
Combinations
FIFO special file type
Last updated