POSIX API
The Alluxio POSIX API is a feature that allows mounting an Alluxio File System as a standard file system on most flavors of Unix. By using this feature, standard tools (for example, ls, cat or mkdir) will have basic access to the Alluxio namespace. More importantly, with the POSIX API integration applications can interact with the Alluxio no matter what language (C, C++, Python, Ruby, Perl, or Java) they are written in without any Alluxio library integrations.
Note that Alluxio-FUSE is different from projects like s3fs, mountableHdfs which mount specific storage services like S3 or HDFS to the local filesystem. The Alluxio POSIX API is a generic solution for the many storage systems supported by Alluxio. Data orchestration and caching features from Alluxio speed up I/O access to frequently used data.

The Alluxio POSIX API is based on the Filesystem in Userspace (FUSE) project. Most basic file system operations are supported. However, given the intrinsic characteristics of Alluxio, like its write-once/read-many-times file data model, the mounted file system does not have full POSIX semantics and contains some limitations. Please read the section of limitations for details.
Choose POSIX API Implementation
The Alluxio POSIX API has two implementations:
Alluxio JNR-Fuse: Alluxio's first generation Fuse implementation that uses JNR-Fuse for FUSE on Java. JNR-Fuse targets for low concurrency scenarios and has some known limitations in performance.
Alluxio JNI-Fuse: Alluxio's default in-house implementation based on JNI (Java Native Interface) which targets more performance-sensitve applications (like model training workloads) and initiated by researchers from Nanjing University and engineers from Alibaba Inc.
Here is a guideline to choose between the JNR-Fuse and JNI-Fuse:
Workloads: If your data access is highly concurrent (e.g., deep learning training), JNI-Fuse is better and more stable.
Maintenance: JNI-Fuse is under active development (checkout our developer meeting notes). Alluxio community will focus more on developing JNI-Fuse and deprecate Alluxio JNR-Fuse eventually.
JNI-Fuse is enabled by default for better performance. If JNR-Fuse is needed for legacy reasons, set alluxio.fuse.jnifuse.enabled to false in ${ALLUXIO_HOME}/conf/alluxio-site.properties:
Choose Deployment Mode
There are two approaches to deploy Alluxio POSIX integration:
Serving POSIX API by Standalone FUSE process: Alluxio POSIX integration can be launched as a standalone process, independent from existing running Alluxio clusters. Each process is essentially a long-running Alluxio client, serving a file system mount point that maps an Alluxio path to a local path. This approach is flexible so that users can enable or disable POSIX integration on hosts regardless Alluxio servers are running locally. However, the FUSE process needs to communicate with Alluxio service through network.
Enabling FUSE on worker: Alluxio POSIX integration can also be provided by a running Alluxio worker process. This integration provides better performance because the FUSE service can communicate with the Alluxio worker without invoking RPCs, which help improve the read/write throughput on local cache hit.
Here is a guideline to choose between them:
Workloads: If your workload is estimated to have a good hit ratio on local cache, and there are a lot of read/writes of small files, embedded FUSE on the worker process can achieve higher performance with less resource overhead.
Deployment: If you want to enable multiple local mount points on a single host, choose standalone process. Otherwise, you can reduce one process to deploy with FUSE on worker.
Requirements
The followings are the basic requirements running Alluxio FUSE integration to support POSIX API in a standalone way. Installing Alluxio using Docker and Kubernetes can further simplify the setup.
Install JDK 1.8 or newer
Basic Setup
The basic setup deploys the standalone process. After reading the basic setup section, checkout fuse in worker setup here if it suits your needs.
Mount Alluxio as a FUSE mount point
After properly configuring and starting an Alluxio cluster; Run the following command on the node where you want to create the mount point:
This will spawn a background user-space java process (alluxio-fuse) that will mount the Alluxio path specified at <alluxio_path> to the local file system on the specified <mount_point>.
For example, running the following commands from the ${ALLUXIO_HOME} directory will mount the Alluxio path /people to the folder /mnt/people on the local file system.
When <alluxio_path> is not given, the value defaults to the root (/). Note that the <mount_point> must be an existing and empty path in your local file system hierarchy and that the user that runs the integration/fuse/bin/alluxio-fuse script must own the mount point and have read and write permissions on it. Multiple Alluxio FUSE mount points can be created in the same node. All the AlluxioFuse processes share the same log output at ${ALLUXIO_HOME}/logs/fuse.log, which is useful for troubleshooting when errors happen on operations under the filesystem.
Unmount Alluxio from FUSE
To unmount a previously mounted Alluxio-FUSE file system, on the node where the file system is mounted run:
This unmounts the file system at the mount point and stops the corresponding Alluxio-FUSE process. For example,
By default, the unmount operation will wait for 120 seconds for any in-progress read/write operations to finish. If read/write operations haven't finished after 120 seconds, the fuse process will be forcibly killed which may cause read/write operations to fail. You can add -s to avoid the fuse process being killed if there are remaining in-progress read/write operations after the timeout. For example,
Check the Alluxio POSIX API mounting status
To list the mount points; on the node where the file system is mounted run:
This outputs the pid, mount_point, alluxio_path of all the running Alluxio-FUSE processes.
For example, the output could be:
Advanced Setup
Fuse on worker process
Unlike standalone Fuse which you can mount at any time without Alluxio worker involves, the embedded Fuse has the exact same life cycle as the worker process it embeds into. When the worker starts, the Fuse is mounted based on worker configuration. When the worker ends, the embedded Fuse is unmounted automatically. If you want to modify your Fuse mount, change the configuration and restart the worker process.
Enable FUSE on worker by setting alluxio.worker.fuse.enabled to true in the ${ALLUXIO_HOME}/conf/alluxio-site.properties:
By default, Fuse on worker will mount the Alluxio root path / to default local mount point /mnt/alluxio-fuse with no extra mount options. You can change the alluxio path, mount point, and mount options through Alluxio configuration:
For example, one can mount Alluxio path /people to local path /mnt/people with kernel_cache,entry_timeout=7200,attr_timeout=7200 mount options when starting the Alluxio worker process:
Configure Alluxio fuse options
These are the configuration parameters for Alluxio POSIX API.
alluxio.fuse.cached.paths.max
500
Defines the size of the internal Alluxio-FUSE cache that maintains the most frequently used translations between local file system paths and Alluxio file URIs.
alluxio.fuse.debug.enabled
false
Enable FUSE debug output. This output will be redirected in a `fuse.out` log file inside `alluxio.logs.dir`.
alluxio.fuse.fs.name
alluxio-fuse
Descriptive name used by FUSE to mount the file system.
alluxio.fuse.jnifuse.enabled
true
Use JNI-Fuse library for better performance. If disabled, JNR-Fuse will be used.
alluxio.fuse.shared.caching.reader.enabled
false
(Experimental) Use share grpc data reader for better performance on multi-process file reading through Alluxio JNI Fuse. Blocks data will be cached on the client side so more memory is required for the Fuse process.
alluxio.fuse.logging.threshold
10s
Logging a FUSE API call when it takes more time than the threshold.
alluxio.fuse.maxwrite.bytes
131072
The desired granularity of FUSE write upcalls in bytes. Note that 128K is currently an upper bound imposed by the linux kernel.
alluxio.fuse.user.group.translation.enabled
false
Whether to translate Alluxio users and groups into Unix users and groups when exposing Alluxio files through the FUSE API. When this property is set to false, the user and group for all FUSE files will match the user who started the alluxio-fuse process
Configure mount point options
You can use -o [mount options] to set mount options. If you want to set multiple mount options, you can pass in comma separated mount options as the value of -o. The -o [mount options] must follow the mount command.
Different versions of libfuse and osxfuse may support different mount options. The available Linux mount options are listed here. The mount options of MacOS with osxfuse are listed here . Some mount options (e.g. allow_other and allow_root) need additional set-up and the set up process may be different depending on the platform.
Tuning mount options
kernel_cache utilizes kernel system caching and improves read performance. This should only be enabled on filesystems, where the file data is never changed externally (not through the mounted FUSE filesystem).auto_cache utilizes kernel system caching and improves read performance. Instead of unconditionally keeping cached data, the cached data is invalidated if the modification time or the size of the file has changed since it was last opened. See [libfuse documentation](https://libfuse.github.io/doxygen/structfuse__config.html#a9db154b1f75284dd4fccc0248be71f66) for more info. max_read=N
direct_io
set by default in JNR-Fuse
don't set in JNI-Fuse
When `direct_io` is enabled, kernel will not cache data and read-ahead. `direct_io` is enabled by default in JNR-Fuse but is recommended not to be set in JNI-Fuse cause it may have stability issue under high I/O load.
kernel_cache
Unable to set in JNR-Fuse, recommend to set in JNI-Fuse based on workloads
auto_cache
This option is an alternative to `kernel_cache`. Unable to set in JNR-Fuse.
attr_timeout=N
1.0
7200
The timeout in seconds for which file/directory attributes are cached. The default is 1 second. Recommend set to a larger value to reduce the time to retrieve file metadata operations from Alluxio master and improve performance.
big_writes
Set
Stop Fuse from splitting I/O into small chunks and speed up write.
entry_timeout=N
1.0
7200
The timeout in seconds for which name lookups will be cached. The default is 1 second. Recommend set to a larger value to reduce the file metadata operations in Alluxio-Fuse and improve performance.
131072
Use default value
Define the maximum size of data can be read in a single Fuse request. The default is infinite. Note that the size of read requests is limited anyway to 32 pages (which is 128kbyte on i386).
A special mount option is the max_idle_threads=N which defines the maximum number of idle fuse daemon threads allowed. If the value is too small, FUSE may frequently create and destroy threads which will introduce extra performance overhead. Note that, libfuse introduce this mount option in 3.2 while JNI-Fuse supports 2.9.X during experimental stage. The Alluxio docker image alluxio/alluxio-enterprise enables this property by modifying the libfuse source code.
In alluxio docker image, the default value for MAX_IDLE_THREADS is 64. If you want to use another value in your container, you could set it via environment variable at container start time:
Example: `allow_other` and `allow_root`
By default, Alluxio-FUSE mount point can only be accessed by the user mounting the Alluxio namespace to the local filesystem.
For Linux, add the following line to file /etc/fuse.conf to allow other users or allow root to access the mounted folder:
Only after this step that non-root users have the permisson to specify the allow_other or allow_root mount options.
For MacOS, follow the osxfuse allow_other instructions to allow other users to use the allow_other and allow_root mount options.
After setting up, pass the allow_other or allow_root mount options when mounting Alluxio-FUSE:
Note that only one of the allow_other or allow_root could be set.
Assumptions and Limitations
Currently, most basic file system operations are supported. However, due to Alluxio implicit characteristics, please be aware that:
Files can be written only once, only sequentially, and never be modified. That means overriding a file is not allowed, and an explicit combination of delete and then create is needed. For example, the
cpcommand would fail when the destination file exists.viandvimcommands will only succeed modifying files if the underlying operating system deletes the original file first and then creates a new file with modified content beneath.Alluxio does not have hard-links or soft-links, so commands like
lnare not supported. The hardlinks number is not displayed inlloutput.The user and group are mapped to the Unix user and group only when Alluxio POSIX API is configured to use shell user group translation service, by setting
alluxio.fuse.user.group.translation.enabledtotrue. Otherwisechownandchgrpare no-ops, andllwill return the user and group of the user who started the Alluxio-FUSE process. The translation service does not change the actual file permission when runningll.
Performance Optimization
Due to the conjunct use of FUSE, the performance of the mounted file system is expected to be lower compared to using the Alluxio Java client directly.
Most of the overheads come from the fact that there are several memory copies going on for each call for read or write operations. FUSE caps the maximum granularity of writes to 128KB. This could be probably improved by a large extent by leveraging the FUSE cache write-backs feature introduced in the 3.15 Linux Kernel (supported by libfuse 3.x but not yet supported in jnr-fuse/jni-fuse).
The following client options are useful when running deep learning workloads against Alluxio JNI-Fuse based on our experience.
If you find other options useful, please share with us via Alluxio community slack channel Note that these changes should be done before the mounting steps.
Enable Metadata Caching
Alluxio Fuse process can cache file metadata locally to reduce the overhead of repeatedly requesting metadata of the same file from Alluxio Master. Enable when the workload repeatedly getting information of numerous files.
alluxio.user.metadata.cache.enabled
false
If this is enabled, metadata of paths will be cached. The cached metadata will be evicted when it expires after alluxio.user.metadata.cache.expiration.time or the cache size is over the limit of alluxio.user.metadata.cache.max.size.
alluxio.user.metadata.cache.max.size
100000
Maximum number of paths with cached metadata. Only valid if alluxio.user.metadata.cache.enabled is set to true.
alluxio.user.metadata.cache.expiration.time
10min
Metadata will expire and be evicted after being cached for this time period. Only valid if alluxio.user.metadata.cache.enabled is set to true.
For example, a workload that repeatedly gets information of 1 million files and runs for 50 minutes can set the following configuration:
The metadata size of 1 million files is usually between 25MB and 100MB. Enable metadata cache may also introduce some overhead, but may not be as big as client data cache.
Other Performance or Debugging Tips
The following client options may affect the training performance or provides more training information.
alluxio.user.metrics.collection.enabled
false
Enable the collection of fuse client side metrics like short-circuit read/write information to show on the Alluxio Web UI.
alluxio.user.logging.threshold
10s
Logging a client RPC when it takes more time than the threshold.
alluxio.user.unsafe.direct.local.io.enabled
false
(Experimental) If this is enabled, clients will read from local worker directly without invoking extra RPCs to worker to require locations. Note this optimization is only safe when the workload is read only and the worker has only one tier and one storage directory in this tier.
alluxio.user.update.file.accesstime.disabled
false
(Experimental) By default, a master RPC will be issued to Alluxio Master to update the file access time whenever a user accesses it. If this is enabled, the client doesn't update file access time which may improve the file access performance but cause issues for some applications.
alluxio.user.block.worker.client.pool.max
1024
Limits the number of block worker clients for Alluxio JNI-Fuse to read data from remote worker or validate block locations. Some deep training jobs don't release the block worker clients immediately and may stuck in waiting for any available.
alluxio.user.block.master.client.pool.size.max
1024
Limits the number of block master client for Alluxio JNI-Fuse to get block information.
alluxio.user.file.master.client.pool.size.max
1024
Limits the number of file master client or Alluxio JNI-Fuse to get or update file metadata.
Increase Direct Memory Size
When encountering the out of direct memory issue, add the following JVM opts to ${ALLUXIO_HOME}/conf/alluxio-env.sh to increase the max amount of direct memory.
Troubleshooting
This section talks about how to troubleshoot issues related to Alluxio POSIX API. Note that the errors or problems of Alluxio POSIX API may come from the underlying Alluxio system. For general guideline in troubleshooting, please refer to troubleshooting documentation
Input/output error
Unlike Alluxio CLI which may show more detailed error messages, user operations via Alluxio Fuse mount point will only receive error code on failures with the pre-defined error code message by FUSE. For exmaple, once an error happens, it is common to see:
In this case, check Alluxio Fuse logs for the actual error message. The logs are in logs/fuse.log (deployed via standalone fuse process) or logs/worker.log (deployed via fuse in worker process).
Check FUSE operations in Debug Log
Each I/O operation by users can be translated into a sequence of Fuse operations. Sometimes the error comes from unexpected Fuse operation combinations. In this case, enabling debug logging in FUSE operations helps understand the sequence and shows time elapsed of each Fuse operation.
For example, a typical flow to write a file seen by FUSE is an initial Fuse.create which creates a file, followed by a sequence of Fuse.write to write data to that file, and lastly a Fuse.release to close file to commit a file written to Alluxio file system.
To understand this sequence seen and executed by FUSE, one can modify ${ALLUXIO_HOME}/conf/log4j.properties to customize logging levels and restart corresponding server processes. For example, set alluxio.fuse.AlluxioJniFuseFileSystem to DEBUG
Then you will see the detailed Fuse operation sequence shown in debug logs.
If Fuse is deployed in the worker process, one can modify server logging at runtime. For example, you can update the log level of all classes in alluxio.fuse package in all workers to DEBUG with the following command:
For more information about logging, please check out this page.
Fuse metrics
To monitor Fuse-related metrics for standalone Fuse process, setting alluxio.fuse.web.enabled to true in ${ALLUXIO_HOME}/conf/alluxio-site.properties before launching the standalone Fuse process. Check out the Fuse metrics doc for how to get Fuse metrics for both standalone Fuse process and Fuse on worker process, and what each metric is used for.
Performance Tuning
The following diagram shows the stack when using Alluxio POSIX API:

Essentially, Alluxio POSIX API is implemented as as FUSE integration which is simply a long-running Alluxio client. In the following stack, the performance overhead can be introduced in one or more components among
Application
Fuse library
Alluxio related components
Application Level
It is very helpful to understand the following questions with respect to how the applications interact with Alluxio POSIX API:
How is the applications accessing Alluxio POSIX API? Is it mostly read or write or a mixed workload?
Is the access heavy in data or metadata?
Is the concurrency level sufficient to sustain high throughput?
Is there any lock contention?
Fuse Level
Fuse, especially the libfuse and FUSE kernel code, may also introduce performance overhead.
libfuse worker threads
The concurrency on Alluxio POSIX API is the joint effort of
The concurrency of application operations interacting with Fuse kernel code and libfuse
The concurrency of libfuse worker threads interacting with Alluxio POSIX API limited by
MAX_IDLE_THREADSlibfuse configuration.
Enlarge the MAX_IDLE_THRAEDS to make sure it's not the performance bottleneck. One can use jstack or visualvm to see how many libfuse threads exist and whether the libfuse threads keep being created/destroyed.
Alluxio Level
Alluxio general performance tuning provides more information about how to investigate and tune the performance of Alluxio Java client and servers.
Clock time tracing
Tracing is a good method to understand which operation consumes most of the clock time.
From the Fuse.<FUSE_OPERATION_NAME> metrics documented in the Fuse metrics doc, we can know how long each operation consumes and which operation(s) dominate the time spent in Alluxio. For example, if the application is metadata heavy, Fuse.getattr or Fuse.readdir may have much longer total duration compared to other operations. If the application is data heavy, Fuse.read or Fuse.write may consume most of the clock time. Fuse metrics help us to narrow down the performance investigation target.
If Fuse.read consumes most of the clock time, enables the Alluxio property alluxio.user.block.read.metrics.enabled=true and Alluxio metric Client.BlockReadChunkRemote will be recorded. This metric shows the duration statistics of reading data from remote workers via gRPC.
If the application spends relatively long time in RPC calls, try enlarging the client pool sizes Alluxio properties based on the workload.
If thread pool size is not the limitation, try enlarging the CPU/memory resources. GRPC threads consume CPU resources.
One can follow the Alluxio opentelemetry doc to trace the gRPC calls. If some gRPC calls take extremely long time and only a small amount of time is used to do actual work, there may be too many concurrent gRPC calls or high resource contention. If a long time is spent in fulfilling the gRPC requests, we can jump to the server side to see where the slowness come from.
CPU/memory/lock tracing
Async Profiler can trace the following kinds of events:
CPU cycles
Allocations in Java Heap
Contented lock attempts, including both Java object monitors and ReentrantLocks
Install async profiler and run the following commands to get the information of target Alluxio process
-ddefine the duration. Try to cover the whole POSIX API testing duration-edefine the profiling target-fdefine the file name to dump the profile information to
Last updated