FUSE Full POSIX Workspace
This feature is experimental since AI-3.9-16.0.0.
This guide shows how to deploy the FUSE Full POSIX Workspace using FoundationDB (FDB) as the metadata backend. Unlike the basic FUSE Write Optimization (write-once, no rename), this mode supports random writes, overwrites, truncation, rename, and other standard POSIX operations through a FUSE mount.
Currently, FUSE Workspace only supports TRANSIENT path configuration. Data written through the Workspace is not automatically persisted to UFS. If using Worker PageStore, ensure data durability through application-level checkpointing or use UFS PageStore for UFS-native durability.
FDB mode uses FoundationDB as a distributed, strongly consistent metadata store, enabling multi-node access to the same dataset. Data can be stored on Worker NVMe (low latency) or UFS PageStore (high durability).
How It Relates to Other Modes
Interface
S3 API (PUT, GET)
FUSE (POSIX)
FUSE (POSIX)
Write model
Sequential (multipart upload)
Sequential, write-once after close
Random write, overwrite, truncate
POSIX support
N/A
Limited (no rename, write-once)
Full
Metadata backend
FDB
FDB
FDB (distributed)
Data storage
Worker NVMe
Worker NVMe
Worker NVMe or UFS PageStore
FDB required
Yes
Yes
Yes
Multi-node access
Yes
Yes
Yes
POSIX Compatibility
open / close
✅
read / write (sequential)
✅
read / write (random, seek)
✅
Log-structured; enable compaction for read performance
rename (file and directory)
✅
Atomic within the same namespace
truncate / ftruncate
✅
mkdir / rmdir
✅
unlink / rm
✅
Soft-delete with background cleanup
chmod / chown / utimes
✅
stat / fstat
✅
Accurate size and timestamps
listdir
✅
Paginated
setxattr / getxattr
✅
Extended attributes
fsync / fdatasync
✅
Flushes in-memory buffer and commits metadata
symlink / readlink
✅
Symbolic links
mmap
✅
Before You Start
Deploy
1. Enable FUSE Workspace
Add the following to your alluxio-cluster.yaml:
Apply:
2. Set Path Configuration to TRANSIENT
Configure the path configuration so that the target paths use TRANSIENT policy mode. This tells Alluxio to keep data in the Workspace without persisting to UFS.
Expected: Update successful!
Verify:
Expected output should show policyMode: TRANSIENT.
3. Choose Data Storage Mode
Option A: Worker PageStore (Default — Low Latency)
No additional configuration needed. Data is stored on Worker NVMe.
Write latency
Sub-ms to a few ms (local network to Worker)
Capacity
Limited by Worker disk
Durability
Transient — data is lost if the Worker fails (only TRANSIENT path configuration is currently supported)
Option B: UFS PageStore (High Durability)
Data is written directly to UFS:
Write latency
Higher (depends on UFS type and network)
Capacity
Unlimited (UFS capacity)
Durability
High (UFS-native replication, e.g., HDFS 3-replica)
Orphan cleanup
Coordinator runs periodic UFS orphan file scans
4. Tune Compaction (Optional)
Compaction is enabled by default. It merges accumulated write logs to reduce read amplification. Compaction can be triggered by write-log count or by storage space amplification (physical size vs. logical size). You can tune the thresholds:
5. Enable Metadata Cache (Recommended)
For read-heavy workloads (frequent stat, getattr, ls), enable the metadata cache layer to reduce FDB load:
The metadata cache uses a 3-second TTL. During this window, concurrent writers on other nodes may see slightly stale metadata.
6. Verify
✅ Success: Workers and FDB pods Running; config returns GENERIC_FDB_BACKED_V2.
Key Configuration
Core
Required settings to enable FUSE Workspace. These properties activate the Workspace subsystem, select the FDB-backed metadata backend, and configure the FDB connection.
alluxio.write.cache.enabled
false
Workspace master switch.
alluxio.write.cache.dual.buffer.file.system.type
—
Set to GENERIC_FDB_BACKED_V2.
alluxio.fuse.v2.enabled
false
Enable FUSE V2 interface (required).
alluxio.foundationdb.cluster.file.path
${alluxio.conf.dir}/fdb.cluster
Path to FDB cluster file. Auto-injected by Operator.
alluxio.user.write.cache.in.memory.write.buffer.size
16MiB
In-memory write buffer per file. Flushed to page store when full.
Coordinator Background Tasks
The Coordinator runs periodic background tasks to maintain system health, including cleaning up expired FDB locks and scanning for orphan files in UFS. These tasks run automatically when enabled and require no manual intervention.
alluxio.coordinator.write.cache.background.tasks.enabled
true
Master switch for Coordinator background tasks.
alluxio.coordinator.write.cache.cleanup.invalid.locks.grace.duration
24h
Grace period before expired FDB locks are cleaned.
alluxio.coordinator.write.cache.check.ufs.orphan.file.period
6h
UFS orphan file scan interval (UFS mode only).
alluxio.coordinator.write.cache.cleanup.ufs.orphan.file.grace.duration
24h
Grace period before UFS orphan files are deleted (UFS mode only).
Compaction
Write operations use a log-structured format — each write appends a new write-log entry rather than modifying data in place. Over time, this causes read amplification because reads must merge all overlapping log entries. Compaction merges these logs into consolidated data, reclaiming storage and improving read performance. Compaction can be triggered by two conditions (whichever is met first): write-log count per file, or space amplification (ratio of physical storage to logical file size).
alluxio.write.cache.compaction.enabled
true
Enable write-log compaction.
alluxio.user.write.cache.trigger.compaction.on.write.log.count
MAX_INT
Per-file write-log count threshold for compaction.
alluxio.user.write.cache.compaction.space.amplification.percent
50
Space amplification percentage threshold. Compaction is triggered when physical storage exceeds logical file size by this percentage. For example, 50 means compaction triggers at 150% of logical size.
alluxio.user.write.cache.compaction.space.amplification.min.file.size
64MiB
Minimum logical file size to evaluate space amplification. Files smaller than this threshold skip the space amplification check to avoid unnecessary I/O.
UFS PageStore (UFS Mode Only)
When UFS PageStore is enabled, file data is written directly to UFS (e.g., HDFS, S3, NAS) instead of being cached on Worker NVMe. This provides UFS-native durability at the cost of higher write latency.
alluxio.user.write.cache.random.access.ufs.page.store.enabled
false
Use UFS as data store instead of Worker NVMe.
alluxio.user.write.cache.random.access.ufs.page.store.path
—
UFS base path for page store data.
Metadata Cache & Optimization
FDB mode stores all metadata in FoundationDB. For read-heavy workloads with frequent stat, getattr, or ls operations, enabling the in-memory metadata cache reduces FDB read load by caching metadata locally with a short TTL. The deferred attribute update option further reduces FDB writes by batching file attribute updates into the next data flush.
alluxio.write.cache.metastore.cache.enabled
false
Enable in-memory metadata cache (3s TTL).
alluxio.user.fuse.write.cache.defer.open.file.attr.update.enabled
false
Batch attribute updates into next data flush, reducing FDB writes.
Monitoring
Coordinator Metrics
alluxio_pfs_background_tasks{type, state}
Background task status. State: RUNNING / SUCCESS / FAILED.
alluxio_pfs_clean_invalid_lock_count
Invalid FDB locks cleaned up.
Client — Stream I/O Metrics
alluxio_pfs_stream_open_streams
Number of currently open stream instances.
alluxio_pfs_stream_bytes_read{type}
Total bytes read. Label type: positioned / internal.
alluxio_pfs_stream_read_latency_ms{type}
Read latency histogram (ms). Label type: positioned / internal.
alluxio_pfs_stream_bytes_written{type}
Total bytes written. Label type: position_write / append_write.
alluxio_pfs_stream_write_latency_ms{type}
Write latency histogram (ms). Label type: position_write / append_write.
alluxio_pfs_stream_truncate_count{type}
Truncate operations. Label type: shrink / grow / noop.
alluxio_pfs_stream_flush_count{committed}
Flush operations. Label committed: true (data persisted) / false.
alluxio_pfs_stream_flush_latency_ms
Flush latency histogram (ms).
alluxio_pfs_stream_persist_count{type}
Data persist from memory to page store. Label type: sync / async.
alluxio_pfs_stream_read_ahead_triggered
Number of times async read-ahead was triggered.
alluxio_pfs_stream_inode_not_found{recovered}
Inode not found during flush (deleted file). Label recovered: true if recovery succeeded.
Client — Memory Management Metrics
alluxio_pfs_memory_total_allocated_bytes
Total direct memory currently held by all stream instances.
alluxio_pfs_memory_alloc_count
Total buffer allocation count.
alluxio_pfs_memory_alloc_bytes
Total bytes of direct memory allocated.
alluxio_pfs_memory_release_count
Total buffer release count.
alluxio_pfs_memory_alloc_timeout_count
Buffer allocation timeouts (waiting for quota).
alluxio_pfs_memory_force_release_count
Unreleased buffers force-freed during stream close.
Client — Page Store Metrics
alluxio_pfs_page_store_write_count{backend}
Page store write operations. Label backend: worker / ufs.
alluxio_pfs_page_store_write_bytes{backend}
Bytes written to page store.
alluxio_pfs_page_store_write_latency_ms{backend}
Page store write latency histogram (ms).
alluxio_pfs_page_store_read_count{backend}
Page store read operations.
alluxio_pfs_page_store_read_bytes{backend}
Bytes read from page store.
alluxio_pfs_page_store_read_latency_ms{backend}
Page store read latency histogram (ms).
alluxio_pfs_page_store_unpin_count
Page store file unpin (cleanup) operations.
Client — Compaction Metrics
alluxio_pfs_compaction_triggered_count{source}
Compaction trigger count.
alluxio_pfs_compaction_latency_ms
Compaction latency histogram (ms). Also reported on Coordinator.
alluxio_pfs_compaction_throughput
Compaction write throughput (bytes). Also reported on Worker.
alluxio_pfs_compaction_reload_count
Extra reload iterations after compaction.
alluxio_pfs_compaction_block_outcome_count{outcome}
Per-block compaction outcome. Label outcome: worker_finished / worker_fallback / local.
alluxio_pfs_compaction_worker_submit_count{result}
Compaction task submission to worker. Label result: success / failure.
alluxio_pfs_compaction_worker_status_check_count{result}
Compaction status check probes.
alluxio_pfs_compaction_range_result_count{result}
Per-range compaction outcome. Label result: compacted / skipped.
alluxio_pfs_write_log_load_count
Write-log load operations count.
alluxio_pfs_write_log_load_latency_ms
Write-log load latency histogram (ms).
Client — Read-Ahead Metrics
alluxio_pfs_read_pattern_sequential_count
Reads classified as sequential.
alluxio_pfs_read_pattern_total_count
Total reads recorded by read-ahead tracker.
alluxio_pfs_read_ahead_decision_count{should_prefetch}
Read-ahead decisions. Label should_prefetch: true / false.
alluxio_pfs_read_ahead_memory_used_bytes
Memory consumed by prefetch buffers.
alluxio_pfs_read_ahead_submitted_count
Read-ahead tasks submitted.
alluxio_pfs_read_ahead_skipped_count{reason}
Read-ahead tasks skipped. Label reason: memory_limit / duplicate.
alluxio_pfs_read_ahead_bytes_fetched
Total bytes prefetched.
alluxio_pfs_read_ahead_failed_count
Read-ahead tasks failed with I/O error.
alluxio_pfs_read_ahead_eviction_count{type}
Buffer evictions. Label type: stale / lru.
alluxio_pfs_read_ahead_evicted_bytes
Bytes evicted from prefetch buffers.
FDB Metrics
alluxio_pfs_foundationdb_call_latency_ms{method, success}
FDB call latency histogram (ms).
alluxio_pfs_fdb_iterator_batch_read_count{iterator_type}
FDB iterator batch read count.
alluxio_pfs_fdb_iterator_entries_scanned{iterator_type}
Key-value entries scanned by FDB iterators.
alluxio_pfs_fdb_iterator_batch_read_latency_ms{iterator_type}
FDB iterator batch read latency histogram (ms).
alluxio_pfs_fdb_iterator_errors{iterator_type, error_type}
FDB iterator errors. Label error_type: transaction_too_old / pb_parse_error.
Troubleshooting
Write latency spikes periodically
Symptom: Write operations show periodic latency spikes every few seconds.
Cause: In-memory write buffer is full and must flush to the data store (Worker or UFS), which blocks the write call.
Fix:
For Worker mode: ensure Worker NVMe has sufficient IOPS.
For UFS mode: use a low-latency UFS (local NAS or HDFS with fast disks).
Check if compaction is running concurrently — compaction reads and writes compete with front-end I/O.
Read latency increases over time
Symptom: Read operations slow down for files that are frequently overwritten.
Cause: Log-structured writes accumulate many write-log entries per block. Each read must merge all overlapping logs (read amplification).
Fix:
Ensure
alluxio.write.cache.compaction.enabledistrue.Lower
alluxio.user.write.cache.trigger.compaction.on.write.log.count(e.g., to1024for write-heavy files).Monitor
alluxio_pfs_compaction_throughputto verify compaction is keeping up.
FDB transaction conflicts
Symptom: Write operations fail intermittently with FDB transaction conflict errors.
Cause: Multiple clients writing to the same file concurrently. FDB uses MVCC and detects conflicting transactions.
Fix:
This is usually transient — the client retries automatically.
If frequent, avoid multiple processes writing the same file simultaneously.
Monitor FDB cluster health:
Orphan files accumulating (UFS mode)
Symptom: UFS storage usage grows even after files are deleted.
Fix:
Reduce
alluxio.coordinator.write.cache.check.ufs.orphan.file.periodandalluxio.coordinator.write.cache.cleanup.ufs.orphan.file.grace.durationif faster cleanup is acceptable.Files younger than the grace duration are intentionally kept to avoid deleting data being actively written.
Out-of-space on Workers (Worker mode)
Symptom: Writes fail with space errors even though UFS has capacity.
Fix:
Increase
alluxio.worker.page.store.pinned.file.capacity.limit.ratio(default0.3, raise to0.5).Add Worker NVMe capacity or add more Workers.
FDB connection failure on startup
Symptom: FUSE pod or Worker fails to start with FDB connection errors.
Fix:
Verify FDB pods are running:
If using Operator-managed FDB, the cluster file is auto-injected. For external FDB, set
alluxio.foundationdb.cluster.file.pathexplicitly.
Performance Tuning
High FDB load
Enable alluxio.write.cache.metastore.cache.enabled, reduce metadata read frequency.
Read latency growing
Enable compaction, lower alluxio.user.write.cache.trigger.compaction.on.write.log.count.
Worker disk pressure
Add more Workers or increase Worker NVMe capacity.
Slow stat / getattr
Enable alluxio.write.cache.metastore.cache.enabled + alluxio.user.fuse.write.cache.defer.open.file.attr.update.enabled.
Heavy random writes
Ensure compaction is enabled with an appropriate trigger threshold.
Verify Full POSIX Operations
Write and Read-After-Write
✅ Success: Output shows hello posix write cache.
Random Write and Overwrite
✅ Success: Output shows overwritten.
Rename (Atomic)
✅ Success: Output shows atomic save.
Truncate
✅ Success: Output shows 4.
Directory Rename
✅ Success: Output shows file.txt.
Symbolic Link
✅ Success: Output shows link target and /data/test/original.txt.
See Also
S3-API Write Optimization — Write Cache via S3 API (sequential writes, no POSIX)
FUSE Write Optimization — FUSE access to basic Write Cache (write-once, limited POSIX)
POSIX API — FUSE deployment details, mount options, and read-cache mode
Last updated