FUSE Full POSIX Workspace

This guide shows how to deploy the FUSE Full POSIX Workspace using FoundationDB (FDB) as the metadata backend. Unlike the basic FUSE Write Optimization (write-once, no rename), this mode supports random writes, overwrites, truncation, rename, and other standard POSIX operations through a FUSE mount.

Currently, FUSE Workspace only supports TRANSIENT path configuration. Data written through the Workspace is not automatically persisted to UFS. If using Worker PageStore, ensure data durability through application-level checkpointing or use UFS PageStore for UFS-native durability.

FDB mode uses FoundationDB as a distributed, strongly consistent metadata store, enabling multi-node access to the same dataset. Data can be stored on Worker NVMe (low latency) or UFS PageStore (high durability).

How It Relates to Other Modes

FUSE Full POSIX Workspace (this guide)

Interface

S3 API (PUT, GET)

FUSE (POSIX)

FUSE (POSIX)

Write model

Sequential (multipart upload)

Sequential, write-once after close

Random write, overwrite, truncate

POSIX support

N/A

Limited (no rename, write-once)

Full

Metadata backend

FDB

FDB

FDB (distributed)

Data storage

Worker NVMe

Worker NVMe

Worker NVMe or UFS PageStore

FDB required

Yes

Yes

Yes

Multi-node access

Yes

Yes

Yes

POSIX Compatibility

Operation
Supported
Notes

open / close

read / write (sequential)

read / write (random, seek)

Log-structured; enable compaction for read performance

rename (file and directory)

Atomic within the same namespace

truncate / ftruncate

mkdir / rmdir

unlink / rm

Soft-delete with background cleanup

chmod / chown / utimes

stat / fstat

Accurate size and timestamps

listdir

Paginated

setxattr / getxattr

Extended attributes

fsync / fdatasync

Flushes in-memory buffer and commits metadata

symlink / readlink

Symbolic links

mmap


Before You Start

Deploy

1. Enable FUSE Workspace

Add the following to your alluxio-cluster.yaml:

Apply:

2. Set Path Configuration to TRANSIENT

Configure the path configuration so that the target paths use TRANSIENT policy mode. This tells Alluxio to keep data in the Workspace without persisting to UFS.

Expected: Update successful!

Verify:

Expected output should show policyMode: TRANSIENT.

3. Choose Data Storage Mode

Option A: Worker PageStore (Default — Low Latency)

No additional configuration needed. Data is stored on Worker NVMe.

Aspect
Detail

Write latency

Sub-ms to a few ms (local network to Worker)

Capacity

Limited by Worker disk

Durability

Transient — data is lost if the Worker fails (only TRANSIENT path configuration is currently supported)

Option B: UFS PageStore (High Durability)

Data is written directly to UFS:

Aspect
Detail

Write latency

Higher (depends on UFS type and network)

Capacity

Unlimited (UFS capacity)

Durability

High (UFS-native replication, e.g., HDFS 3-replica)

Orphan cleanup

Coordinator runs periodic UFS orphan file scans

4. Tune Compaction (Optional)

Compaction is enabled by default. It merges accumulated write logs to reduce read amplification. Compaction can be triggered by write-log count or by storage space amplification (physical size vs. logical size). You can tune the thresholds:

For read-heavy workloads (frequent stat, getattr, ls), enable the metadata cache layer to reduce FDB load:

The metadata cache uses a 3-second TTL. During this window, concurrent writers on other nodes may see slightly stale metadata.

6. Verify

✅ Success: Workers and FDB pods Running; config returns GENERIC_FDB_BACKED_V2.

Key Configuration

Core

Required settings to enable FUSE Workspace. These properties activate the Workspace subsystem, select the FDB-backed metadata backend, and configure the FDB connection.

Property
Default
Description

alluxio.write.cache.enabled

false

Workspace master switch.

alluxio.write.cache.dual.buffer.file.system.type

Set to GENERIC_FDB_BACKED_V2.

alluxio.fuse.v2.enabled

false

Enable FUSE V2 interface (required).

alluxio.foundationdb.cluster.file.path

${alluxio.conf.dir}/fdb.cluster

Path to FDB cluster file. Auto-injected by Operator.

alluxio.user.write.cache.in.memory.write.buffer.size

16MiB

In-memory write buffer per file. Flushed to page store when full.

Coordinator Background Tasks

The Coordinator runs periodic background tasks to maintain system health, including cleaning up expired FDB locks and scanning for orphan files in UFS. These tasks run automatically when enabled and require no manual intervention.

Property
Default
Description

alluxio.coordinator.write.cache.background.tasks.enabled

true

Master switch for Coordinator background tasks.

alluxio.coordinator.write.cache.cleanup.invalid.locks.grace.duration

24h

Grace period before expired FDB locks are cleaned.

alluxio.coordinator.write.cache.check.ufs.orphan.file.period

6h

UFS orphan file scan interval (UFS mode only).

alluxio.coordinator.write.cache.cleanup.ufs.orphan.file.grace.duration

24h

Grace period before UFS orphan files are deleted (UFS mode only).

Compaction

Write operations use a log-structured format — each write appends a new write-log entry rather than modifying data in place. Over time, this causes read amplification because reads must merge all overlapping log entries. Compaction merges these logs into consolidated data, reclaiming storage and improving read performance. Compaction can be triggered by two conditions (whichever is met first): write-log count per file, or space amplification (ratio of physical storage to logical file size).

Property
Default
Description

alluxio.write.cache.compaction.enabled

true

Enable write-log compaction.

alluxio.user.write.cache.trigger.compaction.on.write.log.count

MAX_INT

Per-file write-log count threshold for compaction.

alluxio.user.write.cache.compaction.space.amplification.percent

50

Space amplification percentage threshold. Compaction is triggered when physical storage exceeds logical file size by this percentage. For example, 50 means compaction triggers at 150% of logical size.

alluxio.user.write.cache.compaction.space.amplification.min.file.size

64MiB

Minimum logical file size to evaluate space amplification. Files smaller than this threshold skip the space amplification check to avoid unnecessary I/O.

UFS PageStore (UFS Mode Only)

When UFS PageStore is enabled, file data is written directly to UFS (e.g., HDFS, S3, NAS) instead of being cached on Worker NVMe. This provides UFS-native durability at the cost of higher write latency.

Property
Default
Description

alluxio.user.write.cache.random.access.ufs.page.store.enabled

false

Use UFS as data store instead of Worker NVMe.

alluxio.user.write.cache.random.access.ufs.page.store.path

UFS base path for page store data.

Metadata Cache & Optimization

FDB mode stores all metadata in FoundationDB. For read-heavy workloads with frequent stat, getattr, or ls operations, enabling the in-memory metadata cache reduces FDB read load by caching metadata locally with a short TTL. The deferred attribute update option further reduces FDB writes by batching file attribute updates into the next data flush.

Property
Default
Description

alluxio.write.cache.metastore.cache.enabled

false

Enable in-memory metadata cache (3s TTL).

alluxio.user.fuse.write.cache.defer.open.file.attr.update.enabled

false

Batch attribute updates into next data flush, reducing FDB writes.

Monitoring

Coordinator Metrics

Metric
Description

alluxio_pfs_background_tasks{type, state}

Background task status. State: RUNNING / SUCCESS / FAILED.

alluxio_pfs_clean_invalid_lock_count

Invalid FDB locks cleaned up.

Client — Stream I/O Metrics

Metric
Description

alluxio_pfs_stream_open_streams

Number of currently open stream instances.

alluxio_pfs_stream_bytes_read{type}

Total bytes read. Label type: positioned / internal.

alluxio_pfs_stream_read_latency_ms{type}

Read latency histogram (ms). Label type: positioned / internal.

alluxio_pfs_stream_bytes_written{type}

Total bytes written. Label type: position_write / append_write.

alluxio_pfs_stream_write_latency_ms{type}

Write latency histogram (ms). Label type: position_write / append_write.

alluxio_pfs_stream_truncate_count{type}

Truncate operations. Label type: shrink / grow / noop.

alluxio_pfs_stream_flush_count{committed}

Flush operations. Label committed: true (data persisted) / false.

alluxio_pfs_stream_flush_latency_ms

Flush latency histogram (ms).

alluxio_pfs_stream_persist_count{type}

Data persist from memory to page store. Label type: sync / async.

alluxio_pfs_stream_read_ahead_triggered

Number of times async read-ahead was triggered.

alluxio_pfs_stream_inode_not_found{recovered}

Inode not found during flush (deleted file). Label recovered: true if recovery succeeded.

Client — Memory Management Metrics

Metric
Description

alluxio_pfs_memory_total_allocated_bytes

Total direct memory currently held by all stream instances.

alluxio_pfs_memory_alloc_count

Total buffer allocation count.

alluxio_pfs_memory_alloc_bytes

Total bytes of direct memory allocated.

alluxio_pfs_memory_release_count

Total buffer release count.

alluxio_pfs_memory_alloc_timeout_count

Buffer allocation timeouts (waiting for quota).

alluxio_pfs_memory_force_release_count

Unreleased buffers force-freed during stream close.

Client — Page Store Metrics

Metric
Description

alluxio_pfs_page_store_write_count{backend}

Page store write operations. Label backend: worker / ufs.

alluxio_pfs_page_store_write_bytes{backend}

Bytes written to page store.

alluxio_pfs_page_store_write_latency_ms{backend}

Page store write latency histogram (ms).

alluxio_pfs_page_store_read_count{backend}

Page store read operations.

alluxio_pfs_page_store_read_bytes{backend}

Bytes read from page store.

alluxio_pfs_page_store_read_latency_ms{backend}

Page store read latency histogram (ms).

alluxio_pfs_page_store_unpin_count

Page store file unpin (cleanup) operations.

Client — Compaction Metrics

Metric
Description

alluxio_pfs_compaction_triggered_count{source}

Compaction trigger count.

alluxio_pfs_compaction_latency_ms

Compaction latency histogram (ms). Also reported on Coordinator.

alluxio_pfs_compaction_throughput

Compaction write throughput (bytes). Also reported on Worker.

alluxio_pfs_compaction_reload_count

Extra reload iterations after compaction.

alluxio_pfs_compaction_block_outcome_count{outcome}

Per-block compaction outcome. Label outcome: worker_finished / worker_fallback / local.

alluxio_pfs_compaction_worker_submit_count{result}

Compaction task submission to worker. Label result: success / failure.

alluxio_pfs_compaction_worker_status_check_count{result}

Compaction status check probes.

alluxio_pfs_compaction_range_result_count{result}

Per-range compaction outcome. Label result: compacted / skipped.

alluxio_pfs_write_log_load_count

Write-log load operations count.

alluxio_pfs_write_log_load_latency_ms

Write-log load latency histogram (ms).

Client — Read-Ahead Metrics

Metric
Description

alluxio_pfs_read_pattern_sequential_count

Reads classified as sequential.

alluxio_pfs_read_pattern_total_count

Total reads recorded by read-ahead tracker.

alluxio_pfs_read_ahead_decision_count{should_prefetch}

Read-ahead decisions. Label should_prefetch: true / false.

alluxio_pfs_read_ahead_memory_used_bytes

Memory consumed by prefetch buffers.

alluxio_pfs_read_ahead_submitted_count

Read-ahead tasks submitted.

alluxio_pfs_read_ahead_skipped_count{reason}

Read-ahead tasks skipped. Label reason: memory_limit / duplicate.

alluxio_pfs_read_ahead_bytes_fetched

Total bytes prefetched.

alluxio_pfs_read_ahead_failed_count

Read-ahead tasks failed with I/O error.

alluxio_pfs_read_ahead_eviction_count{type}

Buffer evictions. Label type: stale / lru.

alluxio_pfs_read_ahead_evicted_bytes

Bytes evicted from prefetch buffers.

FDB Metrics

Metric
Description

alluxio_pfs_foundationdb_call_latency_ms{method, success}

FDB call latency histogram (ms).

alluxio_pfs_fdb_iterator_batch_read_count{iterator_type}

FDB iterator batch read count.

alluxio_pfs_fdb_iterator_entries_scanned{iterator_type}

Key-value entries scanned by FDB iterators.

alluxio_pfs_fdb_iterator_batch_read_latency_ms{iterator_type}

FDB iterator batch read latency histogram (ms).

alluxio_pfs_fdb_iterator_errors{iterator_type, error_type}

FDB iterator errors. Label error_type: transaction_too_old / pb_parse_error.

Troubleshooting

Write latency spikes periodically

Symptom: Write operations show periodic latency spikes every few seconds.

Cause: In-memory write buffer is full and must flush to the data store (Worker or UFS), which blocks the write call.

Fix:

  1. For Worker mode: ensure Worker NVMe has sufficient IOPS.

  2. For UFS mode: use a low-latency UFS (local NAS or HDFS with fast disks).

  3. Check if compaction is running concurrently — compaction reads and writes compete with front-end I/O.


Read latency increases over time

Symptom: Read operations slow down for files that are frequently overwritten.

Cause: Log-structured writes accumulate many write-log entries per block. Each read must merge all overlapping logs (read amplification).

Fix:

  1. Ensure alluxio.write.cache.compaction.enabled is true.

  2. Lower alluxio.user.write.cache.trigger.compaction.on.write.log.count (e.g., to 1024 for write-heavy files).

  3. Monitor alluxio_pfs_compaction_throughput to verify compaction is keeping up.


FDB transaction conflicts

Symptom: Write operations fail intermittently with FDB transaction conflict errors.

Cause: Multiple clients writing to the same file concurrently. FDB uses MVCC and detects conflicting transactions.

Fix:

  1. This is usually transient — the client retries automatically.

  2. If frequent, avoid multiple processes writing the same file simultaneously.

  3. Monitor FDB cluster health:


Orphan files accumulating (UFS mode)

Symptom: UFS storage usage grows even after files are deleted.

Fix:

  1. Reduce alluxio.coordinator.write.cache.check.ufs.orphan.file.period and alluxio.coordinator.write.cache.cleanup.ufs.orphan.file.grace.duration if faster cleanup is acceptable.

  2. Files younger than the grace duration are intentionally kept to avoid deleting data being actively written.


Out-of-space on Workers (Worker mode)

Symptom: Writes fail with space errors even though UFS has capacity.

Fix:

  1. Increase alluxio.worker.page.store.pinned.file.capacity.limit.ratio (default 0.3, raise to 0.5).

  2. Add Worker NVMe capacity or add more Workers.


FDB connection failure on startup

Symptom: FUSE pod or Worker fails to start with FDB connection errors.

Fix:

  1. Verify FDB pods are running:

  2. If using Operator-managed FDB, the cluster file is auto-injected. For external FDB, set alluxio.foundationdb.cluster.file.path explicitly.

Performance Tuning

Scenario
Recommendation

High FDB load

Enable alluxio.write.cache.metastore.cache.enabled, reduce metadata read frequency.

Read latency growing

Enable compaction, lower alluxio.user.write.cache.trigger.compaction.on.write.log.count.

Worker disk pressure

Add more Workers or increase Worker NVMe capacity.

Slow stat / getattr

Enable alluxio.write.cache.metastore.cache.enabled + alluxio.user.fuse.write.cache.defer.open.file.attr.update.enabled.

Heavy random writes

Ensure compaction is enabled with an appropriate trigger threshold.


Verify Full POSIX Operations

Write and Read-After-Write

✅ Success: Output shows hello posix write cache.

Random Write and Overwrite

✅ Success: Output shows overwritten.

Rename (Atomic)

✅ Success: Output shows atomic save.

Truncate

✅ Success: Output shows 4.

Directory Rename

✅ Success: Output shows file.txt.

✅ Success: Output shows link target and /data/test/original.txt.

See Also

Last updated