# FUSE Write Optimization

{% hint style="warning" %}
This feature is experimental since AI-3.8.15.1.4.
{% endhint %}

This guide shows how to use the [Write Cache](/ee-ai-en/ai-3.8-15.1.x/performance/s3-write-cache.md) backend through the FUSE POSIX interface, enabling low-latency writes via standard filesystem calls (`write()`, `open()`, `close()`). Write Cache must already be deployed before following this guide.

## How It Relates to S3-API Write Cache

The Write Cache backend (FoundationDB metadata + NVMe data + async UFS persistence) is **shared** between the two access interfaces:

|                            | [S3-API Write Optimization](/ee-ai-en/ai-3.8-15.1.x/performance/s3-write-cache.md) | FUSE Write Optimization (this guide)                                                    |
| -------------------------- | ---------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------- |
| **Interface**              | S3-compatible API (`PUT`, `GET`)                                                   | POSIX filesystem calls (`write`, `read`)                                                |
| **Client**                 | AWS CLI, boto3, s3fs, any S3 client                                                | Any POSIX application, ML frameworks, shell tools                                       |
| **Write policies**         | `WRITE_THROUGH`, `WRITE_BACK`, `TRANSIENT`                                         | Same                                                                                    |
| **FoundationDB**           | Required                                                                           | Required (same FDB cluster)                                                             |
| **Additional limitations** | None beyond S3 semantics                                                           | See [POSIX Compatibility in Write-Cache Mode](#posix-compatibility-in-write-cache-mode) |

## Before You Start

* [ ] **Write Cache is already deployed** — FDB must be running and `alluxio.write.cache.enabled: "true"`. Verify:

  ```shell
  kubectl exec -n <NAMESPACE> <CLUSTER_NAME>-coordinator-0 -- \
    alluxio conf get alluxio.write.cache.enabled
  ```

  Expected: `true`. If not, complete [S3-API Write Optimization](/ee-ai-en/ai-3.8-15.1.x/performance/s3-write-cache.md) first.
* [ ] **FDB pods are healthy**:

  ```shell
  kubectl get pods -n <NAMESPACE> | grep fdb
  ```

  Expected: FDB controller, log, and storage pods all `Running`.
* [ ] **FUSE PVC exists**:

  ```shell
  kubectl get pvc -n <NAMESPACE> <CLUSTER_NAME>-fuse
  ```

  Expected: PVC is present (status `Pending` before a pod mounts it is normal).

## Recommended Cluster Configuration

When FUSE Write Cache is active, a larger fraction of NVMe capacity is consumed by unpersisted write data. Increase the pinned-space ratio from the default `0.3` to `0.5` in your `alluxio-cluster.yaml`:

```yaml
spec:
  properties:
    alluxio.write.cache.enabled: "true"
    # Raise from default 0.3 to 0.5 to accommodate write-heavy FUSE workloads
    alluxio.worker.page.store.pinned.file.capacity.limit.ratio: "0.5"
  fdb:
    enabled: true
```

Apply the change:

```shell
# Idempotent
kubectl apply -f alluxio-cluster.yaml
```

> Size workers' NVMe accordingly: at ratio `0.5` and total cache `1 TiB`, up to 500 GiB can be occupied by unpersisted write data at any time. If incoming write throughput exceeds persistence throughput, space fills and Alluxio returns `out-of-space` errors. See [Cache Space Management](/ee-ai-en/ai-3.8-15.1.x/performance/s3-write-cache.md#cache-space-management).

## Deploy a FUSE Client Pod

The operator creates a PVC named `<CLUSTER_NAME>-fuse` during cluster installation. Mount it with `mountPropagation: HostToContainer` for auto-recovery if the FUSE process restarts.

```yaml
apiVersion: v1
kind: Pod
metadata:
  name: fuse-test-0
  namespace: <NAMESPACE>
spec:
  containers:
    - image: ubuntu:22.04
      name: fuse-test
      command: ["sleep", "infinity"]
      volumeMounts:
        - mountPath: /data
          name: alluxio-pvc
          mountPropagation: HostToContainer
  volumes:
    - name: alluxio-pvc
      persistentVolumeClaim:
        claimName: <CLUSTER_NAME>-fuse
```

```shell
# Idempotent
kubectl apply -f fuse-pod.yaml
kubectl -n <NAMESPACE> get pod fuse-test-0
```

Expected: `STATUS = Running`, `READY = 1/1`.

For full FUSE deployment options (DaemonSet, Docker / Bare-Metal), see [POSIX API](/ee-ai-en/ai-3.8-15.1.x/data-access/fuse-based-posix-api.md).

## Configure Write-Back Paths

Write policies are configured at the path level — the same as in [S3-API Write Optimization](/ee-ai-en/ai-3.8-15.1.x/performance/s3-write-cache.md#path-level-configuration). The paths refer to the Alluxio namespace (e.g., `/s3/checkpoints`), not the FUSE mount path (`/data/s3/checkpoints`).

Non-interactive configuration (for scripting):

```shell
kubectl exec -i -n <NAMESPACE> <CLUSTER_NAME>-coordinator-0 -- \
  bash -c 'EDITOR="cp /dev/stdin" alluxio pathconfig edit' << 'EOF'
{
  "apiVersion": "v1.0",
  "defaultRule": {
    "description": "Global default",
    "policyMode": "WRITE_THROUGH"
  },
  "pathRules": [
    {
      "alluxioPath": "/s3/checkpoints/**",
      "description": "Low-latency checkpoint writes",
      "policyMode": "WRITE_BACK",
      "properties": { "writeReplicas": 1 }
    }
  ]
}
EOF
```

Expected: `Update successful!`

Verify a specific path resolves to the expected policy:

```shell
kubectl exec -n <NAMESPACE> <CLUSTER_NAME>-coordinator-0 -- \
  alluxio pathconfig test --path /s3/checkpoints/epoch-1/model.pt
```

Expected: output contains `"policyMode": "WRITE_BACK"`.

## Verify Write-Back via FUSE

Write a file through FUSE and confirm it is eventually persisted to UFS:

```shell
# Write through FUSE
kubectl exec -i -n <NAMESPACE> fuse-test-0 -- \
  bash -c 'echo "hello from fuse write cache" > /data/s3/checkpoints/test.txt'

# Confirm visible in Alluxio namespace
kubectl exec -i -n <NAMESPACE> <CLUSTER_NAME>-coordinator-0 -- \
  alluxio fs ls /s3/checkpoints/test.txt
```

Wait for async persistence (up to `alluxio.write.cache.async.file.check.period`, default `10min`):

```shell
# Poll until PERSISTED (8 iterations × 15 s = 2 minutes)
for i in $(seq 1 8); do
  echo "--- Check $i/8 ---"
  NOT_PERSISTED=$(kubectl exec -i -n <NAMESPACE> <CLUSTER_NAME>-coordinator-0 -- \
    alluxio fs ls /s3/checkpoints/ 2>&1 | grep -c "NOT_PERSISTED" || true)
  if [ "${NOT_PERSISTED}" = "0" ]; then
    echo "All files PERSISTED."
    break
  fi
  echo "${NOT_PERSISTED} file(s) still pending. Waiting 15s..."
  sleep 15
done
```

Expected: `All files PERSISTED.` within 2 minutes.

***

## POSIX Compatibility in Write-Cache Mode

{% hint style="warning" %}
The FUSE Write Cache is built on the same backend as the S3-API Write Cache. As a result, the FUSE mount has additional restrictions that go beyond both standard POSIX and standard FUSE semantics. These restrictions apply to **all write-cache policies** (`WRITE_THROUGH`, `WRITE_BACK`, `TRANSIENT`). Review them before migrating workloads to a write-cache FUSE mount.
{% endhint %}

### rename() returns EIO

All `rename()` operations fail with `EIO` while any write-cache policy is active, regardless of whether the file has been persisted to UFS. This affects:

* Shell `mv` and `rename`
* Python `os.rename()`, `pathlib.Path.rename()`
* Write-then-rename patterns (write to `.tmp`, rename into place)

**Workaround — silly rename interception (opt-in):** When applications perform `rm` on open files, Linux internally issues a `rename()` to `.fuse_hidden*`. Enable the interceptor to handle this transparently:

```yaml
spec:
  properties:
    # CLIENT-scoped: set in FUSE client properties
    alluxio.fuse.silly.rename.interceptor.enabled: "true"
```

With this option enabled, Alluxio intercepts `.fuse_hidden*` renames and handles open-file deletion without triggering an S3 `CopyObject + DeleteObject`. Default is `false`.

### Files are write-once after close

Once a file is closed, it cannot be re-opened for writing, appending, or truncating:

| Operation                                                            | errno    |
| -------------------------------------------------------------------- | -------- |
| `open(path, O_CREAT \| O_EXCL)` — file already exists                | `EEXIST` |
| `open(path, O_WRONLY)` or `open(path, O_RDWR)` — file already exists | `EACCES` |

**Impact:** Applications that update files in place (databases, log rotation, config rewriters) will not work through a write-cache FUSE mount. The write-cache FUSE mount is best suited for write-once workloads: model checkpoints, training datasets, ETL stage outputs.

### Hard links are not supported

`link()` returns `EOPNOTSUPP`. Tools that rely on hard links (`rsync --hard-links`, some package managers) will not work through the mount.

### Cache page reclaim on delete (15.1.3+ behavior)

When a file is deleted via FUSE `rm` or `rm -rf`, cached pages are reclaimed on **all** workers that hold copies of the file — not only the hash-ring owner. In builds prior to 15.1.3, only the owner worker reclaimed pages; other workers retained orphaned pages until the next eviction cycle.

***

## Monitoring Async Persistence

Two CLI commands (available 15.1.3+) let you inspect in-flight persist operations without waiting for `alluxio fs ls`:

```shell
# List all files pending or in-progress on a specific worker
kubectl exec -i -n <NAMESPACE> <CLUSTER_NAME>-worker-0 -- \
  alluxio async-persist list

# Scope to a single worker by ID
kubectl exec -i -n <NAMESPACE> <CLUSTER_NAME>-worker-0 -- \
  alluxio async-persist list --worker <WORKER_ID>

# Check the persist state and retry count for a specific path
kubectl exec -i -n <NAMESPACE> <CLUSTER_NAME>-coordinator-0 -- \
  alluxio async-persist stat --path /s3/checkpoints/epoch-1/model.pt
```

Use `async-persist stat` when `alluxio fs ls` shows a file stuck in `NOT_PERSISTED` to determine whether the issue is in the queue or the upload itself.

## Key Configuration

| Property                                                     | Default | Description                                                                                                                                                         |
| ------------------------------------------------------------ | ------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `alluxio.write.cache.enabled`                                | `false` | Enables Write Cache (shared with S3 API).                                                                                                                           |
| `alluxio.worker.page.store.pinned.file.capacity.limit.ratio` | `0.3`   | Max fraction of NVMe capacity for unpersisted write data. Raise to `0.5` for write-heavy FUSE workloads.                                                            |
| `alluxio.write.cache.async.file.check.period`                | `10min` | Scan interval for orphan detection. Shorter values increase FDB load.                                                                                               |
| `alluxio.write.cache.async.check.orphan.timeout`             | `1h`    | Uncommitted writes older than this are treated as abandoned and cleaned up.                                                                                         |
| `alluxio.fuse.silly.rename.interceptor.enabled`              | `false` | CLIENT-scoped. Intercepts `.fuse_hidden*` rename/unlink for transparent `rm` of open files.                                                                         |
| `alluxio.worker.mark.writing.files.duration`                 | `10min` | If a file is open for write but receives no new data for this duration, the worker treats it as a dangling write eligible for cleanup. Timer resets on every write. |

## Troubleshooting

### Directory deletion returns DEADLINE\_EXCEEDED

Running `alluxio fs rm -R` or `rm -rf` on a `WRITE_BACK` path may fail with:

```
io.grpc.StatusRuntimeException: DEADLINE_EXCEEDED: CallOptions deadline exceeded after ~5s
```

{% hint style="danger" %}
Despite the error, the underlying files **may have already been deleted from UFS** before the timeout. Do not assume the data is still present.
{% endhint %}

**Recovery steps:**

1. Verify UFS state directly:

   ```shell
   aws s3 ls s3://<BUCKET>/<path>/ --recursive | head -20
   ```
2. If files are gone from S3, the deletion succeeded at the data layer. Re-running `alluxio fs rm -R` will confirm by returning `Path does not exist`.
3. Pagestore disk space may not shrink immediately — orphaned pages are reclaimed on the next eviction cycle.

***

### Files stuck in NOT\_PERSISTED

```shell
kubectl exec -i -n <NAMESPACE> <CLUSTER_NAME>-coordinator-0 -- \
  alluxio fs ls /s3/checkpoints/
```

If files remain `NOT_PERSISTED` beyond `alluxio.write.cache.async.file.check.period`:

1. Check async-persist queue:

   ```shell
   kubectl exec -i -n <NAMESPACE> <CLUSTER_NAME>-worker-0 -- \
     alluxio async-persist list
   ```
2. Check specific file status:

   ```shell
   kubectl exec -i -n <NAMESPACE> <CLUSTER_NAME>-coordinator-0 -- \
     alluxio async-persist stat --path /s3/checkpoints/<filename>
   ```
3. Check worker logs for upload errors:

   ```shell
   kubectl logs -n <NAMESPACE> <CLUSTER_NAME>-worker-0 --tail=100 | \
     grep -i "persist\|upload\|flush"
   ```
4. If UFS is unreachable, retries enter exponential backoff (up to `alluxio.worker.write.cache.async.persist.retry.max.interval`, default `1h`). Verify UFS connectivity from the worker pod.

***

### rename() returns EIO unexpectedly

This is expected behaviour when any write-cache policy is active (see [rename() returns EIO](#rename-returns-eio)). If your application relies on rename:

* Switch the affected path to `NO_CACHE` policy to bypass the write cache entirely for that path.
* Enable `alluxio.fuse.silly.rename.interceptor.enabled: "true"` if the rename is triggered by `rm` of an open file.

***

### FUSE pod OOM or mount not connected

These are not write-cache-specific. See [FUSE Troubleshooting](/ee-ai-en/ai-3.8-15.1.x/data-access/fuse-based-posix-api.md#troubleshooting).

## See Also

* [S3-API Write Optimization](/ee-ai-en/ai-3.8-15.1.x/performance/s3-write-cache.md) — write cache via S3 API; deploy this first
* [POSIX API](/ee-ai-en/ai-3.8-15.1.x/data-access/fuse-based-posix-api.md) — FUSE deployment details, mount options, read-cache mode
* [S3 API Benchmarks](/ee-ai-en/ai-3.8-15.1.x/benchmark/s3-api.md) — S3-side write throughput baselines
* [Benchmarking POSIX Performance](/ee-ai-en/ai-3.8-15.1.x/benchmark/benchmarking-posix-performance.md) — FUSE-side throughput baselines


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://documentation.alluxio.io/ee-ai-en/ai-3.8-15.1.x/performance/fuse-write-cache.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
