# Release Notes

### Alluxio Enterprise AI 3.9

Alluxio AI 3.9 transforms Alluxio from a read-centric data cache into a full read/write data acceleration platform for AI, unlocking new use cases across the AI workflow while improving performance, reliability, and operational maturity.

This release advances two major product priorities:

* **Road to a Fast Read/Write Cache System** — Alluxio now supports write-intensive workloads with POSIX-compatible write cache and S3 multipart upload support, enabling model checkpointing and data preprocessing.
* **Strengthening the Data Engine for AI** — Native RDMA data transport, zero-copy and worker I/O optimizations, and stronger reliability improvements further strengthen Alluxio for demanding AI infrastructure.

#### New Features

**MLOps Workspace — FUSE Full POSIX Workspace**

{% hint style="warning" %}
Experimental since AI 3.9
{% endhint %}

Alluxio AI 3.9 introduces FUSE Full POSIX Workspace, enabling ML engineers to run interactive workloads directly on Alluxio-backed FUSE mounts with full POSIX semantics. Compared with basic FUSE write optimization, this mode supports random writes, overwrites, truncation, rename, symlinks, and other standard POSIX operations through a FUSE mount.

FDB-backed metadata enables multi-node access to the same dataset, while data can be stored on Worker NVMe for low latency or UFS PageStore for higher durability. Typical workloads include `git clone`, `vim`, `pip install`, continuous logging, data preprocessing, and migration of legacy POSIX applications without code changes. In validation testing, the Workspace reached up to **8.99 GB/s** peak sequential write throughput and **8.01 GB/s** peak hot-cache read throughput.

See [FUSE Full POSIX Workspace](/ee-ai-en/performance/fuse-workspace.md) for configuration and usage details.

**Model Training Checkpointing — S3 and FUSE**

{% hint style="warning" %}
Experimental since AI 3.9
{% endhint %}

Alluxio AI 3.9 adds high-performance checkpointing support for model training through both S3 and FUSE interfaces.

S3 write cache now supports standard Multipart Upload (MPU), enabling multi-gigabyte checkpoint files. In addition, checkpoint data is written to local cache first and then persisted asynchronously to object storage, reducing application-visible checkpoint latency and helping minimize GPU idle time during checkpoint operations.Validation testing showed up to **10.20 GB/s** single-worker checkpoint write throughput.

See [S3-API Write Optimization](/ee-ai-en/performance/s3-write-cache.md) and [FUSE Write Optimization](/ee-ai-en/performance/fuse-write-cache.md) for configuration and usage details.

**Native RDMA Data Transport**

{% hint style="warning" %}
Experimental since AI 3.9
{% endhint %}

Alluxio AI 3.9 adds native RDMA (Remote Direct Memory Access) transport for read I/O, bypassing the kernel networking stack to improve throughput and latency for data access workloads such as model loading, training data reads, and inference serving.

In single-node testing, RDMA reached **23.2 GB/s** on **200Gbps InfiniBand** and **49.5 GB/s** on **400Gbps InfiniBand**. In a 3-worker, 3-client cluster running on **200Gbps InfiniBand** nodes, RDMA scaled to **62.5 GB/s** aggregate throughput. Small-read latency reached **64 µs** P99 for 4 KB reads on 200G and approximately **59 µs** P99.9 on 400G.

RDMA support in this release applies to read I/O. Write cache write paths continue to use the standard TCP transport.

See [RDMA Networking](/ee-ai-en/performance/rdma-networking.md) for configuration and usage details.

#### Enhancements

**Cache Usage Insights from Access Logs**

Alluxio AI 3.9 introduces a cache observability framework that provides fine-grained access logs in addition to time-series metrics.

This helps with cache capacity planning, per-business-unit usage auditing, and chargeback analysis. The framework adds file-level visibility into hot and cold data distribution, per-workload access pattern analysis, and operational controls such as dynamic configuration, CLI-based log management, time-window deduplication, and sampling-rate tuning.

See [Access Log](/ee-ai-en/administration/audit-access-logs/access-log.md) for configuration and usage details.

**Cluster Operation Enhancements**

Alluxio AI 3.9 also improves cluster operations for large-scale deployments.

* **Recoverable data isolation for multi-tenant Kubernetes** — CSI-based subdirectory isolation replaces fragile `volumeMounts.subPath`, and mounted data survives FUSE pod restarts.
* **Independent worker service binding** — Worker RPC, REST, web, data, and RDMA services can use separate NIC and device bindings.
* **Job service reliability improvements** — Zombie job reconciliation, stable Job ID-based management, and stronger etcd-backed scheduler state handling improve operational robustness.
* **Write cache operational background tasks** — Async persist scanning, replica checks, orphan cleanup, invalid lock cleanup, and temp-file promotion are automated.
* **FUSE and deployment improvements** — Additional HDFS 3.4 compatibility, NAS UFS improvements, FUSE log rotation, and cluster diagnostics improve operability.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://documentation.alluxio.io/ee-ai-en/release-notes.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
