# What is Alluxio?

Alluxio is a distributed caching layer that sits between your object storage (S3, GCS, Azure Blob, HDFS) and your compute (PyTorch, vLLM, Spark, Ray). It pulls hot data onto the local NVMe or SSD of each compute node, so workloads read at local storage speed instead of crossing the network to object storage on every access — without moving or copying your data.

<figure><img src="/files/zK1eDV5pWeHJXOFiOKlV" alt=""><figcaption><p>Alluxio is deployed between storage and compute. Hot data is cached on local NVMe/SSD of each compute node and served to frameworks via POSIX, S3-compatible API, or Python FSSpec.</p></figcaption></figure>

## What Alluxio is used for

**Model training — read acceleration and checkpointing**

Training jobs read the same dataset files repeatedly across epochs. Without caching, every read crosses the network to object storage, leaving GPUs idle. Alluxio caches the dataset on the GPU cluster's local SSDs after the first pass, so subsequent epochs run at local storage speed — 10× or more faster than repeated S3 fetches.

For checkpoint writes, Alluxio's write-back cache (requires FoundationDB) bounds checkpoint latency to local NVMe speed and flushes to object storage asynchronously, removing the checkpoint stall from the training loop.

**Model serving — eliminating cold starts**

When a new inference replica starts, it must download model weights — often tens to hundreds of gigabytes — before serving its first request. Alluxio caches model weights on GPU nodes so replicas load from local NVMe instead of object storage. It also handles multi-cloud model distribution: weights are fetched once and served at line rate to any number of concurrent GPU replicas across clouds, without re-downloading per replica or per provider.

**Low-latency feature stores**

ML pipelines that retrieve features from Parquet files or other large datasets on object storage hit latency that can be too slow for real-time inference or short trading windows. Alluxio caches the hot feature data on local NVMe and presents it through a POSIX or S3-compatible interface, so existing query engines and dataframe libraries need no changes.

## Who uses Alluxio

* [**Fireworks AI**](https://www.alluxio.io/customer-stories/fireworks-ai-accelerates-inference-cold-starts-across-multiple-gpu-clouds-with-alluxio) — inference cold start across 10+ GPU clouds: model load time 20+ min → 2–3 min per replica; 50% egress cost reduction; \~2 PB served daily
* [**Dyna Robotics**](https://www.alluxio.io/customer-stories/dyna-robotics) — foundation model training on 10,000–100,000 HDF5 files/day: 30%+ throughput slowdowns eliminated; 88 TB distributed SSD cache across 16 H100 nodes; multi-cloud GPU scheduling without pipeline rewrites
* [**Blackout Power Trading**](https://www.alluxio.io/customer-stories/blackout-power-trading) — ML feature store for real-time power trading: inference query latency 37–83× faster (3,727 ms → 45 ms); scaled from 5,000 to 100,000+ models within the same 15-minute trading window

## Next Steps

* **Learn how it works:** [How Alluxio Works](/ee-ai-en/how-alluxio-works.md) — architecture, caching model, and data flow.
* **Deploy Alluxio:** [Get Started Guide](/ee-ai-en/start.md) — Kubernetes (Operator) or Docker on a Linux host.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://documentation.alluxio.io/ee-ai-en/what-is-alluxio.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.