# Optimizing Writes

Accelerate write-intensive workloads in Alluxio by decoupling writes from the underlying storage (UFS). This guide covers how to improve throughput and reducing latency using a **Cluster-Level Write Cache**.

## Cluster-Level Write Cache

{% hint style="warning" %}
This feature is experimental.
{% endhint %}

For workloads where UFS performance is a major bottleneck, the **Cluster-Level Write Cache** offers maximum write performance. Data is written directly to a dedicated set of Alluxio components that form the write cache, completely bypassing the UFS for the initial write.

This feature is ideal for temporary data that does not require immediate persistence, such as:

* Shuffle data from large-scale data processing jobs.

### Overview

The Cluster-Level Write Cache is a storage cluster managed independently from standard Alluxio workers. Clients interact directly with the cache components for any paths mounted for the write cache.

### Deploying the Cluster-Level Write Cache

Deploy the Cluster-Level Write Cache using the Alluxio Operator for Kubernetes. The deployment is configured under the `cacheOnly` section in your `AlluxioCluster` definition, which reflects the internal component name.

```yaml
apiVersion: k8s-operator.alluxio.com/v1
kind: AlluxioCluster
metadata:
  name: alluxio
spec:
  image: <PRIVATE_REGISTRY>/alluxio-enterprise
  imageTag: <TAG>
  
  # ... other configurations ...

  cacheOnly:
    enabled: true
    mountPath: "/write-cache"
    image: <PRIVATE_REGISTRY>/alluxio-cacheonly
    imageTag: <TAG>
    license: <YOUR_BASE64_ENCODED_LICENSE>
    
    journal:
      storageClass: "gp2"

    worker:
      count: 2
    tieredstore:
      levels:
        - level: 0
          alias: SSD
          mediumtype: SSD
          path: /data1/cacheonly/worker
          type: hostPath
          quota: 10Gi
```

### Accessing the Cluster-Level Write Cache

Once deployed, all requests to the configured `mountPath` (e.g., `/write-cache`) are routed to the Cluster-Level Write Cache.

**Access via Alluxio CLI:**

```shell
bin/alluxio fs ls /write-cache
```

### Asynchronous Persistence to UFS

For data that eventually needs to be persisted, Alluxio offers an optional async persistence mechanism to upload data from a cache path to a corresponding UFS path.

**Limitations**

* **Metadata Operations:** Only basic file persistence is supported; operations like `rename` are not reliably handled.
* **No UFS Cleanup:** Deleting a file from the cache does not delete it from the UFS.
* **No Reconciliation:** Alluxio cannot automatically reconcile diverging versions of a file between the cache and the UFS.

**Configuration**

To enable async persistence, configure a path mapping file on both the standard Alluxio masters and the masters for the Cluster-Level Write Cache.

1. **Set the property** in `alluxio-site.properties`:

   | Property                                             | Description                                      |
   | ---------------------------------------------------- | ------------------------------------------------ |
   | `alluxio.gemini.master.async.upload.local.file.path` | Path to the async upload path mapping JSON file. |
2. **Create the JSON mapping file**. This file defines the mappings from cache paths to UFS-backed Alluxio paths. Note the `cacheOnlyMountPoint` key is required.

   ```json
   {
     "cacheOnlyMountPoint": "/write-cache",
     "asyncUploadPathMapping": {
       "/write-cache/a": "/s3/a",
       "/write-cache/b": "/local/c"
     },
     "blackList": [
       ".tmp"
     ]
   }
   ```

### Advanced Configurations

**Multi-Replica Writes**

Enable multi-replica writes for fault tolerance within the Cluster-Level Write Cache by setting the replication factor.

```yaml
# In your AlluxioCluster spec:
properties:
  "alluxio.gemini.user.file.replication": "2"
```

**Multipart Upload**

Improve write performance by buffering data in client memory and uploading it in the background. Note that property names use the internal `cache.only` term.

```yaml
# In your AlluxioCluster spec:
properties:
  "alluxio.gemini.user.file.cache.only.multipart.upload.enabled": "true"
  "alluxio.gemini.user.file.cache.only.multipart.upload.threads": "16"
  "alluxio.gemini.user.file.cache.only.multipart.upload.buffer.number": "16"
```

**Note:** This feature significantly increases client memory usage.

### Cache Eviction

Files in the Cluster-Level Write Cache are not evicted automatically. You must manually delete them using `rm` or `alluxio fs rm` to free up space.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://documentation.alluxio.io/ee-da-en/data-access/performance/file-writing.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
