# Writing Temporary Files

{% hint style="warning" %}
This feature is experimental.
{% endhint %}

In certain scenarios, the performance and bandwidth of the underlying file system (UFS) may not meet the needs of large-scale data writes. To address this issue, Alluxio offers an option to write data directly to the Alluxio cluster only. Since the process does not interact with UFS, the write performance and bandwidth depends entirely on the performance and bandwidth of the Alluxio cluster. This feature is called CACHE\_ONLY.

The recommended use cases for CACHE\_ONLY include:

* Temporarily saving checkpoint files during AI training
* Shuffle files generated during big data computations

In these use cases, the files written are temporary in nature and not meant to be persisted in storage for long term use.

## Enabling CACHE\_ONLY

To use the CACHE\_ONLY feature, the CACHE\_ONLY storage component must be separately deployed. Note that Alluxio client directly interfaces with the CACHE\_ONLY storage and does not communicate with the Alluxio worker. The data and metadata in CACHE\_ONLY storage are managed independently by CACHE\_ONLY storage itself. Since files are managed separately, files in the CACHE\_ONLY cannot interact with all the other files served by the Alluxio workers.

<figure><img src="/files/G5sc5B0jwuQGcZixrn6c" alt=""><figcaption></figcaption></figure>

### Deploying CACHE\_ONLY storage on Kubernetes

The deployment of CACHE\_ONLY storage is integrated into the Alluxio operator. Enable it by populating the `cacheOnly` field in the Alluxio deployment file.

```yaml
apiVersion: k8s-operator.alluxio.com/v1
kind: AlluxioCluster
metadata:
  name: alluxio
spec:
  image: <PRIVATE_REGISTRY>/alluxio-enterprise
  imageTag: <TAG>
  properties:

  worker:
    count: 2

  pagestore:
    size: 100Gi

  cacheOnly:
    enabled: true
    mountPath: "/cache-only"
    image: <PRIVATE_REGISTRY>/alluxio-cacheonly
    imageTag: <TAG>
    imagePullPolicy: IfNotPresent

    # Replace with base64 encoded license generated by
    # cat /path/to/license.json | base64 |  tr -d "\n"
    license:

    properties:

    journal:
      storageClass: "gp2"

    worker:
      count: 2
    tieredstore:
      levels:
        - level: 0
          alias: SSD
          mediumtype: SSD
          path: /data1/cacheonly/worker
          type: hostPath
          quota: 10Gi
```

**Note:** The CACHE\_ONLY Worker requires local disk storage for CACHE\_ONLY data. This disk space is completely independent of the Alluxio Worker cache, so estimate the required capacity and reserve disk space accordingly.

### Configuring Resource Usage

Configure `cacheOnly.master.resources` and `cacheOnly.worker.resources` in a similar fashion as the `coordinator` and `worker` fields.

```yaml
apiVersion: k8s-operator.alluxio.com/v1
kind: AlluxioCluster
metadata:
  name: alluxio
spec:
  cacheOnly:
    enabled: true
    master:
      count: 1
      resources:
        limits:
          cpu: "8"
          memory: "40Gi"
        requests:
          cpu: "8"
          memory: "40Gi"
      jvmOptions:
        - "-Xmx24g"
        - "-Xms24g"
        - "-XX:MaxDirectMemorySize=8g"
    worker:
      count: 2
      resources:
        limits:
          cpu: "8"
          memory: "20Gi"
        requests:
          cpu: "8"
          memory: "20Gi"
      jvmOptions:
        - "-Xmx8g"
        - "-Xms8g"
        - "-XX:MaxDirectMemorySize=8g"
```

The recommended memory calculation is:

```
(${Xmx} + ${MaxDirectMemorySize}) * 1.1 <= ${requests} = ${limit}
```

### Accessing CACHE\_ONLY

Once CACHE\_ONLY storage is deployed, all requests to its mount point will be treated as CACHE\_ONLY requests. You can access CACHE\_ONLY data in various ways.

Access using the Alluxio CLI:

```shell
bin/alluxio fs ls /cache_only
```

Access using the Alluxio FUSE interface:

```shell
cd ${fuse_mount_path}/cache_only
echo '123' > test.txt
cat test.txt
```

## Advanced Configurations

### Enabling Multi-Replica

CACHE\_ONLY supports multi-replica writes. Enable this feature by adding the `alluxio.gemini.user.file.replication` configuration in the deployment file:

```yaml
apiVersion: k8s-operator.alluxio.com/v1
kind: AlluxioCluster
metadata:
  name: alluxio
spec:
  properties:
    "alluxio.gemini.user.file.replication": "2"
```

### Enabling Multipart Upload

Alluxio supports temporarily storing data in memory and uploading it to the CACHE\_ONLY cluster in the background using multipart uploads to improve write performance. To enable this feature, add the following configurations:

```yaml
apiVersion: k8s-operator.alluxio.com/v1
kind: AlluxioCluster
metadata:
  name: alluxio
spec:
  properties:
    "alluxio.gemini.user.file.cache.only.multipart.upload.enabled": "true"
    "alluxio.gemini.user.file.cache.only.multipart.upload.threads": "16"
    "alluxio.gemini.user.file.cache.only.multipart.upload.buffer.number": "16"
```

| Configuration Item                                                 | Default | Description                                    |
| ------------------------------------------------------------------ | ------- | ---------------------------------------------- |
| alluxio.gemini.user.file.cache.only.multipart.upload.enabled       | false   | Enables the multipart upload feature           |
| alluxio.gemini.user.file.cache.only.multipart.upload.threads       | 16      | Maximum number of threads for multipart upload |
| alluxio.gemini.user.file.cache.only.multipart.upload.buffer.number | 16      | Number of memory buffers for multipart upload  |

**Note:** Enabling multipart upload will significantly increase the memory usage of the Alluxio client. The memory usage is calculated as follows:

```
${alluxio.gemini.user.file.cache.only.multipart.upload.buffer.number} * 64MB
```

### Cache Eviction

Files stored as CACHE\_ONLY will not be automatically evicted. The files can be manually deleted to free up space if the capacity is near full. Delete it via Alluxio FUSE with `rm ${file_path}` or run the Alluxio CLI command `bin/alluxio fs rm ${file_path}`


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://documentation.alluxio.io/ee-ai-en/ai-3.5/performance/writing-temp-files.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
