> For the complete documentation index, see [llms.txt](https://documentation.alluxio.io/ee-ai-en/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://documentation.alluxio.io/ee-ai-en/ai-3.3/feature/cache-preloading.md).

# Cache Preloading

Distributed load allows users to load data from UFS to Alluxio cluster efficiently. This can be used to initialize the Alluxio cluster to be able to immediately serve cached data when running workloads on top of Alluxio. For example, distributed load can be used to prefetch data for machine learning jobs, speeding up the training process. Distributed load can utilize [file segmentation](/ee-ai-en/ai-3.3/feature/file-segmentation.md) and [multi-replication](/ee-ai-en/ai-3.3/feature/io-resiliency.md) to enhance file distribution in scenarios with highly concurrent data access.

## Usage

There are two recommended ways to trigger distributed load:

### job load CLI

The `job load` command can be used to load data from UFS (Under File System) to the Alluxio cluster. The CLI sends a load request to the Alluxio coordinator, which subsequently distributes the load operation to all worker nodes.

```shell
bin/alluxio job load [flags] <path>

# Example output
Progress for loading path '/path':
        Settings:       bandwidth: unlimited    verify: false
        Job State: SUCCEEDED
        Files Processed: 1000
        Bytes Loaded: 125.00MB
        Throughput: 2509.80KB/s
        Block load failure rate: 0.00%
        Files Failed: 0
```

For detailed usage of CLI, please refer to the [job load](/ee-ai-en/ai-3.3/reference/user-cli.md) documentation.

### REST API

Similar to the CLI, the REST API can also be used to load data. Requests are sent directly to the coordinator.

```shell
curl -H "Content-Type: application/json"  -v -X POST http://coordinator_host:19999/api/v1/master/submit_job/load -d '{
    "path": "s3://alluxiow/testm/dir-1/",
    "options": {
         "replicas":"2",
         "batchSize": "300",
         "partialListing": "true",
         "loadMetadataOnly": "true",
         "skipIfExists": "true"
    }
}'
```

Progress can be checked by sending a GET request with the same path.

```shell
curl -H "Content-Type: application/json"  -v -X GET http://coordinator_host:19999/api/v1/master/progress_job/load -d '{
  "path or indexFile": "s3://bucket/dir-1/",
  "format": "TEXT[default] | JSON",
  "verbose": "true"
}'
```

The load operation can be terminated by sending a POST request.

```shell
curl -H "Content-Type: application/json"  -v -X POST http://coordinator_host:19999/api/v1/master/stop_job/load -d '{
  "path || indexFile": "s3://alluxiow/testm/dir-1/"
}'
```

The load jobs can be list by sending a POST request.

```shell
curl http://ip:1999/api/v1/master/list_job?[job-type=LOAD[&job-state=[RUNNING|VERIFYING|STOPPED|SUCCEEDED|FAILED|ALL]]
```


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://documentation.alluxio.io/ee-ai-en/ai-3.3/feature/cache-preloading.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
