# 缓存预加载

分布式加载允许用户高效地将数据从 UFS 加载到 Alluxio 集群。 这可用于初始化 Alluxio 集群，以便在 Alluxio 上运行工作负载时能够立即提供缓存数据。 例如，分布式加载可用于为机器学习作业预取数据，从而加快训练过程。 分布式加载可利用[文件分割](/ee-ai-cn/ai-3.4/feature/file-segmentation.md) 和[多重复制](/ee-ai-cn/ai-3.4/feature/file-replication.md)来加强高并发数据访问场景中的文件分发。

## 使用方法

有两种触发分布式加载的推荐方法：

### 任务加载 CLI

`任务加载`命令可用于将数据从 UFS（底层文件系统）加载到 Alluxio 集群。 CLI 会向 Alluxio coordinator 发送加载请求，coordinator 随后会将加载操作分发到所有 worker 节点。

```shell
bin/alluxio job load [flags] <path>

# 输出示例
Progress for loading path '/path':
        Settings:       bandwidth: unlimited    verify: false
        Job State: SUCCEEDED
        Files Processed: 1000
        Bytes Loaded: 125.00MB
        Throughput: 2509.80KB/s
        Block load failure rate: 0.00%
        Files Failed: 0
```

有关 CLI 的详细用法，请参阅 [job load](/ee-ai-cn/ai-3.4/reference/user-cli.md) 文档。

### REST API

与 CLI 类似，REST API 也可用于加载数据。 请求直接发送到coordinator。

```shell
curl -H "Content-Type: application/json"  -v -X POST http://coordinator_host:19999/api/v1/master/submit_job/load -d '{
    "path": "s3://alluxiow/testm/dir-1/",
    "options": {
         "replicas":"2",
         "batchSize": "300",
         "partialListing": "true",
         "loadMetadataOnly": "true",
         "skipIfExists": "true"
    }
}'
```

可以通过发送路径相同的 GET 请求查询参数来检查进度。

```shell
curl -H "Content-Type: application/json"  -v -X GET http://coordinator_host:19999/api/v1/master/progress_job/load -d '{
  "path or indexFile": "s3://bucket/dir-1/",
  "format": "TEXT[default] | JSON",
  "verbose": "true"
}'
```

可以通过相同路径发送 POST 请求来终止加载操作。

```shell
curl -H "Content-Type: application/json"  -v -X POST http://coordinator_host:19999/api/v1/master/stop_job/load -d '{
  "path || indexFile": "s3://alluxiow/testm/dir-1/"
}'
```

查询load任务列表。

```shell
curl http://ip:1999/api/v1/master/list_job?[job-type=LOAD[&job-state=[RUNNING|VERIFYING|STOPPED|SUCCEEDED|FAILED|ALL]]
```


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://documentation.alluxio.io/ee-ai-cn/ai-3.4/feature/cache-preloading.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
