缓存预加载
分布式加载允许用户高效地将数据从 UFS 加载到 Alluxio 集群。 这可用于初始化 Alluxio 集群,以便在 Alluxio 上运行工作负载时能够立即提供缓存数据。 例如,分布式加载可用于为机器学习作业预取数据,从而加快训练过程。
使用方法
有两种触发分布式加载的推荐方法:
任务加载 CLI
任务加载
命令可用于将数据从 UFS(底层文件系统)加载到 Alluxio 集群。 CLI 会向 Alluxio coordinator 发送加载请求,coordinator 随后会将加载操作分发到所有 worker 节点。
bin/alluxio job load [flags] <path>
# 输出示例
Progress for loading path '/path':
Settings: bandwidth: unlimited verify: false
Job State: SUCCEEDED
Files Processed: 1000
Bytes Loaded: 125.00MB
Throughput: 2509.80KB/s
Block load failure rate: 0.00%
Files Failed: 0
有关 CLI 的详细用法,请参阅 job load 文档。
REST API
与 CLI 类似,REST API 也可用于加载数据。 请求可以发送到任意 worker 节点,worker 节点会将请求转发给 Alluxio coordinator,由 coordinator 分发到所有其他 worker 节点。
通过发送 POST 请求提交作业,请求中应包含目录路径,并将 submit
作为 optType
查询参数。
curl -v -H "Content-Type: application/json"\
http://coordinator_host:19999/api/v1/master/submit_job/load \
-d '{ "path": "s3://bucket/ufs/",
"options": {
"replicas":"1",
"batchSize": "200",
"partialListing": "true",
"loadMetadataOnly": "false",
"skipIfExists": "false"
}
}'
请求和响应示例:
curl -v -H 'Content-Type: application/json' \ http://coordinator_host:19999/api/v1/master/submit_job/load \
-d '{ "path": "s3://bucket/ufs/",
"options": {
"replicas":"1",
"batchSize": "200",
"partialListing": "true",
"loadMetadataOnly": "false",
"skipIfExists": "false"
}
}'
* Host coordinator_host:19999 was resolved.
* IPv6: (none)
* IPv4: 52.26.153.198, 35.162.171.204, 44.236.69.116, 35.80.181.135
* Trying 52.26.153.198:19999...
* Connected to coordinator_host (52.26.153.198) port 19999
> POST /api/v1/master/submit_job/load HTTP/1.1
> Host: coordinator_host:19999
> User-Agent: curl/8.7.1
> Accept: */*
> Content-Type: application/json
> Content-Length: 229
>
* upload completely sent off: 229 bytes
< HTTP/1.1 200 OK
< Date: Mon, 27 Jan 2025 08:08:04 GMT
< Content-Type: application/json
< Content-Length: 4
< Server: Jetty(9.4.53.v20231009)
<
* Connection #0 to host coordinator_host left intact
true
可以通过发送路径相同的 GET 请求,并将 progress
作为 opType
查询参数来检查进度。
curl -v -H 'Content-Type: application/json' -X GET http://coordinator_host:19999/api/v1/master/progress_job/load -d '{ "path": "s3:/bucket/ufs/", "verbose": "true", "format": "TEXT" } }'
请求和响应示例:
curl -v -H 'Content-Type: application/json' -X GET http://coordinator_host:19999/api/v1/master/progress_job/load \
-d '{ "path": "s3://bucket/ufs/", "verbose": "true", "format": "TEXT" } }'
* Host coordinator_hostm:19999 was resolved.
* IPv6: (none)
* IPv4: 52.26.153.198, 44.236.69.116, 35.162.171.204, 35.80.181.135
* Trying 52.26.153.198:19999...
* Connected to coordinator_host (52.26.153.198) port 19999
> GET /api/v1/master/progress_job/load HTTP/1.1
> Host: coordinator_host:19999
> User-Agent: curl/8.7.1
> Accept: */*
> Content-Type: application/json
> Content-Length: 73
>
* upload completely sent off: 73 bytes
< HTTP/1.1 200 OK
< Date: Mon, 27 Jan 2025 18:20:38 GMT
< Content-Type: application/json
< Content-Length: 462
< Server: Jetty(9.4.53.v20231009)
<
* Connection #0 to host coordinator_host left intact
"\tSettings:\tbandwidth: unlimited\tverify: false\tmetadata-only: false \tquota-check:false\n\tTime Elapsed: 00:00:08\n\tJob State: SUCCEEDED\n\tInodes Scanned: 62\n\tNon Empty File Copies Loaded: 54\n\tBytes Loaded: 2168.14MB\n\tThroughput: 271.02MB/s\n\tFile Failure rate: 0.00%\n\tSubtask Failure rate: 0.00%\n\tFiles Failed: 0\n\tRecent failed subtasks: \n\tRecent retrying subtasks: \n\tSubtask Retry rate: 0.00%\n\tSubtasks on Retry Dead Letter Queue: 0\
可以通过相同路径发送 POST 请求来终止加载操作,并将 stop
作为 opType
查询参数。
curl -v -H 'Content-Type: application/json' http://coordinator_host:19999/api/v1/master/stop_job/load \
-d '{ "path": "s3://bucket/ufs/" }'
请求和响应示例:
curl -v -H 'Content-Type: application/json' http://coordinator_host:19999/api/v1/master/stop_job/load \
-d '{ "path": "s3://bucket/ufs/" }'
* Host coordinator_host:19999 was resolved.
* IPv6: (none)
* IPv4: 35.80.181.135, 44.236.69.116, 52.26.153.198, 35.162.171.204
* Trying 35.80.181.135:19999...
* Connected to coordinator_host (35.80.181.135) port 19999
> POST /api/v1/master/stop_job/load HTTP/1.1
> Host: coordinator_host:19999
> User-Agent: curl/8.7.1
> Accept: */*
> Content-Type: application/json
> Content-Length: 35
>
* upload completely sent off: 35 bytes
< HTTP/1.1 200 OK
< Date: Mon, 27 Jan 2025 18:45:02 GMT
< Content-Type: application/json
< Content-Length: 5
< Server: Jetty(9.4.53.v20231009)
<
* Connection #0 to host coordinator_host left intact
true
Last updated