分布式加载允许用户高效地将数据从 UFS 加载到 Alluxio 集群。 这可用于初始化 Alluxio 集群,以便在 Alluxio 上运行工作负载时能够立即提供缓存数据。 例如,分布式加载可用于为机器学习作业预取数据,从而加快训练过程。 分布式加载可利用文件分割 和多重复制来加强高并发数据访问场景中的文件分发。
bin/alluxio job load [flags] <path>
# 输出示例
Progress for loading path '/path':
Settings: bandwidth: unlimited verify: false
Job State: SUCCEEDED
Files Processed: 1000
Bytes Loaded: 125.00MB
Throughput: 2509.80KB/s
Block load failure rate: 0.00%
Files Failed: 0
curl -H "Content-Type: application/json" -v -X POST http://coordinator_host:19999/api/v1/master/submit_job/load -d '{
"path": "s3://alluxiow/testm/dir-1/",
"options": {
"replicas":"2",
"batchSize": "300",
"partialListing": "true",
"loadMetadataOnly": "true",
"skipIfExists": "true"
}
}'
curl -H "Content-Type: application/json" -v -X GET http://coordinator_host:19999/api/v1/master/progress_job/load -d '{
"path or indexFile": "s3://bucket/dir-1/",
"format": "TEXT[default] | JSON",
"verbose": "true"
}'
curl -H "Content-Type: application/json" -v -X POST http://coordinator_host:19999/api/v1/master/stop_job/load -d '{
"path || indexFile": "s3://alluxiow/testm/dir-1/"
}'
curl http://ip:1999/api/v1/master/list_job?[job-type=LOAD[&job-state=[RUNNING|VERIFYING|STOPPED|SUCCEEDED|FAILED|ALL]]