Distributed Copy
Distributed copy allows users to efficiently copy data from one UFS to another UFS. This can be used for data relocation and synchronization between two UFS.
Usage
There are two recommended ways to trigger distributed load:
job copy CLI
The CLI sends a copy request to the Alluxio coordinator, which subsequently distributes the copy operation to all worker nodes.
Copy bin/alluxio job copy [flags]
Operation copy task:
Copy $ {ALLUXIO_HOME}/bin/alluxio job copy --src s3://bucket/src --dst s3://bucket/dst --[submit|progress|stop]
$ {ALLUXIO_HOME}/bin/alluxio job copy --index-file /indexfile --[submit|progress|stop]
Example output
Copy Progress for jobId 1c849041-ef26-4ed7-a932-2af5549754d7 copying path '/src' to '/dst':
Settings: "check-content: false"
Job Submitted: 2023-06-30 12:30:45.0
Job Id: 111111
Job State: RUNNING
Files qualified so far: 1, 826.38MB
Files Failed: 0
Files Skipped: 0
Files Succeeded: 1
Bytes Copied: 826.38MB
Throughput: 1621.09KB/s
Files failure rate: 0.00%
For detailed usage of CLI, please refer to the job copy documentation。
REST API
Similar to the CLI, the REST API can also be used to copy data. Requests are sent directly to the coordinator.
Copy curl -H "Content-Type: application/json" -v -X POST http://coordinator_host:19999/api/v1/master/submit_job/copy -d '{
"src": "s3://alluxiow/testm/dir-1/dir1",
"dst": "s3://alluxiow/testm/dir-1/dir2",
"options": {
"batchSize": "300",
"check_content": "true"
}
}'
Progress can be checked by sending a GET request with the same path.
Copy curl -H "Content-Type: application/json" -v -X GET http://coordinator_host:19999/api/v1/master/progress_job/copy -d '{
"src": "s3://alluxiow/testm/dir-1/dir1",
"dst": "s3://alluxiow/testm/dir-1/dir2",
"format": "TEXT[default] | JSON",
"verbose": "true"
}'
The copy operation can be terminated by sending a POST request.
Copy curl -H "Content-Type: application/json" -v -X POST http://coordinator_host:19999/api/v1/master/stop_job/copy -d '{
"src": "s3://alluxiow/testm/dir-1/dir1",
"dst": "s3://alluxiow/testm/dir-1/dir2"
}'
The copy jobs can be list by sending a POST request. The results only include copy tasks within seven days. The residence time of historical tasks can be configured through alluxio.job.retention.time
.
Copy curl http://coordinator_host:19999/api/v1/master/list_job?[job-type=COPY[&job-state=[RUNNING|VERIFYING|STOPPED|SUCCEEDED|FAILED|ALL]]