File Copy

Distributed Copy

Distributed copy allows users to efficiently copy data from one UFS to another UFS. This can be used for data relocation and synchronization between two UFS.

Usage

There are two recommended ways to trigger distributed load:

job copy CLI

The CLI sends a copy request to the Alluxio coordinator, which subsequently distributes the copy operation to all worker nodes.

bin/alluxio job copy [flags]
- Operation copy task:
```shell
$ {ALLUXIO_HOME}/bin/alluxio job copy --src s3://bucket/src --dst s3://bucket/dst --[submit|progress|stop]

$ {ALLUXIO_HOME}/bin/alluxio job copy --index-file /indexfile --[submit|progress|stop]

Example output

Progress for jobId 1c849041-ef26-4ed7-a932-2af5549754d7 copying path '/src' to '/dst': Settings: "check-content: false" Job Submitted: 2023-06-30 12:30:45.0 Job Id: 111111 Job State: RUNNING Files qualified so far: 1, 826.38MB Files Failed: 0 Files Skipped: 0 Files Succeeded: 1 Bytes Copied: 826.38MB Throughput: 1621.09KB/s Files failure rate: 0.00%


For detailed usage of CLI, please refer to the [job copy](../reference/User-CLI.md) documentation。

### REST API

Similar to the CLI, the REST API can also be used to copy data.
Requests are sent directly to the coordinator.

```shell
curl -H "Content-Type: application/json"  -v -X POST http://coordinator_host:19999/api/v1/master/submit_job/copy -d '{
  "src": "s3://alluxiow/testm/dir-1/dir1",
  "dst": "s3://alluxiow/testm/dir-1/dir2",
  "options": {
    "batchSize": "300",
    "check_content": "true"
  }
}'

Progress can be checked by sending a GET request with the same path.

curl -H "Content-Type: application/json"  -v -X GET http://coordinator_host:19999/api/v1/master/progress_job/copy -d '{
  "src": "s3://alluxiow/testm/dir-1/dir1",
  "dst": "s3://alluxiow/testm/dir-1/dir2",
  "format": "TEXT[default] | JSON",
  "verbose": "true"
}'

The copy operation can be terminated by sending a POST request.

curl -H "Content-Type: application/json"  -v -X POST http://coordinator_host:19999/api/v1/master/stop_job/copy -d '{
  "src": "s3://alluxiow/testm/dir-1/dir1",
  "dst": "s3://alluxiow/testm/dir-1/dir2"
}'

The copy jobs can be list by sending a POST request. The results only include copy tasks within seven days. The residence time of historical tasks can be configured through alluxio.job.retention.time.

curl http://coordinator_host:19999/api/v1/master/list_job?[job-type=COPY[&job-state=[RUNNING|VERIFYING|STOPPED|SUCCEEDED|FAILED|ALL]]

Last updated