Cache Preloading
Last updated
Last updated
Distributed load allows users to load data from UFS to Alluxio cluster efficiently. This can be used to initialize the Alluxio cluster to be able to immediately serve cached data when running workloads on top of Alluxio. Distributed load can utilize and to enhance file distribution in scenarios with highly concurrent data access.
There are two recommended ways to trigger distributed load:
The job load
command can be used to load data from UFS (Under File System) to the Alluxio cluster.
The CLI sends a load request to the Alluxio coordinator, which subsequently distributes the load operation to all worker nodes.
For detailed usage of CLI, please refer to the documentation.
Similar to the CLI, the REST API can also be used to load data. Requests are sent directly to the coordinator.
Progress can be checked by sending a GET request with the same path.
The load operation can be terminated by sending a POST request.
The load jobs can be list by sending a POST request. The results only include load tasks within seven days. The residence time of historical tasks can be configured through alluxio.job.retention.time
.