Alluxio
ProductsLanguageHome
AI-3.6 (stable)
AI-3.6 (stable)
  • Overview
    • Alluxio Namespace and Under File System
    • Worker Management and Consistent Hashing
    • Multi Tenancy and Unified Management
    • I/O Resiliency
  • Getting Started with K8s
    • Resource Prerequisites and Compatibility
    • Installation
      • Install on Kubernetes
      • Handling Images
      • Advanced Configuration
      • License
    • Monitoring and Metrics
    • Management Console
      • Deployment
      • Navigation
      • User Roles & Access Control
    • Cluster Administration
    • System Health Check & Quick Recovery
    • Diagnostic Snapshot
  • Storage Integrations
    • Amazon AWS S3
    • Google Cloud GCS
    • Azure Blob Store
    • Aliyun OSS
    • Tencent COS
    • Volcengine TOS
    • Baidu Object Storage
    • HDFS
    • Network Attached Storage (NAS)
  • Data Access
    • Access via FUSE (POSIX API)
      • Client Writeback
      • Client Virtual Path Mapping
    • Access via S3 API
    • Access via PythonSDK/FSSpec
    • Data Access High Availability
      • Multiple Replicas
      • Multiple Availability Zones (AZ)
    • Performance Optimizations
      • File Reading
      • File Writing
      • Metadata Listing
    • UFS Bandwidth Limiter
  • Cache Management
    • Cache Filter Policy
    • Cache Loading
    • Cache Eviction
      • Manual Eviction by Free Command
      • Auto Eviction by TTL Policy
      • Auto Eviction by Priority Policy
    • Stale Cache Cleaning
    • Cache Quota
  • Performance Benchmarks
    • Fio (POSIX) Benchmark
    • COSBench (S3) Benchmark
    • MLPerf Storage Benchmark
  • Security
    • TLS Support
  • Reference
    • User CLI
    • Metrics
    • REST API
    • S3 API Usage
    • Third Party Licenses
  • Release Notes
Powered by GitBook
On this page
  • Usage
  • job load CLI
  • REST API
  1. Cache Management

Cache Loading

Distributed load allows users to load data from UFS to Alluxio cluster efficiently. This can be used to initialize the Alluxio cluster to be able to immediately serve cached data when running workloads on top of Alluxio. For example, distributed load can be used to prefetch data for machine learning jobs, speeding up the training process. Distributed load can utilize file segmentation and multi-replication to enhance file distribution in scenarios with highly concurrent data access.

Usage

There are two recommended ways to trigger distributed load:

job load CLI

The job load command can be used to load data from UFS (Under File System) to the Alluxio cluster. The CLI sends a load request to the Alluxio coordinator, which subsequently distributes the load operation to all worker nodes.

bin/alluxio job load [flags] <path>

# Example output
Progress for loading path '/path':
        Settings:       bandwidth: unlimited    verify: false
        Job State: SUCCEEDED
        Files Processed: 1000
        Bytes Loaded: 125.00MB
        Throughput: 2509.80KB/s
        Block load failure rate: 0.00%
        Files Failed: 0

For detailed usage of CLI, please refer to the job load documentation.

REST API

Similar to the CLI, the REST API can also be used to load data.

Please refer to the API reference page for more details.

Note that the list results only include load tasks within seven days. The residence time of historical tasks can be configured through alluxio.job.retention.time.

Last updated 16 days ago