Alluxio
ProductsLanguageHome
AI-3.6 (stable)
AI-3.6 (stable)
  • Overview
    • Alluxio Namespace and Under File System
    • Worker Management and Consistent Hashing
    • Multi Tenancy and Unified Management
    • I/O Resiliency
  • Getting Started with K8s
    • Resource Prerequisites and Compatibility
    • Installation
      • Install on Kubernetes
      • Handling Images
      • Advanced Configuration
      • License
    • Monitoring and Metrics
    • Management Console
      • Deployment
      • Navigation
      • User Roles & Access Control
    • Cluster Administration
    • System Health Check & Quick Recovery
    • Diagnostic Snapshot
  • Storage Integrations
    • Amazon AWS S3
    • Google Cloud GCS
    • Azure Blob Store
    • Aliyun OSS
    • Tencent COS
    • Volcengine TOS
    • Baidu Object Storage
    • HDFS
    • Network Attached Storage (NAS)
  • Data Access
    • Access via FUSE (POSIX API)
      • Client Writeback
      • Client Virtual Path Mapping
    • Access via S3 API
    • Access via PythonSDK/FSSpec
    • Data Access High Availability
      • Multiple Replicas
      • Multiple Availability Zones (AZ)
    • Performance Optimizations
      • File Reading
      • File Writing
      • Metadata Listing
    • UFS Bandwidth Limiter
  • Cache Management
    • Cache Filter Policy
    • Cache Loading
    • Cache Eviction
      • Manual Eviction by Free Command
      • Auto Eviction by TTL Policy
      • Auto Eviction by Priority Policy
    • Stale Cache Cleaning
    • Cache Quota
  • Performance Benchmarks
    • Fio (POSIX) Benchmark
    • COSBench (S3) Benchmark
    • MLPerf Storage Benchmark
  • Security
    • TLS Support
  • Reference
    • User CLI
    • Metrics
    • REST API
    • S3 API Usage
    • Third Party Licenses
  • Release Notes
Powered by GitBook
On this page
  • MLPerf Storage Benchmark Overview
  • Results Summary
  • Preparing the Test Environment
  • Preparing MLPerf Storage Test Tools
  • Generating the Dataset
  • Configuring Alluxio
  • Running the Test
  1. Performance Benchmarks

MLPerf Storage Benchmark

Last updated 4 months ago

MLPerf Storage Benchmark Overview

is a benchmark suite designed to characterize the performance of storage systems supporting machine learning workloads. This document describes how to conduct end-to-end testing of Alluxio using MLPerf Storage.

Results Summary

The following results are based on MLPerf Storage v0.5, using A100 GPUs as the accelerator.

Model
# of Accelerators (GPUs)
Dataset
AU
Throughput (MB/sec)
Throughput (samples/sec)

bert

1

1.3TB

99%

0.1

49.3

unet3d

1

719 GB

99%

409.5

2.9

bert

128

2.4 TB

98%

14.8

6217

unet3d

20

3.8 TB

97%-99%

7911.4

56.59

The test results are based on an Alluxio cluster configured as follows, with all server instances available on AWS:

  • Alluxio Cluster: One Alluxio Fuse node and two Alluxio Worker nodes.

  • Alluxio Worker Instance: : 96c + 768GB memory + 100Gb network + 8 nvme

  • Alluxio Fuse Instance : 128c + 256GB memory + 200Gb network

Preparing the Test Environment

Operating System Image: Ubuntu 22.02

Preparing MLPerf Storage Test Tools

sudo apt-get install mpich
git clone -b v0.5 --recurse-submodules https://github.com/mlcommons/storage.git
cd storage
pip3 install -r dlio_benchmark/requirements.txt

Generating the Dataset

We recommend generating the dataset locally and then uploading it to remote storage. Determine the data size to generate:

# Don't forget to replace the parameters with your own.
./benchmark.sh datasize --workload unet3d --num-accelerators 4 --host-memory-in-gb 32
  • workload: Options are unet3d and bert.

  • num-accelerators: The simulated number of GPUs. The larger the number, the more processes can run on a single machine. For datasets of the same size, training time is shorter. However, this increases the demands on storage I/O.

  • host-memory-in-gb: The simulated memory size, which can be freely specified, even exceeding the actual memory of your machine. Larger memory sizes generate larger datasets and require longer training times.

After this command, you will get a result like:

./benchmark.sh datasize --workload unet3d --num-accelerators 4 --host-memory-in-gb 32
The benchmark will run for approx 11 minutes (best case)
Minimum 1600 files are required, which will consume 218 GB of storage
----------------------------------------------
Set --param dataset.num_files_train=1600 with ./benchmark.sh datagen/run commands

Next, you can generate the corresponding dataset with the following command:

./benchmark.sh datagen --workload unet3d --num-parallel 8 --param dataset.num_files_train=1600 --param dataset.data_folder=${dataset.data_folder}

where num-parallel sets the number of parallel threads used to generate the dataset.

After generating the dataset locally, upload it to UFS.

Configuring Alluxio

We recommend using Alluxio version 3.1 or above for MLPerf testing. Additionally, we recommend setting the following configurations in alluxio-site.properties for optimal read performance:

alluxio.user.position.reader.streaming.async.prefetch.enable=true
alluxio.user.position.reader.streaming.async.prefetch.thread=256
alluxio.user.position.reader.streaming.async.prefetch.part.length=4MB
alluxio.user.position.reader.streaming.async.prefetch.max.part.number=4
  • You can configure one or more Alluxio Workers as a cache cluster.

  • Additionally, each MLPerf test node needs to start the Alluxio Fuse process to read data.

  • Ensure that the dataset has been completely loaded into the Alluxio cache from UFS.

Running the Test

./benchmark.sh run --workload unet3d --num-accelerators 4 --results-dir ${results-dir} --param dataset.data_folder=${dataset.data_folder} --param dataset.num_files_train=${dataset.num_files_train}

After completing the test, you can find the summary.json file in the results-dir, similar to:

{
  "model": "unet3d",
  "start": "2024-05-27T14:46:24.458325",
  "num_accelerators": 20,
  "hostname": "ip-172-31-24-47",
  "metric": {
    "train_au_percentage": [
      99.18125818824699,
      99.01649117920554,
      98.95473494676878,
      98.31108303926722,
      98.2658474647346
    ],
    "train_au_mean_percentage": 98.74588296364462,
    "train_au_meet_expectation": "success",
    "train_au_stdev_percentage": 0.38102089124716115,
    "train_throughput_samples_per_second": [
      57.07382805038776,
      57.1334916113455,
      56.93601336110315,
      56.72469392071424,
      56.64526420320678
    ],
    "train_throughput_mean_samples_per_second": 56.90265822935148,
    "train_throughput_stdev_samples_per_second": 0.19058788132211907,
    "train_io_mean_MB_per_second": 7955.518180172248,
    "train_io_stdev_MB_per_second": 26.64594945050442
  },
  "num_files_train": 28125,
  "num_files_eval": 0,
  "num_samples_per_file": 1,
  "epochs": 5,
  "end": "2024-05-27T15:27:39.203932"
}

The train_au_percentage attribute represents GPU utilization.

Additionally, you can run the test multiple times and save the results in the following format:

sample-results
	|---run-1
	       |---host-1
	                |---summary.json
	       |---host-2
	                |---summary.json
	          ....
	       |---host-n
	                |---summary.json
	|---run-2
	       |---host-1
 	               |---summary.json
	       |---host-2
	                |---summary.json
	          ....
 	       |---host-n
 	               |---summary.json
	    .....
	|---run-5
	       |---host-1
	                |---summary.json
	       |---host-2
 	               |---summary.json
 	          ....
 	       |---host-n
 	               |---summary.json

Then, use the following command to aggregate the results of multiple tests:

./benchmark.sh reportgen --results-dir sample-results

The final aggregated result will look like this:

{
    "overall": {
        "model": "unet3d",
        "num_client_hosts": 1,
        "num_benchmark_runs": 5,
        "train_num_accelerators": "20",
        "num_files_train": 28125,
        "num_samples_per_file": 1,
        "train_throughput_mean_samples_per_second": 56.587322998616344,
        "train_throughput_stdev_samples_per_second": 0.3842685544298719,
        "train_throughput_mean_MB_per_second": 7911.431396900177,
        "train_throughput_stdev_MB_per_second": 53.72429981238494
    },
    "runs": {
        "run-5": {
            "train_throughput_samples_per_second": 57.06105089062497,
            "train_throughput_MB_per_second": 7977.662939935283,
            "train_num_accelerators": "20",
            "model": "unet3d",
            "num_files_train": 28125,
            "num_samples_per_file": 1
        },
        "run-2": {
            "train_throughput_samples_per_second": 56.18386238258097,
            "train_throughput_MB_per_second": 7855.023869277903,
            "train_num_accelerators": "20",
            "model": "unet3d",
            "num_files_train": 28125,
            "num_samples_per_file": 1
        },
        "run-1": {
            "train_throughput_samples_per_second": 56.90265822935148,
            "train_throughput_MB_per_second": 7955.518180172248,
            "train_num_accelerators": "20",
            "model": "unet3d",
            "num_files_train": 28125,
            "num_samples_per_file": 1
        },
        "run-3": {
            "train_throughput_samples_per_second": 56.69229017116294,
            "train_throughput_MB_per_second": 7926.10677895614,
            "train_num_accelerators": "20",
            "model": "unet3d",
            "num_files_train": 28125,
            "num_samples_per_file": 1
        },
        "run-4": {
            "train_throughput_samples_per_second": 56.09675331936137,
            "train_throughput_MB_per_second": 7842.845216159307,
            "train_num_accelerators": "20",
            "model": "unet3d",
            "num_files_train": 28125,
            "num_samples_per_file": 1
        }
    }
}

For other Alluxio-related configurations, refer to the section.

MLPerf Storage
i3en.metal
c6in.metal
Fio Tests