Benchmarking ML Training Performance with MLPerf
The MLPerf™ Storage benchmark is a suite designed by MLCommons® to measure how well a storage system performs for realistic machine learning (ML) training workloads. It simulates the I/O patterns of models like BERT and U-Net3D to evaluate storage throughput and I/O efficiency.
This guide explains how to use the MLPerf Storage benchmark to test the performance of an Alluxio cluster.
Benchmark Highlights
The following results were achieved using the MLPerf Storage v0.5 benchmark, with data fully cached in Alluxio and A100 GPUs as the training accelerators. The "Accelerator Utilization (AU)" metric indicates how effectively the storage system kept the GPUs busy.
BERT
1
1.3 TB
99%
0.1
49.3
BERT
128
2.4 TB
98%
14.8
6,217
U-Net3D
1
719 GB
99%
409.5
2.9
U-Net3D
20
3.8 TB
97%-99%
7,911.4
56.59
Test Environment
The benchmark results were generated using the following environment, with all instances deployed in the same AWS availability zone.
Alluxio Cluster:
2 Worker Nodes (
i3en.metal: 96 cores, 768GB RAM, 8 NVMe SSDs)1 FUSE Client Node (
c6in.metal: 128 cores, 256GB RAM)
Operating System: Ubuntu 22.04
Setup and Configuration
1. Install MLPerf Storage Tools
On the client node where you will run the benchmark:
2. Configure Alluxio
For optimal read performance during ML training, we recommend setting the following properties in your conf/alluxio-site.properties file on your Alluxio cluster nodes.
Before running the benchmark, ensure that:
The Alluxio FUSE process is running on the client node.
The training dataset has been fully loaded into the Alluxio cache.
Running the Benchmark
The benchmark process involves generating a synthetic dataset and then running the training simulation against it.
Step 1: Generate the Dataset
First, determine the required size of the dataset based on your simulated hardware.
This command will output the number of files needed. Use this value to generate the actual data files.
After generating the dataset, upload it to your UFS and ensure it is loaded into Alluxio.
Step 2: Run the Benchmark
Execute the benchmark using the run command. The data_folder parameter should point to the dataset within the Alluxio FUSE mount.
Step 3: Review and Aggregate Results
After a run completes, a summary.json file is created in your results directory. This file contains detailed metrics, including GPU utilization (train_au_percentage) and throughput.
Example summary.json
summary.jsonTo get a final result, the benchmark should be run multiple times (e.g., 5 times). Organize the output directories from each run and use the reportgen command to produce an aggregated summary.
This will generate a final JSON output with the overall mean and standard deviation for throughput and other key metrics across all runs.
Last updated