# MLPerf Storage 基准测试

## MLPerf Storage 基准测试概览

[MLPerf Storage](https://github.com/mlcommons/storage)是专门针对机器学习任务的存储系统性能基准测试套件。

本文档介绍如何通过 MLPerf Storage 来对 Alluxio 进行端到端测试。

## 测试结果摘要

以下是使用A100 显卡 (GPU) 进行MLPerf Storage v0.5 测试的结果。

| 模型     | 加速器 (GPUs) | 数据集    | 加速器利用   | 吞吐量 (兆字节/秒) | 吞吐量 (样本数/秒) |
| ------ | ---------- | ------ | ------- | ----------- | ----------- |
| bert   | 1          | 1.3TB  | 99%     | 0.1         | 49.3        |
| unet3d | 1          | 719 GB | 99%     | 409.5       | 2.9         |
| bert   | 128        | 2.4 TB | 98%     | 14.8        | 6217        |
| unet3d | 20         | 3.8 TB | 97%-99% | 7911.4      | 56.59       |

测试结果基于如下配置的 Alluxio 集群，所有服务器实例均在 AWS 上可用：

* **Alluxio 集群:** 一个 Alluxio Fuse 节点和两个 Alluxio Worker 节点。
* **Alluxio Worker 实例:** [i3en.metal](https://aws.amazon.com/cn/ec2/instance-types/i3en/): 96内核 + 768GB 内存+ 100Gb网络 + 8 nvme固态硬盘
* **Alluxio Fuse 实例** [c6in.metal](https://aws.amazon.com/ec2/instance-types/c6i/): 128内核 + 256GB 内存 + 200Gb网络

## 准备测试环境

操作系统镜像：Ubuntu 22.02

### 准备 MLPerf Storage 测试工具

```bash
sudo apt-get install mpich
git clone -b v0.5 --recurse-submodules https://github.com/mlcommons/storage.git
cd storage
pip3 install -r dlio_benchmark/requirements.txt
```

### 生成数据集

我们建议在本地生成数据集，然后上传到远端存储。 确定要生成的数据大小：

```bash
./benchmark.sh datasize --workload unet3d --num-accelerators 4 --host-memory-in-gb 32
```

* \*\*工作负载:\*\*选项为 unet3d 和 bert。
* **num-accelerators：** 模拟的 GPU 数量。数量越多，单台机器上运行的进程就越多。对于相同大小的数据集，训练时间更短。不过，这会增加对存储 I/O 的需求。
* **host-memory-in-gb:** 模拟的内存大小，可以自由指定，甚至可以超过机器的实际内存大小。内存越大，生成的数据集也就越大，需要的训练时间也就越长。

执行此命令后，您将得到如下结果：

```bash
./benchmark.sh datasize --workload unet3d --num-accelerators 4 --host-memory-in-gb 32
The benchmark will run for approx 11 minutes (best case)
Minimum 1600 files are required, which will consume 218 GB of storage
----------------------------------------------
Set --param dataset.num_files_train=1600 with ./benchmark.sh datagen/run commands
```

接下来，您可以使用以下命令生成相应的数据集：

```bash
./benchmark.sh datagen --workload unet3d --num-parallel ${num-parallel} --param dataset.num_files_train=1600 --param dataset.data_folder=${dataset.data_folder}
```

在本地生成数据集后，将其上传到 UFS。

### 配置 Alluxio

我们推荐使用 Alluxio 3.1 或更高版本进行 MLPerf 测试。 此外，建议在 `alluxio-site.properties` 中进行以下配置，以获得最佳读取性能：

```properties
alluxio.user.position.reader.streaming.async.prefetch.enable=true
alluxio.user.position.reader.streaming.async.prefetch.thread=256
alluxio.user.position.reader.streaming.async.prefetch.part.length=4MB
alluxio.user.position.reader.streaming.async.prefetch.max.part.number=4
```

有关其他 Alluxio 相关配置，请参阅 [Fio Tests](/ee-ai-cn/ai-3.4/performance/fio-tests.md) 部分。

* 可将一个或多个 Alluxio Worker 配置为缓存集群。
* 此外，在每个 MLPerf 测试节点上都需要启动 Alluxio Fuse 进程来读取数据。
* 确保数据集已从 UFS 完全加载到 Alluxio 缓存中。

### 运行测试

```bash
./benchmark.sh run --workload ${workload} --num-accelerators ${num-accelerators} --results-dir ${results-dir} --param dataset.data_folder=${dataset.data_folder} --param dataset.num_files_train=${dataset.num_files_train}
```

完成测试后，您可在 `results-dir` 中找到如下的`summary.json` 文件：

```json
{
  "model": "unet3d",
  "start": "2024-05-27T14:46:24.458325",
  "num_accelerators": 20,
  "hostname": "ip-172-31-24-47",
  "metric": {
    "train_au_percentage": [
      99.18125818824699,
      99.01649117920554,
      98.95473494676878,
      98.31108303926722,
      98.2658474647346
    ],
    "train_au_mean_percentage": 98.74588296364462,
    "train_au_meet_expectation": "success",
    "train_au_stdev_percentage": 0.38102089124716115,
    "train_throughput_samples_per_second": [
      57.07382805038776,
      57.1334916113455,
      56.93601336110315,
      56.72469392071424,
      56.64526420320678
    ],
    "train_throughput_mean_samples_per_second": 56.90265822935148,
    "train_throughput_stdev_samples_per_second": 0.19058788132211907,
    "train_io_mean_MB_per_second": 7955.518180172248,
    "train_io_stdev_MB_per_second": 26.64594945050442
  },
  "num_files_train": 28125,
  "num_files_eval": 0,
  "num_samples_per_file": 1,
  "epochs": 5,
  "end": "2024-05-27T15:27:39.203932"
}
```

`train_au_percentage` 属性代表 GPU 利用率。

此外，您还可以多次运行测试，将运行结果按以下格式保存：

```
sample-results
	|---run-1
	       |---host-1
	                |---summary.json
	       |---host-2
	                |---summary.json
	          ....
	       |---host-n
	                |---summary.json
	|---run-2
	       |---host-1
 	               |---summary.json
	       |---host-2
	                |---summary.json
	          ....
 	       |---host-n
 	               |---summary.json
	    .....
	|---run-5
	       |---host-1
	                |---summary.json
	       |---host-2
 	               |---summary.json
 	          ....
 	       |---host-n
 	               |---summary.json
```

然后，使用以下命令汇总多个测试结果：

```bash
./benchmark.sh reportgen --results-dir sample-results
```

最终的汇总结果如下所示：

```json
{
    "overall": {
        "model": "unet3d",
        "num_client_hosts": 1,
        "num_benchmark_runs": 5,
        "train_num_accelerators": "20",
        "num_files_train": 28125,
        "num_samples_per_file": 1,
        "train_throughput_mean_samples_per_second": 56.587322998616344,
        "train_throughput_stdev_samples_per_second": 0.3842685544298719,
        "train_throughput_mean_MB_per_second": 7911.431396900177,
        "train_throughput_stdev_MB_per_second": 53.72429981238494
    },
    "runs": {
        "run-5": {
            "train_throughput_samples_per_second": 57.06105089062497,
            "train_throughput_MB_per_second": 7977.662939935283,
            "train_num_accelerators": "20",
            "model": "unet3d",
            "num_files_train": 28125,
            "num_samples_per_file": 1
        },
        "run-2": {
            "train_throughput_samples_per_second": 56.18386238258097,
            "train_throughput_MB_per_second": 7855.023869277903,
            "train_num_accelerators": "20",
            "model": "unet3d",
            "num_files_train": 28125,
            "num_samples_per_file": 1
        },
        "run-1": {
            "train_throughput_samples_per_second": 56.90265822935148,
            "train_throughput_MB_per_second": 7955.518180172248,
            "train_num_accelerators": "20",
            "model": "unet3d",
            "num_files_train": 28125,
            "num_samples_per_file": 1
        },
        "run-3": {
            "train_throughput_samples_per_second": 56.69229017116294,
            "train_throughput_MB_per_second": 7926.10677895614,
            "train_num_accelerators": "20",
            "model": "unet3d",
            "num_files_train": 28125,
            "num_samples_per_file": 1
        },
        "run-4": {
            "train_throughput_samples_per_second": 56.09675331936137,
            "train_throughput_MB_per_second": 7842.845216159307,
            "train_num_accelerators": "20",
            "model": "unet3d",
            "num_files_train": 28125,
            "num_samples_per_file": 1
        }
    }
}
```


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://documentation.alluxio.io/ee-ai-cn/ai-3.4/performance/mlperf-storage-benchmark.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
