# httpbench 基准测试 `httpbench` 是一个专门打造的约 50 行 Go 小工具，用于测量 Alluxio S3 API 的读取吞吐。当你需要**单 worker 隔离**（比如针对某个 worker 做 CPU profiling，或测量单张 NIC），或者集群运行在 **redirect 模式**——[Warp](/ee-ai-cn/ai-3.8-15.1.x-cn/benchmark/s3-api/warp.md) 在这种集群上完全不能用——此时应选用 httpbench。与 warp 相比：透明跟随 HTTP 307（Go 默认）、接受显式 URL 列表而非枚举 bucket、不做 SigV4 签名、不发 chunked SHA-256 payload。关于何时选用 httpbench 而非 [COSBench](/ee-ai-cn/ai-3.8-15.1.x-cn/benchmark/s3-api/cosbench.md) 或 [Warp](/ee-ai-cn/ai-3.8-15.1.x-cn/benchmark/s3-api/warp.md)，参见[基准测试工具选型](/ee-ai-cn/ai-3.8-15.1.x-cn/benchmark/s3-api.md#ji-zhun-ce-shi-gong-ju-xuan-xing)。6 × c5n.18xlarge 集群的参考吞吐数据见 [6 节点 httpbench（AWS）](/ee-ai-cn/ai-3.8-15.1.x-cn/benchmark/s3-api.md#id-6-jie-dian-httpbench-aws)。 ## 先决条件 * 客户端主机上安装 Go 1.21+。 * 客户端能通过网络访问 Alluxio worker 的 S3 API 端口（默认 29998）。 * 测试数据已完全缓存。使用 `bin/alluxio job load --path --submit` 预加载，使用 `bin/alluxio fs check-cached ` 验证。 ## 安装把工具源码保存为 `httpbench.go`： ```go package main import ( "flag" "fmt" "io" "net/http" "sync" "sync/atomic" "time" ) func main() { conc := flag.Int("c", 16, "concurrency (parallel workers)") dur := flag.Duration("d", 30*time.Second, "duration") flag.Parse() urls := flag.Args() if len(urls) == 0 { fmt.Println("usage: httpbench -c CONC -d DUR URL1 URL2 ...") return } tr := &http.Transport{ MaxIdleConns: *conc * 2, MaxIdleConnsPerHost: *conc * 2, MaxConnsPerHost: *conc * 2, IdleConnTimeout: 60 * time.Second, DisableCompression: true, ForceAttemptHTTP2: false, } client := &http.Client{Transport: tr, Timeout: 5 * time.Minute} var totalBytes, totalReqs int64 var wg sync.WaitGroup deadline := time.Now().Add(*dur) t0 := time.Now() for i := 0; i < *conc; i++ { wg.Add(1) go func(gid int) { defer wg.Done() j := gid for time.Now().Before(deadline) { url := urls[j%len(urls)] j++ resp, err := client.Get(url) if err != nil { continue } n, _ := io.Copy(io.Discard, resp.Body) resp.Body.Close() if resp.StatusCode >= 200 && resp.StatusCode < 300 { atomic.AddInt64(&totalBytes, n) atomic.AddInt64(&totalReqs, 1) } } }(i) } wg.Wait() elapsed := time.Since(t0).Seconds() fmt.Printf("Reqs: %d Bytes: %.2f GB Time: %.2fs\n", totalReqs, float64(totalBytes)/1e9, elapsed) fmt.Printf("→ %.2f GB/s (%.1f Gbps)\n", float64(totalBytes)/elapsed/1e9, float64(totalBytes)*8/elapsed/1e9) } ``` 构建一次： ```shell go build -o httpbench httpbench.go ``` 构建 `文件 → 拥有 worker` 映射。下面三种场景都会复用这张表，每个 bucket 只需构建一次。对任何 worker 发起 1 字节 Range GET：本地拥有的返回 `206`，否则返回 `307`（`Location` 指向真实 owner）： ```shell for f in $(kubectl -n alx-ns exec -i alluxio-cluster-coordinator-0 -- \ alluxio fs ls /mybucket | awk '{print $NF}'); do name=$(basename "$f") resp=$(curl -s --range 0-0 -o /dev/null \ -w "%{http_code}|%{redirect_url}" \ "http://any-worker:29998/mybucket/$name") code=${resp%%|*} redir=${resp#*|} if [ "$code" = "206" ]; then echo "any-worker|$name" else owner=$(echo "$redir" | sed 's|http://||' | cut -d: -f1) echo "$owner|$name" fi done > file_owners.txt ``` ## 使用 ### 场景：模式 1 —— 单 worker，仅本地 key 将一个 worker 的 S3 API 吞吐从跨 worker 路由中隔离出来。适合单独 CPU profiling 某个 worker，或测量单张 NIC。用 worker 只打它本地的 key： ```shell # 提取 worker-1 对应的 URL grep "^worker-1|" file_owners.txt \ | awk -F'|' '{print "http://worker-1:29998/mybucket/"$2}' \ > worker-1-urls.txt # 开测 ./httpbench -c 32 -d 30s $(cat worker-1-urls.txt) ``` 示例输出： ```console Reqs: 224 Bytes: 390.86 GB Time: 35.03s → 11.16 GB/s (89.3 Gbps) ``` ### 场景：模式 2 —— 单客户端，通过 redirect 访问整个 bucket 单个客户端把整个 bucket 的读取都打到同一个入口 worker。对于不属于该 worker 的 key 会触发 307；Go 的 client 透明跟随，并对真实 owner 复用 keep-alive 连接。 ```shell # 所有 URL 都指向同一个入口 worker——Alluxio 对不属于自己的内容自动 redirect for name in $(awk -F'|' '{print $2}' file_owners.txt); do echo "http://worker-1:29998/mybucket/$name" done > all-via-worker-1.txt ./httpbench -c 64 -d 30s $(cat all-via-worker-1.txt) ``` 瓶颈在客户端 NIC，而非 Alluxio 集群——100 Gbps 网卡下约 11 GB/s，与模式 1 相当。关于对象大小相关的行为，参见 [307 重定向开销：大对象 vs 小对象](/ee-ai-cn/ai-3.8-15.1.x-cn/benchmark/s3-api.md#id-307-zhong-ding-xiang-kai-xiao-da-dui-xiang-vs-xiao-dui-xiang)。 ### 场景：模式 3 —— N 客户端 × N worker 成对聚合每个客户端只访问与自己配对的 worker 本地拥有的 key。这种模式对应分布式推理集群从 Alluxio 加载模型分片的真实场景。把 `httpbench` 和对应 worker 的 URL 列表部署到 N 个客户端，然后让 N 个客户端同时起跑： ```shell # 统一启动时间 START=$(date -d '+60 seconds' +%s) for i in 1 2 3 4 5 6; do ssh client-$i " while [ \$(date +%s) -lt $START ]; do sleep 0.1; done ./httpbench -c 32 -d 30s \$(cat worker-$i-urls.txt) " > client-$i.out 2>&1 & done wait # 聚合 awk '/GB\/s/ { sum += $2 } END { print sum, "GB/s aggregate" }' client-*.out ``` 6 × c5n.18xlarge（100 Gbps NIC、数据全部缓存）上的预期聚合吞吐：**约 68 GB/s**，与每对 11.4 GB/s 近线性扩展。对照 [6 节点 httpbench（AWS）](/ee-ai-cn/ai-3.8-15.1.x-cn/benchmark/s3-api.md#id-6-jie-dian-httpbench-aws)。 ## 故障排除 * **模式 2 与模式 1 吞吐几乎一致（大对象）** — 1 GiB+ 对象的预期行为，握手开销摊薄到接近于零。参见 [307 重定向开销：大对象 vs 小对象](/ee-ai-cn/ai-3.8-15.1.x-cn/benchmark/s3-api.md#id-307-zhong-ding-xiang-kai-xiao-da-dui-xiang-vs-xiao-dui-xiang)。 * **吞吐远低于 iperf3 上限的 95%** — 最可能的原因是数据未完全缓存，或命中了 AWS ENA 单 TCP 流上限（C=1 在 ENA 上被限制在约 5 Gbps，无论网卡多大）。把 `-c` 提到 32 及以上，并用 `bin/alluxio fs check-cached` 验证缓存。跨工具的通用故障排查（内核调优、健康检查开销、尾延迟），参见 hub 页的[性能调优与故障排除](/ee-ai-cn/ai-3.8-15.1.x-cn/benchmark/s3-api.md#xing-neng-diao-you-yu-gu-zhang-pai-chu)。 ## 另请参阅 * [S3 API 基准测试](/ee-ai-cn/ai-3.8-15.1.x-cn/benchmark/s3-api.md) — 概述、参考基线、工具选型、跨工具故障排查 * [COSBench 基准测试](/ee-ai-cn/ai-3.8-15.1.x-cn/benchmark/s3-api/cosbench.md) — 适合复杂的多阶段工作负载 * [Warp 基准测试](/ee-ai-cn/ai-3.8-15.1.x-cn/benchmark/s3-api/warp.md) — 适合快速 bucket 级测试，但与 redirect 模式不兼容 * [S3 API 设置与配置](/ee-ai-cn/ai-3.8-15.1.x-cn/data-access/s3-api.md) — 部署模式、redirect 行为、负载均衡器配置 --- # Agent Instructions: Querying This Documentation If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question. Perform an HTTP GET request on the current page URL with the `ask` query parameter: ``` GET https://documentation.alluxio.io/ee-ai-cn/ai-3.8-15.1.x-cn/benchmark/s3-api/httpbench.md?ask= ``` The question should be specific, self-contained, and written in natural language. The response will contain a direct answer to the question and relevant excerpts and sources from the documentation. Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.