# S3 API Benchmarks

## Scope

TL;DR

* Three benchmark tools covered by dedicated pages: [**COSBench**](/ee-ai-en/benchmark/s3-api/cosbench.md) (complex mixed workloads), [**Warp**](/ee-ai-en/benchmark/s3-api/warp.md) (quick bucket-wide reads), and [**httpbench**](/ee-ai-en/benchmark/s3-api/httpbench.md) (a \~50-line Go tool for per-worker measurement on redirect-mode clusters).
* Three reference performance baselines on different hardware (AWS 4-node COSBench, OCI 6-node Warp, AWS 6-node httpbench).
* **Object size matters.** For 1+ GiB objects (typical AI model shards), Alluxio's HTTP 307 redirect costs essentially 0% throughput once keep-alive is warm. For small objects (<100 KiB), the handshake dominates and can halve throughput — the two patterns diverge.
* Key potential performance bottlenecks: network bandwidth, TCP connection reuse, HTTP redirect cost (small objects only), kernel tuning.

For how Alluxio's S3 API works (request flow, consistent hashing, redirects), see [How It Works](/ee-ai-en/data-access/s3-api.md#how-it-works).

## Choosing a Benchmark Tool

|                               | [COSBench](/ee-ai-en/benchmark/s3-api/cosbench.md)              | [Warp](/ee-ai-en/benchmark/s3-api/warp.md) | [httpbench](/ee-ai-en/benchmark/s3-api/httpbench.md) |
| ----------------------------- | --------------------------------------------------------------- | ------------------------------------------ | ---------------------------------------------------- |
| **Best for**                  | Complex, multi-stage workloads                                  | Quick bucket-wide single-operation test    | Per-worker throughput in redirect-mode clusters      |
| **Setup**                     | Controller + driver nodes                                       | Single binary                              | \~50 lines of Go, build on client                    |
| **Workload definition**       | XML config files                                                | CLI flags                                  | URL list on command line                             |
| **Follows HTTP 307**          | Yes (via SDK) — needs `alluxio.worker.s3.redirect.enabled=true` | **No** — incompatible with redirect mode   | Yes (Go default) — works with either pattern         |
| **Multi-client coordination** | Built-in driver model                                           | `--syncstart`                              | ssh + timestamp coordination                         |
| **Results UI**                | Web dashboard                                                   | Terminal output                            | Terminal output                                      |

**Rule of thumb**:

* **Redirect enabled + per-worker isolation needed** (e.g. CPU-profiling one worker, measuring a single NIC): use [**httpbench**](/ee-ai-en/benchmark/s3-api/httpbench.md).
* **Redirect disabled + want a quick bucket-wide number**: use [**Warp**](/ee-ai-en/benchmark/s3-api/warp.md).
* **Mixed read/write, multiple drivers, long-running, or complex staged workloads**: use [**COSBench**](/ee-ai-en/benchmark/s3-api/cosbench.md).

## Reference Performance Baselines

All throughput numbers below assume data is **fully cached in Alluxio**. If data is served from the underlying UFS, throughput will be significantly lower. Verify with `bin/alluxio fs check-cached /path` before testing.

### 4 Node COSBench on AWS

The following results were achieved using a 4-driver COSBench cluster testing a 4-worker Alluxio cluster.

| Component           | Configuration                       |
| ------------------- | ----------------------------------- |
| COSBench Controller | 1 × `c5n.metal`                     |
| COSBench Drivers    | 4 × `c5n.metal`                     |
| Alluxio Coordinator | 1 node                              |
| Alluxio Workers     | 4 × `i3en.metal` (8 NVMe SSDs each) |
| Load Balancer       | AWS ELB across 4 workers            |

**Large file read throughput (1 GB files)** — bandwidth-bound, scales with concurrency until network saturation:

| Concurrency per Driver | Total Throughput |
| ---------------------- | ---------------- |
| 1 thread               | 2.35 GB/s        |
| 16 threads             | 20.44 GB/s       |
| 128 threads            | 36.94 GB/s       |

**Small file read IOPS (100 KB files)** — IOPS-bound, scales with concurrency until CPU saturation:

| Concurrency per Driver | Total Throughput | Total Operations/sec |
| ---------------------- | ---------------- | -------------------- |
| 1 thread               | 50.26 MB/s       | 502 op/s             |
| 16 threads             | 1.10 GB/s        | 11,302 op/s          |
| 128 threads            | 4.69 GB/s        | 46,757 op/s          |

### 6 Node Warp on OCI

In Warp GET tests on OCI `BM.DenseIO.E5.128` nodes (100 Gbps networking, 12 × NVMe in RAID 0), Alluxio achieved 11.2 GiB/s on a single node (0.3 ms avg latency, 0.4 ms P99) and 33.3 GiB/s on 6 nodes (0.6 ms avg, 0.9 ms P99). Note that Warp does not follow HTTP 307 redirects to non-AWS endpoints, so these numbers reflect [Pattern B: Load Balancer + Proxy Mode](/ee-ai-en/data-access/s3-api.md#pattern-b-load-balancer-proxy-mode) (proxy mode via load balancer). See [Alluxio on OCI](https://blogs.oracle.com/cloud-infrastructure/post/alluxio-on-oci-submillisecond-latency-for-ai) for full results.

### 6 Node httpbench on AWS

Run on 6 × `c5n.18xlarge` workers (72 vCPU, 192 GiB RAM, 100 Gbps NIC, 80 GiB tmpfs page store) + 6 × `c5n.18xlarge` clients, serving 82 safetensor files (\~137 GB, 1–3 GiB per file) fully cached in Alluxio. Tool: [httpbench](/ee-ai-en/benchmark/s3-api/httpbench.md).

**Single worker, single client (1:1), 1–3 GiB objects**:

| Concurrency | Throughput                 | vs iperf3 ceiling         |
| ----------- | -------------------------- | ------------------------- |
| 1           | 0.62 GB/s (5.0 Gbps)       | AWS ENA per-flow cap      |
| 16          | 8.66 GB/s (69.3 Gbps)      | 87%                       |
| **32**      | **11.35 GB/s (90.8 Gbps)** | **95%**                   |
| 64          | 11.15 GB/s (89.2 Gbps)     | saturated                 |
| 128         | 10.46 GB/s (83.7 Gbps)     | connection-count overhead |

A single worker's S3 API can deliver within \~5% of TCP line rate to a single client when reads are local.

**6 clients × 6 workers paired aggregate (C=32 per client, 30s)**:

| Metric                                    | Value                         |
| ----------------------------------------- | ----------------------------- |
| Per-pair average                          | 11.43 GB/s (91.4 Gbps)        |
| **Aggregate throughput**                  | **68.55 GB/s (548 Gbps)**     |
| Worker CPU avg (72-vCPU, `mpstat -P ALL`) | 4.1% ≈ **3 cores avg**        |
| Worker CPU peak                           | 7.2–8.2% ≈ **5–6 cores peak** |

Throughput scales near-linearly from per-pair 11.4 GB/s to 6-pair 68.5 GB/s. Worker CPU is idle during the aggregate — Alluxio is NIC-bound, not CPU-bound, for large-object reads.

### Network Ceiling (iperf3 Baseline)

Raw TCP between a client and a worker on the same test bed, for reference:

| TCP streams | Throughput                                            |
| ----------- | ----------------------------------------------------- |
| 1           | 4.97 Gbps (0.62 GB/s) — AWS ENA single-flow cap       |
| 8           | 37.9 Gbps                                             |
| 32          | **95.6 Gbps (11.95 GB/s)** — \~100 Gbps NIC line rate |

Any per-client S3 API number above \~95 Gbps on this hardware is impossible regardless of cluster size — the NIC is the ceiling. Always establish this ceiling with `iperf3 -c <worker> -P 32` before interpreting S3 API numbers.

## 307 Redirect Cost: Large vs Small Objects

The "Pattern B (proxy-mode) ≈ 50% of Pattern A (redirect)" guidance in the [S3 API documentation](/ee-ai-en/data-access/s3-api.md#deployment-patterns) applies to **small-object workloads** where the 307 handshake cost dominates each request. For large-object sequential reads (1+ GiB shards, the common case for AI model loading), the handshake happens once and amortises to near-zero — Pattern A and Pattern B throughput were within \~2% of each other in our tests (Pattern B 11.28 GB/s vs Pattern A 11.02 GB/s at C=64 for 1–3 GiB objects; Pattern B 10.80 GB/s vs Pattern A 10.89 GB/s at C=128).

Rule of thumb: if `avg_object_bytes / NIC_bytes_per_sec` greatly exceeds the 307 redirect RTT (typically \~1 ms in-AZ), redirect cost is noise. On a 100 Gbps NIC, this means the redirect is essentially free for 100 MB+ objects and negligible (<2%) for 1 GB+ objects; for <1 MB objects, the handshake dominates and Pattern B throughput collapses to roughly 50% of Pattern A.

## Performance Tuning and Troubleshooting

For suggested Alluxio configuration parameters and Linux kernel tuning, see [S3 API — Performance](/ee-ai-en/data-access/s3-api.md#performance). For tool-specific issues, see each benchmark page's Troubleshooting section.

Cross-tool symptoms:

* **Small-object throughput \~50% lower than expected** — for workloads with <100 KiB objects, redirects are likely disabled (`alluxio.worker.s3.redirect.enabled=false`, the default), so cross-worker reads are proxied through an intermediate worker. To get full throughput, use Pattern A: set `alluxio.worker.s3.redirect.enabled=true` with a redirect-capable client. See [Deployment Patterns](/ee-ai-en/data-access/s3-api.md#deployment-patterns). For large objects (1+ GiB), Pattern A and Pattern B throughput are near-identical — redirect cost is negligible, see [307 Redirect Cost: Large vs Small Objects](#id-307-redirect-cost-large-vs-small-objects).
* **Throughput far below baselines** — most likely data is not fully cached. Verify with `bin/alluxio fs check-cached` that files show as cached in Alluxio before testing.
* **Low throughput despite high concurrency** — network bottleneck or unbalanced load balancer. Verify 100 Gbps connectivity, same-AZ deployment, and that the load balancer correctly distributes requests evenly across all Alluxio workers. Establish the NIC ceiling with `iperf3` first — see [Network Ceiling (iperf3 Baseline)](#network-ceiling-iperf3-baseline).
* **No scaling with added concurrency** — CPU or connection pool bottleneck. Check worker CPU utilization and ensure `alluxio.worker.s3.connection.keep.alive.enabled` is set to `true`.
* **High tail latency** — TCP port exhaustion. Apply [kernel tuning](/ee-ai-en/data-access/s3-api.md#linux-kernel-parameters) (`tcp_tw_reuse`, `tcp_fin_timeout`).
* **Throughput plateaus at low level** — health check overhead. Disable `alluxio.worker.s3.redirect.health.check.enabled` for benchmarks.
* **Inconsistent or highly variable results across runs** — data not fully cached, or noisy environment (cross-AZ traffic, shared network). Pre-load data and re-run in a dedicated, same-AZ setup.

## See Also

* [COSBench Benchmarks](/ee-ai-en/benchmark/s3-api/cosbench.md) — complex, multi-stage workloads
* [Warp Benchmarks](/ee-ai-en/benchmark/s3-api/warp.md) — quick single-binary, redirect-disabled clusters
* [httpbench Benchmarks](/ee-ai-en/benchmark/s3-api/httpbench.md) — per-worker, redirect-aware
* [S3 API Setup and Configuration](/ee-ai-en/data-access/s3-api.md) — deployment patterns, endpoint setup, load balancer configuration, and client examples
* [S3 UFS Integration](/ee-ai-en/ufs/s3.md) — multipart upload tuning, high concurrency settings, and S3 region configuration


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://documentation.alluxio.io/ee-ai-en/benchmark/s3-api.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
