RDMA Networking
Alluxio supports several high-speed network technologies commonly deployed in AI and HPC clusters. This page covers configuration and performance guidance for each supported option.
IPoIB (IP over InfiniBand)
✅ Supported
Standard TCP/IP over IB hardware, zero code changes
RoCE (RDMA over Converged Ethernet)
Planned
Low-latency RDMA over Ethernet fabric
Native IB API (Verbs)
Planned
Ultra-low latency RDMA over InfiniBand fabric
IPoIB
Overview
InfiniBand (IB) is a high-bandwidth, low-latency interconnect commonly deployed in AI training clusters. Alluxio supports IP over InfiniBand (IPoIB), which runs the standard TCP/IP stack over IB hardware. Because Alluxio communicates over standard TCP/IP sockets, no code changes or special drivers are required — you only need to load the IPoIB kernel module and bind Alluxio services to the IB network interface.
Applies to: NICs configured with InfiniBand link layer (verified via
ibstat | grep "Link layer"). If your ConnectX adapter is running in Ethernet link layer mode, it operates as a standard high-speed Ethernet NIC — Alluxio works with it natively with no IPoIB configuration needed.
IPoIB vs. Native RDMA
Protocol
TCP/IP over IB hardware
Bypass kernel, direct memory access
Alluxio support
✅ Fully supported
Not supported in 3.8
Configuration
Bind to IB network interface
Requires RDMA-aware application code
Typical throughput
100–400 Gbps (hardware-dependent)
Lower latency, similar peak bandwidth
Prerequisites
Hardware
Mellanox/NVIDIA ConnectX-6 or ConnectX-7 network adapter (or equivalent)
InfiniBand switch fabric
Software
Load the IPoIB kernel module and verify that the IB drivers and interfaces are active:
MTU Configuration
IPoIB operates in one of two transport modes that determine the maximum supported MTU:
Datagram (UD)
2,044 bytes
Cloud-managed IB (Azure HPC, AWS EFA)
Connected (RC)
65,520 bytes
On-premises InfiniBand fabrics
Check the current mode before setting MTU:
If mode is datagram (common on cloud IPoIB), the hardware limit is 2,044 bytes. Setting MTU to 9000 will fail with RTNETLINK answers: Invalid argument — this is expected, not an error. Alluxio works correctly at MTU 2,044.
If mode is connected (typical on-premises), set MTU to 9000 for maximum throughput:
To persist the MTU setting across reboots, add it to your network configuration (e.g., /etc/network/interfaces or a systemd-networkd unit file).
Binding Alluxio to the IB Interface
The hot data path in Alluxio runs between workers and FUSE / client nodes — this is where IB bandwidth matters. The coordinator handles background tasks (metadata operations, background jobs) and is not on the data-serving critical path, so it does not need to run on IB-equipped hardware.
For general NIC binding configuration, see Cluster Management. The steps below extend that guidance specifically for IPoIB deployments.
IPoIB can be exposed to pods through:
NVIDIA Network Operator: Automates MLNX_OFED driver deployment and SR-IOV device plugin configuration
Multus CNI: Attaches a secondary IB network interface to Alluxio pods
SR-IOV Device Plugin: Exposes IB Virtual Functions (VFs) as pod resources
Refer to NVIDIA Network Operator documentation and Multus CNI for setup instructions. Once the IB interface is available inside the pod, apply the same alluxio-site.properties settings from the Bare-Metal tab.
Worker Configuration
Add the following to alluxio-site.properties on each worker node. Replace ib0 with your actual IB interface name (check with ip addr show):
Verify after starting the worker:
FUSE / Client Configuration
For nodes running Alluxio FUSE or direct client access, bind the data channel to the IB interface:
For FUSE mount options and prerequisites (including allow_other configuration), see POSIX API (FUSE).
Coordinator Configuration
The coordinator does not need to run on IB-equipped hardware. Set alluxio.coordinator.hostname to the coordinator node's reachable IP address (typically its Ethernet interface):
Verify End-to-End Connectivity
After starting all services, confirm that worker–client data traffic flows over the IB interface:
Reference Performance
The following results are from an example test environment using IPoIB with Alluxio running on bare metal.
Test Environment
Network
2 × 200 Gbps IPoIB (bonded), measured throughput: 360 Gbps
NIC
Mellanox ConnectX-7 (IB link layer, 200 Gbps)
Cache disk
RAID0, 2 × NVMe, read/write: ~12 GB/s
UFS
Object storage via 100 Gbps dedicated line
Deployment
Bare metal, FUSE and worker co-located
Network Layer (iperf3)
Single IB port
180 Gb/s
Bonded (2 × 200 Gbps IPoIB)
360 Gbps
Alluxio Read Throughput (Hot Read, Large Files, 32 Concurrent)
1 FUSE + 1 worker, 1 × NVMe
6.3 GB/s
1 FUSE + 1 worker, RAID0 2 × NVMe
12.5 GB/s
3 FUSE + 3 workers, RAID0 2 × NVMe
36.6 GB/s
Observation: With 3 workers and RAID0 NVMe cache, Alluxio hot read throughput approaches the raw disk bandwidth ceiling (~36 GB/s vs. 36 GB/s theoretical RAID0 maximum), confirming that the IPoIB network is not the bottleneck at this scale.
Troubleshooting
Worker not binding to IB interface
Run
ip addr show ib0to confirm the interface has an IP address assigned.Verify that
alluxio.worker.rpc.bind.devicematches the exact interface name (case-sensitive).Check
alluxio-worker.logforbinderrors.
Workers serve data over Ethernet instead of IB
Verify that
alluxio.worker.data.bind.device=ib0is set on each worker node and that the worker process was restarted after the change.Verify that
alluxio.user.network.data.bind.device=ib0is set on the FUSE / client node.
ip link set ib0 mtu 9000 fails with RTNETLINK answers: Invalid argument
Your IPoIB interface is in datagram mode, which caps MTU at 2,044 bytes. This is common on cloud-managed InfiniBand (Azure HPC, AWS EFA). Alluxio works correctly at the default MTU — no action needed. See MTU Configuration.
Low throughput despite IPoIB
Run
iperf3 -c <other node ib0 IP>between nodes to establish a network-layer baseline.Check
cat /sys/class/net/ib0/mode— datagram mode (MTU 2,044) will limit peak throughput compared to connected mode (MTU 9,000).Confirm all services are communicating over the IB interface:
ss -tnp | grep <ib0 IP>.
IB interface missing after reboot
MTU and bonding settings may not have been persisted. Add them to the system network configuration.
Verify MLNX_OFED drivers load on boot:
lsmod | grep ib_core
Last updated