NICs and DPUs for GPU Servers

·Illia Kasian

A NIC (Network Interface Card) connects a server to the network. A DPU (Data Processing Unit) is a specific kind of NIC, except with its own CPU, memory, and operating system built onto the card.

Every byte of training data and every gradient sync between GPU servers passes through a NIC. A DPU goes further by handling networking, security, and storage tasks on-card, so the server's CPU stays focused on feeding the GPUs.

Why NICs and DPUs matter for AI

NICs have existed for decades, but AI training changed what we need from them. A single GPU server can train small models on its own. Larger models require dozens or hundreds of servers working together, exchanging gradients and loading training data over the network at hundreds of gigabits per second. At those speeds, the NIC determines how fast your GPUs can communicate and whether your CPU has any headroom left.

DPUs are a more recent development. Cloud providers and neoclouds (GPU cloud platforms like CoreWeave) found that running firewalls, encryption, tenant isolation, and storage protocols on the host CPU was eating into compute their customers were paying for. A DPU moves all of that work onto a separate processor built into the card itself.

Both matter for training and inference, but the pressure is different. Training clusters exchange large gradient payloads between servers constantly, demanding the highest bandwidth and lowest latency. Inference clusters serve many smaller requests and care more about consistent per-request latency, but still benefit from fast NICs when serving at scale.

When the CPU becomes the bottleneck

A modern high-speed NIC can move data at wire speed using RDMA (Remote Direct Memory Access), which bypasses the CPU entirely for data transfers. But the server's CPU is still responsible for connection setup, firewalls, encryption, and storage protocols. In a simple single-team cluster with local NVMe storage, most of those features go unused (no complex firewall rules, no network storage protocols), so CPU overhead from networking stays minimal.

In multi-tenant environments like neoclouds, the situation is different. Multiple customers share physical servers, so every packet needs to be checked against firewall and routing rules to keep one tenant's traffic separated from another's. Doing that inspection at 400 Gb/s line rates consumes real CPU cycles. A DPU solves this by enforcing isolation rules and encrypting traffic on its own hardware, independent of the server's OS, providing both performance offload and a trust boundary even if the host is compromised.

Storage is another major case. Network-attached storage (NAS) lets multiple servers read from a shared pool of fast drives over the network instead of relying only on drives installed in each server. A common protocol for this is NVMe-over-Fabrics, which extends the NVMe interface that local SSDs use to work across a network fabric. Reading from NAS at the speeds 8 GPUs demand creates significant CPU load. A DPU offloads this storage protocol processing to its own cores, so the server's CPU stays entirely focused on orchestrating the GPUs.

How GPU servers use NICs

GPU servers have two kinds of network connections. Low-speed Ethernet (1 or 10 Gbps) handles management tasks like SSH and monitoring. High-speed NICs (400 Gb/s and up) handle the traffic that matters for training: GPU-to-GPU gradient exchange and data loading.

These high-speed NICs use one of two protocols. InfiniBand is a dedicated low-latency fabric that dominated AI clusters through 2024. [6]Ethernet is Winning the War Against InfiniBand in AI Back-End Networks, Dell’Oro Group (July 2025)https://www.delloro.com/news/ethernet-is-winning-the-war-against-infiniband-in-ai-back-end-networks/ Ethernet with RDMA (specifically RoCE, or RDMA over Converged Ethernet) is the alternative. Both let GPUs exchange data without the CPU touching each packet. The NIC you choose determines which protocol your cluster speaks, and by extension which switches and cables you buy.

As of early 2026, Ethernet has overtaken InfiniBand by switch sales revenue in AI back-end networks. [7]AI Back-End Networks Continue Their Shift to Ethernet, Dell’Oro Group (December 2025)https://www.delloro.com/news/ai-back-end-networks-continue-their-shift-to-ethernet-now-accounting-for-over-two-thirds-of-3q-2025-switch-sales-in-ai-clusters/ While NVIDIA's reference designs still default to InfiniBand, hyperscalers increasingly use Ethernet-based fabrics (like RoCE or Ultra Ethernet) because it is significantly cheaper per port at scale.

An 8-GPU server typically ships with 8 high-speed NICs, one per GPU, so each GPU gets its own dedicated network path instead of sharing bandwidth. These NICs plug into matching switches in the same rack, forming a fabric. For example, 8 NDR (400 Gb/s) InfiniBand NICs paired with NDR switches give a server 3.2 Tb/s of aggregate bandwidth, a common reference target for keeping 8 GPUs busy during distributed training.

ConnectX and BlueField

Nearly every GPU server ships with NVIDIA networking hardware. This dominance traces back to NVIDIA's $6.9 billion acquisition of Mellanox Technologies in April 2020. [5]NVIDIA Completes Acquisition of Mellanox (2020)https://nvidianews.nvidia.com/news/nvidia-completes-acquisition-of-mellanox-creating-major-force-driving-next-gen-data-centers Mellanox had spent decades building the InfiniBand and high-speed Ethernet adapters that data centers relied on. The acquisition gave NVIDIA end-to-end control of GPU, NIC, and switch silicon, and two product families that now appear in almost every AI cluster: ConnectX and BlueField.

ConnectX is a high-speed NIC supporting both InfiniBand and Ethernet. As of Q1 2026, the 400 Gb/s ConnectX-7  [1]NVIDIA ConnectX-7 Adapter Cards User Manualhttps://docs.nvidia.com/networking/display/connectx7vpi is widely deployed, while the 800 Gb/s ConnectX-8 SuperNIC  [2]NVIDIA ConnectX-8 SuperNIC User Manualhttps://docs.nvidia.com/networking/display/connectx8supernic is shipping in new builds.

BlueField is a DPU. It includes everything ConnectX has plus its own ARM CPU cores, RAM, and OS. This extra hardware lets BlueField run networking, storage, and security functions on-card (infrastructure offload). The practical result is the host CPU stops processing packets and stays focused on feeding GPUs. As of Q1 2026, BlueField-3  [3]NVIDIA BlueField-3 DPUhttps://www.nvidia.com/en-us/networking/products/data-processing-unit/ is the current standard, with the much faster BlueField-4  [4]NVIDIA Launches BlueField-4 (2025)https://blogs.nvidia.com/blog/bluefield-4-ai-factory/ arriving soon.

ConnectX (NIC)BlueField (DPU)
What it isA NICA NIC with its own computer on board
Current gen (deployed)ConnectX-7 (400 Gb/s, NDR)BlueField-3 (400 Gb/s, NDR)
Next gen (shipping)ConnectX-8 SuperNIC (800 Gb/s, XDR)BlueField-4 (800 Gb/s, XDR)
ProtocolInfiniBand and EthernetInfiniBand and Ethernet
On-card computeNoneARM CPU cores, dedicated RAM, own OS
Who handles
firewalling, encryption,
storage protocols
Your server's main CPUThe DPU's on-card ARM cores
Per-card price
(approximate, ConnectX-7 / BF-3)
$1,000 - 1,500$2,500 - 4,000

When each one makes sense

Most GPU servers use both. A common configuration is eight ConnectX NICs for the training fabric (one per GPU) and one BlueField DPU on a separate path for storage and management. The real question is not whether to pick a NIC or a DPU, but whether the CPU overhead in your cluster justifies the added cost of a DPU.

ScenarioWhat to useWhy
GPU-to-GPU traffic (training)ConnectX NICs (one per GPU)RDMA at wire speed, no CPU involvement. This is always NICs.
Storage network on GPU/CPU serversBlueField DPUOffloads storage protocol processing so the CPU stays free for feeding GPUs.
Multi-tenant cluster (neocloud)BlueField DPUEach tenant needs an isolated network. The DPU enforces boundaries without taxing the host CPU.
Security and compliance requirementsBlueField DPUEncryption and firewall rules enforced on separate hardware, independent of the host OS.
Simple cluster, no network storage, one teamConnectX NICs onlyNo storage or security work to offload. CPU overhead from networking is minimal.

Neoclouds like CoreWeave use BlueField DPUs across their fleet because tenant isolation is part of their product. A small private team might add a single DPU per server to handle the storage path and keep the rest simple with ConnectX. The cheapest option, ConnectX-only with no DPU, works when your training data lives on local NVMe drives and the cluster does not share resources with anyone.

References

  1. NVIDIA ConnectX-7 Adapter Cards User Manual
  2. NVIDIA ConnectX-8 SuperNIC User Manual
  3. NVIDIA BlueField-3 DPU
  4. NVIDIA Launches BlueField-4 (2025)
  5. NVIDIA Completes Acquisition of Mellanox (2020)
  6. Ethernet is Winning the War Against InfiniBand in AI Back-End Networks, Dell’Oro Group (July 2025)
  7. AI Back-End Networks Continue Their Shift to Ethernet, Dell’Oro Group (December 2025)

Frequently Asked Questions

What is the difference between a NIC and a DPU?

A NIC (Network Interface Card) connects a server to the network. A DPU (Data Processing Unit) is a NIC with its own CPU, memory, and operating system built onto the card. The DPU handles networking, security, and storage tasks on-card, so the server's CPU stays focused on feeding the GPUs.

Why do GPU servers need 8 NICs?

An 8-GPU server typically ships with 8 high-speed NICs, one per GPU, so each GPU gets its own dedicated network path instead of sharing bandwidth. For example, 8 NDR InfiniBand NICs at 400 Gb/s each give a server 3.2 Tb/s of aggregate bandwidth, a common reference target for keeping 8 GPUs busy during distributed training.

When do you need a DPU instead of a regular NIC?

DPUs make sense in multi-tenant environments (neoclouds like CoreWeave use them across their fleet for tenant isolation), when using network-attached storage, or when security and compliance require encryption enforced on separate hardware. A simple single-team cluster with local NVMe storage can run ConnectX NICs only.

How much do ConnectX NICs and BlueField DPUs cost?

ConnectX-7 NICs (400 Gb/s) run $1,000-1,500 per card. BlueField-3 DPUs (400 Gb/s) run $2,500-4,000. The next generation, ConnectX-8 at 800 Gb/s and BlueField-4 at 800 Gb/s, is shipping in new builds as of Q1 2026.

Residual Value Insurance Solutions for GPUs

Coverage creates a minimum value for what your GPUs are worth at a future date. If they sell below the floor, the policy pays you the difference.

Learn how it works →
NICs and DPUs for GPU Servers | American Compute