NVLink and NVSwitch

Mar 20, 2026·Brenden Reeves, COO·AC Research

NVLink is NVIDIA's high-speed connection between GPUs. NVSwitch is the chip that routes traffic so every GPU can talk to every other GPU at full speed. As of early 2026, fifth-generation NVLink on Blackwell delivers 1,800 GB/s per GPU, about 14x the bandwidth of PCIe 5.0 x16 (the standard expansion bus in servers).^{[1]NVIDIA, "NVLink and NVSwitch" (accessed March 2026)https://www.nvidia.com/en-us/data-center/nvlink/} Most training workloads and an increasing share of inference workloads use multiple GPUs, so that bandwidth matters.

What NVLink does

Any workload that splits across multiple GPUs needs those GPUs to exchange data constantly. Training, large-scale inference, and fine-tuning all require it. If that exchange is slow, GPUs spend more time waiting than computing. NVLink replaces the default PCIe bus with a dedicated high-bandwidth path between GPUs.

Traditionally, NVLink connects GPUs within a single server, on the same baseboard. Traffic between servers uses a separate network, usually InfiniBand or Ethernet, which is slower.

Newer architectures like the GB200 NVL72 extend NVLink beyond a single server so that GPUs across an entire rack communicate at NVLink speeds instead of dropping down to the network.

What NVSwitch does

NVSwitch is the switch chip on the HGX baseboard, the board that carries the GPUs and their NVLink connections. It lets any GPU send data to any other GPU at full NVLink bandwidth simultaneously, so no GPU has to relay data through a neighbor. An HGX A100 board uses six NVSwitch chips^{[2]NVIDIA, "Introducing NVIDIA HGX A100" Technical Blog (2020)https://developer.nvidia.com/blog/introducing-hgx-a100-most-powerful-accelerated-server-platform-for-ai-hpc/} while an HGX H100 uses four.^{[3]NVIDIA, "Introducing NVIDIA HGX H100" Technical Blog (2022)https://developer.nvidia.com/blog/introducing-nvidia-hgx-h100-an-accelerated-server-platform-for-ai-and-high-performance-computing/}

Point-to-point vs NVSwitch (click a GPU)

Without NVSwitch

With NVSwitch

Simplified view. Actual NVSwitch count varies by generation (HGX A100 uses 6, HGX H100 uses 4).

All-reduce, one of the most common distributed training operations, requires every GPU to exchange data with every other GPU simultaneously. Without NVSwitch, that traffic bottlenecks on the few direct links between neighboring GPUs.

NVLink generations

NVIDIA has announced six NVLink generations since 2016. Five have shipped in products. Rubin is the sixth and NVIDIA unveiled it at CES in January 2026.^{[1]NVIDIA, "NVLink and NVSwitch" (accessed March 2026)https://www.nvidia.com/en-us/data-center/nvlink/}^{[4]NVIDIA, "Vera Rubin Platform" Newsroom (2026)https://nvidianews.nvidia.com/news/nvidia-vera-rubin-platform}

NVLink bandwidth by generation (click a bar)

Dashed line marks PCIe 5.0 for comparison.

NVSwitch arrived with the second NVLink generation. NVIDIA introduced it in HGX-2 and used it in DGX-2 to fully connect 16 V100 GPUs.^{[5]NVIDIA, "NVIDIA Introduces HGX-2, Fusing HPC and AI Computing into Unified Architecture" (2018)https://nvidianews.nvidia.com/news/nvidia-introduces-hgx-2-fusing-hpc-and-ai-computing-into-unified-architecture-6696445} Before that, Pascal and early Volta systems relied on direct NVLink wiring, which limited how many GPUs could all communicate at full speed in one system.

NVLink at rack scale

Rack-scale NVLink did not start with Blackwell. H100 systems with NVLink Network could already stretch one NVLink network across multiple servers through external switch boxes, reaching up to 256 connected GPUs.^{[3]NVIDIA, "Introducing NVIDIA HGX H100" Technical Blog (2022)https://developer.nvidia.com/blog/introducing-nvidia-hgx-h100-an-accelerated-server-platform-for-ai-and-high-performance-computing/}

Blackwell made that design easier to buy as one system. GB200 NVL72 packages 72 Blackwell GPUs and 36 Grace CPUs into a single rack, with 130 TB/s of total GPU-to-GPU bandwidth inside that rack.^{[6]NVIDIA, "GB200 NVL72" (accessed March 2026)https://www.nvidia.com/en-us/data-center/gb200-nvl72/}

	H100 NVLink Network	GB200 NVL72
How it works	External switch boxes link multiple servers^{[3]NVIDIA, "Introducing NVIDIA HGX H100" Technical Blog (2022)https://developer.nvidia.com/blog/introducing-nvidia-hgx-h100-an-accelerated-server-platform-for-ai-and-high-performance-computing/}	NVLink switches built into one rack^{[6]NVIDIA, "GB200 NVL72" (accessed March 2026)https://www.nvidia.com/en-us/data-center/gb200-nvl72/}
Max GPUs	256	72
Form factor	Multi-server pod	Single rack

The NVL72 also uses a related link called NVLink-C2C (Chip-to-Chip). Where regular NVLink connects GPUs to each other, C2C connects each Grace CPU to its paired Blackwell GPUs, replacing the usual PCIe bus with a faster path.^{[7]NVIDIA, "The NVIDIA Grace Blackwell Superchip" GB200 NVL Multi-Node Tuning Guide (2025)https://docs.nvidia.com/multi-node-nvlink-systems/multi-node-tuning-guide/overview.html}^{[8]NVIDIA, "NVLink-C2C" (accessed March 2026)https://www.nvidia.com/en-us/data-center/nvlink-c2c/}

When NVLink matters and when it doesn't

Whether NVLink bandwidth affects your workload depends on how many GPUs are involved and how they communicate.

Workload	NVLink	Why
Multi-GPU training	✓ Yes	All-reduce syncs every GPU after each training step. More bandwidth means less time waiting.
Single-GPU inference	✗ No	No GPU-to-GPU traffic when the model fits on one GPU.
Multi-GPU inference	✓ Yes	Model split across GPUs exchanges data every forward pass. MoE models add more cross-GPU traffic.
Fine-tuning	~ Depends	Full fine-tune across GPUs looks like training. LoRA (Low-Rank Adaptation) often fits on one GPU.
HPC / simulations	✓ Yes	Multi-GPU simulations in molecular dynamics, climate modeling, etc. benefit from NVLink bandwidth.

References

Frequently Asked Questions

What is NVLink and what is NVSwitch?

NVLink is NVIDIA's high-speed connection between GPUs. NVSwitch is the chip that routes traffic so every GPU can talk to every other GPU at full speed.

How fast is fifth-generation NVLink on Blackwell compared to PCIe 5.0?

Fifth-generation NVLink on Blackwell delivers 1,800 GB/s per GPU, about 14x the bandwidth of PCIe 5.0 x16, the standard expansion bus in servers.

What does NVSwitch do on an HGX baseboard?

What is GB200 NVL72?

GB200 NVL72 packages 72 Blackwell GPUs and 36 Grace CPUs into a single rack, with 130 TB/s of total GPU-to-GPU bandwidth inside that rack.

Bridging GPU operators and financing partners

We help emerging neoclouds find financing partners, and help financing partners enhance story credit with GPU collateral management and residual value insurance solutions.

Learn how it works →

ShareLinkedIn X Facebook