NVLink and NVSwitch
NVLink is NVIDIA's high-speed connection between GPUs. NVSwitch is the chip that routes traffic so every GPU can talk to every other GPU at full speed. As of early 2026, fifth-generation NVLink on Blackwell delivers 1,800 GB/s per GPU, about 14x the bandwidth of PCIe 5.0 x16 (the standard expansion bus in servers). [1]NVIDIA, "NVLink and NVSwitch" (accessed March 2026)https://www.nvidia.com/en-us/data-center/nvlink/ Most training workloads and an increasing share of inference workloads use multiple GPUs, so that bandwidth matters.
What NVLink does
Any workload that splits across multiple GPUs needs those GPUs to exchange data constantly. Training, large-scale inference, and fine-tuning all require it. If that exchange is slow, GPUs spend more time waiting than computing. NVLink replaces the default PCIe bus with a dedicated high-bandwidth path between GPUs.
Traditionally, NVLink connects GPUs within a single server, on the same baseboard. Traffic between servers uses a separate network, usually InfiniBand or Ethernet, which is slower.
Newer architectures like the GB200 NVL72 extend NVLink beyond a single server so that GPUs across an entire rack communicate at NVLink speeds instead of dropping down to the network.
What NVSwitch does
NVSwitch is the switch chip on the HGX baseboard, the board that carries the GPUs and their NVLink connections. It lets any GPU send data to any other GPU at full NVLink bandwidth simultaneously, so no GPU has to relay data through a neighbor. An HGX A100 board uses six NVSwitch chips [2]NVIDIA, "Introducing NVIDIA HGX A100" Technical Blog (2020)https://developer.nvidia.com/blog/introducing-hgx-a100-most-powerful-accelerated-server-platform-for-ai-hpc/ while an HGX H100 uses four. [3]NVIDIA, "Introducing NVIDIA HGX H100" Technical Blog (2022)https://developer.nvidia.com/blog/introducing-nvidia-hgx-h100-an-accelerated-server-platform-for-ai-and-high-performance-computing/
Point-to-point vs NVSwitch (click a GPU)
All-reduce, one of the most common distributed training operations, requires every GPU to exchange data with every other GPU simultaneously. Without NVSwitch, that traffic bottlenecks on the few direct links between neighboring GPUs.
NVLink generations
NVIDIA has announced six NVLink generations since 2016. Five have shipped in products. Rubin is the sixth and NVIDIA unveiled it at CES in January 2026. [1]NVIDIA, "NVLink and NVSwitch" (accessed March 2026)https://www.nvidia.com/en-us/data-center/nvlink/ [4]NVIDIA, "Vera Rubin Platform" Newsroom (2026)https://nvidianews.nvidia.com/news/nvidia-vera-rubin-platform
NVLink bandwidth by generation (click a bar)
NVSwitch arrived with the second NVLink generation. NVIDIA introduced it in HGX-2 and used it in DGX-2 to fully connect 16 V100 GPUs. [5]NVIDIA, "NVIDIA Introduces HGX-2, Fusing HPC and AI Computing into Unified Architecture" (2018)https://nvidianews.nvidia.com/news/nvidia-introduces-hgx-2-fusing-hpc-and-ai-computing-into-unified-architecture-6696445 Before that, Pascal and early Volta systems relied on direct NVLink wiring, which limited how many GPUs could all communicate at full speed in one system.
NVLink at rack scale
Rack-scale NVLink did not start with Blackwell. H100 systems with NVLink Network could already stretch one NVLink network across multiple servers through external switch boxes, reaching up to 256 connected GPUs. [3]NVIDIA, "Introducing NVIDIA HGX H100" Technical Blog (2022)https://developer.nvidia.com/blog/introducing-nvidia-hgx-h100-an-accelerated-server-platform-for-ai-and-high-performance-computing/
Blackwell made that design easier to buy as one system. GB200 NVL72 packages 72 Blackwell GPUs and 36 Grace CPUs into a single rack, with 130 TB/s of total GPU-to-GPU bandwidth inside that rack. [6]NVIDIA, "GB200 NVL72" (accessed March 2026)https://www.nvidia.com/en-us/data-center/gb200-nvl72/
| H100 NVLink Network | GB200 NVL72 | |
|---|---|---|
| How it works | External switch boxes link multiple servers [3]NVIDIA, "Introducing NVIDIA HGX H100" Technical Blog (2022)https://developer.nvidia.com/blog/introducing-nvidia-hgx-h100-an-accelerated-server-platform-for-ai-and-high-performance-computing/ | NVLink switches built into one rack [6]NVIDIA, "GB200 NVL72" (accessed March 2026)https://www.nvidia.com/en-us/data-center/gb200-nvl72/ |
| Max GPUs | 256 | 72 |
| Form factor | Multi-server pod | Single rack |
The NVL72 also uses a related link called NVLink-C2C (Chip-to-Chip). Where regular NVLink connects GPUs to each other, C2C connects each Grace CPU to its paired Blackwell GPUs, replacing the usual PCIe bus with a faster path. [7]NVIDIA, "The NVIDIA Grace Blackwell Superchip" GB200 NVL Multi-Node Tuning Guide (2025)https://docs.nvidia.com/multi-node-nvlink-systems/multi-node-tuning-guide/overview.html [8]NVIDIA, "NVLink-C2C" (accessed March 2026)https://www.nvidia.com/en-us/data-center/nvlink-c2c/
When NVLink matters and when it doesn't
Whether NVLink bandwidth affects your workload depends on how many GPUs are involved and how they communicate.
| Workload | NVLink | Why |
|---|---|---|
| Multi-GPU training | ✓ Yes | All-reduce syncs every GPU after each training step. More bandwidth means less time waiting. |
| Single-GPU inference | ✗ No | No GPU-to-GPU traffic when the model fits on one GPU. |
| Multi-GPU inference | ✓ Yes | Model split across GPUs exchanges data every forward pass. MoE models add more cross-GPU traffic. |
| Fine-tuning | ~ Depends | Full fine-tune across GPUs looks like training. LoRA (Low-Rank Adaptation) often fits on one GPU. |
| HPC / simulations | ✓ Yes | Multi-GPU simulations in molecular dynamics, climate modeling, etc. benefit from NVLink bandwidth. |
References
- NVIDIA, "NVLink and NVSwitch" (accessed March 2026)
- NVIDIA, "Introducing NVIDIA HGX A100" Technical Blog (2020)
- NVIDIA, "Introducing NVIDIA HGX H100" Technical Blog (2022)
- NVIDIA, "Vera Rubin Platform" Newsroom (2026)
- NVIDIA, "NVIDIA Introduces HGX-2, Fusing HPC and AI Computing into Unified Architecture" (2018)
- NVIDIA, "GB200 NVL72" (accessed March 2026)
- NVIDIA, "The NVIDIA Grace Blackwell Superchip" GB200 NVL Multi-Node Tuning Guide (2025)
- NVIDIA, "NVLink-C2C" (accessed March 2026)
Frequently Asked Questions
What is NVLink and what is NVSwitch?
NVLink is NVIDIA's high-speed connection between GPUs. NVSwitch is the chip that routes traffic so every GPU can talk to every other GPU at full speed.
How fast is fifth-generation NVLink on Blackwell compared to PCIe 5.0?
Fifth-generation NVLink on Blackwell delivers 1,800 GB/s per GPU, about 14x the bandwidth of PCIe 5.0 x16, the standard expansion bus in servers.
What does NVSwitch do on an HGX baseboard?
NVSwitch is the switch chip on the HGX baseboard, the board that carries the GPUs and their NVLink connections. It lets any GPU send data to any other GPU at full NVLink bandwidth simultaneously, so no GPU has to relay data through a neighbor. An HGX A100 board uses six NVSwitch chips while an HGX H100 uses four.
What is GB200 NVL72?
GB200 NVL72 packages 72 Blackwell GPUs and 36 Grace CPUs into a single rack, with 130 TB/s of total GPU-to-GPU bandwidth inside that rack.
Coverage creates a minimum value for what your GPUs are worth at a future date. If they sell below the floor, the policy pays you the difference.
Learn how it works →