NVLink and NVSwitch

·Brenden Reeves

NVLink is NVIDIA's high-speed connection between GPUs. NVSwitch is the chip that routes traffic so every GPU can talk to every other GPU at full speed. As of early 2026, fifth-generation NVLink on Blackwell delivers 1,800 GB/s per GPU, about 14x the bandwidth of PCIe 5.0 x16 (the standard expansion bus in servers). [1]NVIDIA, "NVLink and NVSwitch" (accessed March 2026)https://www.nvidia.com/en-us/data-center/nvlink/ Most training workloads and an increasing share of inference workloads use multiple GPUs, so that bandwidth matters.

Any workload that splits across multiple GPUs needs those GPUs to exchange data constantly. Training, large-scale inference, and fine-tuning all require it. If that exchange is slow, GPUs spend more time waiting than computing. NVLink replaces the default PCIe bus with a dedicated high-bandwidth path between GPUs.

Traditionally, NVLink connects GPUs within a single server, on the same baseboard. Traffic between servers uses a separate network, usually InfiniBand or Ethernet, which is slower.

Newer architectures like the GB200 NVL72 extend NVLink beyond a single server so that GPUs across an entire rack communicate at NVLink speeds instead of dropping down to the network.

What NVSwitch does

NVSwitch is the switch chip on the HGX baseboard, the board that carries the GPUs and their NVLink connections. It lets any GPU send data to any other GPU at full NVLink bandwidth simultaneously, so no GPU has to relay data through a neighbor. An HGX A100 board uses six NVSwitch chips [2]NVIDIA, "Introducing NVIDIA HGX A100" Technical Blog (2020)https://developer.nvidia.com/blog/introducing-hgx-a100-most-powerful-accelerated-server-platform-for-ai-hpc/ while an HGX H100 uses four. [3]NVIDIA, "Introducing NVIDIA HGX H100" Technical Blog (2022)https://developer.nvidia.com/blog/introducing-nvidia-hgx-h100-an-accelerated-server-platform-for-ai-and-high-performance-computing/

Point-to-point vs NVSwitch (click a GPU)

Without NVSwitch
GPU 0GPU 1GPU 2GPU 3GPU 4GPU 5GPU 6GPU 7GPU 0: 3 of 7 direct paths
With NVSwitch
NVSwNVSwNVSwNVSwGPU 0GPU 1GPU 2GPU 3GPU 4GPU 5GPU 6GPU 7GPU 0 → all 7 GPUs through NVSwitch
Simplified view. Actual NVSwitch count varies by generation (HGX A100 uses 6, HGX H100 uses 4).

All-reduce, one of the most common distributed training operations, requires every GPU to exchange data with every other GPU simultaneously. Without NVSwitch, that traffic bottlenecks on the few direct links between neighboring GPUs.

NVIDIA has announced six NVLink generations since 2016. Five have shipped in products. Rubin is the sixth and NVIDIA unveiled it at CES in January 2026. [1]NVIDIA, "NVLink and NVSwitch" (accessed March 2026)https://www.nvidia.com/en-us/data-center/nvlink/ [4]NVIDIA, "Vera Rubin Platform" Newsroom (2026)https://nvidianews.nvidia.com/news/nvidia-vera-rubin-platform

NVLink bandwidth by generation (click a bar)

1.0 · Pascal160 GB/s2.0 · Volta300 GB/s3.0 · Ampere600 GB/s4.0 · Hopper900 GB/s5.0 · Blackwell1.8 TB/s6.0 · Rubin platform3.6 TB/sPCIe 5.0 (~128 GB/s)
Dashed line marks PCIe 5.0 for comparison.

NVSwitch arrived with the second NVLink generation. NVIDIA introduced it in HGX-2 and used it in DGX-2 to fully connect 16 V100 GPUs. [5]NVIDIA, "NVIDIA Introduces HGX-2, Fusing HPC and AI Computing into Unified Architecture" (2018)https://nvidianews.nvidia.com/news/nvidia-introduces-hgx-2-fusing-hpc-and-ai-computing-into-unified-architecture-6696445 Before that, Pascal and early Volta systems relied on direct NVLink wiring, which limited how many GPUs could all communicate at full speed in one system.

Rack-scale NVLink did not start with Blackwell. H100 systems with NVLink Network could already stretch one NVLink network across multiple servers through external switch boxes, reaching up to 256 connected GPUs. [3]NVIDIA, "Introducing NVIDIA HGX H100" Technical Blog (2022)https://developer.nvidia.com/blog/introducing-nvidia-hgx-h100-an-accelerated-server-platform-for-ai-and-high-performance-computing/

Blackwell made that design easier to buy as one system. GB200 NVL72 packages 72 Blackwell GPUs and 36 Grace CPUs into a single rack, with 130 TB/s of total GPU-to-GPU bandwidth inside that rack. [6]NVIDIA, "GB200 NVL72" (accessed March 2026)https://www.nvidia.com/en-us/data-center/gb200-nvl72/

H100 NVLink NetworkGB200 NVL72
How it worksExternal switch boxes link multiple servers [3]NVIDIA, "Introducing NVIDIA HGX H100" Technical Blog (2022)https://developer.nvidia.com/blog/introducing-nvidia-hgx-h100-an-accelerated-server-platform-for-ai-and-high-performance-computing/NVLink switches built into one rack [6]NVIDIA, "GB200 NVL72" (accessed March 2026)https://www.nvidia.com/en-us/data-center/gb200-nvl72/
Max GPUs25672
Form factorMulti-server podSingle rack

The NVL72 also uses a related link called NVLink-C2C (Chip-to-Chip). Where regular NVLink connects GPUs to each other, C2C connects each Grace CPU to its paired Blackwell GPUs, replacing the usual PCIe bus with a faster path. [7]NVIDIA, "The NVIDIA Grace Blackwell Superchip" GB200 NVL Multi-Node Tuning Guide (2025)https://docs.nvidia.com/multi-node-nvlink-systems/multi-node-tuning-guide/overview.html [8]NVIDIA, "NVLink-C2C" (accessed March 2026)https://www.nvidia.com/en-us/data-center/nvlink-c2c/

Whether NVLink bandwidth affects your workload depends on how many GPUs are involved and how they communicate.

WorkloadNVLinkWhy
Multi-GPU training✓ YesAll-reduce syncs every GPU after each training step. More bandwidth means less time waiting.
Single-GPU inference✗ NoNo GPU-to-GPU traffic when the model fits on one GPU.
Multi-GPU inference✓ YesModel split across GPUs exchanges data every forward pass. MoE models add more cross-GPU traffic.
Fine-tuning~ DependsFull fine-tune across GPUs looks like training. LoRA (Low-Rank Adaptation) often fits on one GPU.
HPC / simulations✓ YesMulti-GPU simulations in molecular dynamics, climate modeling, etc. benefit from NVLink bandwidth.

References

  1. NVIDIA, "NVLink and NVSwitch" (accessed March 2026)
  2. NVIDIA, "Introducing NVIDIA HGX A100" Technical Blog (2020)
  3. NVIDIA, "Introducing NVIDIA HGX H100" Technical Blog (2022)
  4. NVIDIA, "Vera Rubin Platform" Newsroom (2026)
  5. NVIDIA, "NVIDIA Introduces HGX-2, Fusing HPC and AI Computing into Unified Architecture" (2018)
  6. NVIDIA, "GB200 NVL72" (accessed March 2026)
  7. NVIDIA, "The NVIDIA Grace Blackwell Superchip" GB200 NVL Multi-Node Tuning Guide (2025)
  8. NVIDIA, "NVLink-C2C" (accessed March 2026)

Frequently Asked Questions

What is NVLink and what is NVSwitch?

NVLink is NVIDIA's high-speed connection between GPUs. NVSwitch is the chip that routes traffic so every GPU can talk to every other GPU at full speed.

How fast is fifth-generation NVLink on Blackwell compared to PCIe 5.0?

Fifth-generation NVLink on Blackwell delivers 1,800 GB/s per GPU, about 14x the bandwidth of PCIe 5.0 x16, the standard expansion bus in servers.

What does NVSwitch do on an HGX baseboard?

NVSwitch is the switch chip on the HGX baseboard, the board that carries the GPUs and their NVLink connections. It lets any GPU send data to any other GPU at full NVLink bandwidth simultaneously, so no GPU has to relay data through a neighbor. An HGX A100 board uses six NVSwitch chips while an HGX H100 uses four.

What is GB200 NVL72?

GB200 NVL72 packages 72 Blackwell GPUs and 36 Grace CPUs into a single rack, with 130 TB/s of total GPU-to-GPU bandwidth inside that rack.

Residual Value Insurance Solutions for GPUs

Coverage creates a minimum value for what your GPUs are worth at a future date. If they sell below the floor, the policy pays you the difference.

Learn how it works →
NVLink and NVSwitch | American Compute