Liquid Cooling vs Air Cooling for GPU Servers
Air cooling blows air over heatsinks (metal fins that conduct heat away from the GPU). Liquid cooling pumps coolant through cold plates bolted to the GPU. Both methods remove heat, but they set different ceilings on what a data center can support.
Liquid cooling fits more GPUs per rack: fewer racks, less floor space, fewer network switches, and shorter cable runs between GPUs, which reduces communication latency for training workloads. Most B200 systems ship air-cooled, but the GB200 NVL72 rack is liquid-only, and NVIDIA's next-generation Rubin GPUs (1,800-2,300W TDP) will require liquid cooling across the board. Most data centers do not support liquid cooling yet. [6]Uptime Institute, "Global Data Center Survey 2025" (2025)https://intelligence.uptimeinstitute.com/resource/uptime-institute-global-data-center-survey-2025
Normalized 42U rack using 10U air and 4U liquid nodes
Illustrative facility PUE by rack density
Why GPU servers need so much cooling
Every watt a GPU consumes becomes heat. TDP (Thermal Design Power) is the maximum sustained heat the chip generates under load. If the cooling system cannot remove that heat fast enough, the GPU throttles its clock speed to protect itself, and performance drops.
| GPU | Architecture | Year | TDP (SXM form factor) |
|---|---|---|---|
| A100 | Ampere | 2020 | 400W |
| H100 | Hopper | 2022 | 700W |
| H200 | Hopper | 2024 | 700W |
| B200 | Blackwell | 2024 | 1,000W |
Sources: NVIDIA A100 datasheet [1]NVIDIA, "A100 Tensor Core GPU Datasheet" (2020)https://www.nvidia.com/content/dam/en-zz/Solutions/Data-Center/a100/pdf/nvidia-a100-datasheet-nvidia-us-2188504-web.pdf, H100 datasheet [4]NVIDIA, "H100 Tensor Core GPU Datasheet" (2022)https://resources.nvidia.com/en-us-tensor-core/nvidia-tensor-core-gpu-datasheet, H200 datasheet [5]NVIDIA, "H200 Tensor Core GPU Datasheet" (2024)https://www.nvidia.com/en-us/data-center/h200/, DGX B200 User Guide [2]NVIDIA, "DGX B200 User Guide" (2025)https://docs.nvidia.com/dgx/dgxb200-user-guide/introduction-to-dgxb200.html
An 8-GPU B200 server consumes 8 kW in GPU power alone. CPUs, memory, network cards, fans, and power supply losses add significantly to that. [2]NVIDIA, "DGX B200 User Guide" (2025)https://docs.nvidia.com/dgx/dgxb200-user-guide/introduction-to-dgxb200.html
NVIDIA's GB200 NVL72, a 72-GPU rack-scale Blackwell system, consumes 120-132 kW total. [3]NVIDIA, "GB200 NVL72" (accessed March 2026)https://www.nvidia.com/en-us/data-center/gb200-nvl72/ [9]Supermicro, "Supermicro Ramps Full Production of NVIDIA Blackwell Rack-Scale Solutions" (2025)https://www.supermicro.com/en/pressreleases/supermicro-ramps-full-nvidia-blackwell-rack-scale-solutions-nvidia-hgx-b200 As of early 2026, most data centers run 10-30 kW per rack, and few exceed 30 kW. [6]Uptime Institute, "Global Data Center Survey 2025" (2025)https://intelligence.uptimeinstitute.com/resource/uptime-institute-global-data-center-survey-2025 A single NVL72 rack draws 4-12x that.
How air cooling works
Fans push ambient air across heatsinks attached to the GPU and CPU packages. Cool air enters from the front of the server, absorbs heat as it passes over the fins, and exits hot from the rear.
Data centers organize this into hot aisle/cold aisle containment (physical barriers that separate cool intake air from hot exhaust). Cold air from CRAC (Computer Room Air Conditioning) or CRAH (Computer Room Air Handler) units feeds the cold aisle. Servers draw it in, heat it, and exhaust it into the hot aisle, which routes back to the cooling units. Physical barriers between aisles prevent hot exhaust from recirculating.
ASHRAE TC 9.9, the technical committee that sets thermal guidelines for data center equipment, recommends an inlet air temperature of 18-27°C for server hardware. [7]ASHRAE TC 9.9, "Thermal Guidelines for Data Processing Environments, 5th Edition" (2021)https://www.ashrae.org/technical-resources/bookstore/datacom-series Operating within that range extends equipment life and keeps energy costs predictable.
The physics set the limit. Air has low specific heat capacity (the amount of energy needed to raise a kilogram by one degree): about 1 kJ per kg per °C. Water's volumetric heat capacity is about 3,400x higher, so less coolant can move the same heat. Fan power scales with the cube of fan speed, so a 10% increase in airflow demands about 33% more fan energy. [8]DCPulse, "How Rack Power Impacts PUE in AI Data Centers" (2025)https://dcpulse.com/article/the-density-dividend-how-rack-power-impacts-pue-efficiency
- Up to 20 kW/rack: standard air cooling with hot/cold aisle containment
- 20-40 kW/rack: high-speed fans, rear-door heat exchangers, or in-row cooling
- Above 40 kW/rack: rear-door heat exchangers or liquid cooling [8]DCPulse, "How Rack Power Impacts PUE in AI Data Centers" (2025)https://dcpulse.com/article/the-density-dividend-how-rack-power-impacts-pue-efficiency
The Uptime Institute's 2025 Global Data Center Survey found that 67% of existing data centers cannot support modern GPU power densities. [6]Uptime Institute, "Global Data Center Survey 2025" (2025)https://intelligence.uptimeinstitute.com/resource/uptime-institute-global-data-center-survey-2025
How liquid cooling works
In direct-to-chip (D2C) cooling, a cold plate, a metal block with internal channels, mounts directly on the GPU. Coolant flows through those channels, absorbs heat at the source, and carries it to a CDU (Coolant Distribution Unit, the heat exchanger that sits outside or beside the rack). D2C handles the hottest components, GPUs and CPUs, with liquid while fans still cool lower-power parts like memory modules and storage drives.
The CDU operates two loops. A secondary loop circulates filtered coolant between the CDU and the cold plates inside the servers. A primary loop connects to the facility's chilled water supply or dry coolers outside the building. The CDU transfers heat from the server loop to the facility loop, then sends cooled fluid back to the cold plates.
CDUs come in two types. Liquid-to-liquid (L2L) CDUs connect to a facility chilled water plant, the standard for large deployments. Liquid-to-air (L2A) CDUs reject heat to air through built-in fans, useful for smaller installations or sites without chilled water infrastructure.
Direct-to-chip cooling: two-loop architecture
A second approach, immersion cooling, submerges the entire server in dielectric fluid (a non-conductive liquid).
| Direct-to-chip (D2C) | Immersion | |
|---|---|---|
| How it works | Cold plates on GPUs/CPUs, coolant loops to CDU | Server submerged in dielectric fluid. Single-phase keeps the fluid liquid throughout. Two-phase lets the fluid evaporate at the hot surface and condense on a cooler surface above, transferring more heat per cycle. |
| Heat captured | Up to 98% of system heat through liquid [9]Supermicro, "Supermicro Ramps Full Production of NVIDIA Blackwell Rack-Scale Solutions" (2025)https://www.supermicro.com/en/pressreleases/supermicro-ramps-full-nvidia-blackwell-rack-scale-solutions-nvidia-hgx-b200 | 100% (all components submerged) |
| PUE (Power Usage Effectiveness, total facility power divided by IT equipment power) | ~1.15 [8]DCPulse, "How Rack Power Impacts PUE in AI Data Centers" (2025)https://dcpulse.com/article/the-density-dividend-how-rack-power-impacts-pue-efficiency | 1.03-1.08 [10]IDTechEx, "Two-Phase Cold Plate Cooling Will Take Off as Early as 2026-2027" (2025)https://www.idtechex.com/en/research-article/two-phase-cold-plate-cooling-will-take-off-as-early-as-2026-2027/34068 |
| Server compatibility | Standard chassis with cold plate retrofit | Purpose-built tanks and enclosures |
| Maturity | Production standard for Blackwell [9]Supermicro, "Supermicro Ramps Full Production of NVIDIA Blackwell Rack-Scale Solutions" (2025)https://www.supermicro.com/en/pressreleases/supermicro-ramps-full-nvidia-blackwell-rack-scale-solutions-nvidia-hgx-b200 | Niche; scaling expected 2026-2027 [10]IDTechEx, "Two-Phase Cold Plate Cooling Will Take Off as Early as 2026-2027" (2025)https://www.idtechex.com/en/research-article/two-phase-cold-plate-cooling-will-take-off-as-early-as-2026-2027/34068 |
Direct-to-chip is the production standard for Blackwell-class hardware. Supermicro, Dell, and NVIDIA's own GB200 NVL72 all use D2C. [9]Supermicro, "Supermicro Ramps Full Production of NVIDIA Blackwell Rack-Scale Solutions" (2025)https://www.supermicro.com/en/pressreleases/supermicro-ramps-full-nvidia-blackwell-rack-scale-solutions-nvidia-hgx-b200 [11]Dell Technologies, "PowerEdge XE9680L Spec Sheet" (2025)https://www.delltechnologies.com/asset/en-us/products/servers/technical-support/poweredge-xe9680l-spec-sheet.pdf [3]NVIDIA, "GB200 NVL72" (accessed March 2026)https://www.nvidia.com/en-us/data-center/gb200-nvl72/ IDTechEx projects two-phase immersion will begin scaling in 2026-2027 as GPU TDPs push past the limits of single-phase systems. [10]IDTechEx, "Two-Phase Cold Plate Cooling Will Take Off as Early as 2026-2027" (2025)https://www.idtechex.com/en/research-article/two-phase-cold-plate-cooling-will-take-off-as-early-as-2026-2027/34068
Air vs liquid tradeoffs
| Air cooling | Liquid cooling (D2C) | |
|---|---|---|
| Max rack density | 25-40 kW | 80-250+ kW |
| Facility requirements | Raised floors or containment, CRAC/CRAH units | CDU per rack or row, piping, chilled water or dry coolers, leak detection |
| Maintenance | Low: replace fans, clean filters | Higher: trained technicians, coolant management, pump servicing |
| Upfront cost | Lower | Higher (CDUs, piping, plumbing) |
| Energy cost at density | Higher (fans scale with cube of speed) | Lower (PUE advantage compounds over time) |
| GPU thermal headroom | Limited at high TDP | Better: lower junction temps, sustained boost clocks |
Sources: Uptime Institute (2025) [6]Uptime Institute, "Global Data Center Survey 2025" (2025)https://intelligence.uptimeinstitute.com/resource/uptime-institute-global-data-center-survey-2025, DCPulse [8]DCPulse, "How Rack Power Impacts PUE in AI Data Centers" (2025)https://dcpulse.com/article/the-density-dividend-how-rack-power-impacts-pue-efficiency, Supermicro [9]Supermicro, "Supermicro Ramps Full Production of NVIDIA Blackwell Rack-Scale Solutions" (2025)https://www.supermicro.com/en/pressreleases/supermicro-ramps-full-nvidia-blackwell-rack-scale-solutions-nvidia-hgx-b200
The cost trade-off depends on scale. A single 8-GPU server with air cooling is cheaper to deploy: no plumbing, no CDU, no coolant management. At rack scale or higher, liquid cooling's density advantage changes the math. In Supermicro's published HGX B200 examples, the liquid-cooled design fits 8 systems and 64 GPUs in a 42U rack, while the air-cooled design fits 4 systems and 32 GPUs in a 42U rack. [9]Supermicro, "Supermicro Ramps Full Production of NVIDIA Blackwell Rack-Scale Solutions" (2025)https://www.supermicro.com/en/pressreleases/supermicro-ramps-full-nvidia-blackwell-rack-scale-solutions-nvidia-hgx-b200 The exact mix depends on the chassis and rack, but the pattern is consistent: liquid cooling buys density.
PUE is a multiplier on every watt of IT load. In a 10 MW IT deployment, PUE 1.8 means 18 MW total facility draw, 8 MW of it just cooling and power distribution. PUE 1.15 drops that overhead to 1.5 MW. The 6.5 MW difference costs about $4 million per year at $0.07/kWh. [8]DCPulse, "How Rack Power Impacts PUE in AI Data Centers" (2025)https://dcpulse.com/article/the-density-dividend-how-rack-power-impacts-pue-efficiency
What the hardware dictates
The cooling method is not always a free choice. NVIDIA and OEM (Original Equipment Manufacturer) server designs sometimes make it for you. Server chassis are measured in rack units (U), where 1U equals 1.75 inches of vertical space. A standard rack is 42U tall.
| System | Cooling | Form factor | Practical takeaway |
|---|---|---|---|
| GB200 NVL72 | Liquid only | Full rack | Requires facility liquid cooling infrastructure |
| HGX B200 (liquid) | Liquid | 4U per node | 2x GPU density vs air-cooled HGX B200 |
| HGX B200 (air) | Air | 10U per node | No plumbing or CDU needed |
| DGX B200 | Air | 10U per node | NVIDIA turnkey; no OEM customization |
| Dell XE9680 (H100/H200) | Air | 6U per node | Fits Hopper-era TDP with air only |
| Dell XE9680L | Liquid | 4U per node | Same baseboard as XE9680 in less space |
Sources: NVIDIA [2]NVIDIA, "DGX B200 User Guide" (2025)https://docs.nvidia.com/dgx/dgxb200-user-guide/introduction-to-dgxb200.html [3]NVIDIA, "GB200 NVL72" (accessed March 2026)https://www.nvidia.com/en-us/data-center/gb200-nvl72/, Supermicro [9]Supermicro, "Supermicro Ramps Full Production of NVIDIA Blackwell Rack-Scale Solutions" (2025)https://www.supermicro.com/en/pressreleases/supermicro-ramps-full-nvidia-blackwell-rack-scale-solutions-nvidia-hgx-b200, Dell [11]Dell Technologies, "PowerEdge XE9680L Spec Sheet" (2025)https://www.delltechnologies.com/asset/en-us/products/servers/technical-support/poweredge-xe9680l-spec-sheet.pdf [12]Dell Technologies, "PowerEdge XE9680 Spec Sheet" (2024)https://www.dell.com/en-us/shop/ipovw/poweredge-xe9680
The form factor difference comes from the cooling hardware itself. An air-cooled server needs tall heatsinks and rows of high-speed fans to force enough airflow across the GPUs. Those components take physical space, which is why Supermicro's air-cooled HGX B200 is a 10U chassis. Replace the heatsinks and fans with compact cold plates and manifold tubing, and the same baseboard fits in 4U. [9]Supermicro, "Supermicro Ramps Full Production of NVIDIA Blackwell Rack-Scale Solutions" (2025)https://www.supermicro.com/en/pressreleases/supermicro-ramps-full-nvidia-blackwell-rack-scale-solutions-nvidia-hgx-b200
Within the same HGX B200 product family, cooling changes the density ceiling. Supermicro's published examples show 32 GPUs in its air-cooled rack design and 64 GPUs in its liquid-cooled rack design. [9]Supermicro, "Supermicro Ramps Full Production of NVIDIA Blackwell Rack-Scale Solutions" (2025)https://www.supermicro.com/en/pressreleases/supermicro-ramps-full-nvidia-blackwell-rack-scale-solutions-nvidia-hgx-b200
For GB200 NVL72 deployments, the data center must have liquid cooling infrastructure in place before the hardware arrives. Retrofitting an air-cooled facility means adding CDUs, running pipes, and potentially upgrading the building's chilled water capacity. That work takes months.
References
- NVIDIA, "A100 Tensor Core GPU Datasheet" (2020)
- NVIDIA, "DGX B200 User Guide" (2025)
- NVIDIA, "GB200 NVL72" (accessed March 2026)
- NVIDIA, "H100 Tensor Core GPU Datasheet" (2022)
- NVIDIA, "H200 Tensor Core GPU Datasheet" (2024)
- Uptime Institute, "Global Data Center Survey 2025" (2025)
- ASHRAE TC 9.9, "Thermal Guidelines for Data Processing Environments, 5th Edition" (2021)
- DCPulse, "How Rack Power Impacts PUE in AI Data Centers" (2025)
- Supermicro, "Supermicro Ramps Full Production of NVIDIA Blackwell Rack-Scale Solutions" (2025)
- IDTechEx, "Two-Phase Cold Plate Cooling Will Take Off as Early as 2026-2027" (2025)
- Dell Technologies, "PowerEdge XE9680L Spec Sheet" (2025)
- Dell Technologies, "PowerEdge XE9680 Spec Sheet" (2024)
Frequently Asked Questions
What is TDP and what happens if GPU cooling cannot keep up?
Every watt a GPU consumes becomes heat. TDP (Thermal Design Power) is the maximum sustained heat the chip generates under load. If the cooling system cannot remove that heat fast enough, the GPU throttles its clock speed to protect itself, and performance drops.
How much power does a GB200 NVL72 rack draw compared to typical data center racks?
The GB200 NVL72 packs 72 Blackwell GPUs into one rack at 120-132 kW total. As of early 2026, most data centers run 10-30 kW per rack, and few exceed 30 kW. A single NVL72 rack draws 4-12x that.
What is direct-to-chip cooling versus immersion cooling?
Direct-to-chip (D2C) cooling handles the hottest components, GPUs and CPUs, with liquid while fans still cool lower-power parts like memory DIMMs and NVMe drives. Immersion cooling submerges the entire server in dielectric fluid, a non-conductive liquid.
How does liquid cooling change rack density for HGX B200?
In Supermicro's published HGX B200 examples, the liquid-cooled design fits 8 systems and 64 GPUs in a 42U rack, while the air-cooled design fits 4 systems and 32 GPUs. The pattern is consistent: liquid cooling doubles density.
Coverage creates a minimum value for what your GPUs are worth at a future date. If they sell below the floor, the policy pays you the difference.
Learn how it works →