Liquid Cooling vs Air Cooling for GPU Servers

Mar 20, 2026·Brenden Reeves, COO·AC Research

Air cooling blows air over heatsinks (metal fins that conduct heat away from the GPU). Liquid cooling pumps coolant through cold plates bolted to the GPU. Both methods remove heat, but they set different ceilings on what a data center can support.

Liquid cooling fits more GPUs per rack: fewer racks, less floor space, fewer network switches, and shorter cable runs between GPUs, which reduces communication latency for training workloads. Most B200 systems ship air-cooled, but the GB200 NVL72 rack is liquid-only, and NVIDIA's next-generation Rubin GPUs (1,800-2,300W TDP) will require liquid cooling across the board. Most data centers do not support liquid cooling yet.^{[6]Uptime Institute, "Global Data Center Survey 2025" (2025)https://intelligence.uptimeinstitute.com/resource/uptime-institute-global-data-center-survey-2025}

Normalized 42U rack using 10U air and 4U liquid nodes

Actual rack layouts vary by OEM.

Illustrative facility PUE by rack density

PUE (Power Usage Effectiveness) measures total facility power divided by IT equipment power. PUE 1.0 means zero cooling overhead. Illustrative trend based on DCPulse and Uptime Institute data.

Why GPU servers need so much cooling

Every watt a GPU consumes becomes heat. TDP (Thermal Design Power) is the maximum sustained heat the chip generates under load. If the cooling system cannot remove that heat fast enough, the GPU throttles its clock speed to protect itself, and performance drops.

GPU	Architecture	Year	TDP (SXM form factor)
A100	Ampere	2020	400W
H100	Hopper	2022	700W
H200	Hopper	2024	700W
B200	Blackwell	2024	1,000W

Sources: NVIDIA A100 datasheet^{[1]NVIDIA, "A100 Tensor Core GPU Datasheet" (2020)https://www.nvidia.com/content/dam/en-zz/Solutions/Data-Center/a100/pdf/nvidia-a100-datasheet-nvidia-us-2188504-web.pdf}, H100 datasheet^{[4]NVIDIA, "H100 Tensor Core GPU Datasheet" (2022)https://resources.nvidia.com/en-us-tensor-core/nvidia-tensor-core-gpu-datasheet}, H200 datasheet^{[5]NVIDIA, "H200 Tensor Core GPU Datasheet" (2024)https://www.nvidia.com/en-us/data-center/h200/}, DGX B200 User Guide^{[2]NVIDIA, "DGX B200 User Guide" (2025)https://docs.nvidia.com/dgx/dgxb200-user-guide/introduction-to-dgxb200.html}

An 8-GPU B200 server consumes 8 kW in GPU power alone. CPUs, memory, network cards, fans, and power supply losses add significantly to that.^{[2]NVIDIA, "DGX B200 User Guide" (2025)https://docs.nvidia.com/dgx/dgxb200-user-guide/introduction-to-dgxb200.html}

NVIDIA's GB200 NVL72, a 72-GPU rack-scale Blackwell system, consumes 120-132 kW total.^{[3]NVIDIA, "GB200 NVL72" (accessed March 2026)https://www.nvidia.com/en-us/data-center/gb200-nvl72/}^{[9]Supermicro, "Supermicro Ramps Full Production of NVIDIA Blackwell Rack-Scale Solutions" (2025)https://www.supermicro.com/en/pressreleases/supermicro-ramps-full-nvidia-blackwell-rack-scale-solutions-nvidia-hgx-b200} As of early 2026, most data centers run 10-30 kW per rack, and few exceed 30 kW.^{[6]Uptime Institute, "Global Data Center Survey 2025" (2025)https://intelligence.uptimeinstitute.com/resource/uptime-institute-global-data-center-survey-2025} A single NVL72 rack draws 4-12x that.

How air cooling works

Fans push ambient air across heatsinks attached to the GPU and CPU packages. Cool air enters from the front of the server, absorbs heat as it passes over the fins, and exits hot from the rear.

Data centers organize this into hot aisle/cold aisle containment (physical barriers that separate cool intake air from hot exhaust). Cold air from CRAC (Computer Room Air Conditioning) or CRAH (Computer Room Air Handler) units feeds the cold aisle. Servers draw it in, heat it, and exhaust it into the hot aisle, which routes back to the cooling units. Physical barriers between aisles prevent hot exhaust from recirculating.

ASHRAE TC 9.9, the technical committee that sets thermal guidelines for data center equipment, recommends an inlet air temperature of 18-27°C for server hardware.^{[7]ASHRAE TC 9.9, "Thermal Guidelines for Data Processing Environments, 5th Edition" (2021)https://www.ashrae.org/technical-resources/bookstore/datacom-series} Operating within that range extends equipment life and keeps energy costs predictable.

The physics set the limit. Air has low specific heat capacity (the amount of energy needed to raise a kilogram by one degree): about 1 kJ per kg per °C. Water's volumetric heat capacity is about 3,400x higher, so less coolant can move the same heat. Fan power scales with the cube of fan speed, so a 10% increase in airflow demands about 33% more fan energy.^{[8]DCPulse, "How Rack Power Impacts PUE in AI Data Centers" (2025)https://dcpulse.com/article/the-density-dividend-how-rack-power-impacts-pue-efficiency}

Up to 20 kW/rack: standard air cooling with hot/cold aisle containment
20-40 kW/rack: high-speed fans, rear-door heat exchangers, or in-row cooling
Above 40 kW/rack: rear-door heat exchangers or liquid cooling^{[8]DCPulse, "How Rack Power Impacts PUE in AI Data Centers" (2025)https://dcpulse.com/article/the-density-dividend-how-rack-power-impacts-pue-efficiency}

The Uptime Institute's 2025 Global Data Center Survey found that 67% of existing data centers cannot support modern GPU power densities.^{[6]Uptime Institute, "Global Data Center Survey 2025" (2025)https://intelligence.uptimeinstitute.com/resource/uptime-institute-global-data-center-survey-2025}

How liquid cooling works

In direct-to-chip (D2C) cooling, a cold plate, a metal block with internal channels, mounts directly on the GPU. Coolant flows through those channels, absorbs heat at the source, and carries it to a CDU (Coolant Distribution Unit, the heat exchanger that sits outside or beside the rack). D2C handles the hottest components, GPUs and CPUs, with liquid while fans still cool lower-power parts like memory modules and storage drives.

The CDU operates two loops. A secondary loop circulates filtered coolant between the CDU and the cold plates inside the servers. A primary loop connects to the facility's chilled water supply or dry coolers outside the building. The CDU transfers heat from the server loop to the facility loop, then sends cooled fluid back to the cold plates.

CDUs come in two types. Liquid-to-liquid (L2L) CDUs connect to a facility chilled water plant, the standard for large deployments. Liquid-to-air (L2A) CDUs reject heat to air through built-in fans, useful for smaller installations or sites without chilled water infrastructure.

Direct-to-chip cooling: two-loop architecture

The CDU isolates server coolant from the facility water supply.

A second approach, immersion cooling, submerges the entire server in dielectric fluid (a non-conductive liquid).

	Direct-to-chip (D2C)	Immersion
How it works	Cold plates on GPUs/CPUs, coolant loops to CDU	Server submerged in dielectric fluid. Single-phase keeps the fluid liquid throughout. Two-phase lets the fluid evaporate at the hot surface and condense on a cooler surface above, transferring more heat per cycle.
Heat captured	Up to 98% of system heat through liquid^{[9]Supermicro, "Supermicro Ramps Full Production of NVIDIA Blackwell Rack-Scale Solutions" (2025)https://www.supermicro.com/en/pressreleases/supermicro-ramps-full-nvidia-blackwell-rack-scale-solutions-nvidia-hgx-b200}	100% (all components submerged)
PUE (Power Usage Effectiveness, total facility power divided by IT equipment power)	~1.15^{[8]DCPulse, "How Rack Power Impacts PUE in AI Data Centers" (2025)https://dcpulse.com/article/the-density-dividend-how-rack-power-impacts-pue-efficiency}	1.03-1.08^{[10]IDTechEx, "Two-Phase Cold Plate Cooling Will Take Off as Early as 2026-2027" (2025)https://www.idtechex.com/en/research-article/two-phase-cold-plate-cooling-will-take-off-as-early-as-2026-2027/34068}
Server compatibility	Standard chassis with cold plate retrofit	Purpose-built tanks and enclosures
Maturity	Production standard for Blackwell^{[9]Supermicro, "Supermicro Ramps Full Production of NVIDIA Blackwell Rack-Scale Solutions" (2025)https://www.supermicro.com/en/pressreleases/supermicro-ramps-full-nvidia-blackwell-rack-scale-solutions-nvidia-hgx-b200}	Niche; scaling expected 2026-2027^{[10]IDTechEx, "Two-Phase Cold Plate Cooling Will Take Off as Early as 2026-2027" (2025)https://www.idtechex.com/en/research-article/two-phase-cold-plate-cooling-will-take-off-as-early-as-2026-2027/34068}

Direct-to-chip is the production standard for Blackwell-class hardware. Supermicro, Dell, and NVIDIA's own GB200 NVL72 all use D2C.^{[9]Supermicro, "Supermicro Ramps Full Production of NVIDIA Blackwell Rack-Scale Solutions" (2025)https://www.supermicro.com/en/pressreleases/supermicro-ramps-full-nvidia-blackwell-rack-scale-solutions-nvidia-hgx-b200}^{[11]Dell Technologies, "PowerEdge XE9680L Spec Sheet" (2025)https://www.delltechnologies.com/asset/en-us/products/servers/technical-support/poweredge-xe9680l-spec-sheet.pdf}^{[3]NVIDIA, "GB200 NVL72" (accessed March 2026)https://www.nvidia.com/en-us/data-center/gb200-nvl72/} IDTechEx projects two-phase immersion will begin scaling in 2026-2027 as GPU TDPs push past the limits of single-phase systems.^{[10]IDTechEx, "Two-Phase Cold Plate Cooling Will Take Off as Early as 2026-2027" (2025)https://www.idtechex.com/en/research-article/two-phase-cold-plate-cooling-will-take-off-as-early-as-2026-2027/34068}

Air vs liquid tradeoffs

	Air cooling	Liquid cooling (D2C)
Max rack density	25-40 kW	80-250+ kW
Facility requirements	Raised floors or containment, CRAC/CRAH units	CDU per rack or row, piping, chilled water or dry coolers, leak detection
Maintenance	Low: replace fans, clean filters	Higher: trained technicians, coolant management, pump servicing
Upfront cost	Lower	Higher (CDUs, piping, plumbing)
Energy cost at density	Higher (fans scale with cube of speed)	Lower (PUE advantage compounds over time)
GPU thermal headroom	Limited at high TDP	Better: lower junction temps, sustained boost clocks

Sources: Uptime Institute (2025)^{[6]Uptime Institute, "Global Data Center Survey 2025" (2025)https://intelligence.uptimeinstitute.com/resource/uptime-institute-global-data-center-survey-2025}, DCPulse^{[8]DCPulse, "How Rack Power Impacts PUE in AI Data Centers" (2025)https://dcpulse.com/article/the-density-dividend-how-rack-power-impacts-pue-efficiency}, Supermicro^{[9]Supermicro, "Supermicro Ramps Full Production of NVIDIA Blackwell Rack-Scale Solutions" (2025)https://www.supermicro.com/en/pressreleases/supermicro-ramps-full-nvidia-blackwell-rack-scale-solutions-nvidia-hgx-b200}

The cost trade-off depends on scale. A single 8-GPU server with air cooling is cheaper to deploy: no plumbing, no CDU, no coolant management. At rack scale or higher, liquid cooling's density advantage changes the math. In Supermicro's published HGX B200 examples, the liquid-cooled design fits 8 systems and 64 GPUs in a 42U rack, while the air-cooled design fits 4 systems and 32 GPUs in a 42U rack.^{[9]Supermicro, "Supermicro Ramps Full Production of NVIDIA Blackwell Rack-Scale Solutions" (2025)https://www.supermicro.com/en/pressreleases/supermicro-ramps-full-nvidia-blackwell-rack-scale-solutions-nvidia-hgx-b200} The exact mix depends on the chassis and rack, but the pattern is consistent: liquid cooling buys density.

PUE is a multiplier on every watt of IT load. In a 10 MW IT deployment, PUE 1.8 means 18 MW total facility draw, 8 MW of it just cooling and power distribution. PUE 1.15 drops that overhead to 1.5 MW. The 6.5 MW difference costs about $4 million per year at $0.07/kWh.^{[8]DCPulse, "How Rack Power Impacts PUE in AI Data Centers" (2025)https://dcpulse.com/article/the-density-dividend-how-rack-power-impacts-pue-efficiency}

What the hardware dictates

The cooling method is not always a free choice. NVIDIA and OEM (Original Equipment Manufacturer) server designs sometimes make it for you. Server chassis are measured in rack units (U), where 1U equals 1.75 inches of vertical space. A standard rack is 42U tall.

System	Cooling	Form factor	Practical takeaway
GB200 NVL72	Liquid only	Full rack	Requires facility liquid cooling infrastructure
HGX B200 (liquid)	Liquid	4U per node	2x GPU density vs air-cooled HGX B200
HGX B200 (air)	Air	10U per node	No plumbing or CDU needed
DGX B200	Air	10U per node	NVIDIA turnkey; no OEM customization
Dell XE9680 (H100/H200)	Air	6U per node	Fits Hopper-era TDP with air only
Dell XE9680L	Liquid	4U per node	Same baseboard as XE9680 in less space

Sources: NVIDIA^{[2]NVIDIA, "DGX B200 User Guide" (2025)https://docs.nvidia.com/dgx/dgxb200-user-guide/introduction-to-dgxb200.html}^{[3]NVIDIA, "GB200 NVL72" (accessed March 2026)https://www.nvidia.com/en-us/data-center/gb200-nvl72/}, Supermicro^{[9]Supermicro, "Supermicro Ramps Full Production of NVIDIA Blackwell Rack-Scale Solutions" (2025)https://www.supermicro.com/en/pressreleases/supermicro-ramps-full-nvidia-blackwell-rack-scale-solutions-nvidia-hgx-b200}, Dell^{[11]Dell Technologies, "PowerEdge XE9680L Spec Sheet" (2025)https://www.delltechnologies.com/asset/en-us/products/servers/technical-support/poweredge-xe9680l-spec-sheet.pdf}^{[12]Dell Technologies, "PowerEdge XE9680 Spec Sheet" (2024)https://www.dell.com/en-us/shop/ipovw/poweredge-xe9680}

The form factor difference comes from the cooling hardware itself. An air-cooled server needs tall heatsinks and rows of high-speed fans to force enough airflow across the GPUs. Those components take physical space, which is why Supermicro's air-cooled HGX B200 is a 10U chassis. Replace the heatsinks and fans with compact cold plates and manifold tubing, and the same baseboard fits in 4U.^{[9]Supermicro, "Supermicro Ramps Full Production of NVIDIA Blackwell Rack-Scale Solutions" (2025)https://www.supermicro.com/en/pressreleases/supermicro-ramps-full-nvidia-blackwell-rack-scale-solutions-nvidia-hgx-b200}

Within the same HGX B200 product family, cooling changes the density ceiling. Supermicro's published examples show 32 GPUs in its air-cooled rack design and 64 GPUs in its liquid-cooled rack design.^{[9]Supermicro, "Supermicro Ramps Full Production of NVIDIA Blackwell Rack-Scale Solutions" (2025)https://www.supermicro.com/en/pressreleases/supermicro-ramps-full-nvidia-blackwell-rack-scale-solutions-nvidia-hgx-b200}

For GB200 NVL72 deployments, the data center must have liquid cooling infrastructure in place before the hardware arrives. Retrofitting an air-cooled facility means adding CDUs, running pipes, and potentially upgrading the building's chilled water capacity. That work takes months.

References

Frequently Asked Questions

What is TDP and what happens if GPU cooling cannot keep up?

How much power does a GB200 NVL72 rack draw compared to typical data center racks?

The GB200 NVL72 packs 72 Blackwell GPUs into one rack at 120-132 kW total. As of early 2026, most data centers run 10-30 kW per rack, and few exceed 30 kW. A single NVL72 rack draws 4-12x that.

What is direct-to-chip cooling versus immersion cooling?

Direct-to-chip (D2C) cooling handles the hottest components, GPUs and CPUs, with liquid while fans still cool lower-power parts like memory DIMMs and NVMe drives. Immersion cooling submerges the entire server in dielectric fluid, a non-conductive liquid.

How does liquid cooling change rack density for HGX B200?

In Supermicro's published HGX B200 examples, the liquid-cooled design fits 8 systems and 64 GPUs in a 42U rack, while the air-cooled design fits 4 systems and 32 GPUs. The pattern is consistent: liquid cooling doubles density.

Bridging GPU operators and financing partners

We help emerging neoclouds find financing partners, and help financing partners enhance story credit with GPU collateral management and residual value insurance solutions.

Learn how it works →

ShareLinkedIn X Facebook