The Power Budget of an AI Data Center

Q: How much power does a 100,000 GPU data center need?

About 171 MW from the grid. The GPUs themselves draw 70 MW, but server overhead (CPUs, memory, fans) adds 37 MW, networking adds 15 MW, and facility overhead (cooling and power delivery) adds another 49 MW. GPUs account for only about 40% of total facility power.

Q: What percentage of data center power goes to cooling?

In traditional air-cooled data centers, cooling consumes 25-40% of total facility power. Liquid cooling, which runs coolant directly through metal plates on each chip, reduces this to below 15%. At AI rack densities of 120 kW per rack, air cooling cannot remove heat fast enough, which is why NVIDIA GB200 NVL72 racks ship liquid-cooled only.

Q: What is PUE and what is the industry average?

PUE (power usage effectiveness) is the ratio of total facility power to IT equipment power. A PUE of 1.0 would mean all power reaches IT equipment with zero overhead. The industry average PUE in 2025 was 1.54 (Uptime Institute). Hyperscalers do much better: Google at 1.09, Meta at 1.08, AWS at 1.15, and Microsoft at 1.16.

Mar 27, 2026·Brenden Reeves, COO·AC Research

A watt is a unit of power, how much energy something uses each second. A phone charger uses about 20 watts, a microwave about 1,000. A single AI training GPU draws 700 to over 2,000 watts depending on the generation, and instead of running for two minutes like a microwave, it runs 24/7 for weeks. A facility housing 100,000 of them needs roughly 200 megawatts (enough to power a small city) from the grid^[1], but a large share of that power never reaches the GPUs. It goes to supporting hardware, cooling, and power delivery.

What a single GPU draws

GPU power is measured in TDP (thermal design power), the maximum sustained wattage the chip draws when working as hard as it can. Training runs hold GPUs at 70-90% of their TDP for weeks or months at a time. Most traditional servers average only 20-50% of their rated power.^[2]

NVIDIA releases a new generation of data center GPUs roughly every two years, and each generation draws significantly more power. The A100 (2020) drew 400W (watts). The H100 (2022) draws 700W. The B200 (2024) draws 1,000W in its air-cooled configuration and up to 1,200W when liquid-cooled.^[3]^[4] The B300 draws 1,400W.^[5] NVIDIA's next architecture, Vera Rubin, targets 2,300W per GPU.^[6]^[7]

NVIDIA training GPU TDP across generations [3][5][6][7]. Vera Rubin is a roadmap estimate.

AMD's competing Instinct line follows the same curve. The MI300X draws 750W.^[8] The MI325X draws 1,000W.^[9] The MI355X draws 1,400W.^[10]

This matters for facilities, not just chips. A facility sized for A100 power loads may not have the cooling or electrical capacity for B200s without upgrades. The power and cooling infrastructure needs to handle roughly 2.5x the load per GPU slot. Many facilities built before 2023 require retrofit or replacement to support current-generation hardware.

What a server adds on top

A GPU does not run by itself. It sits inside a server alongside CPUs, system memory, network adapters, and power supplies. All of these draw power.

The NVIDIA DGX H100 is a purpose-built AI server that holds eight H100 GPUs. Eight H100 GPUs at 700W each draw 5,600W. The full server draws 10,200W.^[11] The remaining 4,600W, nearly half the total, goes to everything else: CPUs, memory, chips that handle GPU-to-GPU and network communication, signal-boosting components, and the server's own fans and power supply conversion losses.^[1]

SemiAnalysis calculated the per-slot cost: each GPU slot in a DGX H100 consumes about 1,275W total, 700W for the GPU and 575W for the supporting infrastructure around it.^[1]

System	GPUs	GPU power	Total system power	Non-GPU overhead
DGX A100	8x A100	3,200W	~6,500W^[12]	~51%
DGX H100	8x H100	5,600W	10,200W^[11]	~45%
DGX B200	8x B200	8,000W	~14,300W^[13]	~44%
GB200 NVL72 (rack)	72x B200	72,000W	120,000W^[14]	~40%

The GB200 NVL72 breaks the server-level pattern. Instead of individual servers mounted in a rack, the entire rack is one integrated system: 72 Blackwell GPUs and 36 Grace CPUs (NVIDIA's own energy-efficient processor, replacing the Intel Xeon CPUs used in earlier DGX servers) spread across 18 trays, rated at 120 kW per rack.^[14]

At 1,000W per GPU (base TDP), about 40% of the rack's power goes to non-GPU components. That overhead percentage is lower than the DGX H100's 45% because the Grace CPUs use less power than the Xeon CPUs they replace, and the rack-scale design shares power delivery and cooling infrastructure instead of duplicating it across eight independent servers.

From server to cluster

A typical single-unit server running web or database workloads draws 300-500 watts. The AI systems in the previous section draw 20x to over 200x that.

Power per system, from a single enterprise server to the GB200 NVL72 rack-scale system [14]. The NVL72 is an integrated rack, not a collection of discrete servers.

Industry analysts project next-generation racks at roughly 600 kW, with future generations potentially reaching 1 MW per rack.^[15]

Not all of the facility's power reaches the servers. Cooling systems, power converters, and other facility infrastructure add overhead on top of the IT load. The industry measures this overhead as PUE (power usage effectiveness): the ratio of total facility power to IT equipment power. A PUE of 1.5 means the facility draws 50% more than the IT equipment alone, with the extra going to cooling and power delivery. A PUE of 1.0 would mean zero overhead, the ideal. Lower is better.

At cluster scale, the numbers get large quickly. 12,500 DGX H100 servers at 10.2 kW each is 127.5 MW of server power alone. Add networking equipment (the switches and cabling that connect racks together, plus management systems) at roughly 10-15% of the compute load^[1], and IT equipment power reaches about 145 MW. The facility's cooling and power delivery infrastructure then add another 30-40% on top, bringing the total to 190-200 MW from the grid.^[1]

Real facilities vary from the generic model above based on PUE targets and server configuration. xAI's Colossus cluster in Memphis launched with 100,000 H100 GPUs at roughly 150 MW, below the generic model's estimate because Colossus was built to aggressive efficiency targets, then doubled to 200,000 GPUs within months.^[16]^[17]^[18] The facilities table below shows how Colossus and other recent builds compare at scale.

Two components that barely register in the power budget: storage and lighting. Training clusters keep their working data in memory chips on the GPUs and on the server's motherboard. Storage drives (5-25W each) are a rounding error against the compute load. Lighting, security systems, and office space together account for less than 2-3% of facility power.^[19]

Where the overhead goes

Everything between the GPU and the utility meter is overhead. It falls into two main categories: cooling and power delivery.

Cooling

In a traditional air-cooled data center, cooling consumes 25-40% of total facility power.^[20] The physics work against density: fan power scales with the cube of airflow speed, so a 10% increase in airflow requires roughly 33% more fan energy. At 10 kW per rack, air cooling delivers a PUE of about 1.5. At 60 kW per rack, that can degrade to 1.7-2.0 depending on facility design. Above 100 kW per rack, air cooling alone becomes impractical.^[21]

Liquid cooling breaks this curve. Direct-to-chip liquid cooling holds a PUE of about 1.15 regardless of rack density, from 10 kW to 120 kW per rack.^[21] It works by running coolant through cold plates (metal blocks pressed against heat-generating chips) that capture the vast majority of system heat through the liquid loop.

The GB200 NVL72 rack has no air-cooled option. At 120 kW per rack, air cooling is physically impossible. The rack requires chilled water at 25-45 degrees C flowing through its cold plates.^[14] Not all Blackwell GPUs require liquid cooling (the B200 has an air-cooled option at 1,000W), but any facility deploying the NVL72 rack form factor needs a liquid cooling plant.

Immersion cooling, where servers are submerged in a non-conductive liquid called a dielectric fluid that can safely contact electronics, pushes PUE even lower (1.02-1.03 for single-phase immersion)^[22] but remains a niche deployment. Most operators choose direct-to-chip liquid cooling for its lower complexity and compatibility with standard rack sizes.

Both bars show the same 60 kW IT load. PUE values from Uptime Institute [21].

Power delivery

Power passes through six stages between the utility grid and the GPU chip. Most convert voltage; all lose some energy as heat, though the conversion stages account for nearly all the loss.^[23]

Stage	What it does	Loss
Transformer	Takes the high-voltage power line from the electric company and reduces it to a lower voltage (480V) the building can use.	0.5-2%
Switchgear	Acts like a smart circuit breaker panel. Routes power and automatically switches to backup if the main feed fails.	<0.1%
UPS	Uninterruptible power supply. Converts AC to DC to charge batteries, then back to AC to feed servers, keeping them running if power drops before generators start. That double conversion makes it the biggest facility-level loss. Some operators use eco-mode, which bypasses the double conversion during normal operation and only switches to battery when needed (99% efficient, but with slightly less protection).	4-8%^[23]
PDU	Power distribution unit. Splits power across individual server racks, like a power strip for an entire row of equipment.	1-3%
PSU	Power supply unit. Converts AC to DC inside each server. Rated by 80 Plus tiers: Bronze (85%) was standard a decade ago, AI servers now use Titanium (95.4%), and a Ruby tier (96.5%) launched in 2025.^[24]	4-6%
VRM	Voltage regulator module. Final step on the motherboard, dropping voltage to the ~0.8V each GPU chip needs. Its losses count as IT load, not facility overhead, so they are easy to overlook. For a 1,000W GPU, the VRM dissipates 80-150W as heat.^[25]	8-15%

AC (alternating current) is what the power grid delivers. DC (direct current) is what chips run on. Much of the power delivery chain is converting from one to the other.

The GB200 NVL72 takes this further. Instead of the traditional AC distribution chain, each rack converts 480V three-phase AC (the standard form of high-power electricity delivery) directly to ~50V DC at 97% efficiency, then distributes DC to the compute trays.^[25] This consolidates multiple conversion stages into fewer, more efficient ones. The total power conversion waste heat on a 120 kW rack is about 3.6 kW, or 3% of the rack's draw.

Redundancy

Mission-critical data centers duplicate their power infrastructure so that no single failure causes downtime. This redundancy has a direct cost in power overhead because backup equipment still draws energy even when idle.

Three redundancy levels are common. N+1 adds one spare component beyond the minimum needed (e.g. three UPS units when two would suffice). 2N fully duplicates the entire power path: two independent chains of transformers, UPS units, and PDUs, each capable of carrying the full load alone. 2(N+1) provides two independent paths with an extra spare in each, the highest level of resilience.

Data center tiers, defined by the Uptime Institute's Tier Classification, map to these levels.^[26] Tier I has no redundancy. Tier II adds N+1 for power components. Tier III requires N+1 with a redundant distribution path so any component can be maintained without taking the facility offline. Tier IV requires 2N or 2(N+1) with two simultaneously active distribution paths and automatic fault tolerance.

Redundancy affects efficiency because UPS units are less efficient at partial load. In an N+1 design, each UPS runs at roughly 80% capacity, near its efficiency peak. In a 2N design, each unit runs at about 50% capacity, where double-conversion efficiency drops by 2-4 percentage points. Hyperscalers (the largest operators, like Google, Amazon, Meta, and Microsoft) minimize this penalty by sizing UPS capacity close to actual load and using modular UPS systems that add capacity incrementally rather than installing large units that run half-empty.

UPS technologies

Not all UPS systems work the same way. Three technologies dominate, each with different tradeoffs in runtime, efficiency, and maintenance.

Technology	How it works	Runtime	Efficiency
Battery (lead-acid / Li-ion)	Batteries store energy and release it when grid power fails	5-15 minutes	92-96%
Flywheel	Spinning rotor stores kinetic energy, drives generator on failure	15-30 seconds	97-98%
DRUPS	Diesel engine + flywheel + motor-generator in one unit	Unlimited (fuel)	97-98%

Battery UPS is the most common. Traditional valve-regulated lead-acid (VRLA) batteries require replacement every 3-5 years. Lithium-ion cells last 10-15 years and take up roughly a quarter of the space, but cost 2-3x more upfront. Google has deployed over 100 million lithium-ion cells across its fleet, using distributed battery backup at the rack level rather than a centralized facility UPS.^[27]

Flywheel UPS stores energy in a spinning rotor instead of batteries. Its 15-30 seconds of runtime is only enough to bridge the gap until diesel generators start. The tradeoff: higher efficiency and no battery replacements, but very short standalone runtime.

DRUPS (diesel rotary UPS) combines a diesel engine, flywheel, and motor-generator into a single unit with no batteries at all. The flywheel provides instant bridging power while the diesel engine spins up. Efficiency reaches 97-98%, and runtime is limited only by fuel supply.^[28] DRUPS is predominantly deployed in European data centers; US operators mostly use battery-based systems.

Measuring efficiency

PUE: power usage effectiveness

The industry average PUE in 2025 was 1.54, according to the Uptime Institute's annual global survey. That number has barely moved in a decade. It dropped sharply from 2.5 in 2007 to about 1.65 by 2014, driven by basic improvements like separating hot and cold airflows, running cooling at higher temperatures, and packing more workloads onto fewer servers. Since then, improvement has stalled. The most recent six survey years all land in the 1.54-1.59 band.^[29] The average is skewed by aging facilities; half the data centers in Uptime's survey are over 11 years old.^[2] New hyperscale builds operate far below the average.

Operator	PUE (latest)	Source
Google	1.09 (fleet average), 1.04 (best campus, Ohio)^[30]	Google Data Centers (2024)
Meta	1.08^[31]	Meta 2024 Sustainability Report
AWS	1.15^[32]	Amazon 2024 Sustainability Report
Microsoft	1.16^[33]	Microsoft 2024 Environmental Sustainability Report
Industry average	1.54^[29]	Uptime Institute Global Survey (2025)

Google's fleet-wide PUE of 1.09 means that for every 100W of IT power, only 9W goes to overhead.^[30] At the industry average of 1.54, that overhead jumps to 54W. On a 100 MW campus, that gap (9 MW vs. 54 MW of overhead) adds up to roughly $28 million per year in wasted electricity.

PUE is becoming less useful as a metric for AI facilities. As rack power climbs from 10 kW to 120 kW, the IT load becomes so dominant that PUE naturally drops toward 1.0, even without real efficiency improvements in cooling or power delivery. A facility with poor cooling engineering but extremely high-density racks can still post a low PUE. NVIDIA has publicly stated that PUE is an ineffective metric for AI workloads and has called for a replacement.^[34]

PUE also says nothing about water consumption or carbon emissions. The Green Grid, the organization that created PUE, introduced two companion metrics to fill those gaps.

WUE: water usage effectiveness

WUE measures the liters of water a facility consumes per kilowatt-hour of IT energy. A WUE of 1.8 means 1.8 liters of water used for every kilowatt-hour of computing.^[35] Cooling towers, which spray water and let it evaporate to carry away heat, are the biggest reason facilities use water. The Green Grid estimated an industry-average WUE of 1.8 L/kWh in 2011, the most recent broad estimate available. Hyperscalers have since driven that number down significantly. At 1.8 L/kWh, a 100 MW facility would consume over 4 million liters of water per day, roughly 1.5 Olympic swimming pools.

WUE and PUE often pull in opposite directions. Evaporating water to cool uses less electricity (better PUE) but consumes more water (worse WUE). Switching to refrigeration-based or direct liquid cooling reduces water use but may increase electricity consumption. This is why PUE alone is insufficient: a facility can post a PUE of 1.1 while consuming millions of liters of water daily.

Operator	WUE (L/kWh)	Year
AWS	0.15^[36]	2024
Meta	0.18^[31]	2023 data
Microsoft	0.30^[37]	FY2024
Industry average	~1.8 (likely lower today)^[35]	2011 est.

Microsoft has announced that its next-generation data centers will consume zero water for cooling, avoiding over 125 million liters per year per facility.^[37] Google, Meta, Amazon, and Microsoft have all committed to becoming “water positive” (replenishing more water than they consume) by 2030. As of 2024 reporting, all four have active replenishment programs, but none have reached net-positive status yet.^[30]^[31]^[36]^[37]

CUE: carbon usage effectiveness

CUE measures kilograms of carbon emissions per kilowatt-hour of IT energy.^[38] Unlike PUE and WUE, which reflect facility design choices, CUE depends almost entirely on where the facility gets its electricity. Multiply the grid's carbon intensity (the grams of CO₂ emitted per kilowatt-hour of electricity generated) by PUE and you get CUE. A facility running on nuclear power (very low carbon) with a PUE of 1.3 scores a CUE of 0.016. The same facility on the US average grid scores 0.48, roughly 30x worse.^[39]

Most hyperscalers report carbon using “market-based” accounting, where they buy renewable energy credits (certificates proving that clean energy was generated somewhere on their behalf) to offset their usage. Under this method, a company can claim near-zero emissions even if the local grid still runs on fossil fuels. “Location-based” accounting, which measures the actual carbon intensity of the regional grid, tells a different story: Google, Microsoft, and Amazon have all seen location-based emissions rise in recent years as AI workloads have grown faster than new clean energy supply.^[33]

The full picture

Epoch AI modeled the full power chain from GPU to facility wall using NVIDIA's GB200 NVL72 server design and Uptime Institute PUE data.^[40] Their model finds that GPUs consume about 40% of total facility power at peak operation. Each stage of the chain multiplies the load by a fixed factor: 1.53x at the server level (CPUs, memory, networking chips, fans, and power conversion losses), then 1.14x for shared IT equipment (network switches, management servers, storage), then 1.4x for facility overhead (cooling, power delivery, and lighting). Chained together (1 × 1.53 × 1.14 × 1.4), every 1 watt of GPU load becomes 2.44 watts at the meter.^[40]

Concretely, applied to 70 MW of GPU power (roughly 100,000 H100-class GPUs at 700W each):

Illustrative power multiplier stack for 70 MW of GPU power. Multipliers from Epoch AI [40], derived from GB200 NVL72 reference architecture.

For comparison, several facilities under construction in early 2026:

Facility	Power	Status
xAI Colossus, Memphis	150-250 MW (100K→200K GPUs)^[16]^[17]^[18]	Operating
Stargate (OpenAI/SoftBank/Oracle)	Up to 10 GW across multiple sites^[41]	Announced January 2025
Meta total fleet (2023)	~1.7 GW continuous (14,975 GWh/yr)^[31]	Operating

US data centers consumed 176 TWh of electricity in 2023, about 4.4% of total US electricity consumption.^[42] The Lawrence Berkeley National Laboratory projects that figure could reach 6.7-12% by 2028, driven largely by AI workloads.^[42]

References

Frequently Asked Questions

How much power does a single AI GPU draw?

GPU power is measured in TDP (thermal design power), the maximum watts a chip draws under full load. The NVIDIA H100 draws 700W, the B200 draws 1,000W, and the B300 draws 1,400W. AI training keeps GPUs near peak utilization for weeks or months, so TDP is the number that matters for planning power infrastructure.

How much power does a 100,000 GPU data center need?

About 171 MW from the grid. The GPUs themselves draw 70 MW, but server overhead (CPUs, memory, fans) adds 37 MW, networking adds 15 MW, and facility overhead (cooling and power delivery) adds another 49 MW. GPUs account for only about 40% of total facility power.

What percentage of data center power goes to cooling?

In traditional air-cooled data centers, cooling consumes 25-40% of total facility power. Liquid cooling, which runs coolant directly through metal plates on each chip, reduces this to below 15%. At AI rack densities of 120 kW per rack, air cooling cannot remove heat fast enough, which is why NVIDIA GB200 NVL72 racks ship liquid-cooled only.

What is PUE and what is the industry average?

PUE (power usage effectiveness) is the ratio of total facility power to IT equipment power. A PUE of 1.0 would mean all power reaches IT equipment with zero overhead. The industry average PUE in 2025 was 1.54 (Uptime Institute). Hyperscalers do much better: Google at 1.09, Meta at 1.08, AWS at 1.15, and Microsoft at 1.16.

Bridging GPU operators and financing partners

We help emerging neoclouds find financing partners, and help financing partners enhance story credit with GPU collateral management and residual value insurance solutions.

Learn how it works →

ShareLinkedIn X Facebook