Bare Metal for AI Compute

Mar 18, 2026·Bernie Margulies, CEO·AC Research

Bare metal means renting servers with no extra software between your workload and the hardware. For AI workloads, it's becoming the default. The managed-cloud premiums that made sense for web apps on AWS, Azure, and Google Cloud don't justify themselves when the workload is GPU heavy. AI coding tools are lowering the barrier to self-manage, so paying for pre-built cloud services matters less.

What bare metal means for AI compute

Bare metal is a physical server you rent with no software layer in between. You get the machine and install whatever software you want.

Cloud instances work differently. When you launch an instance on AWS or Azure, you typically get a slice of a physical machine shared with other tenants. A "hypervisor" sits between your code and the hardware, allocating resources between the tenants.

Bare metal has been common in high-performance computing for decades. Financial trading firms, scientific research labs, and game server operators all run on bare metal because their costs are mostly fixed over long time-frames, and they need predictable performance.

GPU workloads fit the same pattern. A training job runs at close to 80% GPU utilization for days or weeks. An inference cluster for a large AI platform runs around 50% continuously. You don't need to spin up 50 GPUs for only 10 minutes and turn them off.

Elastic scaling, the core value of cloud, matters less for AI training workloads. Of course, cloud providers do offer GPU instances. Amazon has the p5 line, Azure has ND, and Google has A3. These give you access to the GPUs using GPU passthrough. But they carry the cloud pricing premium:^{[1]AWS, "EC2 On-Demand Instance Pricing" (accessed March 18, 2026)https://aws.amazon.com/ec2/pricing/on-demand/}^{[2]IntuitionLabs, "H100 Rental Prices Compared Across 15+ Cloud Providers" (2026)https://intuitionlabs.ai/articles/h100-rental-prices-cloud-comparison}^{[3]Lambda, "AI Cloud Pricing" (accessed March 18, 2026)https://lambda.ai/service/gpu-cloud/pricing}^{[4]Vast.ai, "GPU Pricing — Live Marketplace Rates" (accessed March 18, 2026)https://vast.ai/pricing}

Provider (Q1 2026)	H100 on-demand (per GPU-hour)	Type
AWS	$3.93	Hyperscaler cloud
Lambda	$3.44	On-demand neocloud
Bare Metal Providers	$1.80-$2.00	Bare metal

How hosting became cloud

In the 1990s and early 2000s, running a website meant shared hosting. Multiple sites lived on one machine. One site's traffic spike could crash other sites on the machine.

Virtual private servers (VPS) fixed this by dividing a physical machine into separate virtual machines (VMs). You still rented over long time frames but were isolated from other tenants.

AWS launched EC2 in 2006 and changed the model. Rent "elastically" by the hour, instead of by the year. Scale up or down in minutes. On top of EC2, AWS made a killing with the services that followed: S3 for storage (2006), RDS for managed databases (2009), Lambda for serverless functions (2014). Each removed another thing you'd otherwise need an engineer to build and maintain.

Google Cloud Platform launched in 2008, and Microsoft Azure in 2010. As of Q4 2025, the three hold about 63% of a cloud infrastructure market that generated $119 billion in a single quarter: AWS at 28%, Azure at 21%, Google Cloud at 14%.^{[5]Synergy Research Group, "GenAI Helps Drive Quarterly Cloud Revenues to $119 Billion" (Q4 2025)https://www.srgresearch.com/articles/genai-helps-drive-quarterly-cloud-revenues-to-119-billion-as-growth-rate-jumped-yet-again-in-q4}

Why this won't play out like AWS

AWS won on brand and developer experience, both built on the back of 200+ services shipped over 18 years. Each service doubled as a switching cost. the more you used, the more expensive it became to leave. The cost of recreating each service you left behind was too expensive.

But it's different for AI, because GPU workloads are portable. The same PyTorch training script runs on any provider: AWS, CoreWeave, or self-hosted bare metal. Install NVIDIA's CUDA, do some setup, and the code runs. GPU users only need 5-10 services, and free open-source alternatives exist for all of them:

Orchestration: Kubernetes
Monitoring: Prometheus + Grafana
Object storage: MinIO
Job scheduling: Slurm, Ray
Container registry: Harbor

AI coding tools change the build-vs-buy math

The strongest argument for the cloud premium was engineering cost. Building your own monitoring stack, deployment pipeline, and orchestration layer used to take a team of 3-4 engineers several months. Maintenance and upgrades were also a headache.

AI coding has compressed that. As of early 2026, a single AI-assisted engineer can build a custom deployment pipeline or monitoring dashboard in 1-2 weeks. As more teams build on bare metal, open-source tooling improves, which makes it easier for the next team.

GPU utilization make bare metal only impractical

Web workloads are bursty. Cloud lets you scale up for peak traffic and scale down when it passes. Cloud providers charge a premium partly to absorb idle-time risk.

This doesn't apply to AI training, which has a much higher and stable utilization rate, closer to 80%. But for AI inference workloads, the bursty nature of web workloads still applies.^{[6]Reiss, C. et al., "Heterogeneity and Dynamicity of Clouds at Scale: Google Trace Analysis," SoCC '12 (2012)https://www.pdl.cmu.edu/PDL-FTP/CloudComputing/googletrace-socc2012_abs.shtml}^{[7]Weng, Q. et al., "MLaaS in the Wild: Workload Analysis and Scheduling in Large-Scale Heterogeneous GPU Clusters," NSDI '22 (2022)https://www.usenix.org/conference/nsdi22/presentation/weng}

Estimates based on approximate digitizations of hourly patterns. CPU shape from Reiss et al., SoCC '12, Fig. 1 (production-priority CPU usage by hour of day, peak-to-mean ~1.3).^[6] Inference shape from Weng et al., NSDI '22, Fig. 8 (July–August 2020 trace, 6,500+ GPUs).^[7] Training utilization (~80%) is an industry assumption.

For small AI companies, using an on-demand neocloud is likely the best option. For medium-sized AI companies, the optimal path is likely to pay for both bare metal and an on-demand neocloud. The bare metal provider for the core of inference workloads, and during heavier periods, you would rely on an on-demand neocloud provider. For large AI companies, the optimal path is to likely use mostly bare metal, or even your own data center if you have the resources.

GPU compute is commoditizing toward bare metal

When the product is less about services, and more about hardware, the lowest-cost provider wins. And bare metal is the lowest-cost form factor.

The market was already following this pattern before GPU compute. Cloudflare unbundled CDN from AWS CloudFront. Backblaze B2 unbundled object storage from S3. Each time a cloud feature could stand on its own, someone extracted it and sold it cheaper.

GPU compute is going the same way: CoreWeave, Lambda, and Crusoe offer GPU instances without the full cloud services stack, and their pricing reflects the thinner margin. And because most buyers are building GPU workloads from scratch today, there's nothing to migrate and therefore no switching cost.

Despite being a "commodity", not all bare metal is interchangeable

Each provider will have differences in: OEM (server manufacturer), cooling, power, inter-node networking, and physical proximity. Each will affect workload performance.

For example, a bare metal provider using 8x H100 SXM systems from Dell has different motherboards, power supplies, and cooling than the equivalent from Supermicro. A provider's choice of OEM will lead to different reliability and thermal standards.

The market will be more fragmented

GPU compute will be more fragmented than traditional cloud. The market is splitting into segments, each serving different buyer needs:

Segment	Examples	Buyers	Trade-off
Hyperscaler cloud	AWS, Azure, GCP	Enterprises with compliance requirements, existing cloud contracts	Pay premium for managed everything
Neocloud	CoreWeave, Lambda, Crusoe, Together	AI companies wanting GPU-optimized infrastructure	Lower cost, fewer managed services
Bare metal	Equinix Metal, Vultr, Hetzner, OVHcloud	Teams with infrastructure skills or using AI tools to self-manage	Lowest cost per GPU-hour, more operational responsibility
Self-operated	OpenAI, Meta, Google DeepMind, xAI	Frontier labs at massive scale	Full control, large capital commitment

References

Frequently Asked Questions

What does bare metal mean for AI compute?

Bare metal means renting servers with no extra software between your workload and the hardware. You get the machine and install whatever software you want. For AI workloads, it's becoming the default.

How much does an H100 cost per GPU-hour on AWS versus bare metal?

In Q1 2026, AWS charges $3.93 per GPU-hour on demand for an H100. Lambda charges $3.44. Bare metal providers charge $1.80-$2.00 per GPU-hour.

Why won't AI GPU compute play out like AWS?

AWS won on brand and developer experience, both built on the back of 200+ services shipped over 18 years. But GPU workloads are portable. The same PyTorch training script runs on any provider: AWS, CoreWeave, or self-hosted bare metal.

How do AI coding tools change build versus buy for the cloud?

Bridging GPU operators and financing partners

We help emerging neoclouds find financing partners, and help financing partners enhance story credit with GPU collateral management and residual value insurance solutions.

Learn how it works →

ShareLinkedIn X Facebook