FeaturedTech Analysis14 min read

Best Liquid Cooling Solutions for AI Data Centers: 2026 Vendor Comparison & Cost Analysis

Air cooling hit its ceiling. With NVIDIA’s B200 GPUs pulling 1,200W each and next-generation racks exceeding 120kW, every hyperscaler and enterprise AI builder now faces the same question—which liquid cooling technology delivers the best balance of density, efficiency, and total cost of ownership? This guide compares the leading approaches and vendors shaping the market in 2026.

1. Why Liquid Cooling Is Now Mandatory for AI

The power trajectory of AI accelerators has outpaced every projection. NVIDIA’s H100 drew 700W. The B200, shipping through 2025 and 2026, consumes up to 1,000W under air cooling and 1,200W when liquid cooled—unlocking higher clock speeds and sustained throughput. The upcoming Vera Rubin Ultra generation, expected in 2027, is projected to exceed 1,800W per GPU.

These per-chip numbers compound at the rack level. Traditional air-cooled data center racks top out at roughly 35–41kW. NVIDIA’s GB200 NVL72 system—the reference architecture for large-scale AI training—demands 120kW per rack. The next-generation Vera Rubin NVL144 is designed for 370kW per rack, and Rubin Ultra configurations are targeting 600kW. No combination of fans, raised floors, and chilled air can bridge that gap.

The economics reinforce the physics. Liquid cooling delivers heat rejection efficiency that air simply cannot match, enabling facilities to run more compute per square foot while cutting energy costs by 30–50%. For operators building GPU clusters that cost tens of millions of dollars in hardware alone, cooling efficiency is no longer a facilities concern—it is a business-critical decision that directly affects model training time, infrastructure ROI, and carbon commitments.

2. Cooling Technologies Compared

Four primary liquid cooling approaches compete in the AI data center market. Each carries distinct trade-offs across efficiency, density support, retrofit complexity, and cost.

Technology	PUE Range	Density Support	Best For
Air cooling	1.4–1.8	Up to ~35–41kW/rack	Legacy workloads, inference at moderate density
Direct-to-chip (DTC)	1.05–1.15	100–200kW/rack	GPU training clusters, brownfield retrofits
Single-phase immersion	1.03–1.08	100–250kW/tank	High-density greenfield, edge deployments
Two-phase immersion	1.01–1.02	200kW+/tank	Ultra-high-density AI, maximum efficiency

Direct-to-Chip (DTC) Cooling

DTC systems circulate coolant through cold plates mounted directly on processors, removing heat at the source while leaving the rest of the server air-cooled. This hybrid approach currently holds 47% of the AI datacenter liquid cooling market, driven by its compatibility with standard rack formats and relatively straightforward retrofit path. DTC is the baseline cooling method for NVIDIA’s GB200 NVL72 reference design.

Rear-Door Heat Exchangers (RDHx)

RDHx units attach to the back of standard racks and use water coils to capture exhaust heat before it enters the data hall. While they cannot support rack densities above 50–60kW on their own, they serve as an effective supplemental layer in facilities transitioning from pure air cooling.

Single-Phase Immersion

Servers are submerged in a dielectric fluid that absorbs heat through direct contact. The fluid is pumped to an external heat exchanger without changing phase. This eliminates fans entirely, reduces noise to near zero, and achieves PUE values between 1.03 and 1.08. The trade-off is that immersion tanks require purpose-built enclosures and careful fluid management.

Two-Phase Immersion

The most efficient approach: a low-boiling-point fluid absorbs heat, vaporizes on contact with hot components, rises to a condenser, and returns as liquid. The phase change enables extraordinary heat transfer rates with PUE as low as 1.01. The challenge is fluid cost, containment complexity, and the environmental profile of some engineered fluids.

3. Vendor Landscape: 7 Companies to Know

The liquid cooling market for AI data centers is consolidating around a handful of vendors with proven deployments at scale. Here are the seven most significant players in 2026.

CoolIT Systems — Direct-to-Chip Leader

CoolIT’s CHx2000 Coolant Distribution Unit (CDU) delivers up to 2MW of cooling capacity, supporting 12 NVL72 racks at 120kW each from a single unit. Their direct-to-chip cold plates are among the most widely deployed in hyperscale AI environments, with deep integration into OEM server platforms.

GRC (Green Revolution Cooling) — Modular Immersion

GRC pioneered modular single-phase immersion tanks designed for rapid deployment. Their partnership ecosystem now includes LG and SK Enmove, expanding manufacturing capacity and fluid supply chains for large-scale rollouts across Asia and North America.

Submer — High-Density Immersion at Scale

Submer’s SmartPod packs 100kW of cooling capacity into just 60 square feet, making it one of the most space-efficient immersion solutions available. Their landmark 1GW deal in India signals the scale at which immersion cooling is now being adopted for national AI infrastructure programs.

ZutaCore — Two-Phase Specialist

ZutaCore’s HyperCool system uses two-phase direct-to-chip evaporative cooling that handles chip TDPs exceeding 2,500W while requiring fewer than 4 gallons of dielectric fluid per server. Their partnership with automotive thermal specialist Valeo brings manufacturing scale and heat exchanger expertise from the EV industry.

Vertiv — Infrastructure-Scale Cooling Services

Vertiv brings global services reach and 2MW CDU units purpose-built for AI workloads. Their advantage lies in end-to-end facility integration—combining liquid cooling with power distribution, monitoring, and thermal management under a single vendor relationship.

Schneider Electric — NVIDIA Partnership

Schneider Electric partnered directly with NVIDIA to develop reference cooling architectures for the GB200 NVL72 platform, with plans to support rack densities up to 300kW. Their integrated approach combines EcoStruxure management software with liquid cooling hardware for unified facility orchestration.

Asetek — Direct-to-Chip Pioneer

Asetek helped establish the direct-to-chip cooling category and continues to push the technology forward. Their AI-Optimized Cold Plate, developed in collaboration with Fabric8Labs using advanced 3D metal printing, achieves higher thermal conductivity through micro-channel geometries impossible with traditional manufacturing.

4. Cost Analysis: CAPEX, PUE & TCO

Liquid cooling requires higher upfront capital expenditure than air cooling, but the operational savings—driven by lower PUE and reduced facility footprint—typically produce a favorable total cost of ownership within 18 to 36 months.

Cost Factor	Air Cooling	Direct-to-Chip	Immersion
PUE	1.4–1.8	1.05–1.15	1.03–1.08
Per-rack cooling CAPEX (NVL72)	N/A (cannot support)	~$50K	~$56K
Max rack density	35–41kW	100–200kW	200kW+
Facility footprint	Baseline	40–60% smaller	50–70% smaller

10MW Facility Savings Model

Consider a 10MW AI training facility. At a PUE of 1.5 (typical air cooling), the facility consumes 15MW total—5MW lost to cooling overhead. Switching to direct-to-chip cooling at PUE 1.10 reduces total consumption to 11MW, saving 4MW of continuous power draw. At $0.08/kWh, that translates to roughly $2.8 million in annual energy savings. Over a five-year GPU refresh cycle, the cumulative savings far exceed the incremental CAPEX of the liquid cooling infrastructure.

Immersion cooling pushes savings further. At PUE 1.05, the same 10MW facility draws just 10.5MW total, saving 4.5MW versus air cooling. The additional savings come at the cost of more specialized infrastructure, longer deployment timelines, and higher per-rack CAPEX. For greenfield deployments where the facility is designed around immersion from the start, the economic case is increasingly compelling.

Break-Even Timeline

Most operators report break-even on liquid cooling investments within 18 to 36 months, depending on local power costs, density requirements, and whether the deployment is a retrofit or greenfield build. In regions with electricity costs above $0.10/kWh—including much of Europe and parts of California—payback can occur in under 18 months.

5. Real-World Deployments: Microsoft, Google & Meta

The three largest AI infrastructure operators have each adopted liquid cooling, but with different strategies reflecting their unique hardware stacks and facility footprints.

Microsoft — Maia 100 & Fairwater Cooling

Microsoft developed its proprietary Maia 100 AI accelerator alongside a custom cooling system called Fairwater, which uses direct-to-chip liquid cooling with cold plates designed specifically for the Maia silicon. The system also supports NVIDIA GPU racks, including the Maia 200 next-generation chip. By co-designing the chip and cooling together, Microsoft optimizes thermal interfaces and reduces the performance penalty that off-the-shelf cooling solutions impose.

Google — TPU Pods at Scale

Google has deployed liquid cooling across more than 2,000 TPU Pods, making it one of the largest liquid-cooled AI deployments in production. Their latest Ironwood TPU generation is designed from the ground up for liquid cooling, with thermal design points that assume direct-to-chip heat rejection. Google’s approach emphasizes facility-level integration, using waste heat recovery to improve overall campus energy efficiency.

Meta — AALC and Catalina AI Pods

Meta’s Assisted Air Liquid Cooling (AALC) system represents a hybrid approach, combining rear-door heat exchangers with supplemental direct-to-chip cooling to retrofit existing facilities. For new builds, Meta’s Catalina AI Pod architecture is designed natively for liquid cooling at extreme density. With a reported $65 billion AI infrastructure expansion underway, Meta is simultaneously operating legacy air-cooled facilities, retrofitted AALC sites, and purpose-built liquid-cooled campuses.

6. Retrofit vs. Greenfield Considerations

The choice between retrofitting an existing facility and building new has significant implications for technology selection, timeline, and cost.

Brownfield Retrofit Challenges

Retrofitting an air-cooled data center for liquid cooling involves several constraints. Existing raised floors may not support the weight of filled immersion tanks. Piping runs must be added without disrupting live production environments. Electrical capacity, originally sized for 10–15kW racks, must be upgraded to support 100kW+ densities—a change that often requires new switchgear, bus bars, and utility interconnections.

For brownfield sites, direct-to-chip cooling is typically the most practical path. DTC systems work within standard 19-inch rack form factors, require only the addition of CDUs and facility water loops, and can be deployed rack-by-rack without taking down the entire hall. This incremental approach lets operators phase in liquid cooling as they deploy new GPU hardware, avoiding a disruptive facility-wide conversion.

Greenfield Advantages

New construction removes most constraints. Facilities can be designed around immersion tanks from day one, with floor loading, drainage, fluid containment, and power distribution all purpose-built for high-density liquid cooling. Greenfield sites also enable waste heat recovery—routing the warm coolant return to district heating systems or absorption chillers—which is difficult to retrofit into existing buildings.

The trade-off is time. New data center construction typically takes 18 to 30 months from permitting to commissioning, while a DTC retrofit can be operational within 8 to 12 weeks. For operators racing to deploy AI capacity, the speed advantage of retrofits is often decisive.

7. Patent Activity & the IP Landscape

Liquid cooling for data centers has become one of the most active patent filing areas in enterprise technology. Patent filings related to data center cooling have increased by more than 50% since 2019, reflecting both the surge in R&D investment and the strategic importance vendors place on protecting their innovations.

Iceotope leads the pure-play cooling vendor space with over 215 patents covering precision immersion cooling, chassis-level liquid distribution, and thermal interface materials. Their portfolio creates significant freedom-to-operate considerations for competitors entering the immersion market.

Among the hyperscalers and chip makers, NVIDIA holds patents on GPU-specific cold plate designs and liquid cooling integration for its DGX and HGX platforms. Microsoft has filed extensively around its Fairwater system and two-phase immersion experiments. TSMC holds IP related to on-package cooling and advanced thermal interface materials for 3D-stacked chiplets, a technology increasingly relevant as AI accelerators adopt multi-die architectures.

For companies developing or deploying liquid cooling solutions, the patent landscape presents both risk and opportunity. A growing IP thicket means that freedom-to-operate analysis is essential before committing to a cooling architecture, and that the remaining patent term on key filings directly affects competitive dynamics.

Check Patent Remaining Life

Cooling technology patents are a growing IP area with filings accelerating year over year. Use our Patent Term Calculator to determine the remaining enforceable life of key cooling patents and assess competitive exposure.

Evaluate Cooling Technology Patents

As liquid cooling becomes standard infrastructure for AI, the underlying patents will shape vendor selection, licensing costs, and competitive positioning. Understand the IP landscape before you commit.

Open Patent Term Calculator

Sources

Selected primary or official reference materials used for this guide.

Disclaimer: This article is for educational and informational purposes only and does not constitute engineering, procurement, or legal advice. Cooling system specifications, pricing, and vendor capabilities are based on publicly available data as of the publication date and may change. Consult qualified professionals for facility-specific decisions.