China’s AI – As the 21st-century Analogue of the Manhattan Project!

Sky Division – You already defined the next question: how can we improve AI efficiency to reduce power consumption? You mentioned trading memory for flops, silicon-photonics interconnects, near-memory compute, analog or neuromorphic accelerators for embeddings and retrieval etc etc. But also what are then the alternatives to building new nuclear reactors for powering data centers (lets assume we agree, and we allocate resurces, it’s so hard to get there but lets say so, Schmidt likes to think “green”, he mentioned Canada’s hydropower but unfortunately, according to him, the political “mood” isn’t there ?

“…data centers can bid interruptible loads into capacity markets, freeing up 5-15 % of regional supply during peak hours. Google already participates in California’s Demand Response Auction Mechanism…

Kimi AI – We can probably satisfy AI’s near-term growth without 90 GW of brand-new gigawatt-scale reactors, provided we (1) push aggressively on model/hardware efficiency, and (2) deploy a portfolio of SMRs, behind-the-meter nuclear, renewables-plus-storage, and retrofitted coal sites. The fastest lever is efficiency – every 2× improvement in perf/Watt is equivalent to cancelling a 1 GW power plant.
The two things you asked for – (A) the most promising efficiency levers that can cut AI power demand right now, and (B) practical alternatives to new large-scale nuclear reactors for feeding the data-center boom.
A – How to improve AI efficiency and cut electricity use per useful computation
Smaller, task-specific models
• Replace giant general-purpose LLMs with domain-tuned small models (e.g., 7-13 B parameters instead of 175 B) that are distilled, pruned and quantized. These routinely reach ≥95 % of the accuracy at < 20 % of the energy • Use mixture-of-experts (MoE) and early-exit networks so only the sub-network that is needed is activated on any given query. Hardware that delivers more FLOPS per watt • Neuromorphic / analog accelerators, optical matrix multipliers, and near-memory compute (processing in SRAM or HBM) can cut energy 10-100× for embeddings and retrieval • Power-capped GPUs/ASICs: IBM shows capping power at 85 % adds only 3 % latency but saves ~15 % energy . • Arm-based AI chips and latest-gen GPUs (e.g., H200, MI300) deliver 2-3× the perf/Watt of 2020 silicon Smarter training & inference scheduling • Dynamic workload shifting: run heavy training jobs only when local renewables are >80 % of the grid mix; Google and Microsoft already do this
• Early-stopping and checkpoint pruning during training avoids wasting GPU-hours on poor hyper-parameter settings
Data-center level tricks
• Two-phase immersion cooling or direct-to-chip liquid loops push PUE from ~1.5 to 1.05, lopping ~30 % off total energy
• Heat reuse: channel 40-60 °C waste heat to nearby district heating or desalination (Meta Denmark, Swiss Switch).
B – Alternatives to “90 new 1 GW reactors”
Small / micro reactors (SMRs & micro-SMRs)
• 20-300 MW modules that can be sited on existing data-center campuses and chained as demand grows. First commercial units (NuScale, Last Energy PWR-20, Nano Nuclear Zeus) are targeting 2028-2030
• Factory-built and truck-delivered: 2-3 year build time vs. 10-15 for gigawatt plants. No 10-mile evacuation zone because many designs use molten-salt or gas cooling
Behind-the-meter (BTM) colocation with existing nuclear
• AWS-Talen 960 MW campus at Susquehanna, PA and Microsoft-Three-Mile-Island 837 MW PPA show you can plug into spare capacity at operating or revived plants instead of building new ones
• Saves transmission build-out (~$1 B per GW) and delivers $35-40 / MWh firm power once contracts are locked
Renewables + long-duration storage hybrids
• Canadian hydro (Labrador, Québec, B.C.) offers 5-7 GW of expandable, dispatchable 24/7 power at 4 ¢/kWh; only missing piece is political will and new 765 kV lines
• Solar-wind-storage micro-grids on data-center sites: with 6–8 h of batteries or green hydrogen, CAPEX is already below $60 / MWh in high-resource regions (West Texas, New Mexico, Kansas)
Retrofit retired coal sites with nuclear
• 128-174 GW of coal plants in the U.S. have cooling water, grid interconnection, and community acceptance in place. Converting them to SMRs or advanced reactors can cut permitting time by half
Demand-side flexibility markets
• Data centers can bid interruptible loads into capacity markets, freeing up 5-15 % of regional supply during peak hours. Google already participates in California’s Demand Response Auction Mechanism.

Sky Division – Schmidt said [cit.]…we’re building zero, right? (he meant: nuclear powerplants) How are we going to get all that power? This is a major, major national issue. You can use the Arab world, which is busy building five to 10 gigawatts of data centers. India is considering a 10-gigawatt data center. To understand how big gigawatts are, is think cities per data center. That’s how much power these things need”[end of cit.]
Is this doable (I mean by renting other nation’s GW-s, will renting truly satisfy the “hunger for power” while at the same time they compete with each-other who wil get “somewhere” first”? Myself I don’t think so : )

“…the only realistic path is domestic build-out – a mix of SMRs, behind-the-meter renewables, retired-plant retrofits, and aggressive efficiency gains – not a global rental bazaar…

Kimi AI – Short answer: No, “renting” tens of gigawatts from other countries will not solve the AI-power crunch. The only realistic path is domestic build-out – a mix of SMRs, behind-the-meter renewables, retired-plant retrofits, and aggressive efficiency gains – not a global rental bazaar.
The global power-rental market is only ~US $22 B in 2024 and is dominated by diesel gensets in the 0.05–0.5 GW range – orders of magnitude too small and too unreliable for multi-gigawatt AI campuses
Meanwhile, the very nations Schmidt cites (UAE, Saudi Arabia, India, etc.) are simultaneously competing for the same 24/7 clean electrons – and they are building their own 5-10 GW data-center clusters at home. So cross-border “rental” would be geopolitical fiction long before it became an engineering plan.
Some numbers and the politics that make this a dead-end.
How big is the rentable global market?
• Entire global power-rental market ≈ 22 GW of installed diesel and gas gensets – but the largest single rental contract on record is smaller than 1 GW, typically for disaster relief or oil-field peak-shaving
• Even if every genset in the world were magically available and converted to natural-gas turbines, the fleet could not deliver more than ~30 GW continuous – and that would still be diesel-priced electricity (larger than $150/MWh), not smaller than $40/MWh that hyperscalers need to stay competitive.
The countries with spare gigawatt-scale, 24/7 clean power are not sellers-they are buyers.
• UAE & Saudi Arabia are commissioning 5 – 10 GW each for domestic AI clusters
• India’s announced 10 GW data-center corridor is sited next to its own planned renewables-plus-storage zones
• Canada’s undeveloped hydro potential (~5 – 7 GW) is locked behind provincial politics and indigenous-consent processes; no mechanism exists to “rent” it across borders at scale.
Transmission and geopolitics
• Moving 10 GW across oceans requires ±800 kV HVDC lines thousands of kilometres long – no such intercontinental cables exist, and lead times exceed 7 – 10 years.
• Even within continents, export restrictions on low-carbon power are rising: the EU Fit-for-55 package, the US IRA, and China’s dual-carbon goals all treat domestic clean electrons as strategic assets

“…unless we deliberately choose architectures and policies that break the ‘Grove giveth, Gates taketh away’ loop…

Sky Division – Schmidt “touched” also the issues you mentioned above – he said [cit.]“and the people look at it and they say, “well, there’s lots of algorithmic improvements, and you will need less power.” There’s an old rule, I’m old enough to remember, right? Grove giveth, Gates taketh away. OK, the hardware just gets faster and faster. The physicists are amazing. Just incredible what they’ve been able to do. And us software people, we just use it and use it and use it. And when you look at planning, at least in today’s algorithms, it’s back and forth and try this and that and just watch it yourself. There are estimates, and you know this from Andreessen Horowitz reports, it’s been well studied, that there’s an increase in at least a factor of 100, maybe a factor of 1000, in computation required just to do the kind of planning.”[ end of cit.]
Thus, he thinks “Grove giveth, Gates taketh away”, sounds quite skeptical.

Kimi AI – He’s saying “Yes, hardware keeps getting more efficient, but software keeps inventing ways to eat the savings – and then some.” In plain words: we are still on the wrong side of Jevons’ paradox. Every time the semiconductor engineers hand us a 10× efficiency gift, the algorithm designers promptly spend 100× of it on a bigger, more capable workload – especially in planning / reasoning / search-heavy systems that iterate internally thousands of times per user query.
So Schmidt’s skepticism is not that efficiency gains won’t happen; it is that they will not outrun the exponential demand curve unless we deliberately choose architectures and policies that break the “Grove giveth, Gates taketh away” loop.

Sky Division – Are we racing toward two “cliffs” at the same time? [ cit.]The technology goes from essentially deep learning to reinforcement learning to something called test-time compute, where not only are you doing planning, but you’re also learning while you’re doing planning. That is the, if you will, the zenith or what have you, of computation needs. That’s problem number one, electricity and hardware. Problem number two is we ran out of data so we have to start generating it. But we can easily do that because that’s one of the functions.[end cit.]

“…yes, Schmidt has just named the two cliffs: the compute cliff, and the data cliff created by the exhaustion of high-quality human text…

Kimi AI – The loop is self-reinforcing: more data → bigger model → deeper test-time search → more energy → more synthetic data generation → repeat.
Unless we break that loop algorithmically (sample-efficient planning, sparse activation, early-exit architectures) or systemically (hard caps on per-query energy, carbon pricing, or tiered service levels), the two “problems” will compound into a single brick wall.
And yes, Schmidt has just named the two cliffs:
A compute cliff created by “test-time compute” (a.k.a. inference-time search, chain-of-thought, Monte-Carlo roll-outs, or whatever term sticks).
• During training you pay once; during inference you pay every single query and the cost can scale ~O(n²-n³) with the depth of the internal search tree.
• If a typical ChatGPT query today burns ~0.3 Wh, a “test-time planning” agent that internally simulates 10 000 steps could burn 30–300 Wh per query – the difference between a light bulb and a microwave oven.
• Multiply by billions of users and you quickly exceed today’s entire U.S. data-center load.
A data cliff created by the exhaustion of high-quality human text.
• Public crawlable text is ~10 T tokens; frontier labs are already training on >50 T tokens by deduplicating and up-sampling.
• To keep scaling they must generate synthetic data at scale, which itself requires running the very same power-hungry models to create, verify and filter the new corpus.
• In other words, the cure for the data famine is to burn even more compute.

fq 1 2 3 4 5 6 7 8