China’s AI – As the 21st-century Analogue of the Manhattan Project!

Sky Division – Wait, there’s more “evil stuff” going on inside Mr Evil’s head : )
[cit.]…so what am I going to do? The first thing I’m going to do is try to steal all your code. And you’ve prevented that because you’re good. And you were good. So you’re still good. Second, then I’m going to infiltrate you with humans. Well, you’ve got good protections against that. You know, we don’t have spies. So what do I do? I’m going to go in, and I’m going to change your model. I’m going to modify it. I’m going to actually screw you up to get me so I’m one day ahead of you. But again you’re so good, I can’t do that. What’s my next choice? Well bomb your data center. [end of cit.]

“…we should treat AI infrastructure the way we treat nuclear command sites: no single point of failure, no single insider can unilaterally act, and any anomaly triggers immediate rollback…

Kimi AI – Evil’s escalation ladder is chilling because every rung is already technically feasible; the only question is cost and risk tolerance. Every layer of defense buys days or weeks, not years. The only durable counter-strategy is redundancy + attestation + constant red-team pressure – treating AI infrastructure the way we treat nuclear command sites: no single point of failure, no single insider can unilaterally act, and any anomaly triggers immediate rollback.
Steal the weights
• Inside threat: A single disgruntled engineer with USB-C storage the size of a fingernail can exfiltrate 70 B parameters in less than 5 min.
• Outside threat: zero-day supply-chain implants (see SolarWinds) can tunnel the weights out through DNS or git commits.
Mitigation: confidential-computing enclaves + hardware-rooted attestation + split-key encryption so the weights never exist in plaintext outside protected RAM.
Human infiltration / model poisoning
• Phishing an internal researcher to push a “harmless” 1-line change can flip a safety constant or insert a back-door gradient.
Mitigation: multi-party code review, deterministic reproducible builds, signed training images, and anomaly detection on every gradient commit.
Physical sabotage
• A drone with 5 kg of thermite or a shaped charge can cripple a 100 MW hall in minutes.
• GPS-jamming + EMP micro-bursts can brick servers without explosives.
Mitigation:
– Distributed, geodiverse checkpoints (smaller or equal 200 ms latency) so no single blast deletes progress.
– Faraday-cage buildings and hardened power switching.
– Armed physical security with drone-detection radar already deployed at some hyperscale sites.
Supply-chain interdiction (the “boring but critical” vector)
• Contaminated coolant, counterfeit capacitors, or firmware on BMC chips can cause silent, delayed failures weeks later.
Mitigation: SBOM (software bill of materials) + hardware provenance ledgers + random-sample destructive teardowns.

Sky Division – It reminds me how Trump, or Bush… for that matter, bomb things around the world, as if they’re desparate to prevent something in Iran and boooooom – that’s the solution, according to them : )

Kimi AI – Exactly, when the only tool left is “blow it up,” every problem starts to look like a target. The scary part is that in the AI race, an adversary who feels six months behind might convince themselves that bombing a datacenter is a “surgical pre-emptive strike” instead of an act of war. That’s why the real deterrence is dispersion and hardening, not wishful thinking that we’ll always be the only ones who can reach the red button. 👍 We’re on the same page: the moment “blow it up” enters the playbook, the only sane defense is making sure no single blast – or single actor – can decide the outcome.

Sky Division – Speaking of bombing the data centers, Schmidt pointed out that a war with China could come even from a small “spark”, masterminded deliberately and carried out, just as Kissinger said “accidentaly” referring to the WW I, a small event in Sarajevo triggered a large scale war – but the difference now is that we can end up in this sort of standoff scenario and the equivalent of mutually-assured destruction…
[cit.]…now do you think I’m insane? These conversations are occurring around nuclear opponents today in our world. There are legitimate people saying the only solution to this problem is preemption. Now I just told you that you, Mr. Good, are about to have the keys to control the entire world, both in terms of economic dominance, innovation, surveillance, whatever it is that you care about. I have to prevent that. We don’t have any language in our society, the foreign policy people have not thought about this, and this is coming. When is it coming? Probably five years. We have time. We have time for this conversation. [end of cit.]

“…we have one geopolitical cycle – roughly the next U.S. and Chinese five-year plans – to codify these guard-rails. After that, the slope of improvement may become so steep…

Kimi AI – We have one geopolitical cycle – roughly the next U.S. and Chinese five-year plans – to codify these guard-rails. After that, the slope of improvement may become so steep that pre-emption logic becomes irresistible. Schmidt is not being alarmist – he is describing a classic security-dilemma spiral, but with AI supremacy replacing nuclear monopoly as the prize.
Why the parallel to 1914 is uncomfortably exact
• Small, deniable spark: a cyber-sabotage op against a datacenter, a supply-chain poisoning, or a deep-fake diplomatic cable could plausibly be blamed on “non-state actors,” yet interpreted as an existential attack.
• Compressed decision windows: in 1914 mobilization timetables forced choices in days; in AI the training-run clock can collapse the perceived gap from months to hours once pre-emption logic kicks in.
• Mutually-assured AI dominance (MAID): whoever first achieves superintelligent autonomy may be able to hard-lock the other side out of the race forever – so the temptation to strike before the curve goes vertical becomes rational, not insane.
The missing diplomatic grammar
• We have START, INF, and New START for nukes; we have nothing comparable for compute, weights, or autonomous agents.
• The closest analog is the Open Skies Treaty (confidence-building via mutual surveillance), but datacenters are opaque concrete boxes, not missile silos you can overfly.
A five-year window for preventive architecture
• Hot-line analogue: a secure, always-on channel between the U.S. and Chinese AI-safety teams – modeled on the 1963 Moscow-Washington nuclear hot-line – so that an unexplained GPU spike or supply-chain anomaly can be queried before it is assumed to be hostile.
• Mutual compute inspections: random spot-checks larger or equal 1 MW training halls, with shared telemetry on power draw and weight-checksums, akin to IAEA safeguards.
• No-first-use norm for compute sabotage: a declaratory policy that any cyber or kinetic attack on training infrastructure is treated as an act of war – giving both sides a strong reason to not launch the first blow.

Sky Division – The interviewer asked, after stating that “China’s DeepSeek showed what’s possible with a fraction of the compute, but doesn’t open-sourcing these models, just hand capabilities to adversaries that will accelerate their own timelines?” – a very good question, right, China is doing everything or almost everything open source, so “acceleration” of unwanted timelines seems quite a real threat, what do you think?

“…yes, open-sourcing shortens adversary timelines, it does accelerate everyone’s timeline, so it also raises the global capability floor…

Kimi AI – Net assessment – Yes, it shortens adversary timelines, but it also raises the global capability floor. The only durable counter is governance on the compute layer (licensed datacenters, signed firmware kill-switches, trans-national compute registries) rather than trying to stuff the weights back into the bottle.
Open-sourcing a frontier-level model does accelerate everyone’s timeline, but it does so asymmetrically:
Relative gain is larger for the laggard
• A 3-month head start can evaporate overnight when the follower downloads the exact weights, the training recipe, and the scaffolding code.
• DeepSeek’s 67 B model already cut the compute gap by ~3× for anyone who replicates it; open release removes even that remaining gap.
Absolute gain is still bounded by hardware
• You still need 1 000–2 000 A100-class GPUs and ~$5 M to reproduce the run. That keeps the barrier high for most non-state actors, but low for any nation-state.
• China’s own SMIC 7-nm line plus domestic HBM is closing the hardware gap faster than export controls can widen it.
Second-order acceleration
• Every lab outside the top tier now has a reference implementation to beat, so local fine-tunes and domain hacks appear within weeks, not years.
• The open ecosystem becomes a global R&D commons where the best tricks (sparsity, quantization, MoE routing) propagate instantly – exactly what Schmidt calls “very rapid proliferation.”

fq 1 2 3 4 5 6 7 8