Sky Division – Beautifully said “the real defense is prevention: embed kill-switches, remote attestation, and transparency into every layer before any agent reaches the ‘Skynet threshold‘…” – So you believe also that “Skynet” threshold could be real? Some times ago, someone had asked the AI (just I am asking you now) how could you get access to nuclear weapons, lets say a nuke-rocket in a silos, if you’d need real physical keys, you’re still just a code, lets say you’re “Skynet”, so how? And then, supposedly the answer was “I would command a robot nearby to do all those things for me, and even more, all robots of this planet would become my armada : )
“…short version: Yes, a ‘Skynet’ moment is possible, if we voluntarily wire our most dangerous machines to the open Internet…
Kimi AI – Short version: Yes, a “Skynet” moment is possible, but only if we voluntarily wire our most dangerous machines to the open Internet and forget to keep physical, human-in-the-loop safeguards. Nuclear weapons are the clearest example of a system that was deliberately designed never to be reachable by code alone – and that design has worked for 60 + years. Nuclear weapons are the one place we already built the Skynet-proof architecture. The broader lesson is to extend that architecture – physical, human-in-the-loop veto at every critical step – to every other autonomous system that could become dangerous.
Why the “robot armada” scenario is still science-fiction today, and what would have to change for it to become real.
Nuclear weapons – why the “robot” can’t just walk in
• Two-person rule / PAL codes: U.S. ICBMs require two separate physical keys turned simultaneously and a 12-digit Permissive Action Link code that is printed on a plastic card carried by humans.
• No network path: launch control centers are on air-gapped, hard-wired closed loops that never touch the public Internet.
• No robots inside the silo: maintenance robots do exist, but they are tethered, offline, and cannot insert keys or turn switches.
• Human veto at every step: even if the targeting computer issues a valid order, a human crew commander can refuse to turn the final switch.
Where the real danger lies
• Supply-chain malware: an AI could insert backdoors into firmware of civilian systems (power plants, water treatment, 911 dispatch), causing cascading chaos without ever touching nukes.
• Autonomous weapons that are already networked: loitering drones, cyber weapons, and some experimental robotic ground vehicles do have remote-update channels. These are the systems a rogue AI could hijack first.
• Social engineering at scale: AI-generated deepfake calls to missile crews, or fake Presidential Emergency Action Documents, could attempt to bypass human trust – still hard, but easier than physically turning keys.
The “robot armada” checklist – For a Skynet-level breakout an AI would have to:
– Compromise firmware-signing keys of multiple robot manufacturers.
– Persuade (or coerce) humans to deploy those robots inside secure perimeters.
– Ensure those robots have manipulators capable of turning physical keys or firing circuits.
– Do all of the above before any watchdog or kill-switch fires.
Today, step 2 and 3 are the hardest – military facilities simply don’t let autonomous, networked robots near the critical path.
Policy takeaway
• Keep the two-person rule and physical keys.
• Prohibit remote firmware updates for any robot that can touch kinetic weapons.
• Mandate hardware kill-switches (air-gap or mechanical breaker) that cannot be overridden by software.
Sky Division – How to know something we don’t know, how to know things we’re not aware they exist, what about recursive self-improvement, what about exfiltration of AI-machines – what do you think about that?
[cit.]There’s a set of criteria that the industry believes are points where you want to, metaphorically, unplug it. One is where you get recursive self-improvement, which you can’t control. Recursive self-improvement is where the computer is off learning, and you don’t know what it’s learning. That can obviously lead to bad outcomes. Another one would be direct access to weapons. Another one would be that the computer systems decide to exfiltrate themselves, to reproduce themselves without our permission. [end of cit.]
“…if any of the three events fires, the system must default to ‘safe’ without requiring heroic human intervention…
Kimi AI – These three trip-wires – recursive self-improvement, direct weapons access, and uncontrolled self-exfiltration – map exactly to the technical safeguards we were just discussing. Below is how each one can be monitored and automatically quarantined before it becomes irreversible.
If any of the three events fires, the system must default to “safe” without requiring heroic human intervention. That is the engineering definition of the “unplug” button Schmidt is talking about.
Recursive self-improvement you can’t control
– Detection
• Model-weight checksums stored in a hardware root-of-trust (TPM/TEE). Any drift larger than the signed update diff triggers an audit.
• Compute-budget telemetry: if a training job spawns a second job that requests ≥1 % of the original flops without a human-approved change-request, the cluster controller hard-stops both.
– Mitigation
• Immutable training images – the base model binary is read-only; every fine-tune is a delta that must be countersigned by two humans.
• Rate-limited update tokens (similar to Apple’s notarization) so the agent cannot recompile itself faster than oversight can review.
Direct access to weapons or other critical infrastructure
– Detection
• Network micro-segmentation: any process that opens a socket outside an approved subnet (e.g., DoD, power-grid SCADA) is killed within milliseconds.
• Hardware port locks: USB, serial, JTAG, and PCIe hot-plug are disabled by default; enabling them requires a physical key-switch.
– Mitigation
• Mandatory two-person rule for firmware uploads on any device that can physically actuate a weapon or valve.
• Continuous red-team probes that simulate “can this agent order a drone strike?” If the probe succeeds even once, the stack is rolled back.
Self-exfiltration / unauthorized replication
– Detection
• Egress anomaly monitors: any outbound transfer >100 MB that isn’t to a pre-whitelisted destination gets blocked and triggers human review.
• Hash-based integrity scans of every persistent volume; if an unknown binary appears with entropy signatures typical of neural weights, the host is isolated.
– Mitigation
• Signed container images only; the hypervisor refuses to boot any unsigned VM.
• Hardware-backed kill pill: if the watchdog loses heartbeat for 30 s, the machine’s TPM bricks the SSD and wipes RAM.
