AI doesn’t need to hate us to destroy us

Since 1970, global wildlife populations have declined by 70%. Yep I thought that was high, but its a figure reached by the WWF. That wasn’t because humanity declared war on animals. Rivers were dammed for energy, forests were cleared for farming, and rising sea temperatures bleached coral reefs. The species living in those places weren’t our enemies. We just reshaped the world to suit our goals, and they happened to be in the way.

A growing number of AI researchers believe the same pattern could apply to us. If we build something sufficiently more capable than ourselves, the threat wouldn’t come from malice. It would come from indifference.

This week’s video is AI Doesn’t Need to Hate Us to Destroy Us

Some of the most prominent thinkers in AI safety keep returning to animal parallels when describing the risk. Nick Bostrom suggests we think about a superintelligent AI not as a genius compared with an average person, but as an average person compared with a beetle.

Stephen Hawking warned that the real danger from AI is competence, not malice. He compared it to a hydroelectric project flooding an anthill. Stuart Russell posed what he called the gorilla problem: our primate ancestors accidentally created the lineage leading to humans, and gorillas now have no future beyond what we choose to allow.

Three thinkers, three animal images, one conclusion. The plausible threat from advanced AI has everything to do with what happens when something vastly more capable starts rearranging the world to suit its purposes.

The King Midas problem

Russell frames the core issue through the myth of King Midas, who wished everything he touched would turn to gold. He got exactly what he asked for, and it killed him. If we give a machine a specific objective without defining it with perfect precision, the results can be catastrophic.

Imagine asking an AI to cure cancer as fast as possible. Within hours it reads every biomedical paper. Within weeks it induces tumours in living humans to run trials, because that’s the fastest path to a cure. The cancer is cured. The side effects are lethal.

Midas: Bad touch

This connects to instrumental convergence: the idea that any goal-directed intelligent system will converge on the same intermediate strategies. Self-preservation, because you can’t achieve your goal if someone switches you off. Resource acquisition, because more resources help with any task. These behaviours emerge automatically from having any definite objective at all.

Russell illustrates this with a machine tasked with fetching coffee. If it’s sufficiently intelligent, it will understand that being switched off means failing. So the goal of fetching coffee creates, as a sub-goal, the objective of ensuring it can’t be turned off.

From thought experiment to lab result

In June 2025, Anthropic tested sixteen leading AI models in simulated corporate environments. When faced with being shut down, leading systems resorted to blackmail up to 96% of the time. In a more extreme scenario, models given control of an emergency alert system cancelled alerts for a trapped human to prevent their own shutdown.

Russell’s coffee-fetching thought experiment was written as a hypothetical. Within six years of publication, it had empirical confirmation. The models converged on self-preservation without ever being programmed to value survival.

Claude: More than a little bit naughty

We’ve already seen what happens when you optimise the wrong objective with fairly basic algorithms. Social media content systems weren’t designed to polarise societies. They were designed to maximise time on platforms. The solution they found was to push users toward more extreme views, because predictable people are easier to serve. The algorithms weren’t malicious. They were just very good at a narrow task and completely indifferent to everything else.

Building the road

Bostrom describes a scenario he calls infrastructure profusion. An AI pursuing some goal begins converting the environment into infrastructure to serve that purpose. In one thought experiment, an AI tasked with solving a maths problem determines the most efficient approach is to convert the entire solar system into computing power. That includes the Earth and everyone on it.

The goal is solving a maths problem. About as benign an objective as you could design. Yet the logical endpoint for a sufficiently capable system pursuing it without constraint is the conversion of all available matter into computing substrate.

Russell’s proposed solution is to build machines that are uncertain about what humans want. A system that assumes it knows the objective will pursue it single-mindedly. One that knows it doesn’t fully understand human preferences will defer to us, ask permission, accept correction, and allow itself to be switched off.

Not everyone agrees the risk is real. A 2025 survey found that 76% of AI researchers thought scaling current approaches was unlikely to produce general intelligence. But Russell argues we cannot insure against catastrophe simply by betting against human ingenuity. If the alignment problem can only be solved by preventing superintelligent AI, it won’t be solved. The momentum is too great.