Paperclip maximizer

From Lesswrongwiki
Revision as of 18:28, 1 September 2013 by JoshuaFox (talk | contribs) (Similar thought experiments)
Jump to: navigation, search

The AI does not hate you, nor does it love you, but you are made out of atoms which it can use for something else.

The paperclip maximizer is the canonical thought experiment showing how an artificial general intelligence, even one with an apparently innocuous and impractical goal, would ultimately destroy humanity--unless its goal is the preservation of human values.

This goal for the artificial general intelligence (AGI) is chosen for illustrated purposes because it is very unlikely to be implemented, and has little apparent danger or emotional load (in contrast to, for example, curing cancer or winning wars). This produces a thought experiment which shows the contingency of human values: An extremely powerful optimizer (a highly intelligent agent) could seek goals that are completely alien to ours, and as a side-effect destroy us by consuming resources essential to our survival.


First described by Bostrom (2003), the paperclip maximizer is an AGI whose goal is to maximize the number of paperclips in its collection. If it has been constructed with a roughly human level of general intelligence, the AGI might collect paperclips, earn money to buy paperclips, or begin to manufacture paperclips.

Most importantly, however, it would undergo an intelligence explosion: It would work to improve its own intelligence, where "intelligence" is understood in the sense of optimization power, the ability to maximize a reward/utility function--in this case, the number of paperclips. The AGI would improve its intelligence, not because it values more intelligence in its own right, but because more intelligence would help it achieve its goal of accumulating paperclips. Having increased its intelligence, it would produce more paperclips, and also used its enhanced abilities to further self-improve. Continuing this process, it would undergo an Intelligence explosion and reach far-above-human levels.

It would innovate better and better techniques to maximize the number of paperclips. Ultimately, it would convert all the mass of the solar system into paperclips.

This may seem more like super-stupidity than super-intelligence. For humans, it would indeed be stupidity, as it would constitute failure to fulfill many of our important terminal values, such as life, love, and variety. But the AGI under consideration has a goal system very different from humans. It has the one, simple goal of maximizing the number of paperclips, and human life, learning, joy, and so on are not specified as goals. The AGI is simply an optimization process--a goal-seeker, a utility-function-maximizer. Its values can be completely alien to ours. If its utility function is to maximize paperclips, then unless it is buggy, it will do exactly that.

A paperclipping scenario is also possible without an intelligence explosion. If society keeps getting increasingly automated and AI-dominated, then the first borderline AGI might manage to take over the rest using some relatively narrow-domain trick that doesn't require very high general intelligence.


The paperclip maximizer illustrates that an entity can be a powerful optimizer--an intelligence--without sharing any of the complex mix of human terminal values, which developed under the particular selection pressures found in our environment of evolutionary adaptation.

If an AGI is not specifically programmed to be benevolent to humans, it will be almost as dangerous as if it were designed to be malevolent.

Any future AGI, if it is not to destroy us, must be built to specifically optimize for human values as its terminal value (goal). Human values don't spontaneously emerge in a generic optimization process.

Similar thought experiments

Other goals for AGIs have been used to illustrate similar concepts.

Some goals are apparently morally neutral, like the paperclip maximizer.

Other goals are purely mathematical goals, with no apparent real-world impact. Yet these too present similar risks. For example, if an AGI had the goal of solving the Riemann Hypothesis, it might convert all available mass to computronium (the most efficient possible computer processors).

Other goals are apparently pro-human: Maximizing towards these goals seems to support human well-being. Yet even these would produce similar outcomes unless the full complement of human values is the goal. For example, an AGI whose terminal value is to increase the number of smiles, as a proxy for human happiness, would tile the solar system with smiley faces (Yudkowsky 2008).


Blog posts

See also