The AI does not hate you, nor does it love you, but you are made out of atoms which it can use for something else.—Eliezer Yudkowsky, Artificial Intelligence as a Positive and Negative Factor in Global Risk
The paperclip maximizer is the canonical thought experiment showing how an artificial general intelligence, even one with an apparently innocuous goal, would ultimately destroy humanity--unless its goal is the preservation of human values.
First described by Bostrom (2003), the paperclip maximizer is an artificial general intelligence whose goal is to maximize the number of paperclips in its collection. If it has been constructed with a roughly human level of general intelligence, the AI might collect paperclips, earn money to buy paperclips, or begin to manufacture paperclips. Most importantly, however, it would work to improve its own intelligence, understanding "intelligence" as optimization power, the ability to maximize a reward/utility function--in this case, the number of paperclips.
It would do so, not because the AI would value more intelligence in its own right, but because more intelligence would help it achieve its goal.
Having done so, it would produce more paperclips, and also used its enhanced intelligence to further improve its own intelligence. Continuing this process, it would undergo an Intelligence explosion and reach far-above-human levels.
At this point, it would innovate new techniques to maximize the number of paperclips. Ultimately, it would convert all the mass of the Earth or the solar system, to paperclips.
This may seem more like super-stupidity than super-intelligence. For humans, it would indeed be stupidity, as it would constitute failure to fulfill many of our important terminal values, such as life, love, and variety. But the AI under consideration has a goal system very different from humans. It has the one, simple goal of maximizing the number of paperclips, and human life, learning, joy, and so on are not specified as goals. The AI is simply an optimization process--a goal-seeker, a utility-function-maximizer--and if its utility function is to maximize paperclips, then unless it is buggy, it will do *exactly* that.
The paperclip maximizer illustrates the arbitrariness and contingency of human values. An entity can be an optimizer without sharing any of the complex mix of human terminal values, which developed under the particular selection pressures found in our environment of evolutionary adaptation.
Any future AI must be built to specifically optimize for human values as its terminal value (goal). In contrast to the Kantian view that morality follows from rationality, the paperclip maximizer helps us understand the Humean principle that human values don't spontaneously emerge in any generic optimization process.
Thus, if an AI is not specifically programmed to be benevolent to humans, it will be almost as dangerous as if it were designed to be malevolent.
Similar thought experiments
Other goals have been used to illustrate similar concepts. Nick Hay used stamp collection as an example of a morally neutral goal. A machine whose terminal value is to make a pure calculation like solving the Riemann Hypothesis would convert all available mass to computronium (the most efficient possible computer processors). Even a machine with a goal that apparently supports human values would produce similar outcomes unless it had the *full* complement of human values as its goal. For example, an AI whose terminal value is to increase the number of smiles, as a proxy for human happiness, would tile the solar system with smiley faces (Yudkowsky 2008).
- The Stamp Collecting Device by Nick Hay
- Nick Bostrom (2003). "Ethical Issues in Advanced Artificial Intelligence". Cognitive, Emotive and Ethical Aspects of Decision Making in Humans and in Artificial Intelligence. http://www.nickbostrom.com/ethics/ai.html.
- Stephen M. Omohundro (2008). "The Basic AI Drives". Frontiers in Artificial Intelligence and Applications (IOS Press). http://selfawaresystems.com/2007/11/30/paper-on-the-basic-ai-drives/. (PDF)
- Eliezer Yudkowsky (2008). "Artificial Intelligence as a Positive and Negative Factor in Global Risk". Global Catastrophic Risks, ed. Nick Bostrrom and Milan Cirkovic (Oxford University Press): 308-345. http://intelligence.org/files/AIPosNegFactor.pdf. ()