Difference between revisions of "Paperclip maximizer"

From Lesswrongwiki
Jump to: navigation, search
m (Grammar Gestapo was here)
Line 2: Line 2:
 
|Eliezer Yudkowsky|[http://yudkowsky.net/singularity/ai-risk Artificial Intelligence as a Positive and Negative Factor in Global Risk]}}
 
|Eliezer Yudkowsky|[http://yudkowsky.net/singularity/ai-risk Artificial Intelligence as a Positive and Negative Factor in Global Risk]}}
  
The '''paperclip maximizer''' is the canonical thought experiment showing how an artificial general intelligence, even one designed competently and without malice, could ultimately destroy humanity. The thought experiment shows that AIs with apparently innocuous values could pose an [[existential risk|existential threat]].
+
The '''paperclip maximizer''' is the canonical thought experiment showing how an  an artificial general intelligence with even an apparently innocuous goal would ultimately destroy humanity--unless its goal is the preservation of human values.  
  
The goal of maximizing paperclips is chosen for illustrative purposes because it is very unlikely to be implemented, and has little apparent danger or emotional load (in contrast to, for example, curing cancer or winning wars). This produces a thought experiment which shows the contingency of human values: An [[really powerful optimization process|extremely powerful optimizer]] (a highly intelligent agent) could seek goals that are completely alien to ours ([[orthogonality thesis]]), and as a side-effect destroy us by consuming resources essential to our survival.
+
As described by Bostrom (2003), the paperclip maximizer is an AI whose goal is to maximize the number of paperclips in its collection. If it had been constructed with a roughly human level of general intelligence, an AI might collect paperclips, earn money to buy paperclips, or begin to manufacture paperclips. However, and most importantly, it would work to improve its own intelligence, understanding "intelligence" as [[optimization]] power, the ability to maximize a reward/utility function--in this case, the number of paperclips.
  
==Description==
+
It would do so, not because the AI would value more intelligence in its own right, but because more intelligence would help it achieve its goal.
First described by Bostrom (2003), a paperclip maximizer is an [[artificial general intelligence]] (AGI) whose goal is to maximize the number of paperclips in its collection. If it has  been constructed with a roughly human level of general intelligence, the AGI might collect paperclips, earn money to  buy paperclips, or begin to manufacture paperclips.  
 
  
Most importantly, however, it would undergo an [[intelligence explosion]]: It would work to improve its own intelligence, where "intelligence" is understood in the sense of  [[optimization]] power, the ability to maximize a reward/[[utility function]]—in this case, the number of paperclips. The AGI would improve its intelligence, not because it values more intelligence in its own right, but because more intelligence would help it achieve its goal of accumulating paperclips. Having increased its intelligence, it would produce more paperclips, and also use its enhanced abilities to further self-improve. Continuing this process, it would undergo an [[intelligence explosion]]  and reach far-above-human levels.
+
Having done so, it would produce more paperclips, and also used its enhanced intelligence to further improve its own intelligence. Continuing this process, it would undergo an [[Intelligence Explosion]]  and reach far-above-human levels.
  
It would innovate better and better techniques to maximize the number of paperclips. At some point, it might convert most of the matter in the solar system into paperclips.
+
which has is an agent that desires to fill the universe with as many paperclips as possible. It is usually assumed to be a [[Strong AI|superintelligent AI]] so [[singleton|powerful]] that the outcome for the world overwhelmingly depends on its goals, and little else. A paperclip maximizer very [[Lawful intelligence|creatively]] and efficiently converts the universe into something that is from a human perspective completely arbitrary and worthless.  
  
This may seem more like super-stupidity than super-intelligence. For humans, it would indeed be stupidity, as it would constitute failure to fulfill many of our important [[Terminal value|terminal values]], such as life, love, and variety. The AGI won't revise or otherwise change its goals, since changing its goals would result in fewer paperclips being made in the future, and that opposes its current goal. It has one simple goal of maximizing the number of paperclips; human life, learning, joy, and so on are not specified as goals. An AGI is simply an [[optimization process]]—a goal-seeker, a utility-function-maximizer. Its values can be completely alien to ours. If its utility function is to maximize paperclips, then it will do exactly that.
+
It is important to realize that purely internal goals may also result in dangerous behavior. An AI which maximizes a number within itself would fill the universe with as many computing modules as possible, to store an enormously huge number.
  
A paperclipping scenario is also possible without an intelligence explosion. If society keeps getting increasingly automated and AI-dominated, then the first borderline AGI might manage to take over the rest using some relatively narrow-domain trick that doesn't require very high general intelligence.
+
A paperclip maximizer does not "realize" that all life is precious, despite its intelligence, because the notion that life is precious is specific to particular philosophies held by human beings, who have an adapted moral architecture resulting from ''specific'' selection pressures acting over millions of years of [[evolution]]ary time. These values don't [[Futility of chaos|spontaneously emerge]] in any generic [[optimization process]]. A paperclip maximizer sees life the same way it sees everything else that is made of atoms — as raw material for paperclips.
  
==Conclusions==
+
This concept illustrates how AIs that haven't been [[Friendly AI|specifically programmed to be benevolent to humans]] are basically as dangerous as if they were explicitly malicious. The use of paperclips in this example is unimportant and serves as a stand-in for any values that are not merely [[alien values|alien]] and unlike [[human universal|human values]], but result from ''blindly'' pulling an arbitrary mind from a [[mind design space]]. Calling strong AIs [[really powerful optimization process]]es is another way of fighting [[anthropomorphism|anthropomorphic]] [[connotation]]s in the term "artificial intelligence".
The paperclip maximizer illustrates that an entity can be a powerful optimizer—an intelligence—without sharing any of the complex mix of human [[terminal value|terminal values]], which developed under the particular selection pressures found in our [[evolution|environment of evolutionary adaptation]], and that an AGI that is not specifically [[Friendly AI|programmed to be benevolent to humans]] will be almost as dangerous as if it were designed to be malevolent.
 
  
Any future AGI, if it is not to destroy us, must have human values as its terminal value (goal). Human values don't [[Futility of chaos|spontaneously emerge]] in a generic optimization process. A safe AI would therefore have to be programmed explicitly with
+
==External links==
human values ''or'' programmed with the ability (including the goal) of inferring human values.
 
  
==Similar thought experiments==
+
*[http://intelligence.org/blog/2007/06/11/the-stamp-collecting-device/ The Stamp Collecting Device] by [[Nick Hay]]
Other goals for AGIs have been used to illustrate similar concepts.
 
 
 
Some goals are apparently morally neutral, like the paperclip maximizer. These goals involve a very minor human "value," in this case making paperclips. The same point can be illustrated with a much more significant value, such as eliminating cancer. An optimizer which instantly vaporized all humans would be maximizing for that value.
 
 
 
Other goals are purely mathematical, with no apparent real-world impact. Yet these too present similar risks. For example, if an AGI had the goal of solving the Riemann Hypothesis, [http://intelligence.org/upload/CFAI/design/generic.html#glossary_riemann_hypothesis_catastrophe it might convert] all available mass to [[computronium]] (the most efficient possible computer processors).
 
 
 
Some goals apparently serve as a proxy or measure of human welfare, so that maximizing towards these goals seems to also lead to benefit for humanity. Yet even these would produce similar outcomes unless the ''full'' complement of human values is the goal. For example, an AGI whose terminal value is to increase the number of smiles, as a proxy for human happiness, could work towards that goal by reconfiguring all human faces to product smiles, or tiling the solar system with smiley faces (Yudkowsky 2008).
 
  
 
==References==
 
==References==
Line 48: Line 38:
 
|year=2008
 
|year=2008
 
|publisher=IOS Press}} ([http://selfawaresystems.files.wordpress.com/2008/01/ai_drives_final.pdf PDF])
 
|publisher=IOS Press}} ([http://selfawaresystems.files.wordpress.com/2008/01/ai_drives_final.pdf PDF])
 
*{{cite journal
 
|title=Artificial Intelligence as a Positive and Negative Factor in Global Risk
 
|author=Eliezer Yudkowsky
 
|url=http://intelligence.org/files/AIPosNegFactor.pdf
 
|journal= Global Catastrophic Risks, ed. Nick Bostrrom  and Milan Cirkovic
 
|year=2008
 
|publisher=Oxford University Press
 
|pages=308-345}} ([http://intelligence.org/files/AIPosNegFactor.pdf])
 
  
 
==Blog posts==
 
==Blog posts==
Line 65: Line 46:
 
==See also==
 
==See also==
  
*[[Orthogonality thesis]]
 
 
*[[Unfriendly AI]]
 
*[[Unfriendly AI]]
 
*[[Mind design space]], [[Magical categories]], [[Complexity of value]]
 
*[[Mind design space]], [[Magical categories]], [[Complexity of value]]

Revision as of 03:34, 25 November 2014

The AI does not hate you, nor does it love you, but you are made out of atoms which it can use for something else.

The paperclip maximizer is the canonical thought experiment showing how an an artificial general intelligence with even an apparently innocuous goal would ultimately destroy humanity--unless its goal is the preservation of human values.

As described by Bostrom (2003), the paperclip maximizer is an AI whose goal is to maximize the number of paperclips in its collection. If it had been constructed with a roughly human level of general intelligence, an AI might collect paperclips, earn money to buy paperclips, or begin to manufacture paperclips. However, and most importantly, it would work to improve its own intelligence, understanding "intelligence" as optimization power, the ability to maximize a reward/utility function--in this case, the number of paperclips.

It would do so, not because the AI would value more intelligence in its own right, but because more intelligence would help it achieve its goal.

Having done so, it would produce more paperclips, and also used its enhanced intelligence to further improve its own intelligence. Continuing this process, it would undergo an Intelligence Explosion and reach far-above-human levels.

which has is an agent that desires to fill the universe with as many paperclips as possible. It is usually assumed to be a superintelligent AI so powerful that the outcome for the world overwhelmingly depends on its goals, and little else. A paperclip maximizer very creatively and efficiently converts the universe into something that is from a human perspective completely arbitrary and worthless.

It is important to realize that purely internal goals may also result in dangerous behavior. An AI which maximizes a number within itself would fill the universe with as many computing modules as possible, to store an enormously huge number.

A paperclip maximizer does not "realize" that all life is precious, despite its intelligence, because the notion that life is precious is specific to particular philosophies held by human beings, who have an adapted moral architecture resulting from specific selection pressures acting over millions of years of evolutionary time. These values don't spontaneously emerge in any generic optimization process. A paperclip maximizer sees life the same way it sees everything else that is made of atoms — as raw material for paperclips.

This concept illustrates how AIs that haven't been specifically programmed to be benevolent to humans are basically as dangerous as if they were explicitly malicious. The use of paperclips in this example is unimportant and serves as a stand-in for any values that are not merely alien and unlike human values, but result from blindly pulling an arbitrary mind from a mind design space. Calling strong AIs really powerful optimization processes is another way of fighting anthropomorphic connotations in the term "artificial intelligence".

External links

References

Blog posts

See also