Difference between revisions of "Friendly artificial intelligence"

From Lesswrongwiki
Jump to: navigation, search
m (Modified for clarity of intent.)
(big rewrite; new section on OPFAI; discussion of ASIs that aren't Friendly or Unfriendly)
Line 1: Line 1:
 
{{wikilink}}
 
{{wikilink}}
  
'''Friendly Artificial Intelligence''' ('''FAI''') has two meanings. Its general meaning refers to any [[artificial general intelligence]] that has a positive rather than negative effect on humanity. The more specific meaning refers to the kinds of AGI designs which Eliezer Yudkowsky argues to be the only ones that can be expected to have a positive effect. The rest of this article uses the term in its more general sense.
+
A '''Friendly Artificial Intelligence''' ('''Friendly AI''', or '''FAI''') is a [[superintelligence]] (i.e., a [[really powerful optimization process]]) that produces good, beneficial outcomes rather than harmful ones. The term was coined by Eliezer Yudkowsky, so it is frequently associated with Yudkowsky's proposals for how such an [[artificial general intelligence]] (AGI) of this sort would behave.
  
In this context, ''Friendly AI'' also refers to the field of knowledge required to build such an AI. Note that ''Friendly'' (capital-''F'') is being used as a term of art, referring specifically to AIs that protect humans and humane values; an FAI need not be "friendly" in the conventional sense and need not even be sentient. Any AGI that is not friendly is said to be [[Unfriendly artificial intelligence|Unfriendly]].
+
"Friendly AI" can also be used as a shorthand for '''Friendly AI theory''', the the field of knowledge required to build such an AI. Note that "Friendly" (with a capital "F") is being used as a term of art, referring specifically to AIs that promote humane values. An FAI need not be "friendly" in the conventional sense of being personable, compassionate, or fun to hang out with. Indeed, an FAI need not even be sentient.
  
AI that underwent an [[intelligence explosion]] could exert unprecedented [[optimization]] power over its future; therefore, a Friendly AI could very well create an unimaginably [[Fun theory|good future]]. However, just because an AI has the means to do something, [[Giant cheesecake fallacy|doesn't mean it will]]. An Unfriendly AI could represent an [[existential risk]]: destroying all humans, not out of hostility, but as a side effect of trying to do something [[Paperclip maximizer|entirely different]].
+
Any AGI that is not friendly is said to be [[Unfriendly artificial intelligence|Unfriendly]].
  
Requiring Friendliness doesn't make AGI any ''easier'', and almost certainly makes it harder. Most approaches to AGI aren't amenable to implementing precise goals, and so don't even constitute subprojects for FAI, leading to Unfriendly AI as the only possible "successful" outcome. Specifying Friendliness also presents unique technical challenges: humane moral value [[Complexity of value|is very complex]]; a lot of [[Magical categories|seemingly simple-sounding moral concepts conceal hidden complexity]] not "[[mind projection fallacy|inherent]]" in the universe itself. It is likely impossible to specify humane values by explicitly programming them in, one needs a technique for extracting them automatically.
+
An AI that underwent an [[intelligence explosion]] could exert unprecedented [[optimization process|optimization]] power over its future. Therefore a Friendly AI could very well create an unimaginably good future, of the sort described in [[fun theory]]. However, the fact that an AI has the ability to do something doesn't mean that it will [[giant cheesecake fallacy|make use of this ability]]. Yudkowsky's [http://intelligence.org/2013/05/05/five-theses-two-lemmas-and-a-couple-of-strategic-implications Five Theses] suggest that a [[recursive self-improvement|recursively self-improving]] AGI could quickly become a superintelligence, and that most such superintelligences will have [[basic AI drives|convergent instrumental reasons]] to endanger humanity and its interests. So while building a Friendly superintelligence seems possible, building a superintelligence will generally result instead in an [[Unfriendly artificial intelligence|Unfriendly AI]], a powerful optimization process that optimizes for extremely harmful outcomes. An Unfriendly AI could represent an [[existential risk]] even if it destroys humans, not out of hostility, but as a side effect of trying to do something [[paperclip maximizer|entirely different]].
 +
 
 +
Not all AGIs are Friendly or Unfriendly:
 +
# Some AGIs may be too weak to qualify as superintelligences. We could call these 'approximately human-level AIs'. Designing safety protocols for narrow AIs and weak, non-self-modifying AGIs is primarily a [[machine ethics]] problem outside the purview of Friendly AI.
 +
# Some AGIs (e.g., safe [[Oracle AI]]s) may not optimize strongly and consistently for harmful or beneficial outcomes, or may only do so contingent on how they're used by human operators.
 +
# Some AGIs may be on a self-modification trajectory that will eventually make them Friendly, but are dangerous at present. Calling them 'Friendly' or 'Unfriendly' would neglect their temporal inconsistency, so '[[Proto-Friendly]] AI' is a better term here.
 +
 
 +
However, the [[orthogonality]] and convergent instrumental goals theses give reason to think that the vast majority of possible superintelligences will be Unfriendly.
 +
 
 +
Requiring Friendliness makes the AGI problem significantly harder, because 'Friendly AI' is a much narrower class than 'AI'. Most approaches to AGI aren't amenable to implementing precise goals, and so don't even constitute subprojects for FAI, leading to Unfriendly AI as the only possible 'successful' outcome. Specifying Friendliness also presents unique technical challenges: humane values are [[complexity of value|very complex]]; a lot of [[magical categories|seemingly simple-sounding normative concepts]] conceal hidden complexity; and locating encodings of human values [[mind projection fallacy|in the physical world]] seems impossible to do in any direct way. It will likely be technologically impossible to specify humane values by explicitly programming them in; if so, then FAI calls for a technique for generating such values automatically.
 +
 
 +
==Open problems==
 +
An '''open problem in Friendly AI''' ('''OPFAI''') is a problem in mathematics, computer science, or philosophy of AI that needs to be solved in order to build a Friendly AI, and plausibly ''doesn't'' need to be solved in order to build a superintelligence with unspecified, 'random' values. Open problems include:
 +
 
 +
# [http://lesswrong.com/lw/kd/pascals_mugging_tiny_probabilities_of_vast/ Pascal's mugging] / [http://lesswrong.com/lw/h8k/pascals_muggle_infinitesimal_priors_and_strong/ Pascal's muggle]
 +
# [http://lesswrong.com/lw/hmt/tiling_agents_for_selfmodifying_ai_opfai_2/ Self-modification] and [http://yudkowsky.net/rational/lobs-theorem/ Löb's Theorem]
 +
# [[Naturalized induction]]
  
 
==Blog posts==
 
==Blog posts==
 
 
*[http://lesswrong.com/lw/wk/artificial_mysterious_intelligence/ Artificial Mysterious Intelligence]
 
*[http://lesswrong.com/lw/wk/artificial_mysterious_intelligence/ Artificial Mysterious Intelligence]
 
*[http://lesswrong.com/lw/wt/not_taking_over_the_world/ Not Taking Over the World]
 
*[http://lesswrong.com/lw/wt/not_taking_over_the_world/ Not Taking Over the World]

Revision as of 15:30, 30 January 2014

Smallwikipedialogo.png
Wikipedia has an article about


A Friendly Artificial Intelligence (Friendly AI, or FAI) is a superintelligence (i.e., a really powerful optimization process) that produces good, beneficial outcomes rather than harmful ones. The term was coined by Eliezer Yudkowsky, so it is frequently associated with Yudkowsky's proposals for how such an artificial general intelligence (AGI) of this sort would behave.

"Friendly AI" can also be used as a shorthand for Friendly AI theory, the the field of knowledge required to build such an AI. Note that "Friendly" (with a capital "F") is being used as a term of art, referring specifically to AIs that promote humane values. An FAI need not be "friendly" in the conventional sense of being personable, compassionate, or fun to hang out with. Indeed, an FAI need not even be sentient.

Any AGI that is not friendly is said to be Unfriendly.

An AI that underwent an intelligence explosion could exert unprecedented optimization power over its future. Therefore a Friendly AI could very well create an unimaginably good future, of the sort described in fun theory. However, the fact that an AI has the ability to do something doesn't mean that it will make use of this ability. Yudkowsky's Five Theses suggest that a recursively self-improving AGI could quickly become a superintelligence, and that most such superintelligences will have convergent instrumental reasons to endanger humanity and its interests. So while building a Friendly superintelligence seems possible, building a superintelligence will generally result instead in an Unfriendly AI, a powerful optimization process that optimizes for extremely harmful outcomes. An Unfriendly AI could represent an existential risk even if it destroys humans, not out of hostility, but as a side effect of trying to do something entirely different.

Not all AGIs are Friendly or Unfriendly:

  1. Some AGIs may be too weak to qualify as superintelligences. We could call these 'approximately human-level AIs'. Designing safety protocols for narrow AIs and weak, non-self-modifying AGIs is primarily a machine ethics problem outside the purview of Friendly AI.
  2. Some AGIs (e.g., safe Oracle AIs) may not optimize strongly and consistently for harmful or beneficial outcomes, or may only do so contingent on how they're used by human operators.
  3. Some AGIs may be on a self-modification trajectory that will eventually make them Friendly, but are dangerous at present. Calling them 'Friendly' or 'Unfriendly' would neglect their temporal inconsistency, so 'Proto-Friendly AI' is a better term here.

However, the orthogonality and convergent instrumental goals theses give reason to think that the vast majority of possible superintelligences will be Unfriendly.

Requiring Friendliness makes the AGI problem significantly harder, because 'Friendly AI' is a much narrower class than 'AI'. Most approaches to AGI aren't amenable to implementing precise goals, and so don't even constitute subprojects for FAI, leading to Unfriendly AI as the only possible 'successful' outcome. Specifying Friendliness also presents unique technical challenges: humane values are very complex; a lot of seemingly simple-sounding normative concepts conceal hidden complexity; and locating encodings of human values in the physical world seems impossible to do in any direct way. It will likely be technologically impossible to specify humane values by explicitly programming them in; if so, then FAI calls for a technique for generating such values automatically.

Open problems

An open problem in Friendly AI (OPFAI) is a problem in mathematics, computer science, or philosophy of AI that needs to be solved in order to build a Friendly AI, and plausibly doesn't need to be solved in order to build a superintelligence with unspecified, 'random' values. Open problems include:

  1. Pascal's mugging / Pascal's muggle
  2. Self-modification and Löb's Theorem
  3. Naturalized induction

Blog posts

External links

See also

References