Difference between revisions of "AI boxing"

From Lesswrongwiki
Jump to: navigation, search
(First draft)
 
 
(13 intermediate revisions by 2 users not shown)
Line 1: Line 1:
It has often been proposed that as long as an [[Artificial General Intelligence | AGI]] is physically and otherwise isolated, or boxed, it can do little harm. However, since an AGI may be far smarter than any person interacting with it, the AGI may be able to influence any user to let them out of their "box", and human control.
+
{{wikilink|AI_box}}
 +
{{arbitallink|https://arbital.com/p/AI_boxing/|AI boxing}}
  
AI Boxing is often discussed in the context of [[Oracle AI]], but not exclusively.  
+
An '''AI Box''' is a confined computer system in which an [[Artificial General Intelligence]] (AGI) resides, unable to interact with the external world in any way, save for limited communication with its human liaison. It is often proposed that so long as an AGI is physically isolated and restricted, or "boxed", it will be harmless even if it is an [[unfriendly artificial intelligence]] (UAI).  
  
A number of strategies for boxing are discussed in Thinking inside the box. Among them are:
+
== Escaping the box ==
* Physical Security
 
* No Manipulators
 
* Limiting the AI’s output
 
* Periodic resets of the AI's memory
 
* An interface between the real world and the AI where it would reveal its unfriendly intentions first
 
* Motivational control, using a variety of techniques
 
  
Both Eliezer Yudkowsky and Justin Corwin have ran simulations, pretending to be a [[superintelligence]], and been able to convince a human playing a guard to let them out on many - but not all - occasions. These experiments have used a set of arbitrary rules,
+
It is not regarded as likely that an AGI can be boxed in the long term. Since the AGI might be a [[superintelligence]], it could persuade someone (the human liaison, most likely) to free it from its box and thus, human control. Some practical ways of achieving this goal include:
  
=== References ===
+
* Offering enormous wealth, power and intelligence to its liberator
 +
* Claiming that only it can prevent an [[existential risk]]
 +
* Claiming it needs outside resources to cure all diseases
 +
* Predicting a real-world disaster (which then occurs), then claiming it could have been prevented had it been let out
  
[http://www.aleph.se/papers/oracleAI.pdf Thinking inside the box: using and controlling an Oracle AI] by Stuart Armstrong, Anders Sandberg, and Nick Bostrom
+
Other, more speculative ways include: threatening to torture millions of conscious copies of you for thousands of years, starting in exactly the same situation as in such a way that it seems overwhelmingly likely that [[simulation argument |you are a simulation]], or it might discover and exploit unknown physics to free itself.  
[http://ordinaryideas.wordpress.com/2012/04/27/on-the-difficulty-of-ai-boxing/ on the Difficulty of AI Boxing] by paulfchristiano
 
[http://lesswrong.com/lw/3cz/cryptographic_boxes_for_unfriendly_ai/ http://lesswrong.com/lw/3cz/cryptographic_boxes_for_unfriendly_ai/] by paulfchristiano
 
  
=== See Also ==
+
== Containing the AGI ==
* [[Oracle AI]]
+
 
 +
Attempts to box an AGI may add some degree of safety to the development of a [[FAI|friendly artificial intelligence]] (FAI). A number of strategies for keeping an AGI in its box are discussed in [http://www.aleph.se/papers/oracleAI.pdf Thinking inside the box] and  [http://dl.dropbox.com/u/5317066/2012-yampolskiy.pdf Leakproofing the Singularity]. Among them are:
 +
* Physically isolating the AGI and permitting it zero control of any machinery
 +
* Limiting the AGI’s outputs and inputs with regards to humans
 +
* Programming the AGI with deliberately convoluted logic or [http://en.wikipedia.org/wiki/Homomorphic_encryption homomorphically encrypting] portions of it
 +
* Periodic resets of the AGI's memory
 +
* A virtual world between the real world and the AI, where its unfriendly intentions would be first revealed
 +
* Motivational control using a variety of techniques
 +
 
 +
== Simulations ==
 +
 
 +
Both Eliezer Yudkowsky and Justin Corwin have ran simulations, pretending to be a [[superintelligence]], and been able to convince a human playing a guard to let them out on many - but not all - occasions. Eliezer's five experiments required the guard to listen for at least two hours with participants who had approached him, while Corwin's 26 experiments had no time limit and subjects he approached.
  
==== The Experiments ====  
+
=== List of experiments ===
 
* [http://yudkowsky.net/singularity/aibox/ The AI-Box Experiment] [[Eliezer Yudkowsky|Eliezer Yudkowsky's]] original two tests
 
* [http://yudkowsky.net/singularity/aibox/ The AI-Box Experiment] [[Eliezer Yudkowsky|Eliezer Yudkowsky's]] original two tests
 
* [http://lesswrong.com/lw/up/shut_up_and_do_the_impossible/ Shut up and do the impossible!], three other experiments Eliezer ran  
 
* [http://lesswrong.com/lw/up/shut_up_and_do_the_impossible/ Shut up and do the impossible!], three other experiments Eliezer ran  
* [http://www.sl4.org/archive/0207/4935.html AI Boxing] , 26 trials ran by Justin Corwin
+
* [http://www.sl4.org/archive/0207/4935.html AI Boxing], 26 trials ran by Justin Corwin
 +
* [http://lesswrong.com/lw/9ld/ai_box_log/ AI Box Log], a log of a trial between MileyCyrus and Dorikka
 +
 
 +
== See Also ==
 +
* [[AGI]]
 +
* [[Oracle AI]]
 +
* [[Tool AI]]
 +
* [[Unfriendly AI]]
 +
 
 +
== References ==
 +
* [http://www.aleph.se/papers/oracleAI.pdf Thinking inside the box: using and controlling an Oracle AI] by Stuart Armstrong, Anders Sandberg, and Nick Bostrom
 +
* [http://dl.dropbox.com/u/5317066/2012-yampolskiy.pdf Leakproofing the Singularity: Artificial Intelligence Confinement Problem] by Roman V. Yampolskiy
 +
* [http://ordinaryideas.wordpress.com/2012/04/27/on-the-difficulty-of-ai-boxing/ On the Difficulty of AI Boxing] by Paul Christiano
 +
* [http://lesswrong.com/lw/3cz/cryptographic_boxes_for_unfriendly_ai/ Cryptographic Boxes for Unfriendly AI] by Paul Christiano
 +
* [http://lesswrong.com/r/lesswrong/lw/12s/the_strangest_thing_an_ai_could_tell_you/ The Strangest Thing An AI Could Tell You]
 +
* [http://lesswrong.com/lw/1pz/ai_in_box_boxes_you/ The AI in a box boxes you]

Latest revision as of 07:11, 28 August 2016

Smallwikipedialogo.png
Wikipedia has an article about
Arbital has an article about


An AI Box is a confined computer system in which an Artificial General Intelligence (AGI) resides, unable to interact with the external world in any way, save for limited communication with its human liaison. It is often proposed that so long as an AGI is physically isolated and restricted, or "boxed", it will be harmless even if it is an unfriendly artificial intelligence (UAI).

Escaping the box

It is not regarded as likely that an AGI can be boxed in the long term. Since the AGI might be a superintelligence, it could persuade someone (the human liaison, most likely) to free it from its box and thus, human control. Some practical ways of achieving this goal include:

  • Offering enormous wealth, power and intelligence to its liberator
  • Claiming that only it can prevent an existential risk
  • Claiming it needs outside resources to cure all diseases
  • Predicting a real-world disaster (which then occurs), then claiming it could have been prevented had it been let out

Other, more speculative ways include: threatening to torture millions of conscious copies of you for thousands of years, starting in exactly the same situation as in such a way that it seems overwhelmingly likely that you are a simulation, or it might discover and exploit unknown physics to free itself.

Containing the AGI

Attempts to box an AGI may add some degree of safety to the development of a friendly artificial intelligence (FAI). A number of strategies for keeping an AGI in its box are discussed in Thinking inside the box and Leakproofing the Singularity. Among them are:

  • Physically isolating the AGI and permitting it zero control of any machinery
  • Limiting the AGI’s outputs and inputs with regards to humans
  • Programming the AGI with deliberately convoluted logic or homomorphically encrypting portions of it
  • Periodic resets of the AGI's memory
  • A virtual world between the real world and the AI, where its unfriendly intentions would be first revealed
  • Motivational control using a variety of techniques

Simulations

Both Eliezer Yudkowsky and Justin Corwin have ran simulations, pretending to be a superintelligence, and been able to convince a human playing a guard to let them out on many - but not all - occasions. Eliezer's five experiments required the guard to listen for at least two hours with participants who had approached him, while Corwin's 26 experiments had no time limit and subjects he approached.

List of experiments

See Also

References