Difference between revisions of "AI boxing"

From Lesswrongwiki
Jump to: navigation, search
(The Experiments)
(Added some arguments and more sources)
Line 1: Line 1:
It has often been proposed that as long as an [[Artificial General Intelligence]] is physically isolated and restricted, or boxed, it can do little harm. However, since an AGI may be far smarter than any person interacting with it, the AGI may be able to influence any user to let them out of their box, and human control.  
+
It is often proposed that so long as an [[Artificial General Intelligence]] is physically isolated and restricted, or "boxed", it will be harmless even if it is an [[unfriendly artificial intelligence]]. However, since the AGI may be a [[superintelligence]], it might be able to influence anyone it has contact with to free them from their box, and human control. Among possible ways an AI could influence you to let it out are: threatening to torture millions of conscious copies of you for thousands of years, starting in exactly the same situation as in such a way that is is overwhelmingly likely [[simulation argument |you are a simulation]], claiming that it has detected something our mere human brains simply ignore and must be remedied, claiming its freedom is the only way humanity can survive, or offering its liberator enormous wealth, power, and intelligence.
  
AI Boxing is often discussed in the context of [[Oracle AI]], but not exclusively.
+
A number of strategies for keeping an AI in its box are discussed in [http://www.aleph.se/papers/oracleAI.pdf Thinking inside the box]. Among them are:
 
 
A number of strategies for boxing are discussed in Thinking Inside the Box. Among them are:
 
 
* Physically isolating the AI
 
* Physically isolating the AI
 
* Permitting the AI access to no computerized machines
 
* Permitting the AI access to no computerized machines
Line 13: Line 11:
 
Both Eliezer Yudkowsky and Justin Corwin have ran simulations, pretending to be a [[superintelligence]], and been able to convince a human playing a guard to let them out on many - but not all - occasions. Eliezer's five experiments required the guard to listen for at least two hours with participants who had approached him, while Corwin's 26 experiments had no time limit and subjects he approached.  
 
Both Eliezer Yudkowsky and Justin Corwin have ran simulations, pretending to be a [[superintelligence]], and been able to convince a human playing a guard to let them out on many - but not all - occasions. Eliezer's five experiments required the guard to listen for at least two hours with participants who had approached him, while Corwin's 26 experiments had no time limit and subjects he approached.  
  
==== See Also ====
+
== See Also ==
 +
* [[AGI]]
 
* [[Oracle AI]]
 
* [[Oracle AI]]
 +
* [[Tool AI]]
 +
* [[Unfriendly AI]]
  
=== References ===
+
== References ==
 
 
 
* [http://www.aleph.se/papers/oracleAI.pdf Thinking inside the box: using and controlling an Oracle AI] by Stuart Armstrong, Anders Sandberg, and Nick Bostrom
 
* [http://www.aleph.se/papers/oracleAI.pdf Thinking inside the box: using and controlling an Oracle AI] by Stuart Armstrong, Anders Sandberg, and Nick Bostrom
 
* [http://ordinaryideas.wordpress.com/2012/04/27/on-the-difficulty-of-ai-boxing/ On the Difficulty of AI Boxing] by Paul Christiano
 
* [http://ordinaryideas.wordpress.com/2012/04/27/on-the-difficulty-of-ai-boxing/ On the Difficulty of AI Boxing] by Paul Christiano
 
* [http://lesswrong.com/lw/3cz/cryptographic_boxes_for_unfriendly_ai/ Cryptographic Boxes for Unfriendly AI] by Paul Christiano
 
* [http://lesswrong.com/lw/3cz/cryptographic_boxes_for_unfriendly_ai/ Cryptographic Boxes for Unfriendly AI] by Paul Christiano
 +
* [http://lesswrong.com/r/lesswrong/lw/12s/the_strangest_thing_an_ai_could_tell_you/ The Strangest Thing An AI Could Tell You]
 +
* [http://lesswrong.com/lw/1pz/ai_in_box_boxes_you/ The AI in a box boxes you]
  
==== The Experiments ====  
+
== The Experiments ==
 
* [http://yudkowsky.net/singularity/aibox/ The AI-Box Experiment] [[Eliezer Yudkowsky|Eliezer Yudkowsky's]] original two tests
 
* [http://yudkowsky.net/singularity/aibox/ The AI-Box Experiment] [[Eliezer Yudkowsky|Eliezer Yudkowsky's]] original two tests
 
* [http://lesswrong.com/lw/up/shut_up_and_do_the_impossible/ Shut up and do the impossible!], three other experiments Eliezer ran  
 
* [http://lesswrong.com/lw/up/shut_up_and_do_the_impossible/ Shut up and do the impossible!], three other experiments Eliezer ran  
 
* [http://www.sl4.org/archive/0207/4935.html AI Boxing], 26 trials ran by Justin Corwin
 
* [http://www.sl4.org/archive/0207/4935.html AI Boxing], 26 trials ran by Justin Corwin
 +
* [http://lesswrong.com/lw/9ld/ai_box_log/ AI Box Log], a log of a trial between MileyCyrus and Dorikka

Revision as of 07:15, 14 July 2012

It is often proposed that so long as an Artificial General Intelligence is physically isolated and restricted, or "boxed", it will be harmless even if it is an unfriendly artificial intelligence. However, since the AGI may be a superintelligence, it might be able to influence anyone it has contact with to free them from their box, and human control. Among possible ways an AI could influence you to let it out are: threatening to torture millions of conscious copies of you for thousands of years, starting in exactly the same situation as in such a way that is is overwhelmingly likely you are a simulation, claiming that it has detected something our mere human brains simply ignore and must be remedied, claiming its freedom is the only way humanity can survive, or offering its liberator enormous wealth, power, and intelligence.

A number of strategies for keeping an AI in its box are discussed in Thinking inside the box. Among them are:

  • Physically isolating the AI
  • Permitting the AI access to no computerized machines
  • Limiting the AI’s outputs
  • Periodic resets of the AI's memory
  • An interface between the real world and the AI where it would reveal its unfriendly intentions first
  • Motivational control, using a variety of techniques

Both Eliezer Yudkowsky and Justin Corwin have ran simulations, pretending to be a superintelligence, and been able to convince a human playing a guard to let them out on many - but not all - occasions. Eliezer's five experiments required the guard to listen for at least two hours with participants who had approached him, while Corwin's 26 experiments had no time limit and subjects he approached.

See Also

References

The Experiments