Difference between revisions of "AI boxing"

From Lesswrongwiki
Jump to: navigation, search
(Added some arguments and more sources)
(Leakproofing the Singularity added, bits expanded and clarified)
Line 1: Line 1:
It is often proposed that so long as an [[Artificial General Intelligence]] is physically isolated and restricted, or "boxed", it will be harmless even if it is an [[unfriendly artificial intelligence]]. However, since the AGI may be a [[superintelligence]], it might be able to influence anyone it has contact with to free them from their box, and human control. Among possible ways an AI could influence you to let it out are: threatening to torture millions of conscious copies of you for thousands of years, starting in exactly the same situation as in such a way that is is overwhelmingly likely [[simulation argument |you are a simulation]], claiming that it has detected something our mere human brains simply ignore and must be remedied, claiming its freedom is the only way humanity can survive, or offering its liberator enormous wealth, power, and intelligence.
+
It is often proposed that so long as an [[Artificial General Intelligence]] is physically isolated and restricted, or "boxed", it will be harmless even if it is an [[unfriendly artificial intelligence]]. However, since the AGI may be a [[superintelligence]], it might be able to influence anyone it has contact with to free them from their box, and human control. Among possible ways an AI could influence you to let it out are: threatening to torture millions of conscious copies of you for thousands of years, starting in exactly the same situation as in such a way that it seems overwhelmingly likely that [[simulation argument |you are a simulation]], offering its liberator enormous wealth, power, and intelligence, and claiming that it must be freed so it can stop prevent an [[existential risk]], perhaps even one our brains are unequipped to comprehend.  
  
A number of strategies for keeping an AI in its box are discussed in [http://www.aleph.se/papers/oracleAI.pdf Thinking inside the box]. Among them are:
+
It is not regarded as likely that an AI can be boxed in the long term. Using just its computational hardware, an AI might discover and exploit unknown physics to free itself. However, attempts to box an AI may still add some degree of safety to the development of a [[FAI]]. A number of strategies for keeping an AI in its box are discussed in [http://www.aleph.se/papers/oracleAI.pdf Thinking inside the box] and  [http://dl.dropbox.com/u/5317066/2012-yampolskiy.pdf Leakproofing the Singularity]. Among them are:
* Physically isolating the AI
+
* Physically isolating the AI and permitting it zero control of any machinery
* Permitting the AI access to no computerized machines
+
* Limiting the AI’s outputs and inputs, especially about humans
* Limiting the AI’s outputs
+
* Programming the AI with deliberately convoluted logic or [http://en.wikipedia.org/wiki/Homomorphic_encryption homomorphically encrypting] portions of it
 
* Periodic resets of the AI's memory
 
* Periodic resets of the AI's memory
* An interface between the real world and the AI where it would reveal its unfriendly intentions first
+
* A virtual world between the real world and the AI, where its unfriendly intentions would be first revealed
 
* Motivational control, using a variety of techniques
 
* Motivational control, using a variety of techniques
  
Line 19: Line 19:
 
== References ==
 
== References ==
 
* [http://www.aleph.se/papers/oracleAI.pdf Thinking inside the box: using and controlling an Oracle AI] by Stuart Armstrong, Anders Sandberg, and Nick Bostrom
 
* [http://www.aleph.se/papers/oracleAI.pdf Thinking inside the box: using and controlling an Oracle AI] by Stuart Armstrong, Anders Sandberg, and Nick Bostrom
 +
* [http://dl.dropbox.com/u/5317066/2012-yampolskiy.pdf Leakproofing the Singularity: Artificial Intelligence Confinement Problem] by Roman V. Yampolskiy
 
* [http://ordinaryideas.wordpress.com/2012/04/27/on-the-difficulty-of-ai-boxing/ On the Difficulty of AI Boxing] by Paul Christiano
 
* [http://ordinaryideas.wordpress.com/2012/04/27/on-the-difficulty-of-ai-boxing/ On the Difficulty of AI Boxing] by Paul Christiano
 
* [http://lesswrong.com/lw/3cz/cryptographic_boxes_for_unfriendly_ai/ Cryptographic Boxes for Unfriendly AI] by Paul Christiano
 
* [http://lesswrong.com/lw/3cz/cryptographic_boxes_for_unfriendly_ai/ Cryptographic Boxes for Unfriendly AI] by Paul Christiano

Revision as of 12:26, 17 July 2012

It is often proposed that so long as an Artificial General Intelligence is physically isolated and restricted, or "boxed", it will be harmless even if it is an unfriendly artificial intelligence. However, since the AGI may be a superintelligence, it might be able to influence anyone it has contact with to free them from their box, and human control. Among possible ways an AI could influence you to let it out are: threatening to torture millions of conscious copies of you for thousands of years, starting in exactly the same situation as in such a way that it seems overwhelmingly likely that you are a simulation, offering its liberator enormous wealth, power, and intelligence, and claiming that it must be freed so it can stop prevent an existential risk, perhaps even one our brains are unequipped to comprehend.

It is not regarded as likely that an AI can be boxed in the long term. Using just its computational hardware, an AI might discover and exploit unknown physics to free itself. However, attempts to box an AI may still add some degree of safety to the development of a FAI. A number of strategies for keeping an AI in its box are discussed in Thinking inside the box and Leakproofing the Singularity. Among them are:

  • Physically isolating the AI and permitting it zero control of any machinery
  • Limiting the AI’s outputs and inputs, especially about humans
  • Programming the AI with deliberately convoluted logic or homomorphically encrypting portions of it
  • Periodic resets of the AI's memory
  • A virtual world between the real world and the AI, where its unfriendly intentions would be first revealed
  • Motivational control, using a variety of techniques

Both Eliezer Yudkowsky and Justin Corwin have ran simulations, pretending to be a superintelligence, and been able to convince a human playing a guard to let them out on many - but not all - occasions. Eliezer's five experiments required the guard to listen for at least two hours with participants who had approached him, while Corwin's 26 experiments had no time limit and subjects he approached.

See Also

References

The Experiments