AI boxing

From Lesswrongwiki
Jump to: navigation, search

It has often been proposed that as long as an Artificial General Intelligence is physically isolated and restricted, or boxed, it can do little harm. However, since an AGI may be far smarter than any person interacting with it, the AGI may be able to influence any user to let them out of their box, and human control.

AI Boxing is often discussed in the context of Oracle AI, but not exclusively.

A number of strategies for boxing are discussed in Thinking Inside the Box. Among them are:

  • Physically isolating the AI
  • Permitting the AI access to no computerized machines
  • Limiting the AI’s outputs
  • Periodic resets of the AI's memory
  • An interface between the real world and the AI where it would reveal its unfriendly intentions first
  • Motivational control, using a variety of techniques

Both Eliezer Yudkowsky and Justin Corwin have ran simulations, pretending to be a superintelligence, and been able to convince a human playing a guard to let them out on many - but not all - occasions. Eliezer's five experiments required the guard to listen for at least two hours with participants who had approached him, while Corwin's 26 experiments had no time limit and subjects he approached.

See Also


The Experiments