Difference between revisions of "Reflective decision theory"

From Lesswrongwiki
Jump to: navigation, search
m (fixed link)
 
(11 intermediate revisions by 2 users not shown)
Line 1: Line 1:
'''Reflective decision theory''' is a term occasionally used to refer to a decision theory that would allow an agent to take actions in a way that they do not trigger regret. This regret is conceptualized, according to the [[Causal Decision Theory]], as a [[Reflective inconsistency]], a divergence between the agent who took the action and the ''same'' agent reflecting upon it after.
+
'''Reflective decision theory''' is a term occasionally used to refer to a decision theory that would allow an agent to take actions in a way that does not trigger regret. This regret is conceptualized, according to the [[Causal Decision Theory]], as a [[Reflective inconsistency]], a divergence between the agent who took the action and the ''same'' agent reflecting upon it after.
  
Many hypothesized [[AGI]]s are expected to be powerful specifically due to an ability to access their own source code and self-modify. Because such an AGI could change its decision algorithm in a situation like Newcomb's Problem, it is necessary to develop a reflectively consistent decision theory to understand the AGI's behavior. Particularly, reflective consistency would be needed to ensure that an AGI preserved a [[Friendly Artificial Intelligence|Friendly]] value system throughout its self-modifications.
+
==The Newcomb's Problem example==
 +
This problem represents the best example of what [[Eliezer Yudkowsky]] calls the [http://lesswrong.com/lw/nc/newcombs_problem_and_regret_of_rationality/ regret of rationality].  
 +
Simply put, consider an alien superintelligence that comes to you and wants to play a simple game:
  
For the reasons above, this is a topic of interest to SIAI's research team. Proposed solutions include Eliezer Yudkowsky's [[Timeless Decision Theory]].
+
:He sets two boxes in front of you - Box A and Box B.
 +
:Box A is transparent and has 1000 dollars inside. Box B is opaque and can contain 1000000 dollars or nothing.
  
 +
:You can choose to take ''both'' boxes or to take only Box B.
 +
 +
:The catch is: this superintelligence is a Predictor (which has been making ''correct'' predictions), and will only put the 1000000 dollars in Box B if, and only if, it predicts you will choose Box B.
 +
 +
:By the time you decide, the alien has already made the prediction and left the scene, and you are faced with the choice. A or B?
 +
 +
The dominant view in the literature regards chosing ''both'' boxes as the more rational decision, although the alien actually rewards irrational agents. When considering thought experiments such as this, it's suggested that a sufficiently powerful [[AGI]] would solve it by being able to access its own source code and to self-modify. This would allow it to alter its own behavior and decision process, beating the paradox through the definition of a ''precommitment'' to a certain choice in such situations.
 +
 +
In order for us to understand the AGI's behavior in this and other situations and to be able to implement it, we will have to create a reflectively consistent decision theory. Particularly, reflective consistency would be needed to ensure that it preserved a [[Friendly Artificial Intelligence|friendly]] value system throughout its self-modifications.
 +
 +
Eliezer Yudkowsky's has proposed a theoretical solution to the reflective decision theory problem in his [[Timeless Decision Theory]].
 +
 +
==Further Reading & References==
 +
*[http://intelligence.org/files/TDT.pdf Timeless Decision Theory] by Eliezer Yudkowsky
 +
*[http://johncarlosbaez.wordpress.com/2011/03/07/this-weeks-finds-week-311/ Interview] of Eliezer Yudkowsky by John Baez, March 7th, 2011
  
 
==See also==
 
==See also==
Line 12: Line 30:
 
*[[Timeless Decision Theory]]
 
*[[Timeless Decision Theory]]
 
*[[Complexity of value]]
 
*[[Complexity of value]]
 
==External links==
 
 
*[http://intelligence.org/research/researchareas SIAI research areas]
 
*[http://intelligence.org/upload/TDT-v01o.pdf Timeless Decision Theory] by Eliezer Yudkowsky
 
*[http://johncarlosbaez.wordpress.com/2011/03/07/this-weeks-finds-week-311/ interview] of Eliezer Yudkowsky by John Baez, March 7th, 2011
 
  
 
[[Category:Concepts]]
 
[[Category:Concepts]]

Latest revision as of 07:43, 23 May 2013

Reflective decision theory is a term occasionally used to refer to a decision theory that would allow an agent to take actions in a way that does not trigger regret. This regret is conceptualized, according to the Causal Decision Theory, as a Reflective inconsistency, a divergence between the agent who took the action and the same agent reflecting upon it after.

The Newcomb's Problem example

This problem represents the best example of what Eliezer Yudkowsky calls the regret of rationality. Simply put, consider an alien superintelligence that comes to you and wants to play a simple game:

He sets two boxes in front of you - Box A and Box B.
Box A is transparent and has 1000 dollars inside. Box B is opaque and can contain 1000000 dollars or nothing.
You can choose to take both boxes or to take only Box B.
The catch is: this superintelligence is a Predictor (which has been making correct predictions), and will only put the 1000000 dollars in Box B if, and only if, it predicts you will choose Box B.
By the time you decide, the alien has already made the prediction and left the scene, and you are faced with the choice. A or B?

The dominant view in the literature regards chosing both boxes as the more rational decision, although the alien actually rewards irrational agents. When considering thought experiments such as this, it's suggested that a sufficiently powerful AGI would solve it by being able to access its own source code and to self-modify. This would allow it to alter its own behavior and decision process, beating the paradox through the definition of a precommitment to a certain choice in such situations.

In order for us to understand the AGI's behavior in this and other situations and to be able to implement it, we will have to create a reflectively consistent decision theory. Particularly, reflective consistency would be needed to ensure that it preserved a friendly value system throughout its self-modifications.

Eliezer Yudkowsky's has proposed a theoretical solution to the reflective decision theory problem in his Timeless Decision Theory.

Further Reading & References

See also