Reflective decision theory

From Lesswrongwiki
Revision as of 11:14, 7 October 2012 by Pedrochaves (talk | contribs)
Jump to: navigation, search

Reflective decision theory is a term occasionally used to refer to a decision theory that would allow an agent to take actions in a way that they do not trigger regret. This regret is conceptualized, according to the Causal Decision Theory, as a Reflective inconsistency, a divergence between the agent who took the action and the same agent reflecting upon it after.

When considering though experiments such as Newcomb’s Problem, it has been suggested that a sufficiently powerful AGI would be able to access its own source code and self-modify. This would allow for the AGI to alter its own behavior and decision process, beating the paradox through the definition of a precommitment to a certain choice. In order for us to understand the AGI's behavior in this and other situations and to be able to implement it, we will have to create a reflectively consistent decision theory. Particularly, reflective consistency would be needed to ensure that an AGI preserved a friendly value system throughout its self-modifications.

Eliezer Yudkowsky's has proposed theoretical solution to the problem in his Timeless Decision Theory.

Further Reading & References

See also