Reflective decision theory

From Lesswrongwiki
Revision as of 10:15, 7 October 2012 by Pedrochaves (talk | contribs)
Jump to: navigation, search

Reflective decision theory is a term occasionally used to refer to a decision theory that would allow an agent to take actions in a way that they do not trigger regret. This regret is conceptualized, according to the Causal Decision Theory, as a Reflective inconsistency, a divergence between the agent who took the action and the same agent reflecting upon it after.

When considering though experiments such as Newcomb’s Problem, it has been suggested that a sufficiently powerful AGI would be able to access its own source code and self-modify. This would allow for the AGI to alter its own behavior and decision process, beating the paradox through the definition of a precommitment to a certain choice in such situations. In order for us to understand the AGI's behavior in this and other situations and to be able to implement it, we will have to create a reflectively consistent decision theory. Particularly, reflective consistency would be needed to ensure that an AGI preserved a friendly value system throughout its self-modifications.

Eliezer Yudkowsky's has proposed theoretical solution to the problem in his Timeless Decision Theory.

Further Reading & References

See also