AIXI

From Lesswrongwiki
Revision as of 23:31, 23 August 2012 by JoshuaFox (talk | contribs)
Jump to: navigation, search

AIXI is a mathematical formalism for a maximally intelligent hypothetical agent, developed by Marcus Hutter (2005, 2007). As AIXI is not computable--though computable variants exist--it does not serve not as a design for a real-world AI. But it is valuable as a theoretical model of intelligence, since it abstracts away resource limitations that limit the intelligence of and complicate the analysis of real-world AI. It works by simulating all possible actions into the future to find the best one, considering simpler hypotheses about the way the world works as more likely.

Hutter (2007) describes AIXI as a combination of decision theory and algorithmic information theory: "Decision theory formally solves the problem of rational agents in uncertain worlds if the true environmental prior probability distribution is known. Solomonoff’s theory of universal induction formally solves the problem of sequence prediction for unknown prior distribution. We combine both ideas and get a parameterless theory of universal Artificial Intelligence."

AIXI operates within the following agent model: There is an *agent*, and an *environment*, which is a computable function unknown to the agent. (So, the agent will need to have a probability distribution on the range of possible environments.) On each clock tick, the agent receives an *observation* (a bitstring/number) from the environment, as well as a reward (another number). The agent then outputs an *action* (another number). Then, on each iteration, the environment provides an observation and reward as a function of the full history of the interaction; the agent likewise is choosing its action as a function of the full history. The agent's intelligence is defined by its expected reward across all environments, weighting their likelihood by their complexity.

AIXI guesses at a probability distribution for its environment using Solomonoff induction, a formalization of Occam's razor: Simpler environments are considered to be more likely than more complex ones. It then calculates the expected reward of each action it might choose--weighting the likelihood of possible environments as mentioned--and chooses the best action. It does this calculation by extrapolating its actions into the future recursively, using the assumption that at each step into the future it will again choose the best possible action using the same procedure.

AIXI is provably the most intelligent unbiased agent. However, it is not a feasible AI, because Solomonoff induction is not computable; and because it evaluates expected value over an infinite set of possible actions on each iteration. However, it has served to inspire a computable variant, AIXItl. Given constraints on time and space, AIXItl is provably more intelligent than any other agent of a given length. AIXItl too is intractable, but implementable variants such as the Monte Carlo approximation by Veness et al. (2011) have shown promising results in simple general-intelligence test problems.

Eliezer Yudkowsky has pointed out a limitation in AIXI called the "Anvil problem": AIXI lacks a self-model. It extrapolates its own actions into the future indefinitely, on the assumption that it will keep working in the same way in the future. Though AIXI is an abstraction, any real AI would have a physical embodiment that could be damaged, and an implementation which could change its behavior due to bugs; and the AIXI formalism completely ignores these possibilities.


References

Blog posts

See also