Back to LessWrong

Solomonoff induction

From Lesswrongwiki

Jump to: navigation, search
Wikipedia has an article about

Ray Solomonoff defined an inference system that will learn to correctly predict any computable sequence with only the absolute minimum amount of data. This system, in a certain sense, is the perfect universal prediction algorithm. To summarize it very informally, Solomonoff induction works by:

  • Starting with all possible hypotheses (sequences) as represented by computer programs (that generate those sequences), weighted by their simplicity (2-n, where n is the program length);
  • Discarding those hypotheses that are inconsistent with the data.

Weighting hypotheses by simplicity, the system automatically incorporates a form of Occam's razor, which is why it has been playfully referred to as Solomonoff's lightsaber.

Solomonoff induction gets off the ground with a solution to the "problem of the priors". Suppose that you stand before a universal prefix Turing machine U. You are interested in a certain finite output string y_{0}. In particular, you want to know the probability that U will produce the output y_{0} given a random input tape. This probability is the Solomonoff a priori probability of y_{0}.

More precisely, suppose that a particular infinite input string x_{0} is about to be fed into U. However, you know nothing about x_{0} other than that each term of the string is either 0 or 1. As far as your state of knowledge is concerned, the ith digit of x_{0} is as likely to be 0 as it is to be 1, for all i = 1, 2, \ldots. You want to find the a priori probability m(y_{0}) of the following proposition:

(*) If U takes in x_{0} as input, then U will produce output y_{0} and then halt.

Unfortunately, computing the exact value of m(y_{0}) would require solving the halting problem, which is undecidable. Nonetheless, it is easy to derive an expression for m(y_{0}). If U halts on an infinite input string x, then U must read only a finite initial segment of x, after which U immediately halts. We call a finite string p a self-delimiting program if and only if there exists an infinite input string x beginning with p such that U halts on x immediately after reading to the end of p. The set \mathcal{P} of self-delimiting programs is the prefix code for U. It is the determination of the elements of \mathcal{P} that requires a solution to the halting problem.

Given p \in \mathcal{P}, we write "\operatorname{prog}(x_{0}) = p" to express the proposition that x_{0} begins with p, and we write "U(p) = y_{0}" to express the proposition that U produces output y_{0}, and then halts, when fed any input beginning with p. Proposition (*) is then equivalent to the exclusive disjunction

\bigvee_{p \in \mathcal{P} \colon U(p) = y_{0}} (\operatorname{prog}(x_{0}) = p).

Since x_{0} was chosen at random from \{0, 1\}^{\omega}, we take the probability of \operatorname{prog}(x_{0}) = p to be 2^{-\ell(p)}, where \ell(p) is the length of p as a bit string. Hence, the probability of (*) is

m(y_{0}) := \sum_{p \in \mathcal{P} \colon U(p) = y_{0}} 2^{-\ell(p)}.

Blog posts

See also