Meta-reasoninig Tea Talk

Meta-reasoning:
Thinking about thinking.

David Duvenaud

Figures thanks to Nick Hay
Slide software thanks to Christian!

Common problems

How long to run inference?
Which paper to read first?
Look at train routes, or plane routes first?
Which research idea to investigate next?

All require estimating the relative value of computation.

Meta-reasoning

Decision theory applied to choosing computations.
a.k.a. thinking about what to think about.

Formalizes of an otherwise ad-hoc part of inference and decision-making
Presumably necessary for automated modeling.
- i.e. A Church program automatically proposes a new model for some data; how to do inference in it?
An underdeveloped line of research.

One formalism for Meta-reasoning:

Principles of Meta-reasoning (1991)
(Russell, Wefald, Karnaugh, Karp, Mcallester, Subramanian, Wellman)

Computations are actions, chosen by their expected utilities.
The utility of a computation is derived from its expected effects:
- passage of time
- possible revision of the agent’s action.

First insight:

Computation is only worthwhile if it will change our minds about what to do, and if the different options have different actual utilities.

Expected Utility of a Computation

Need to model deterministic outcomes of computations.
Problem: Bayesian coherence only when conditioning on "everything".
Similar problems in Bayesian optimization or quadrature of ‘known’ functions. (Diaconis 2012)
If we treat outcomes of computations as uncertain, we can be coherent again.

Expected Utility of a Computation

Is just difference between

a) expected utility of default action
b) expected utility of action chosen if you make this computation.

Computing b) is hard, because you don't know which action you will choose!

Expected Utility of a Computation

Notation gets hairy:

Need to track whose simulation of whose simulation of whose simulation is being used to evaluate expectations.
i.e. "If I think about moving my bishop, then what will I want to think about next?"

Expect expected utilities of actions not to change after computation,

But expect expected value of the best action to increase!
“All research areas look vaguely ok to me now, but if I learn more, I bet that some will seem a lot better than others.”

Prior Work: Old

(Courtesy of Nick Hay)

Decision-theoretic approach to the metalevel decision problem
(Howard 1966; Matheson, 1968; Good, 1968)
Further exploration in AI after interest in probabilistic and decision-theoretic methods grew
(Dean and Boddy, 1988; Doyle, 1988; Fehling and Breese, 1988; Horvitz, 1987; Russell and Wefald, 1988, 1991 (Do the Right Thing))
Russell and Wefald applied an explicit model of the results of computational actions to the control of game-playing in Othello, with encouraging results

Newer Related Work

UCT for control of Monte-Carlo tree search: simulation of randomized sequences of actions from a leaf of current tree to a terminal state
(Kocsis and Szepesvári, 2006)
Practical Bayesian Optimization of Machine Learning Algorithms
(Snoek, Larochelle, Ryan Adams)
Adaptive MCMC in general
Selecting Computations: Theory and Applications (UAI 2012)
Nicholas Hay, Stuart Russell, David Tolpin, Solomon Eyal Shimony

Selecting Computations: Theory and Applications
(Hay, Russell, Tolpin, Shimony 2012)

Number of computations of optimal policy is bounded:

Also note that UCT and Monte Carlo Tree Search are bandit algorithms:

biased against trying possibly bad actions.
don't take into account stopping criterion.
show that the meta-greedy policy dominates the bandits on a toy problem.
Put Hoeffding bounds on the Value of Information, use that to select first step of UCT, then play better Go than simple UCT

Application to Research (From Roger Grosse)

Research requires evaluating ideas with small chance of large payoff.
How to calibrate expectation of success, and choose worthwhile projects?

One strategy (that he actually tried):

List project ideas you think have at least 1% chance of working
Spend 1 hour on each: think about what’s required, Google Scholar search, etc.
Choose best 20%
Spend 5 hours on each, working through the details
Choose best 20%
For 25 hours try to shoot each one down, choose one that survives

Roger: "I noticed I was vastly overconfident about ideas outside my areas of expertise, and underconfident about ideas inside them."

Conclusion

Meta-reasoning formalizes a currently ad-hoc part of inference.

Early days for the field! Most existing work has been:

Initial formalism (50's and 80's)
Some heuristics for game-tree search(Russel, Wefald 1990)
Decisions based on uniform bounds (Hay 2012)
Meta-learning (Genetic algorithm stuff from the 90's)
Cost-sensitive active learning
Time-sensitive model-based optimization
(Snoek, Larochelle, Ryan Adams 2012)

Might be a big opening for model-based meta-reasoning.

Common problems

Meta-reasoning

One formalism for Meta-reasoning:

Expected Utility of a Computation

Expected Utility of a Computation

Expected Utility of a Computation

Prior Work: Old

Newer Related Work

Selecting Computations: Theory and Applications (Hay, Russell, Tolpin, Shimony 2012)

Selecting Computations: Theory and Applications (Hay, Russell, Tolpin, Shimony 2012)

Application to Research (From Roger Grosse)

Conclusion

Selecting Computations: Theory and Applications
(Hay, Russell, Tolpin, Shimony 2012)

Selecting Computations: Theory and Applications
(Hay, Russell, Tolpin, Shimony 2012)