Abstract
Levin Tree Search (LTS) is a tree/graph search algorithm guided by a (conditional and contextual) probability distribution over action sequences. It comes with strong theoretical guarantees on the search effort required before finding a solution. It also comes with its own loss function (aka objective function) so that a parameterized policy can be learnt to minimize the search effort.
Previously (NeurIPS 2018, AAAI 2021) we have used neural networks to learn the policy, with good results, but NNs make it hard to derive convergence guarantees.
In this work we devise an alternative parameterized model, based on recent tools from the online compression literature, to ensure that the resulting search effort loss function remains convex, while making the model capable enough to tackle challenging search problems. Convexity has several important properties, and it particular it allows us to prove regret guarantees: given a stream of solution trajectories, the policy quickly converges to the optimum policy in the parameter space. The new model is able to combine predictions of both general and specific models, and, somewhat like NNs, can be overparameterized without losing generalization capabilities.
We tackle 4 classic search application domains: Sokoban (boxoban), The Witness, 24-puzzle, and Rubik's cube. All these domains are combinatorial and almost no instance can be solved with random walks. Most known machine learning algorithms fail on at least one of these domains.
We show that our new model combined with LTS matches and sometimes surpasses previous published results that use a neural network instead, and learn a low search-effort (i.e., fast) search policy to solve new instances of a test set.
Authors
Laurent Orseau, Marcus Hutter, Levi H. S. Lelis*
Venue
IJCAI 2023