BAX

Extending Bayesian optimization from estimating global optima to estimating other function properties defined by the output of algorithms

Bayesian Algorithm Execution: Estimating Computable Properties of Black-box Functions Using Mutual Information
Willie Neiswanger, Ke Alexander Wang, Stefano Ermon
International Conference on Machine Learning (ICML) 2021

Paper: arxiv.org/abs/2104.09460
Slides: pdf
Talk video: SLAC AI Seminar

Project Overview

Suppose we have an expensive and possibly-noisy black-box function f.

description

We can query (evaluate) f at an input x, but we don't know the function. In many fields, a common task is to find a global optima of f via a sequence of queries—and a popular approach is known as Bayesian optimization.

However, suppose we want to estimate some other property of f. Here are some example function properties of interest:

description

And suppose this function property of interest is computable. This means there exists some algorithm A that can be run on f and returns the property as output (regardless of how many queres A requires).

For example, if our property is the local minimum of f, there exist a variety of local optimization algorithms (e.g. gradient descent, evolutionary algorithms, Nelder-Mead method) that return a local minimum of f.

description

There are many computable function properties, such as optima (global, local, top-k), Pareto frontiers, sub/super-level sets, phase boundaries, roots, integrals, expectations, graph properties, and more.

Bayesian algorithm execution (BAX) is the task of efficiently estimating a computable function property, via a probabilistic model (i.e. extending Bayesian optimization to function properties beyond global optima). Equivalently, we can view BAX as the task of estimating the output of the algorithm A using as few function queries as possible (regardless of how many queries A requires to execute).

We've developed an information-based method for BAX called InfoBAX, which sequentially chooses queries that maximize mutual information with respect to the algorithm's output:

description

To optimize the acquisition function (line 2), there are two stages:

Run the algorithm A on posterior function samples to get a set of execution path samples.
Using the cached set of execution path samples, approximate the expected information gain (EIG) for any input x. See the paper for details.

Here is an illustration of these two stages:

description

This is similar to information-based methods for Bayesian optimization, such as predictive entropy search, and it allows the acquisition function to be optimized efficiently!

After running InfoBAX, we have a posterior estimate of the computable function property (or, equivalently, a posterior estimate of the output of algorithm A).

Some Experimental Results

Estimating shortest paths in graphs

Suppose we have a graph with edge weights defined by a black-box function. We can view the shortest path between two nodes in this graph as a property of this function. To compute the shortest path, we could use a classical graph traversal algorithm such as Dijkstra's algorithm, which iteratively accesses edge weights during its execution.

Suppose, however, that accessing edge weights is expensive. For example, in real-world transportation networks—such as road, air, or shipping networks—accessing each edge weight might involve a costly measurement (e.g. via satellite) or price negotiation. In these cases, a naive deployment of Dijkstra's algorithm may use many more edge queries than our budget allows. Instead, we aim to use InfoBAX to estimate the output of Dijkstra's algorithm using the fewest possible number of edge queries.

As a concrete example, suppose we have a grid-shaped graph, where a black-box function over edge weights is shown in green. We can plot the set of edges accessed by Dijkstra's algorithm (black dots) and the shortest path output of the algorithm (dashed black line) between the start and goal nodes (purple and gold boxes).

description

Here, Dijkstra's algorithm requires querying over 300 edge weights. To reduce this number, we could instead try the following: query a smaller set of edge weights and infer the shortest path using a probabilistic model. In this setting, InfoBAX is designed to choose edge queries to most efficiently infer the shortest path.

We compare different sampling strategies for estimating the output of Dijkstra's algorithm, such as random search (left), uncertainty sampling (middle), or InfoBAX (right). For each method, the distribution over inferred shortest paths is shown in blue.

description

Using InfoBAX, we can infer the shortest path with high accuracy using a much smaller set of edge queries.

Here's the full story in one figure:

description