Bayesian Algorithm Execution: Estimating Computable Properties of Black-box Functions Using Mutual Information

Abstract

In many real world problems, we want to infer some property of an expensive black-box function \(f\), given a budget of \(T\) function evaluations. One example is budget constrained global optimization of \(f\), for which Bayesian optimization is a popular method. Other properties of interest include local optima, level sets, integrals, or graph-structured information induced by \(f\). Often, we can find an algorithm \(\mathcal{A}\) to compute the desired property, but it may require far more than \(T\) queries to execute. Given such an \(\mathcal{A}\), and a prior distribution over \(f\), we refer to the problem of inferring the output of \(\mathcal{A}\) using \(T\) evaluations as Bayesian Algorithm Execution (BAX). To tackle this problem, we present a procedure, InfoBAX, that sequentially chooses queries that maximize mutual information with respect to the algorithm's output. Applying this to Dijkstra's algorithm, for instance, we infer shortest paths in synthetic and real-world graphs with black-box edge costs. Using evolution strategies, we yield variants of Bayesian optimization that target local, rather than global, optima. On these problems, InfoBAX uses up to 500 times fewer queries to \(f\) than required by the original algorithm. Our method is closely connected to other Bayesian optimal experimental design procedures such as entropy search methods and optimal sensor placement using Gaussian processes.

Example

This video shows InfoBAX with Dijkstra's algorithm to estimate shortest paths in graphs with expensive edge-weight queries. The goal is to estimate the shortest path between the purple and gold squares (where dashed line is the true shortest path), using a limited budget of expensive edge queries (black dots). See details below.

Project Overview

Suppose we have an expensive and possibly-noisy black-box function \(f\).

We can query (evaluate) \(f\) at an input \(x\), but we don't know the function. In many fields, a common task is to find a global optima of \(f\) via a sequence of queries—and a popular approach is known as Bayesian optimization.

However, suppose we want to estimate some other property of \(f\). Here are some example function properties of interest (see also the image at the top of this page):

Now suppose this function property of interest is computable. This means there exists some algorithm \(\mathcal{A}\) that can be run on \(f\) and returns the property as output (regardless of how many queries \(\mathcal{A}\) requires).

For example, if our property is the local minimum of \(f\), there exist a variety of local optimization algorithms (e.g. gradient descent, evolutionary algorithms, Nelder-Mead method) that return a local minimum of \(f\).

There are many computable function properties, such as optima variants (global, local, top-\(k\)), Pareto frontiers, sub/super-level sets, phase boundaries, roots, integrals, expectations, graph properties, and more.

Bayesian algorithm execution (BAX) is the task of efficiently estimating a computable function property, via a probabilistic model (i.e. extending Bayesian optimization to function properties beyond global optima). Equivalently, we can view BAX as the task of estimating the output of the algorithm \(\mathcal{A}\) using as few function queries as possible (regardless of how many queries \(\mathcal{A}\) requires to execute).

We've developed an information-based method for BAX called InfoBAX, which sequentially chooses queries that maximize mutual information with respect to the algorithm's output:

To optimize the acquisition function (line 2), there are two stages:

Run the algorithm \(\mathcal{A}\) on posterior function samples to get a set of execution path samples.
Using the cached set of execution path samples, approximate the expected information gain (EIG) for any input \(x\). See the paper for details.

Here is an illustration of these two stages:

This is similar to information-based methods for Bayesian optimization, such as predictive entropy search, and it allows the acquisition function to be optimized efficiently!

After running InfoBAX, we have a posterior estimate of the computable function property (or, equivalently, a posterior estimate of the output of algorithm \(\mathcal{A}\)).

Here's the full story in one figure:

Some Experimental Results

Estimating shortest paths in graphs

Suppose we have a graph with edge weights defined by a black-box function. We can view the shortest path between two nodes in this graph as a property of this function. To compute the shortest path, we could use a classical graph traversal algorithm such as Dijkstra's algorithm, which iteratively accesses edge weights during its execution.

Suppose, however, that accessing edge weights is expensive. For example, in real-world transportation networks—such as road, air, or shipping networks—accessing each edge weight might involve a costly measurement (e.g. via satellite) or price negotiation. In these cases, a naive deployment of Dijkstra's algorithm may use many more edge queries than our budget allows. Instead, we aim to use InfoBAX to estimate the output of Dijkstra's algorithm using the fewest possible number of edge queries.

As a concrete example, suppose we have a grid-shaped graph, where a black-box function over edge weights is shown in green. We can plot the set of edges accessed by Dijkstra's algorithm (black dots) and the shortest path output of the algorithm (dashed black line) between the start and goal nodes (purple and gold boxes).

Here, Dijkstra's algorithm requires querying over 300 edge weights. To reduce this number, we could instead try the following: query a smaller set of edge weights and infer the shortest path using a probabilistic model. In this setting, InfoBAX is designed to choose edge queries to most efficiently infer the shortest path.

We compare different sampling strategies for estimating the output of Dijkstra's algorithm, such as random search (left), uncertainty sampling (middle), or InfoBAX (right). For each method, the distribution over inferred shortest paths is shown in blue.

Using InfoBAX, we can infer the shortest path with high accuracy using a much smaller set of edge queries.

Here is an animation showing InfoBAX with Dijkstra's algorithm on a larger grid graph:

Bayesian local optimization

While Bayesian optimization is a popular method for global optimization, there also exist many local optimization algorithms such as evolution strategies, the Nelder-Mead method, and finite-difference gradient descent, which may show strong performance in certain settings—such as in high-dimensions—when function evaluations are cheap and many queries can be made.

We instead propose to run InfoBAX with a local optimization algorithm. This yields a variant of Bayesian optimization for expensive black-box functions that targets local, rather than global, optima. The main idea is that we approach Bayesian optimization as the task of inferring the output of a local optimization algorithm run on \(f\), rather than inferring a global optima of \(f\).

As a concrete example, suppose we have a two-dimensional black-box function \(f\), shown in green, with three optima (yellow squares). We can plot the set of queries made by an evolution strategy algorithm (black dots), and the algorithm output (pink star).

Here, this evolution strategy algorithm makes over 200 function queries. To reduce this number, we can apply InfoBAX, and make queries to estimate the output of the evolution strategy.

We show the sampling pattern after 18 queries chosen by InfoBAX (right), random search (left), and max-value entropy search (middle), a Bayesian (global) optimization algorithm.

Here is an animation showing InfoBAX with an evolution strategy algorithm. Posterior samples of the algorithm execution path and output are respectively shown as blue and purple dots, and the next query (i.e. acquisition function optimizer) is shown as a pink dot.

Top-\(k\) estimation

Given a finite set of points \(X\), we may wish to estimate the subset of \(k\) points that have the highest value under a black-box function \(f\). This generalizes standard Bayesian optimization (i.e. top-1 estimation) to top-\(k\) estimation. For example, this type of task appears in materials discovery and interactive learning problems.

If we had an unlimited budget, we could simply deploy a top-\(k\) algorithm: scan through each point \(x \in X\), query its value \(f(x)\), and afterwards sort the points in decreasing order by their values and return the first \(k\) points. However, this requires us to query each point in \(X\).

Instead we will run InfoBAX with this top-\(k\) algorithm. In the following animation, each point in the set \(X\) is shown as a grey dot, and the (ground truth) top-\(k\) points in \(X\) are shown as gold stars. At each iteration, posterior samples of the top-\(k\) points are shown by blue squares, and the next query (i.e. acquisition function optimizer) is shown as a pink dot. Intuitively, we want the blue squares to concentrate their mass on the gold stars in as few queries as possible.

BibTeX

@inproceedings{neiswanger2021bayesian,
  title         = {Bayesian Algorithm Execution: Estimating Computable Properties of Black-box Functions Using Mutual Information},
  author        = {Neiswanger, Willie and Wang, Ke Alexander and Ermon, Stefano},
  booktitle     = {International Conference on Machine Learning},
  year          = {2021},
  organization  = {PMLR}
}