Generalizing Bayesian Optimization with Decision-theoretic Entropies

Stanford University
*The first two authors contributed equally to this work.

One-sentence summary: We develop a Bayesian optimization procedure based on a decision-theoretic generalization of entropy, which can be tailored to custom optimization and other sequential decision making tasks.

Abstract

Bayesian optimization (BO) is a popular method for efficiently inferring optima of an expensive black-box function via a sequence of queries. Existing information-theoretic BO procedures aim to make queries that most reduce the uncertainty about optima, where the uncertainty is captured by Shannon entropy. However, an optimal measure of uncertainty would, ideally, factor in how we intend to use the inferred quantity in some downstream procedure. In this paper, we instead consider a generalization of Shannon entropy from work in statistical decision theory (DeGroot 1962, Rao 1984), which contains a broad class of uncertainty measures parameterized by a problem-specific loss function corresponding to a downstream task. We first show that special cases of this entropy lead to popular acquisition functions used in BO procedures such as knowledge gradient, expected improvement, and entropy search. We then show how alternative choices for the loss yield a flexible family of acquisition functions that can be customized for use in novel optimization settings. Additionally, we develop gradient-based methods to efficiently optimize our proposed family of acquisition functions, and demonstrate strong empirical performance on a diverse set of sequential decision making tasks, including variants of top-\(k\) optimization, multi-level set estimation, and sequence search.

Example

The following figures illustrate a few example acquisition functions as special cases of our framework, with their corresponding Bayes actions \(a^*\) visualized. For each, we write the associated action set \(\mathcal{A}\) and loss function \(\ell\) below the plot. In each plot, the true function is a solid black line, the posterior mean is a red dashed line, the observed data are black dots, and the Bayes action is shown in gold.

description
description

BibTeX

@article{neiswanger2022generalizing,
  title         = {Generalizing Bayesian Optimization with Decision-theoretic Entropies},
  author        = {Neiswanger, Willie and Yu, Lantao and Zhao, Shengjia and Meng, Chenlin and Ermon, Stefano},
  journal       = {Advances in Neural Information Processing Systems},
  year          = {2022}
}