## Overview

Computing infrastructure is increasing in complexity, and many software systems have hundreds or thousands of configuration parameters. For example,

- Applications: data store and processing tools can have hundreds of configuration knobs.
- Runtimes: modern JVMs have more than 700 tunable parameters.
- Environments: containers, OS kernels, and VMs have dozens of settings.

In order to tune these configurations, performance engineers or DevOps need to
manually study how different knob configurations affect desired performance
criteria, such as latency, throughput, and cost. This is especially challenging
because the performance of these systems tends to depend on the hardware and
the workload, whose characteristics may be entirely or partly unknown. With
increasingly complex systems and an explosion in the number of such systems
deployed in an organization, this approach is not scalable. See *Figure 1* for
examples of configuration knobs at various infrastructure layers.

**Figure 1.** Examples of performance criteria to optimize and knobs to tune at
various infrastructure layers.

## Methods

Our approach is to perform efficient experimentation with different configuration parameters in a staging or pre-production environment and choose those that optimize for the desired criteria. These tuned systems can then be deployed in production. Since each experiment can be quite expensive, our methods are designed to find optimal parameters in as few experiments as possible.

To choose these experiments, we are developing Bayesian optimization (aka
bandits with Bayesian models) algorithms for efficient optimization of real
world systems. The algorithms maintain a statistical model that predicts
outcomes for unexplored configurations and quantifies uncertainty about
predictions. This model feeds into our design recommendation algorithm, which
uses these predictions and uncertainty estimates to suggest designs to test
that yield the most insight for the given criteria. When an experiment is
completed, the model is updated with the results. Doing so enables the
algorithm to suggest better designs to test in subsequent iterations. This
optimization loop is illustrated in *Figure 2*.

**Figure 2.** Optimization loop for configuration tuning using our machine learning system.

## Example: Stream Processing

Preliminary results have shown promising performance jointly tuning
configurations of multiple applications. We show results for a self-optimizing
stream processing system in *Figure 3*, which performs auto-tuning of
configurations to reduce latency. We show results on the Yahoo Streaming
Benchmark (YSB), where we deploy a data pipeline using Spark, Kafka, and Redis.
We jointly tune over 41 configuration knobs from all three systems. In one
hour of tuning, using a distributed implementation on a dozen AWS instances, we
reduce the 99th percentile latency of YSB by 77 percent.

**Figure 3.** A self-optimizing data stream processing pipeline (left), and
configuration tuning results (right) on the Yahoo Streaming Benchmark.