The goal of the demand-side platform (DSP) is to find and buy the best ad inventory in the competitive market. While doing so the DSP should observe several important constraints. First, the DSP should evenly spend the budget with the time, without much of the overspending and underspending at every given time period. Second, the DSP must ensure that at each given time it tries to buy the best possible inventory available on the market. These constraints must be fulfilled under an extreme load, our DSP processes millions of bid requests for second in multiple locations distributed around the world.

The incoming bid requests are evaluated by our machine learning (ML) engine, which is trained to select the best opportunities for a particular advertiser in terms of conversion probability. Each bid request is assigned a score from 0 to 1, the higher the score the higher chances the shown impression will result in a conversion action by a user. After scoring, we need to make a decision if this inventory is worth buying. On one side of the equation, we want to buy only the best inventory to increase the expectation of conversion and decrease the cost per action. On the other side, we must buy enough inventory to spend the assigned budget. If we are too picky in the inventory we underspend the budget. If we are not picky enough, we buy the inventory that we should have passed achieving suboptimal performance. We cannot significantly overspend the budget - the bidding just stops if we do so - and we miss on good opportunities.

To make the problem worse, the supply of the inventory is highly unpredictable. Although it demonstrates some daily and weekly seasonality, there are some rapid random variations in supply that are hard to predict. Another problem comes from the nature of the real-time bidding system. We must make a decision on each bid request independently because global system synchronizations are prohibitively expensive. For the same reason, our computational solutions cannot be computationally extensive.

This problem looks like an ideal use case for a control theory application. Formally, we have a control - the pickiness level, a score which we use as a lower threshold for the bought inventory, and the outcome - the pace. The pace is the number of dollars spent per time period relative to the budget for this period. If pace > 1 (overpacing) we are spending more money than we should, contrary, if pace < 1 we not spending the whole budget. A known solution to this problem is a PID controller. Just types of automations are used from spaceships for course correction to flow control in pluming systems. This controller constantly evaluates the error in the outcome (pace) and tunes the control (pickness level) to minimize the error on the next step. PID here stands for "proportional", "integral", "derivative". What this really means, that to make a decision on the next pickiness level the controller evaluates three factors: (proportional) how far are we from the target right now, (integral) how well we are doing historically, and (derivative) are we moving in the right direction.

The algorithm for this controller is very simple and can be implemented within a few lines of code in any programming language:

- store previous_error = 0 and integral_error = 0
- calculate the current in pace error = (1-pace)/(1+pace) we use a symmetrical formula to equate the errors in over- and underpacing. Other loss functions can be used depending on priorities.
- calculate derivative error error_d = (error - previous_error)/dt and integral error error_i = integral_error + error*dt, where dt is the time between evaluations
- calculate the new_pickiness_level = pickiness_level + (K * error + K_i * error_i + K_d * error_d)
- store previous_error and integral_error
- wait dt
- go to 2

To demonstrate how this algorithm works we apply it to one of the flights (advertising line item running on Roku DSP). Here is how the supply looks for this flight:

[Image]

Suppose we want to spend $20 every 5 minutes of this flight. Here is how the spend looks like with PID controller evaluating the pickiness level every 5 minutes

[image]

While the noise in the supply contributes to some overspend and underspend at each given period, the long-term spend stays on track and the relative budget error stays within 0.5%

[image]

In conclusion, we demonstrate that simple application of the control theory can improve performance of the real-time bidding systems looking for the best inventory under ever-changing conditions.