Budget optimization is a familiar concept to most performance marketers — leveraging algorithms that automatically shift spend between different ads or ad sets depending on results. These algorithms are so deeply ingrained in many ad platforms that it can be difficult to imagine a world without them.
For all their ubiquity though, budget optimization algorithms are often misunderstood. Shifts in spend aren’t always immediately apparent, and even when they are they can seem contrary to logic. Today’s topic is: How do budget optimization algorithms work?
Leaving marketing aside for a minute, let’s talk about something called the multi-armed bandit (MAB) problem. This goofy sounding (but important) probability theory is the mathematical basis for budget optimization.
Here’s how the theory works in a nutshell:
Imagine you’re in a casino with 10 different slot machines. Every time you pull the arm of a slot machine, you receive a cash reward.
Three critical factors are at work:
- Each slot machine has a reward distribution. A distribution is simply a set of probabilities that describe how likely the slot machine is to give different rewards.
- The reward distribution for each slot machine is unique. That is, every slot machine has a unique chance of giving you certain rewards. Some machines may typically return high rewards, while some will typically return smaller rewards.
- You don’t know any of the reward distributions in advance.
Now, let’s say you’re then told that you can pull the slot machine arms a total of 100 times. How do you maximize your reward?
Taking a Naïve Gamble
A naïve approach to the problem might suggest pulling each slot machine’s arm the same number of times: Since there are 10 machines, in this scenario you’d pull each slot machine’s arm 10 times.
The reasoning behind this method is that distributing pulls equally between the slot machines will give you the best chance to compare the rewards you got from each slot machine at the end of your 100 pulls.
Even before you get to your full 100 pulls however, you’ll probably notice that some slot machines appear to be rewarding you more than others.
Intuitively, if you notice this after you start, then you should switch strategies and start pulling the arms of those slot machines with higher rewards more often than the low-rewarding slot machines. This is where we we need to think about exploration and exploitation.
Exploration vs. Exploitation
The mathematical solutions to MAB problems are often framed in terms of exploration and exploitation.
In your first few pulls, you have no idea which slot machines are going to generate the best rewards. Therefore, you must explore the different slot machines, and use your pulls simply to get a good understanding of which slot machines appear the most profitable.
Once you start to get an indication that some slot machines are more profitable, you can move into exploitation, where you focus your pulls on the slot machines that appear to be the most lucrative.
Exploration and exploitation happen at the same time. While you might start off with all of your pulls being exploratory pulls, gradually, you can shift more of them to exploitative pulls, as you become more confident in which slot machines provide the greatest reward.
Introducing a Time Component
Assuming that the reward distribution of each slot machine doesn’t change over time, you’ll eventually reach a point where your pulls are 0% exploration and 100% exploitation. In other words, you’ll figure out which slot machine is best and pull it every time.
If we introduce a time component that allows the reward distribution of each slot machine to change over time, the situation becomes more complex. In this scenario, you’ll never know for sure that one slot machine is the best, as the reward distributions could change subtly with every pull.
To account for this, you always need to reserve some of your pulls for exploration. You must constantly explore the less profitable slot machines in case they become more profitable over time.
What MAB Means to Marketers
In a marketing context, the slot machines become ads, ad sets, or other creative you’re trying to optimize. Instead of pulling the slot machines’ arms, we’re looking at where we should spend our next dollar of advertising budget.
In the same way that different slot machines have different reward distributions, different ads have different conversion distributions. We don’t know what they are when we run ads for the first time, so it makes sense for optimization algorithms to spend our budget in a highly exploratory way, spreading the budget across multiple ad sets to gauge which generates the best outcomes.
We know in marketing that no ads or ad sets are going to see static performance over time. Market conditions change and users get ad fatigue when they see creative repeatedly.
This is why the solution to budget optimization isn’t to determine which ad set is the best and give that 100% of the budget. The optimal solution is to continuously explore, and dedicate at least some spend toward lower performing ad sets so that we can react in the event that their performance improves.
The Best Course of Action
Understanding the logic behind budget optimization algorithms is crucial to understanding how they act. While certain aspects of their behavior might seem odd or counterintuitive — such as allocating traffic to your worst ads — remember that they are playing out a solution to a well-solved mathematical problem.
Allowing the algorithms to continue exploring all available options is the key to long-term efficiency in budget allocation.
Do you have more questions about how to get the most out of your ad budget spend? Let’s talk.