(No, MVT is not just advanced A/B testing)
There’s a lot to learn about split testing (A/B testing) and multivariate testing (MTV), and there’s a lot of industry debate about whether and when to use each. But don’t fret! This post seeks to clarify issues surrounding A/B testing and MVT by providing information about the following:
- Understanding A/B and MVT
- How to choose
- Which comes first?
- Results interpretation
- Managing stakeholder expectations
- Conclusion (tl;dr)
- Resources & Interesting Articles
Understanding A/B and MVT
Split Testing (aka A/B or A/B/n) traditionally focuses on evaluating the impact of changing one element at a time. Split testing splits traffic across two choices (or more, if A/B/n testing is applied to several choices), and is a tool for deciding A or B: You want to know if one element outperforms another to improve a page’s conversion rate? This is the tool to use.
Multivariate testing (MVT) is a way of testing that allows you to evaluate multiple element options simultaneously and measure their interactions with each other, while also identifying which elements have the greatest impact on your desired KPI. MVT is a tool for determining profitable associations (AB or AC or BC). You want to know which combination of elements work best, given your hypothesis? This is the tool to use.
There are two kinds of MVTs: In a Full Factorial MVT, all possible variants are created and tested over equal parts of traffic. In a Partial or Fractional Factorial, a selection of possible variants are created and tested. These results are less precise, but they require less traffic.
Each platform and each analyst uses different nomenclature, but all MVTs have these core elements:
- Element – element on page you want to change (headline, hero image, CTA, offer, price, etc)
- Version – each element will have 2 or more versions you want to test
- Variant – a unique combination of element-versions to make a recipe
An MVT Example for our Palates instead of our Screens:
Say we wanted to make toast. Because toast is delicious. Why couldn’t we perform an MVT toast-test at a party, provided we had enough friends? Say we have enough friends (traffic). Say we made Sourdough toast with butter and honey for a previous party (the control), and it was a big hit, but we wanted to see if we could do better, so we’ll introduce three new versions of our elements. The elements of our MVT toast-test would be bread, something creamy, something sweet. The versions in our toast-making would be sourdough or grainy wheat bread, butter or a soft, melty cheese, honey or cherry compote. But we can’t just test wheat, cheese and cherry against our control! If our second recipe won or lost against the control, we wouldn’t know why (maybe the cherry compote was too sour? Maybe the grainy wheat was too dense?). Plus, if we only compared the two toast recipes, we wouldn’t know if a previously unconsidered combination would actually win (honey and cherries??). So, if we design a full-factorial MVT toast-test, we’ll need to supply the following variants: (total number of variants = number of versions for element 1 x number of versions for element 2 x number of versions for element 3, ect., so, in this case, we need 8 variants–the control + 7 variations):
While it is certainly possible to use split testing methodologies when making multiple changes on a page, the sacrifice made is to the insights captured and ability to scale the learning. If you change the heading and the hero image and the Call to Action (CTA) and out-perform to control, you will not know if it was the heading, the hero, or the CTA that drove that improvement, nor will you know if the improvement was made larger due to an interaction of any 2 or all 3 of the elements (or if the improvement would have been largest if only 1 or 2 of the 3 were included).
In contrast, MVT allows you to learn all of that—and be able to scale that learning for future testing. MVT is specifically designed to measure the impact of changes to each element as well as the interaction of each element to the others. In that way, MVT allows you to test multiple elements at the same time without sacrificing the insights and scalability.
How to Choose
Choosing whether to use split testing or MVT is about as mysterious as choosing whether to use a hammer or a screwdriver: They’re each good tools for different tasks. While each tool has its pros and cons, the determining factor for which tool to use should always be the task at hand.
Each tool requires a testable hypothesis for it to be effective. To their peril, analysts sometimes skip this step when they’re planning MVT experiments. Further, there’s a misconception that an MVT plan will allow analysts to do work that split testing keeps them from doing. Keep in mind that MVT will allow for incredible insights, but only if an analyst has done upfront analysis and hypothesis formation to ensure the test runs on the most advantageous elements and with the most applicable versions of those elements.
In theory, nearly every split test could be run as an MVT. BUT, in practice, it makes much more sense for about 1 in 10 tests to be MVT for two reasons. First, MVT requires more resources to set up and analyze. Secondly, it is more difficult to interpret and communicate MVT testing, and because of that, it’s harder to drive decision making, which should be the primary goal of your testing program.
Below, I’ve highlighted the pros and cons of split testing and MVT and given frequent use scenarios for each.
Pros/Cons of each
Split Testing Pros:
- Easy to design, interpret, and communicate
- Faster results for decision-making (less traffic required)
- Lower design/development cost
Split Testing Cons:
- Frequently misunderstood/poorly designed tests with limited learning obtained
- Requirement to test only one element per test limits the number of answers you can get at any one time
- Helps you identify which elements are most impactful to your Key Performance Indicator (KPI)
- Measures the interaction of each element on the others
- Ability to combine multiple tests into single MVT to evaluate interactions
- Significant increase in design, development, QA, and analysis time required
- Results can be difficult to interpret and even more troublesome to communicate effectively reducing ability to derive insights and drive decision-making
- Considerable increase in traffic / test runtime required for readout
- Frequently used as a “test everything at the same time” approach without solid evidence supporting rationale for all element versions
- Propensity for “nonsense” variants to be created
- Frequent need to follow an MVT with split test to confirm winning “variant” due to MVTs having increased likelihood of false positives
Use cases (when to use / what problems)
- Newly designed layout or page and want to figure out where to focus optimization efforts – MVT
- Spending a lot of time and resources designing multiple versions of hero images and want evidence to support the importance of hero images – MVT
- Limited available data to support hypothesis for why page is underperforming due to lack of tags or age of page – MVT
- Looking for insights that you can scale throughout the site and across other digital properties – A/B
- Already know which element is the most important on the page and want to find the right version – A/B
- Clear hypothesis supported by data recommending specific changes to single element – A/B
Which Comes First?
It’s important to remember that some jobs may require both split testing and MVT, just as some jobs may require a hammer and a screwdriver. Based on test outcomes and the goals of your tests, you might need to perform split testing after MVT to validate MVT results or an MVT variant winner. We do this for two really good reasons. First, MVTs have a higher false positive rate than A/B tests. Due to the simple fact that false positives increase with increasing number of variants. Secondly, Partial/Fractional MVTs can result in a winning variant that was identified by the tool based on interaction measurements that were never actually tested on live traffic.
You might also need to perform MVT after a split test to incrementally improve elements of the winning split test design, because a split test may select new page or layout, and then MVT can be used to help incrementally improve page or layout with multiple variants. MVT can also be used to help identify which elements are most important to page performance.
One of the most important (and most frequently disregarded) elements of any optimization test is the ability to interpret the results.
Split test results are fairly easy to interpret. However, even they can run the risk of mis-stating the impact if the analyst does not understand or cannot adequately explain confidence intervals.
MVT results can be very challenging to interpret, as they give more than just recipe B lift over recipe A with X confidence. MVTs also include interaction effects and element contribution. Interaction effects are the measure of how each element interacts with other elements on the page to increase (or decrease) the KPI impact. Element contribution speaks to the importance of a given element on the KPI outcomes.
Confidence intervals & margin of error–It’s important to represent “lifts” measured in both split and multivariate testing in a way that includes the ranges inherent in confidence intervals and margin of error.
Managing Stakeholder Expectations
Being able to communicate results is an important part of A/B and MVT testing. However, it is equally important to communicate your expectations for the trajectory of the entire testing project up front. You’ll need to communicate your plan and expectations, not just your results.
The most important part of an effective optimization program is that program’s ability to drive change and help the organization make better decisions. To do this, the optimization program owners must be able to effectively communicate both the results and their recommended actions.
MVT results can be more difficult to comprehend and explain, so it’s important not to share too much data. Stick to a few core insights, for example these:
- Most impactful element—element that seems to be most important to visitor for moving them to where you want them to be.
- Least impactful element—element where you can feel free to do lower confidence testing/targeting going forward without risking conversion.
- Element interactions—where any two elements seem to impact each other, that relationship must be explained and recommendations provided for how to consider future tests or site updates.
- Most successful variant—a visual should be created showing clearing the most successful combination of element versions. If numbers exist (the most successful variant is not always tested when using partial / fractional factorial methods), the lift and confidence over the control should be shared. If the most popular variant was not tested, a follow-up A/B test should be recommended with existing control vs. winning variant.
MVT recommendations might include:
- Most important element to consider when creating new page layouts / designs for tests or live site
- Element relationships to consider when creating new page layouts / designs for tests or live site
- Recommended roll-out of winning variant OR follow-up A/B with winning variant
TL;DR (Too Long; Didn’t Read for those of us older than millennials…)
Hopefully this article has debunked the faulty ideas that MVT is somehow superior to split testing because it is more sophisticated, or that MVT is an “advanced version” of split testing that should be used for everything. Neither of these ideas are true: Each method is simply another tool in the toolbox. When deciding whether to use either of these methods, don’t start with the tool, start with the problem. If the problem is a screw, you need a screwdriver. If the problem is a nail, you need a hammer. Communication is the clearest way to discern the problems—and the solutions—at hand.
Resources & Interesting Articles