A/B Testing Can Solve Your Marketing Debates
If you’re a marketer, you have likely been in a meeting room at some point in your career (if not at some point this week) when your colleagues did not agree on the creative direction of a campaign. The strategist will vote for version A because they believe the concept fits best into the brand’s long-term roadmap. The designer wants to go with version B because it has the most visual impact. The analyst recommends version C because of some numbers they’ve seen on this audience before. Then, finally, there’s Version D, which is in the running because it’s the CEO’s favorite.
If you’ve ever been in this situation, A/B testing is the answer to getting your campaigns on the best possible track, scientifically. The world’s best marketers succeed largely because they are emotionally in-tune with what customers want, but at the end of the day, success is measured quantitatively by how much revenue is generated. The objective use of numbers will help you make the best marketing decisions for your unique products and customers.
What is A/B testing?
A/B testing, or split testing (it works with more than two versions), is a form of scientific testing that helps identify the best performing version by conversions when only one variable is changing (i.e. copy, color, time).
A/B testing is not new, and you’ve likely heard of it, yet the situation laid out above is still happening in conference rooms around the world. Perhaps marketers are intimidated by the math involved or reluctant to invest their resources in a testing plan and execution. Whatever the case may be for you and your organization, proper A/B testing is virtually guaranteed to provide a return on your investment.
Why A/B test?
If you know how to harness the overwhelming amount of information available today, existing data can be a great way to solve problems. Here are a few ways DEG uses existing data:
- Web Analytics – using visitor path analysis to help guide user experience design.
- Descriptive Segmentation – creating an RFM-e/RF scoring to target specific segments of a customer base.
- Goal Projections – relying on past performance to set realistic expectations for the future.
That said, there are several pitfalls that come when historical data is used exclusively:
- The self-fulfilling prophecy – If you determine that your Twitter audience has traditionally replied to your tweets most often at 2 p.m., and you begin to tweet regularly at 2 p.m., performance during this time will likely continue to outperform times you are not as focused on.
- Correlation does not imply causation – If you have ever read the book Freakonomics, you know that correlations can be a funny thing. Oftentimes, marketers get the causation backwards. Do your Facebook fans really spend more money on your website, or do the people spending more money on your website become Facebook fans?
- Past data is not always predictive of future behavior – Descriptive segmentations that use past behavior to score customers do not account for life changes. You may end up targeting a previously high-spending customer as a VIP, while they may have just decided to save every last penny to buy a new home and will no longer be a customer.
A/B Testing takes the guesswork out of the data by changing only one variable at a time so you know exactly what is causing lifts (or drops) in performance. Let’s take a look at A/B testing in action by answering the age old question, when is the best time to send email? This is a great question to answer through A/B testing because the answer is likely different for every type of business. Our partner, ExactTarget, explored this question last month in a new video segment featuring Jay Baer. In the video, Jay mentions that there is no one best time that works for everybody. A/B testing is how you find the time that works best for your company.
We have a great example to share because DEG was recently challenged to determine the best time to send email for a global client that sends emails to fifteen different countries and in eight different languages. Would their subscribers behave similarly across borders, languages, and cultures? We started answering these questions by plotting the data that we already had to determine whether this case was worth investing in. If we saw that clicks were steady regardless of the email send time, we probably would not have elected to move forward with an A/B test. Here is what we found in North America:
As you can see, there are some obvious trends that emerge, most notably, a spike during the 1:00 p.m. hour. We determined is that this spike was driven by the aforementioned self-fulfilling prophecy effect. Our email team was getting to the office at eight or nine in the morning, building emails, and by the time they were sent…well, you can guess what time it was: one o’clock in the afternoon. Being that we saw several interesting patterns in most of the countries we reviewed, we decided to scientifically A/B test send times to see if we were able to formally identify the best time to email our customers.
In order to do this, we randomly split our subscriber list into six even buckets and sent each segment the exact same email, on the same day, at 12:30 a.m., 4:30 a.m., 8:30 a.m., 12:30 p.m., 4:30 p.m., and 8:30 p.m., respectively. While A/B testing it is key to ensure that only one variable is changing, otherwise you can’t be certain which variable is impacting your results. In our case, the only variable that changed was the send time. Here are our results in North America when the tested data is added:
Notice the large shift in peak click time. A/B testing revealed that our audience is was more active during the 9:00 a.m. hour than they are during the 1:00 p.m. hour, and that we were previously sending emails at a suboptimal time – potentially leaving dollars on the table!
Last, we tested these results for statistical significance. If you do not have a background in statistics or an analytics team, there are plenty of free, reliable calculators on the web that can help you do this. We were able to determine that sending emails during the 8:00 p.m. hour was statistically worse at 90% confidence. If you are seeing trends, but they are not backed by statistical significance, move forward with the winning result that passes the eyeball test, but beware that you may be committing a Type I Error by believing a relationship exists when it actually doesn’t.
The principles and approach applied to this time of email send time study work for any variable. So the next time you are in a meeting arguing over creative, cut to the chase and figure out who is right with A/B testing. Need some inspiration? Here is a compiled list of 101 things you could be testing.