AB testing is interesting because it promises objectivity. You run a test, one variation performs better, there is a winner and a loser, people’s opinions no longer matter, life is simple.

But then comes the AA test…

To set up an AA test you run a regular test but with the exact same website in both conditions. AA tests are painful because they force you to acknowledge just how random and meaningless AB test results can be.

To distinguish a result from chance, you need statistical significance and statistical power. For this, you need large numbers. Large numbers can either mean working with lots of traffic or it can mean large improvements. The larger the improvement, the less traffic you need.

Large improvements mean staying away from AB testing things like button colours, font size and grammatical changes. That’s not to say that these micro-changes are not important, only that they’re unlikely to register when you are dealing with small amounts of traffic.

Grouping important changes together backfires when you’re working with large amounts of traffic because an improvement in one area might lead to problems elsewhere. Not being able to isolate effects is risky when the cost of a mistake is high.

On the other hand, a prudent approach like this could take decades with small amounts of traffic. When improving conversion for a small business with low traffic, the solution is to batch groups of important changes together.

Focus on understanding what people want and give it to them. Leap, don’t tip-toe. Rather than using AB testing to fine-tune you use it to make that you are leaping in the right direction.

💬 Comments, questions and feedback all welcome.