Significant or not? Measure the efficiency of your AB test!

One of the most frequently asked questions I receive regarding A/B testing is how many users are necessary for the testing? Naturally, like with 99% of the questions, the answer is the same: it depends!

Sample size – what does it depend on?

Fundamentally, there are 3 things to consider:

  1. Your baseline conversion rate (%)
  2. The minimum relative change you expect from the test (%)
  3. The statistical significance you expect from the test (~95%)

If you have these, then throw this in the Optimizely – Sample Size Calculator and you will instantly see the magical number:

Optimizely – Sample Size Calculator

Optimizely – Sample Size Calculator

As you can see in the given example, a 3% baseline conversion rate and 20% minimum relative change will get you 95% statistical significance with: 10170 people per version.

That said, if you have 10.000 visitors each week, then a 2 version AB-test will go through in 2 weeks.

3 other methods to decide between the question of significant versus non-significant:

You should know, that Optimizely’s engine runs strict measurements on whether the results are significant or not. This is fine as it is, but to be on the safe side, I use 3 other tools to verify whether the published results are valid or not. To be honest, if I get 3 positive results from the other methods, I often don’t wait for the strict results of Optimizely. See below:

1. T-test:

The most classic AB-test verifying method. There’s a user-friendly, fill-in version available online (e.g. HERE). It’s a dry science – if you get a P-value < 0.05, then there’s a 95%+ chance that the winner will be the one who is currently winning.  But this in itself is not enough.

VWO, AB split test significance calculator

VWO, AB split test significance calculator

2. Trend charts

Optimizely also shows how trends evolve. This is not magic: if you see the same results for two weeks and even your T-test presents good results, then you can be fairly certain that you have won.

Optimizely: AB test trend

Optimizely: AB test trend

3. AAB(B) test:

This is an expert trick! 😉

Even before starting the experiment, it’s advisable to prepare an unedited, original version too. This is how you get a 2 A version – or even a 2 B version as well. If there is no difference in the results of the two similar versions, that’s a good sign! That combined with the trend-chart and the T-test method, you can kick the question of significant versus non-significant in the ass!

AABB teszt - conversions

AABB teszt – conversions

I think with these 3 quick and dirty solutions you know everything you need to know when it comes to verifying the results of your AB-test.

Also when it comes to your first experiment make sure you keep the 5+1 rules of A/B testing.

If you want to be notified first about new content on data36 blog (like articles, videos, handbooks, etc.), sign up for the Newsletter!

Tomi Mester

← Previous post

Next post →


  1. Hello! Regarding AABs, could I get you thoughts on this? This article on Twitter warns against the practice.

    • hey Jose,

      thanks for the link! Very nice article – I enjoyed the pragmatical approach of it!
      And yepp, it’s a fair reasoning and I can fully agree with it! Although – by my experience – many risks they try to draw the readers’ attention do not come up in practice, it’s more like an academical debate.

      Anyway, I wrote this article 3 years ago — since then, I changed many things in my processes, too… In the near future, I will write a new article on the topic and how I do A/B testing and significance-tests nowadays!

      Thanks and cheers,

Leave a Reply