Find out if your A/B test results are statistically significant. No signup required.
🎉 Variation (B) Wins! Control (A) Wins
+% relative improvement
Control Rate
%
Variation Rate
%
Z-Score
P-Value
You can be % confident this result is not due to chance. The difference could be due to random chance. Consider running the test longer or with more traffic.
Statistical significance tells you whether the difference between your control and variation is real or just due to random chance. When a result is statistically significant, you can be confident that the variation actually performs differently from the control.
In A/B testing, you're comparing two versions of something (a webpage, email, button, etc.) to see which performs better. Without statistical significance, you might make decisions based on random fluctuations in your data.
A 95% confidence level means there's only a 5% chance that the observed difference happened by random chance. This is the industry standard for most business decisions.
Z = (p₂ - p₁) / √[p(1-p)(1/n₁ + 1/n₂)]
Where:
Create surveys to understand why users prefer one version over another. Try it yourself 👇
It depends on your baseline conversion rate and the size of the difference you want to detect. Generally, you need hundreds to thousands of visitors per variation. Smaller differences require larger sample sizes to detect reliably.
The p-value is the probability that you'd see a difference this large (or larger) if there was actually no real difference between the groups. A p-value below 0.05 (for 95% confidence) means the result is statistically significant.
Common reasons include: not enough traffic yet, the real difference is too small to detect, or there genuinely is no difference. Try running the test longer or focusing on larger changes that might produce bigger effects.
No — this is called "peeking" and can lead to false positives. Decide on your sample size before the test starts and run it to completion, or use sequential testing methods designed for continuous monitoring.
This calculator uses a two-tailed test, which checks if there's a difference in either direction (better or worse). One-tailed tests only check one direction. Two-tailed is more conservative and generally recommended.
🔒 Data securely stored with AWS in EU 🇪🇺