How do I know if my A/B test is statistically significant?

Enter the visitors and conversions for both your control and variant. The calculator runs a two-proportion z-test and reports a p-value. If the p-value is below your chosen threshold (0.05 for 95% confidence), the difference is statistically significant — unlikely to be down to chance — and you can act on it.

What is a good confidence level for an A/B test?

95% is the standard for most marketing and product experiments, meaning a 5% chance the result is a false positive. Use 99% when the change is costly or risky to reverse, and 90% only for low-stakes, fast iteration where you can tolerate more false alarms.

What does a p-value of 0.05 mean in A/B testing?

A p-value of 0.05 means that if there were truly no difference between the two versions, you'd see a gap this large or larger about 5% of the time by random chance alone. A p-value below 0.05 is the usual line for declaring a result significant at 95% confidence.

How many conversions do I need for a significant A/B test?

There's no fixed number — it depends on your baseline conversion rate and the size of the effect. Small effects on low base rates can need tens of thousands of visitors per variant. Plan the test up front with a sample-size calculator, then use this tool to read the result once data is in.

What is the difference between statistical and practical significance?

Statistical significance says the difference is real, not noise. Practical significance asks whether it's big enough to matter. With a very large sample, a 0.1% uplift can be statistically significant yet too small to justify shipping. Read the confidence interval to judge the size of the effect, not just the verdict.

Should I use a one-tailed or two-tailed test?

Use two-tailed (the default). It tests whether anything changed in either direction and is the safe, honest choice. A one-tailed test only checks whether the variant beats the control, needs roughly half the evidence, and inflates false positives if the variant actually performs worse — so reserve it for cases where a drop is impossible or irrelevant.

Why does the calculator show a chi-square check?

Pearson's chi-square test of independence on the same 2×2 outcome table is mathematically equivalent to the two-proportion z-test, where χ² equals z². The calculator computes χ² from a separate formula and confirms it matches z², so you can trust the number isn't a coding slip.

Is my data sent to a server?

No. The whole calculation runs in your browser with plain arithmetic — no API call, no upload, nothing stored. You can disconnect from the internet and it still works. That makes it safe to use with a client's confidential conversion numbers.

Experiments · Statistics

A/B Test Statistical Significance Calculator

Enter the visitors and conversions for your control and your variant. This tool runs a two-proportion z-test and tells you the p-value, the confidence level, and — in plain English — whether the difference is a real winner or just noise. No signup, runs entirely in your browser.

By Induwara Ashinsana— Executive Director, Ryzera TechnologiesUpdated Jun 11, 2026

Test your A/B resulttwo-proportion z-test

NIST method · χ²-verified

Control (A)

10%

Visitors

Conversions

Variant (B)

13%

Visitors

Conversions

Confidence threshold

The bar the test must clear to call a winner. 95% is the industry default.

Hypothesis

Two-tailed is the safe default. One-tailed needs less evidence but only tests one direction.

Try an example

Significant — variant wins

At 96.45% confidence (≥ your 95% bar), the +30.0% relative uplift for the variant is unlikely to be chance. You can ship it.

Confidence

96.45%

1 − p-value (p = 0.0355)

z-statistic

2.1027

χ² cross-check: passed

Relative uplift

+30.0%

Abs. diff: 3%

95% CI on the gap

0.21% … 5.79%

Interval excludes 0 — a real difference

Conversion rates

Control (A)100 / 1,000 · 10%

Variant (B)130 / 1,000 · 13%

Sources cited: two-proportion z-test and normal tables from the NIST/SEMATECH e-Handbook §7.2.4; normal CDF via Abramowitz & Stegun 7.1.26. Every result is cross-checked against Pearson's chi-square test (χ² = z²).

How it works

When version B converts better than version A, the gap could be a genuine improvement or it could be random luck. A significance test puts a number on that doubt. This calculator uses the two-proportion z-test with a pooled standard error, exactly as defined in the NIST/SEMATECH e-Handbook of Statistical Methods (§7.2.4). Every step runs client-side on plain arithmetic.

Conversion rates. Control rate p_a = c_a / n_a and variant rate p_b = c_b / n_b, where c is conversions and n is visitors.
Relative uplift. (p_b − p_a) / p_a — the headline "X% better" figure.
Pooled proportion. p = (c_a + c_b) / (n_a + n_b). The test assumes, for argument's sake, that both versions share this rate.
Pooled standard error. SE = √( p·(1 − p)·(1/n_a + 1/n_b) ).
Test statistic. z = (p_b − p_a) / SE — how many standard errors apart the two rates are.
p-value.The normal CDF Φ(z) comes from the Abramowitz & Stegun 7.1.26 approximation of the error function (accurate to 1.5×10⁻⁷): Φ(z) = 0.5·(1 + erf(z/√2)). A two-tailed p-value is 2·(1 − Φ(|z|)); one-tailed is 1 − Φ(z) in the observed direction.
Confidence and verdict. Confidence is 1 − p. The result is significant when the p-value falls below your alpha (1 − threshold): 0.10, 0.05, or 0.01.
Confidence interval on the gap. Using the unpooled standard error (NIST §1.3.5.2), (p_b − p_a) ± z*·√( p_a(1−p_a)/n_a + p_b(1−p_b)/n_b ), with z* = 1.645 / 1.960 / 2.576 for 90 / 95 / 99%. If the interval excludes zero, the difference is real at that level.

To guard against arithmetic mistakes, each result is cross-checked against Pearson's chi-square test of independence on the same 2×2 table. The two are algebraically identical for a single comparison, so χ² must equal z² — the calculator confirms this on every run before showing you a verdict.

Worked examples

A clear winner

Control 100/1000 vs Variant 130/1000 · two-tailed · 95%

Rates: p_a = 100/1000 = 10.00%, p_b = 130/1000 = 13.00%
Pooled p = 230/2000 = 0.115
SE = √(0.115·0.885·(1/1000 + 1/1000)) = 0.0142671
z = (0.13 − 0.10) / 0.0142671 = 2.1027
p = 2·(1 − Φ(2.1027)) = 0.0355 → confidence 96.45%
96.45% ≥ 95% ⇒ Significant. Relative uplift = +30.0%. Ship B.

Too early to call

Control 50/500 vs Variant 60/500 · two-tailed · 95%

Rates: p_a = 50/500 = 10.00%, p_b = 60/500 = 12.00%
Pooled p = 110/1000 = 0.11
SE = √(0.11·0.89·(1/500 + 1/500)) = 0.0197889
z = (0.12 − 0.10) / 0.0197889 = 1.0107
p = 2·(1 − Φ(1.0107)) = 0.3122 → confidence 68.78%
68.78% < 95% ⇒ Not significant. The same +20% is within noise — keep testing.

Sample size decides it (edge case)

Same 10% vs 13% gap, but 100M/1B each · two-tailed · 95%

Rates unchanged: p_a = 10.00%, p_b = 13.00%
Pooled p = 0.115, SE = √(0.115·0.885·(2/1,000,000,000)) = 0.0000143
z = 0.03 / 0.0000143 = 2102.7
p ≈ 0 → confidence ≈ 100.00%
The borderline gap from example 1 is overwhelming once the sample is huge.
Lesson: significance depends on sample size as much as the gap itself.

Frequently asked questions

Sources & references

The formulas on this page were last cross-checked against the NIST e-Handbook on 2026-06-11, and every calculation is verified at runtime against Pearson's chi-square test (χ² = z²).

Related tools

LiveAI

Pearson Correlation Calc

Paste two columns to get the Pearson correlation r, r², covariance, a t-test with two-tailed p-value, and a scatter plot — with full step-by-step working. Matches scipy.stats.pearsonr, runs entirely in your browser.

Open tool

LiveAI

ROUGE Score Calculator

Calculate ROUGE-1, ROUGE-2, ROUGE-L and ROUGE-Lsum precision, recall and F1 between a generated summary and one or more references, entirely in your browser. Shows matched n-grams and the longest common subsequence. Matches Google rouge-score, no signup.

Open tool

LiveAI

Word Error Rate (WER)

Paste a reference and a hypothesis transcript to get Word Error Rate, Character Error Rate, word accuracy, and a colour-coded alignment of every substitution, deletion, and insertion. NIST SCTK method, runs entirely in your browser.

Open tool

Rate this tool

Be the first to rate

Comments & feedback

Spotted a bug or want an improvement? Tell us — our team reviews every comment, and good ideas get built. Comments are public and anonymous.

Found a bug, edge case, or want to suggest an improvement?

Email me at [email protected] — most fixes ship within 24 hours.