induwara.lk
induwara.lkExperiments · Statistics

A/B Test Statistical Significance Calculator

Enter the visitors and conversions for your control and your variant. This tool runs a two-proportion z-test and tells you the p-value, the confidence level, and — in plain English — whether the difference is a real winner or just noise. No signup, runs entirely in your browser.

By Induwara AshinsanaUpdated Jun 11, 2026
Test your A/B resulttwo-proportion z-test
NIST method · χ²-verified

Control (A)

10%

Variant (B)

13%

The bar the test must clear to call a winner. 95% is the industry default.

Two-tailed is the safe default. One-tailed needs less evidence but only tests one direction.

Try an example

Significant — variant wins

At 96.45% confidence (≥ your 95% bar), the +30.0% relative uplift for the variant is unlikely to be chance. You can ship it.

Confidence
96.45%
1 − p-value (p = 0.0355)
z-statistic
2.1027
χ² cross-check: passed
Relative uplift
+30.0%
Abs. diff: 3%
95% CI on the gap
0.21% … 5.79%
Interval excludes 0 — a real difference

Conversion rates

Control (A)100 / 1,000 · 10%
Variant (B)130 / 1,000 · 13%

Sources cited: two-proportion z-test and normal tables from the NIST/SEMATECH e-Handbook §7.2.4; normal CDF via Abramowitz & Stegun 7.1.26. Every result is cross-checked against Pearson's chi-square test (χ² = z²).

How it works

When version B converts better than version A, the gap could be a genuine improvement or it could be random luck. A significance test puts a number on that doubt. This calculator uses the two-proportion z-test with a pooled standard error, exactly as defined in the NIST/SEMATECH e-Handbook of Statistical Methods (§7.2.4). Every step runs client-side on plain arithmetic.

  1. Conversion rates. Control rate p_a = c_a / n_a and variant rate p_b = c_b / n_b, where c is conversions and n is visitors.
  2. Relative uplift. (p_b − p_a) / p_a — the headline "X% better" figure.
  3. Pooled proportion. p = (c_a + c_b) / (n_a + n_b). The test assumes, for argument's sake, that both versions share this rate.
  4. Pooled standard error. SE = √( p·(1 − p)·(1/n_a + 1/n_b) ).
  5. Test statistic. z = (p_b − p_a) / SE — how many standard errors apart the two rates are.
  6. p-value.The normal CDF Φ(z) comes from the Abramowitz & Stegun 7.1.26 approximation of the error function (accurate to 1.5×10⁻⁷): Φ(z) = 0.5·(1 + erf(z/√2)). A two-tailed p-value is 2·(1 − Φ(|z|)); one-tailed is 1 − Φ(z) in the observed direction.
  7. Confidence and verdict. Confidence is 1 − p. The result is significant when the p-value falls below your alpha (1 − threshold): 0.10, 0.05, or 0.01.
  8. Confidence interval on the gap. Using the unpooled standard error (NIST §1.3.5.2), (p_b − p_a) ± z*·√( p_a(1−p_a)/n_a + p_b(1−p_b)/n_b ), with z* = 1.645 / 1.960 / 2.576 for 90 / 95 / 99%. If the interval excludes zero, the difference is real at that level.

To guard against arithmetic mistakes, each result is cross-checked against Pearson's chi-square test of independence on the same 2×2 table. The two are algebraically identical for a single comparison, so χ² must equal z² — the calculator confirms this on every run before showing you a verdict.

Worked examples

A clear winner

Control 100/1000 vs Variant 130/1000 · two-tailed · 95%

  1. Rates: p_a = 100/1000 = 10.00%, p_b = 130/1000 = 13.00%
  2. Pooled p = 230/2000 = 0.115
  3. SE = √(0.115·0.885·(1/1000 + 1/1000)) = 0.0142671
  4. z = (0.13 − 0.10) / 0.0142671 = 2.1027
  5. p = 2·(1 − Φ(2.1027)) = 0.0355 → confidence 96.45%
  6. 96.45% ≥ 95% ⇒ Significant. Relative uplift = +30.0%. Ship B.

Too early to call

Control 50/500 vs Variant 60/500 · two-tailed · 95%

  1. Rates: p_a = 50/500 = 10.00%, p_b = 60/500 = 12.00%
  2. Pooled p = 110/1000 = 0.11
  3. SE = √(0.11·0.89·(1/500 + 1/500)) = 0.0197889
  4. z = (0.12 − 0.10) / 0.0197889 = 1.0107
  5. p = 2·(1 − Φ(1.0107)) = 0.3122 → confidence 68.78%
  6. 68.78% < 95% ⇒ Not significant. The same +20% is within noise — keep testing.

Sample size decides it (edge case)

Same 10% vs 13% gap, but 100M/1B each · two-tailed · 95%

  1. Rates unchanged: p_a = 10.00%, p_b = 13.00%
  2. Pooled p = 0.115, SE = √(0.115·0.885·(2/1,000,000,000)) = 0.0000143
  3. z = 0.03 / 0.0000143 = 2102.7
  4. p ≈ 0 → confidence ≈ 100.00%
  5. The borderline gap from example 1 is overwhelming once the sample is huge.
  6. Lesson: significance depends on sample size as much as the gap itself.

Frequently asked questions

Sources & references

Related tools

Rate this tool
Be the first to rate

Comments & feedback

Spotted a bug or want an improvement? Tell us — our team reviews every comment, and good ideas get built. Comments are public and anonymous.

Found a bug, edge case, or want to suggest an improvement?

Email me at [email protected] — most fixes ship within 24 hours.