How does the statistical significance calculator work?

Enter the number of respondents in each group and how many of them chose the answer (or converted). The calculator runs a two-proportion z-test and tells you whether the gap between the two rates is statistically significant at the standard 95% confidence level, along with the exact p-value. Everything runs in your browser — no signup, and your data never leaves your device.

What does the p-value mean in plain language?

The p-value is the probability that a difference at least as large as yours would show up by random chance if the two groups were actually identical. A p-value of 0.03 means a gap this big would appear by luck only 3% of the time, so the difference is probably real. The common convention is to call a result "statistically significant" when the p-value is below 0.05.

How many respondents do I need for a significant result?

It depends on the size of the difference you want to detect: small gaps need large samples, while big gaps can reach significance with a few hundred respondents per group. As a rough guide, this test becomes unreliable below about 30 respondents per group. Use our sample size calculator before you start collecting responses to know how many you'll need.

Can I use this calculator for A/B test results?

Yes — an A/B test is exactly what this test was built for. Enter each variant's visitors as respondents and its conversions as the "chose this answer" count, and you'll see whether the winning variant's lead is statistically significant. Just decide your sample size before the test starts and evaluate once it's reached, rather than stopping the moment the result looks significant.

Yes, Fomr has a free plan that includes unlimited forms, unlimited responses, unlimited team members, 25+ form components, design customization, email notifications, and more. The Pro plan adds features like custom domains, removal of Fomr branding, and SEO controls.

Statistical Significance Calculator — Compare Two Survey Results

When to use a statistical significance calculator

Any time you compare two percentages and want to act on the difference, you're making a bet: is the gap real, or did it just happen by chance? This calculator settles that bet for the most common cases marketers and researchers run into:

Two segments in a survey. 31% of your enterprise customers picked "pricing" as their top frustration, but only 22% of small businesses did. Is that a genuine difference between segments, or sampling luck?
Two variants of a form or question. You ran two versions of a signup form or reworded a question, and version B converted better. Before you declare a winner, check the math — our guide to form conversion optimization covers what to test in the first place.
Before and after a change. Satisfaction was 68% last quarter and 74% this quarter. Significant improvement, or a wobble you'll be embarrassed to have celebrated?

In every case the inputs are the same: how many people were in each group, and how many of them gave the answer (or took the action) you're measuring.

How to read the result

The calculator runs a two-proportion z-test and returns a p-value. In plain language, the p-value answers one question: if there were truly no difference between the two groups, how often would random chance alone produce a gap at least this big?

A p-value of 0.20 means a gap like yours would show up about 20% of the time by pure luck — far too often to trust. A p-value of 0.01 means it would appear only 1% of the time, so the difference is almost certainly real.

The conventional cutoff is 0.05: below it, the result is called "statistically significant," meaning there's less than a 5% chance the gap is a fluke. That threshold is a convention, not a law of nature — a p-value of 0.06 isn't meaningless and 0.04 isn't gospel. But 0.05 is the standard your stakeholders will recognize, and it's what this calculator uses for its verdict.

One thing "significant" does not mean: important. It only means the difference is unlikely to be random. Whether a real 2-point lift matters to your business is a judgment call the math can't make for you.

A worked example

Say you tested two versions of a lead form. Group A saw the original: 200 visitors, 48 completed it — a 24% conversion rate. Group B saw the redesign: 210 visitors, 69 completed it — about 33%.

The pooled rate across both groups is 117 out of 410, or 28.5%. The z-test compares the 9-point gap against the variation you'd expect from samples this size and returns a z-score of roughly 1.99, which works out to a p-value of about 0.047. That's under 0.05, so the redesign's win is statistically significant — barely. A difference this large would appear by chance only about 5 times in 100, so you can roll out version B with reasonable confidence. If the same rates had come from 50 visitors per group, the p-value would be close to 0.4 and the honest answer would be "we can't tell yet."

Why sample size matters more than the gap

That last point trips people up constantly: the same percentage gap can be rock solid or meaningless depending on how many people are behind it. A 10-point difference between two groups of 1,000 is overwhelming evidence. The same 10-point difference between two groups of 30 is a coin flip — small samples bounce around so much that big gaps appear and vanish on their own.

This is the same phenomenon as the margin of error on a poll: fewer respondents means each percentage comes with a wider band of uncertainty, and two wide bands overlap easily. You can see exactly how wide your bands are with our margin of error calculator. And if you're planning a test rather than analyzing one, work backwards: decide the smallest difference you'd care about, then use our sample size calculator to figure out how many respondents you need before you start. Collecting data first and hoping for significance later is how tests end in frustration.

Common pitfalls

Peeking early and stopping when you hit significance. If you check the p-value every day and stop the moment it dips under 0.05, you'll "win" far more often than you should — random wobbles cross the line all the time on their way to nowhere. Decide your sample size up front and judge the result once, when you reach it.
Testing many segments and reporting the one that won. Slice your survey by age, plan, industry, and region, and one slice will look significant by luck alone — run twenty comparisons at the 0.05 level and you should expect one false positive. Treat surprise wins in sub-segments as hypotheses to re-test, not findings to announce.
Confusing statistical significance with practical importance. With a huge sample, a 0.4-point difference can be highly significant and still not worth a single meeting. Always ask two questions: is the difference real, and is it big enough to act on?
Comparing groups that differ in more than one way. If version B ran a week later, to a different audience, during a promotion — the test can't tell you which change caused the gap. Keep everything but the thing you're testing constant.

The honest fine print

This tool runs a two-proportion z-test with pooled variance — the standard method for comparing two percentages. It's an approximation, and it's least reliable when samples are small (roughly under 30 respondents per group) or when rates sit very close to 0% or 100%. In those edge cases treat the verdict as a rough guide, and when the decision is high-stakes, collect more responses rather than squinting at a borderline p-value.

Significance testing also can't rescue a biased sample: if only your happiest customers answered, no p-value will fix that. Getting trustworthy inputs — enough responses, from the right people — is the real work, and our form analytics guide covers how to measure and improve it. When you're ready to run the test itself, build both variants as free Fomr surveys with unlimited responses, split your audience, and bring the counts back here.

Statistical Significance Calculator

Your two results

Test your ideas with real respondents