Statistical Significance Explained — What p < 0.05 Actually Means

By CalcMulti Editorial Team··10 min read

Statistical significance is one of the most widely used — and most widely misunderstood — concepts in science, medicine, and data analysis. A result is called "statistically significant" when the probability of observing data at least as extreme as yours, assuming the null hypothesis is true (the p-value), falls below a chosen threshold α (typically 0.05).

But p < 0.05 does not mean the result is important, large, or practically meaningful. It does not mean there is a 95% probability that the effect is real. Understanding what significance does and does not tell you is essential for interpreting research and making good decisions.

What a P-Value Actually Is

The p-value is the probability of observing a test statistic as extreme as (or more extreme than) the one calculated from your data, assuming the null hypothesis is true.

If p = 0.03: assuming H₀ is true, there is a 3% chance of seeing data this extreme by random chance. This seems unlikely enough that we "reject H₀" — but 3% is not 0%. Random coincidences do happen, and with many studies being run, some 3% events will occur.

What a p-value is NOT: It is not the probability that H₀ is true. It is not the probability that your result occurred by chance. It is not the probability that your finding will replicate. It is not a measure of effect size.

p-valueCommon InterpretationWhat it actually means
< 0.001Highly significantVery rare under H₀; but does not tell you effect size
0.001 – 0.01Strongly significantRare under H₀
0.01 – 0.05Statistically significantBelow conventional threshold
0.05 – 0.10Marginal / trendBorderline; often reported cautiously
> 0.10Not significantH₀ cannot be rejected at conventional levels

Why 0.05? The Arbitrary Nature of Alpha

The α = 0.05 threshold was popularised by Ronald Fisher in the 1920s as a convenient round number. It means: "I am willing to incorrectly reject H₀ 5% of the time (when it is actually true)." This is the Type I error rate.

The 0.05 threshold is arbitrary. Physics uses α = 0.0000003 (5 sigma) for major discoveries. Genomics uses α = 5 × 10⁻⁸ to account for hundreds of thousands of simultaneous tests. Preliminary research often uses α = 0.10. The threshold should match the cost of a false positive in your domain.

A major problem: with thousands of studies being published, even if every null hypothesis were true, 5% of studies would falsely "find" a significant result at α = 0.05. This is part of why the replication crisis exists in science — many significant results are false positives.

Statistical Significance vs Practical Significance

Statistical significance tells you whether an effect is likely real (not due to chance). Practical significance tells you whether the effect is large enough to matter. These are completely separate questions.

A drug lowers blood pressure by an average of 1 mmHg (95% CI: 0.5–1.5 mmHg). This is statistically significant (p < 0.001, based on a large trial). But a 1 mmHg reduction is clinically trivial — the drug is practically meaningless despite being "significant."

Conversely: A teaching intervention improves test scores by 15 points on a 100-point scale (p = 0.08). This is not statistically significant at α = 0.05 — but a 15-point improvement would be educationally important. With more students, this effect might reach significance.

Always report effect sizes (Cohen's d, η², R²) alongside p-values to communicate both statistical and practical significance.

ScenarioStatistical SignificancePractical SignificanceConclusion
Drug: −1 mmHg blood pressure (n=10,000, p=0.001)Yes (p < 0.05)No (trivially small)Significant but useless
Teaching: +15 pts test score (n=20, p=0.08)No (p > 0.05)Yes (large effect)Possibly important; need more data
Ad: +0.3% click rate (n=1M, p<0.001)YesContext-dependent (revenue?)Report effect size
Therapy: −8 pts depression scale (n=50, p=0.02)YesYes (clinically meaningful)Good evidence of benefit

Type I and Type II Errors

Type I error (false positive): Rejecting H₀ when it is actually true. Probability = α (typically 0.05). You conclude there is an effect when there is none.

Type II error (false negative): Failing to reject H₀ when it is actually false. Probability = β (often 0.20). You miss a real effect.

Statistical power = 1 − β = the probability of detecting a real effect when it exists. A power of 0.80 means you have an 80% chance of finding a significant result when the true effect matches your assumption.

The trade-off: reducing α (stricter threshold) reduces Type I errors but increases Type II errors. The only way to reduce both simultaneously is to increase sample size. Power analysis before a study determines the sample size needed to detect a specified effect size with adequate power.

H₀ True (no effect)H₀ False (effect exists)
Reject H₀ (significant)Type I Error (α)Correct (Power = 1−β)
Fail to reject H₀Correct (1−α)Type II Error (β)

Going Beyond Significance — Better Practices

1. Report effect sizes. Cohen's d for t-tests, η² for ANOVA, R² for regression. These tell you how large the effect is, not just whether it exists.

2. Report confidence intervals. A 95% CI gives the range of plausible effect sizes. "The mean difference is 4.2 (95% CI: 1.8–6.6)" is far more informative than "p = 0.003."

3. Consider power. An underpowered study (n too small) will miss real effects. A result of p = 0.12 from n = 20 does not mean no effect — it may mean insufficient power. Calculate and report your study's power.

4. Pre-register hypotheses. Specifying your hypothesis and analysis plan before collecting data prevents p-hacking (testing many hypotheses until one is significant by chance).

5. Replicate. A single significant result (p < 0.05) is evidence, not proof. The replication crisis has shown that many published findings do not replicate. Significance from a well-powered, pre-registered, replicated study is far more meaningful.

Frequently Asked Questions

Educational use only. Content is based on publicly documented mathematical formulas and reviewed for accuracy by the CalcMulti Editorial Team. Last updated: February 2026.