What does p < 0.05 actually mean?

p < 0.05 means: if the null hypothesis were true, there is less than a 5% chance of observing results as extreme as (or more extreme than) what you observed. It does NOT mean there is a 95% chance your hypothesis is correct, nor does it say anything about practical importance.

Is a lower p-value always better?

A lower p-value means stronger evidence against H₀, but it says nothing about the size or practical importance of the effect. With a large enough sample, any trivially small effect will produce p < 0.001. Always pair p-values with effect sizes (Cohen's d, r, η²) and confidence intervals.

What if my p-value is exactly 0.05?

The 0.05 threshold is a convention, not a law. p = 0.050 and p = 0.049 are statistically indistinguishable. Rather than binary "significant/not significant" thinking, report the exact p-value and let readers assess evidence strength. Many journals now require this approach.

In theory, no — the p-value is always a positive probability. In practice, software may display p = 0.000 when the p-value is below the display precision (e.g., p < 0.0001). Always report as "p < 0.0001" or the exact value when available, never as p = 0.

What is the difference between p-value and significance level (α)?

α (significance level) is the threshold you set in advance — typically 0.05. The p-value is the probability you calculate from your data. You reject H₀ when p ≤ α. α is a decision rule; the p-value is a measurement. Always set α before collecting data to avoid "moving the goalposts."

Why do so many fields require p < 0.05 specifically?

The 0.05 threshold was popularized by statistician Ronald Fisher in 1925 as a convenient round number. It became a standard by convention — not because of any mathematical property. Many leading statisticians now argue for p < 0.005 as the threshold for "significant" and p < 0.05 as merely "suggestive." Some journals have abandoned significance thresholds entirely in favor of reporting exact p-values, effect sizes, and confidence intervals.

P-Value Explained

Name: P-Value Explained — What It Really Means and How to Use It
Availability: OnlineOnly
Author: CalcMulti Editorial Team

By CalcMulti Editorial Team·Updated: March 2026·9 min read

The p-value is the most widely reported — and most widely misunderstood — number in statistics. It appears in every scientific paper, clinical trial, and A/B test result, yet surveys consistently show that even researchers who use it daily cannot correctly define it.

This guide explains exactly what a p-value is, what "p < 0.05" actually means, what it does NOT mean, and how to use it correctly alongside effect sizes and confidence intervals.

The Exact Definition of a P-Value

A p-value is the probability of obtaining a test statistic as extreme as — or more extreme than — the one observed, assuming the null hypothesis is true.

Worked example: You test whether a coin is fair. You flip it 100 times and get 60 heads. Under H₀ (fair coin, p = 0.5), what is the probability of getting 60 or more heads? That probability is the p-value. If p = 0.028, it means: "If the coin were truly fair, there is only a 2.8% chance of seeing 60+ heads in 100 flips."

Critically: the p-value is a probability about the data given H₀ — it is not a probability about H₀ given the data. This distinction is the source of most p-value misinterpretations.

What "p < 0.05" Means (and Does Not Mean)

What it means: If the null hypothesis were true, data as extreme as yours would occur less than 5% of the time by chance. You reject H₀ because this is surprising enough under H₀.

What it does NOT mean — the 6 most common misconceptions:

❌ "p < 0.05 means there is a 95% chance the result is real." Wrong. The p-value says nothing about the probability that H₀ is true or false.

❌ "p = 0.04 is a stronger result than p = 0.049." Both are below 0.05. The difference is noise, not signal.

❌ "p > 0.05 means no effect exists." It means insufficient evidence to reject H₀ — absence of evidence is not evidence of absence.

❌ "A small p-value means a large effect." A tiny effect with a huge sample (n = 100,000) can produce p < 0.001. Always report effect size alongside p-value.

❌ "p-value is the probability my hypothesis is wrong." The p-value is computed assuming H₀ is true — it says nothing about the probability of your hypothesis.

❌ "The 0.05 threshold is a law of nature." It is an arbitrary convention established by Ronald Fisher in 1925. Many journals now require p < 0.005 or reporting the exact p-value.

How Is the P-Value Calculated?

The calculation depends on which statistical test you run. The general process is: (1) compute a test statistic from your data, (2) find the probability that the test statistic is this extreme under the null distribution.

For a one-sample t-test: t = (x̄ − μ₀) / (s / √n). Then find P(|T| ≥ |t|) where T follows a t-distribution with df = n − 1.

For a chi-square test: χ² = Σ(O − E)²/E. Then find P(χ² ≥ χ²_observed) under the chi-square distribution with the appropriate degrees of freedom.

Modern calculators (including this site's p-value calculator) compute this automatically. The key is choosing the correct test for your data type and research question.

Test	Data type	Null hypothesis	Test statistic
One-sample t-test	Continuous, one group	Population mean = μ₀	t = (x̄ − μ₀)/(s/√n)
Two-sample t-test	Continuous, two groups	Means are equal	t = (x̄₁ − x̄₂)/SE_diff
Chi-square test	Categorical	Variables are independent	χ² = Σ(O−E)²/E
Correlation test	Two continuous variables	Correlation ρ = 0	t = r√(n−2)/√(1−r²)
ANOVA F-test	Continuous, 3+ groups	All group means equal	F = MS_between/MS_within

Significance Levels (α) — The Threshold Choice

The significance level α is the threshold below which you reject H₀. You set α before collecting data — never after seeing the results.

α = 0.05 (5%): the standard in social sciences, biology, and most A/B testing. Means you accept a 5% false positive rate (Type I error rate).

α = 0.01 (1%): stricter, used in medical trials, genomics, and high-stakes decisions. Reduces false positives but increases false negatives (Type II errors).

α = 0.001 (0.1%): very strict, used in particle physics ("5-sigma rule" ≈ p < 0.0000003) and genome-wide association studies.

The choice of α is a decision about how much Type I error risk you are willing to accept. It should reflect the cost of false positives in your field — not the 0.05 tradition.

α level	Interpretation	Common use
0.10	10% false positive rate	Exploratory research, pilot studies
0.05	5% false positive rate	Social sciences, A/B testing, most clinical research
0.01	1% false positive rate	Medical trials, psychology replication
0.001	0.1% false positive rate	Genomics, physics, high-stakes decisions
< 3×10⁻⁷	"5-sigma" ≈ 0.0000003	Particle physics (Higgs boson standard)

Beyond the P-Value — What Else You Need

A p-value alone is never enough. Modern statistics and major journal guidelines require reporting three things together: the p-value, the effect size, and the confidence interval.

Effect size answers "how large is the effect?" not just "is there an effect?" Cohen's d < 0.2 is small, 0.5 is medium, 0.8 is large. A drug that reduces blood pressure by 0.1 mmHg can produce p < 0.001 in a large trial — but 0.1 mmHg is clinically meaningless.

Confidence interval shows the range of plausible values for the true effect. A 95% CI of [0.02, 0.04] is very different from [−0.5, 5.0] even if both have p < 0.05. Wide intervals indicate uncertainty; narrow intervals indicate precision.

The replication crisis in psychology and medicine (2010s onward) was largely caused by over-reliance on p-values without effect sizes or power calculations. Many effects with p < 0.05 failed to replicate because the sample sizes were small and the effects were smaller than originally reported.

P-Hacking and How to Avoid It

P-hacking is the practice of running multiple analyses and reporting only the one with p < 0.05. If you run 20 independent tests at α = 0.05, you expect 1 false positive by chance — yet that single "significant" result looks like a real finding.

Common p-hacking patterns: adding more subjects until p < 0.05; trying different outcome variables until one is significant; removing outliers selectively; testing multiple subgroups and reporting only the one that "worked"; switching from two-tailed to one-tailed after seeing results.

Defenses: pre-register your analysis plan before collecting data (AsPredicted.org, OSF). Apply Bonferroni correction or Benjamini-Hochberg correction for multiple comparisons. Report all tests run, not just significant ones. Calculate required sample size in advance using power analysis.

Related Calculators

P-Value Calculator

Calculate p-values from z, t, or chi-square statistics

Hypothesis Testing Basics

The 5-step hypothesis testing process

T-Test Calculator

One and two-sample t-tests

Effect Size Calculator

Cohen's d and other effect size measures

Type I vs Type II Error

False positives and false negatives explained

Statistics Hub

All statistics calculators and guides

← Back to Statistics Hub

Frequently Asked Questions

Educational use only. Content is based on publicly documented mathematical formulas and reviewed for accuracy by the CalcMulti Editorial Team. Last updated: March 2026.