Effect Size Explained

By CalcMulti Editorial Team··8 min read

A p-value tells you whether an effect exists. Effect size tells you how large that effect is. Without effect size, statistical significance is nearly meaningless — a drug that lowers blood pressure by 0.1 mmHg can produce p < 0.001 in a study of 100,000 patients while being clinically worthless.

This guide covers the most widely used effect size measures — Cohen's d for comparing means, Pearson r for correlations, eta-squared (η²) for ANOVA, and odds ratio for binary outcomes — with benchmarks for interpreting small, medium, and large effects.

Why Effect Size Matters More Than p-Value Alone

Statistical significance depends on both effect size AND sample size. With a large enough sample, even a trivially small effect will be statistically significant. With a small sample, even a large effect may not reach significance. Effect size is independent of sample size — it measures magnitude, not detectability.

Example: Study A (n=50) finds that a teaching intervention improves test scores by 15 points (sd=10), d=1.5, p=0.001. Study B (n=10,000) finds the same intervention improves scores by 0.2 points (sd=10), d=0.02, p=0.003. Both are "statistically significant," but Study A's effect is 75 times larger and practically meaningful. Study B's effect is negligible despite the smaller p-value.

The American Psychological Association (APA), British Medical Journal, and most major journals now require effect sizes to be reported alongside p-values for exactly this reason.

Cohen's d — Comparing Two Means

Cohen's d is the standardized mean difference between two groups. It expresses the difference in units of standard deviations, making it comparable across different studies and measurement scales.

d = (μ₁ − μ₂) / σ_pooled, where σ_pooled = √[(s₁² + s₂²)/2] for equal-size groups.

Worked example: Group 1 (treatment): mean=72, sd=10. Group 2 (control): mean=64, sd=10. d = (72−64)/10 = 0.8. The groups differ by 0.8 standard deviations — a large effect.

For one-sample tests (comparing a mean to a known value μ₀): d = (x̄ − μ₀)/s.

Cohen's dInterpretationReal-world analogy
0.0 – 0.2Negligible / very smallIQ difference of ~3 points
0.2 – 0.5SmallHeight difference between 15 and 16-year-old boys
0.5 – 0.8MediumIQ difference between PhD holders and general population
0.8 – 1.2LargeEffect of coaching on SAT scores
> 1.2Very largeDifference between professional and amateur athletes

Pearson r — Effect Size for Correlations

Pearson r already is an effect size — it measures the strength of the linear relationship between two variables on a −1 to +1 scale.

r² (the square of r) is the "coefficient of determination" — it tells you what proportion of variance in one variable is explained by the other. r = 0.5 means r² = 0.25 = 25% of variance explained.

r can also be converted to Cohen's d: d = 2r / √(1−r²), and vice versa. This lets you compare correlation-based and mean-based effect sizes.

|r| valueInterpretationr² (variance explained)
0.00 – 0.10Negligible< 1%
0.10 – 0.30Small1% – 9%
0.30 – 0.50Medium9% – 25%
0.50 – 0.70Large25% – 49%
> 0.70Very large> 49%

Eta-Squared (η²) — Effect Size for ANOVA

Eta-squared measures the proportion of total variance in the dependent variable explained by the independent variable (group membership). It is the ANOVA equivalent of r².

η² = SS_between / SS_total, where SS = sum of squares. Range: 0 to 1.

Partial eta-squared (η²_p) is preferred in factorial ANOVA — it measures the proportion of variance explained by a factor after accounting for other factors in the model. Most statistical software reports partial η² by default.

Benchmark: η² = 0.01 (small), 0.06 (medium), 0.14 (large). These are Cohen's conventions for ANOVA, equivalent to d ≈ 0.2, 0.5, 0.8.

Other Effect Size Measures

Odds ratio (OR): used in logistic regression and 2×2 contingency tables. OR = (a/b)/(c/d) where a,b,c,d are cell counts. OR=1 means no effect; OR=2 means twice the odds. OR < 1 means lower odds. Used extensively in medical research.

Relative risk (RR): RR = P(outcome|exposed) / P(outcome|unexposed). More intuitive than OR for interpreting treatment effects. Note: OR ≈ RR when the outcome is rare (< 10% prevalence).

Hedges' g: similar to Cohen's d but corrected for small-sample bias (uses n−1 in denominator). Preferred over d when n < 20.

Glass's Δ: uses only the control group SD in the denominator, appropriate when groups have different variances or when you want to express the effect relative to the natural variability of the control condition.

MeasureUsed forRange
Cohen's dComparing two means0 to ∞ (absolute value)
Pearson rCorrelations−1 to +1
Eta-squared η²ANOVA (proportion of variance)0 to 1
Odds ratioBinary outcomes (logistic regression)0 to ∞ (1 = no effect)
Relative riskProspective studies, RCTs0 to ∞ (1 = no effect)
Hedges' gSmall-sample comparisons of means0 to ∞ (absolute value)

How to Report Effect Sizes

APA format: "The treatment group scored significantly higher than the control group, t(48) = 3.21, p = .002, d = 0.91, 95% CI [0.37, 1.44]."

Always report: (1) the test statistic with df, (2) the exact p-value, (3) the effect size with its name, (4) a 95% confidence interval for the effect size when possible.

Confidence intervals for effect sizes are especially valuable — they show the range of plausible true effect sizes. A CI that includes 0 (for d) or 1 (for OR) is consistent with no effect, regardless of the p-value.

Frequently Asked Questions

Educational use only. Content is based on publicly documented mathematical formulas and reviewed for accuracy by the CalcMulti Editorial Team. Last updated: March 2026.