Population vs Sample Variance — Why n vs n−1?
By CalcMulti Editorial Team··9 min read
Every statistics student eventually asks: why does sample variance divide by n−1 instead of n? The answer involves one of the most elegant ideas in statistics — the concept of bias in estimation. Getting this wrong produces variance estimates that are systematically too small, which in turn makes standard deviations too small, confidence intervals too narrow, and hypothesis tests too eager to declare significance.
This guide explains the two formulas, where the difference comes from, why it matters, and the mathematical intuition behind Bessel's correction — without requiring calculus.
Formula
Population: σ² = Σ(xᵢ − μ)² / n | Sample: s² = Σ(xᵢ − x̄)² / (n − 1)
Quick Answer: n or n−1?
Use n (population variance, σ²) when your dataset contains every member of the group you are studying — all employees, all items in a batch, every student in one class. Use n−1 (sample variance, s²) when your data is a subset drawn from a larger population and you want to estimate the true variance of that larger group. When in doubt, use n−1.
One-line rule: Did you measure the whole group, or only part of it? Whole group → n. Part of it → n−1.
| Scenario | Full Population or Sample? | Formula | Denominator |
|---|---|---|---|
| Scores for all 30 students in one class | Full population | σ² | n = 30 |
| Scores for 30 of 1,200 students in a school | Sample | s² | n−1 = 29 |
| Heights of all 50 employees in a startup | Full population | σ² | n = 50 |
| Heights of 50 adults sampled from a city | Sample | s² | n−1 = 49 |
| Blood pressure of 20 patients in a clinical trial | Sample | s² | n−1 = 19 |
| Defect rate across all 500 products in a batch | Full population | σ² | n = 500 |
| Quality check on 30 of 10,000 manufactured parts | Sample | s² | n−1 = 29 |
| Monthly returns of a stock over 12 months | Sample (of possible returns) | s² | n−1 = 11 |
The Two Formulas Side by Side
Both formulas measure the average squared deviation from the mean. The only difference is the denominator: n for population variance, n−1 for sample variance.
Population variance (σ²) divides by n. It is used when your dataset IS the entire population — every single member of the group you are studying. Example: you have the exact score of every student in one specific class of 30. You want to describe the spread of that class, not estimate the spread of a larger group. Divide by n = 30.
Sample variance (s²) divides by n−1. It is used when your dataset is a SAMPLE drawn from a larger population, and you want to estimate the true population variance. Example: you surveyed 30 randomly selected students from a school of 1,200 and want to estimate the variance for the whole school. Divide by n−1 = 29.
Worked example with the same data, different intent: Dataset {4, 7, 13, 2}. n = 4. Mean = (4+7+13+2)/4 = 6.5. Deviations: −2.5, +0.5, +6.5, −4.5. Squared deviations: 6.25, 0.25, 42.25, 20.25. Sum = 69.0. Population variance: 69.0/4 = 17.25. Sample variance: 69.0/3 = 23.0.
| Property | Population Variance (σ²) | Sample Variance (s²) |
|---|---|---|
| Symbol | σ² (sigma squared) | s² |
| Denominator | n | n − 1 |
| When to use | Data IS the full population | Data is a sample from a larger population |
| Result | Exact population spread | Unbiased estimate of population spread |
| Bias | Unbiased (exact) | Unbiased estimator of σ² |
| Example | All 30 students in one class | 30 students sampled from 1,200 |
The Bias Problem — Why Dividing by n Is Wrong for Samples
When you compute variance from a sample, you use the sample mean x̄ — not the true population mean μ. This creates a subtle but systematic problem: x̄ is the value that minimises the sum of squared deviations from your specific sample. No other value would produce a smaller sum of squared deviations for your particular data.
Because x̄ is optimised for your sample, the deviations (xᵢ − x̄) are systematically smaller than the deviations from the true population mean (xᵢ − μ). When you sum the squared deviations and divide by n, you are dividing a sum that is already biased downward. The result — the biased variance — consistently underestimates the true population variance σ².
A simple demonstration: suppose the population is {1, 5, 9} with μ = 5 and σ² = (16+0+16)/3 = 10.67. Now take the sample {1, 5}. Sample mean x̄ = 3. Biased estimate (÷n): [(1−3)²+(5−3)²]/2 = [4+4]/2 = 4. Unbiased estimate (÷n−1): [4+4]/1 = 8. True population variance is 10.67. Neither sample estimate is exactly right (that is sampling variation), but on average across many possible samples of size 2, dividing by n−1 will hit the true value; dividing by n will consistently be too low.
The mathematical proof shows that E[biased estimator] = σ² × (n−1)/n. Multiplying by n/(n−1) corrects this: E[s²] = σ². This is exactly what dividing by n−1 achieves.
Bessel's Correction — The Intuition
Friedrich Bessel (1784–1846) was a German mathematician and astronomer who formalised this correction while working on astronomical measurement errors. The correction that bears his name answers: by what factor should we scale up the biased variance to remove the downward bias?
The answer is n/(n−1). Multiplying the biased variance (÷n) by n/(n−1) gives the unbiased variance (÷n−1). These two operations are equivalent: σ²_biased × n/(n−1) = Σ(xᵢ−x̄)²/n × n/(n−1) = Σ(xᵢ−x̄)²/(n−1) = s².
The intuition: when you draw a sample, you have n data points but only n−1 of them are 'free' to vary. Once you know n−1 values and the sample mean, the last value is completely determined (it must make the mean come out right). This is what statisticians call degrees of freedom. You have n observations but spend 1 degree of freedom computing x̄, leaving n−1 free degrees. Dividing by n−1 accounts for this constraint.
The effect of the correction diminishes as n grows: for n = 5, dividing by 4 vs 5 is a 25% difference. For n = 100, it is only 1%. For n = 1,000, it is 0.1%. This is why the distinction matters most in small samples (n < 30) and is negligible for large datasets.
Practical Decision Guide — Which Formula to Use
Use population variance (÷n) when: you measured every single member of the group you care about; you have no interest in generalising to a wider group; you are describing a closed system (all employees in a specific company, all products in a finished batch, all students in a single class during a specific term).
Use sample variance (÷n−1) when: you collected data from a subset of a larger group; you want to make inferences or predictions about the wider population; your data comes from a survey, experiment, clinical trial, or any process where you could in principle collect more data. This is the correct choice in the vast majority of real-world statistical work.
When you are not sure: default to sample variance (n−1). Most statistical software — including Excel's VAR(), Python's numpy.var(ddof=1), and R's var() — uses the sample formula by default. Excel has separate functions: VAR.S() for sample, VAR.P() for population. Python numpy.var() defaults to population (ddof=0); use ddof=1 for sample.
It rarely matters for large samples. For n ≥ 100, the difference between n and n−1 in the denominator is less than 1%. The distinction is most consequential when n is small (2–30), which is common in pilot studies, quality control sampling, and experimental research.
Excel, Python, and R — Which Formula Does Each Use?
Different tools have different defaults, which causes errors when switching between them. The table below shows exactly which function to call for each formula in the most common software environments.
| Software | Sample Variance (n−1) | Population Variance (n) | Sample Std Dev (n−1) | Population Std Dev (n) |
|---|---|---|---|---|
| Excel | VAR.S() or VAR() | VAR.P() | STDEV.S() or STDEV() | STDEV.P() |
| Python numpy | np.var(a, ddof=1) | np.var(a) or np.var(a, ddof=0) | np.std(a, ddof=1) | np.std(a) |
| Python pandas | df.var() or df[col].var() | df.var(ddof=0) | df.std() | df.std(ddof=0) |
| R | var(x) | var(x) × (n−1)/n * | sd(x) | sd(x) × √((n−1)/n) * |
| SPSS | Variance in Descriptives | Not default — compute manually | Std Deviation | Not default |
| Google Sheets | VAR() or VAR.S() | VARP() or VAR.P() | STDEV() | STDEVP() |
| MATLAB | var(x) — n−1 by default | var(x,1) — weights=1 uses n | std(x) | std(x,1) |
| SQL (PostgreSQL) | VAR_SAMP(col) | VAR_POP(col) | STDDEV_SAMP(col) | STDDEV_POP(col) |
Common Mistakes and How to Avoid Them
Mistake 1: Using population variance for sample data. The most frequent error. If you surveyed 40 people from a city of 500,000 and used σ² (÷40), your variance estimate is ~2.5% too low (factor of 39/40). For small samples this error compounds: n=10 gives 10% underestimation, n=5 gives 20%. Always ask: "Is this my full group or a subset?"
Mistake 2: Wrong Excel function. Excel's VAR() changed to VAR.S() in Excel 2010. Both use n−1. VAR.P() is the population function. If you learned statistics before 2010 and still type VAR(), you are getting the sample version (n−1) — which is usually correct, but verify. The STDEV vs STDEVP confusion causes the same problem for standard deviation.
Mistake 3: numpy defaulting to population. Python's numpy.var() uses population variance (n) by default — opposite to most other tools. This catches many data scientists off guard. If you are estimating population variance from a sample, always write np.var(data, ddof=1). The same applies to np.std().
Mistake 4: Applying correction to the whole dataset. If you have census data (data on everyone), using n−1 actually makes your variance estimate worse — you are introducing unnecessary inflation. Government statisticians and actuaries often work with true population data and correctly use n (population variance).
Mistake 5: Forgetting that SD correction differs from variance correction. While s² is an unbiased estimator of σ², the sample standard deviation s = √s² is NOT an unbiased estimator of σ. The square root introduces a small downward bias. For most practical purposes this is ignored, but in precision quality control (manufacturing, metrology) the c₄ correction factor is applied to obtain an unbiased σ estimate.
Related Calculators
Compute σ² and s² with full deviation table
Standard Deviation CalculatorSquare root of variance, same units as data
Variance vs Standard DeviationWhich to report and why
Mean CalculatorRequired first step before variance
Coefficient of Variation CalculatorRelative spread as % of mean
Statistics HubAll statistics calculators & guides
Frequently Asked Questions
Educational use only. Content is based on publicly documented mathematical formulas and reviewed for accuracy by the CalcMulti Editorial Team. Last updated: February 2026.