Does sample standard deviation also use n−1?

Yes. Sample standard deviation s = √s² = √[Σ(xᵢ−x̄)²/(n−1)]. Population standard deviation σ = √σ² = √[Σ(xᵢ−μ)²/n]. The same correction applies. Note that while s² is an unbiased estimator of σ², the sample standard deviation s is not an unbiased estimator of σ — the square root introduces a small downward bias. In practice this is usually ignored, but more precise corrections exist (the c₄ correction factor) for small-sample applications in quality control.

What happens when n = 1?

With a sample of n = 1, sample variance is undefined — the denominator n−1 = 0. This makes intuitive sense: you cannot estimate variability from a single observation. There is no information about spread in a dataset of one. Population variance of a single value is 0 (no deviation from the mean), but this is trivially true and not useful.

Do Excel, Python, and R use n or n−1 by default?

Excel: VAR() and VAR.S() use n−1 (sample); VAR.P() uses n (population). STDEV() uses n−1; STDEVP() uses n. Python numpy: np.var() uses n (population) by default; np.var(ddof=1) uses n−1 (sample). pandas: .var() uses n−1 by default. R: var() and sd() always use n−1. This inconsistency between tools is a common source of errors — always check your software documentation when precision matters.

Is variance always positive?

Yes — variance is the average of squared deviations. Squaring eliminates all negative values. The minimum possible variance is exactly 0, which occurs only when every value in the dataset is identical (zero spread). Any dataset with at least two different values will have variance > 0. If you get a negative variance from a calculation, there is an arithmetic error.

Why is Bessel's correction named after Bessel?

Friedrich Bessel (1784–1846) was a German mathematician and astronomer who, while working on reducing errors in astronomical observations, formalised the correction for bias in variance estimation from samples. He was primarily known for other contributions (Bessel functions in physics, the first measurement of stellar parallax), but the n−1 correction in sample variance has carried his name into every introductory statistics course. The correction itself was known earlier, but Bessel's systematic treatment made it standard.

Population vs Sample Variance — Why n vs n−1?

Name: Population vs Sample Variance — Simple Guide with Examples
Availability: OnlineOnly
Author: CalcMulti Editorial Team

By CalcMulti Editorial Team·Updated: February 2026·9 min read

Every statistics student eventually asks: why does sample variance divide by n−1 instead of n? The answer involves one of the most elegant ideas in statistics — the concept of bias in estimation. Getting this wrong produces variance estimates that are systematically too small, which in turn makes standard deviations too small, confidence intervals too narrow, and hypothesis tests too eager to declare significance.

This guide explains the two formulas, where the difference comes from, why it matters, and the mathematical intuition behind Bessel's correction — without requiring calculus.

Formula

Population: σ² = Σ(xᵢ − μ)² / n | Sample: s² = Σ(xᵢ − x̄)² / (n − 1)

Quick Answer: n or n−1?

Use n (population variance, σ²) when your dataset contains every member of the group you are studying — all employees, all items in a batch, every student in one class. Use n−1 (sample variance, s²) when your data is a subset drawn from a larger population and you want to estimate the true variance of that larger group. When in doubt, use n−1.

One-line rule: Did you measure the whole group, or only part of it? Whole group → n. Part of it → n−1.

Scenario	Full Population or Sample?	Formula	Denominator
Scores for all 30 students in one class	Full population	σ²	n = 30
Scores for 30 of 1,200 students in a school	Sample	s²	n−1 = 29
Heights of all 50 employees in a startup	Full population	σ²	n = 50
Heights of 50 adults sampled from a city	Sample	s²	n−1 = 49
Blood pressure of 20 patients in a clinical trial	Sample	s²	n−1 = 19
Defect rate across all 500 products in a batch	Full population	σ²	n = 500
Quality check on 30 of 10,000 manufactured parts	Sample	s²	n−1 = 29
Monthly returns of a stock over 12 months	Sample (of possible returns)	s²	n−1 = 11

The Two Formulas Side by Side

Both formulas measure the average squared deviation from the mean. The only difference is the denominator: n for population variance, n−1 for sample variance.

Population variance (σ²) divides by n. It is used when your dataset IS the entire population — every single member of the group you are studying. Example: you have the exact score of every student in one specific class of 30. You want to describe the spread of that class, not estimate the spread of a larger group. Divide by n = 30.

Sample variance (s²) divides by n−1. It is used when your dataset is a SAMPLE drawn from a larger population, and you want to estimate the true population variance. Example: you surveyed 30 randomly selected students from a school of 1,200 and want to estimate the variance for the whole school. Divide by n−1 = 29.

Worked example with the same data, different intent: Dataset {4, 7, 13, 2}. n = 4. Mean = (4+7+13+2)/4 = 6.5. Deviations: −2.5, +0.5, +6.5, −4.5. Squared deviations: 6.25, 0.25, 42.25, 20.25. Sum = 69.0. Population variance: 69.0/4 = 17.25. Sample variance: 69.0/3 = 23.0.

Property	Population Variance (σ²)	Sample Variance (s²)
Symbol	σ² (sigma squared)	s²
Denominator	n	n − 1
When to use	Data IS the full population	Data is a sample from a larger population
Result	Exact population spread	Unbiased estimate of population spread
Bias	Unbiased (exact)	Unbiased estimator of σ²
Example	All 30 students in one class	30 students sampled from 1,200

The Bias Problem — Why Dividing by n Is Wrong for Samples

When you compute variance from a sample, you use the sample mean x̄ — not the true population mean μ. This creates a subtle but systematic problem: x̄ is the value that minimises the sum of squared deviations from your specific sample. No other value would produce a smaller sum of squared deviations for your particular data.

Because x̄ is optimised for your sample, the deviations (xᵢ − x̄) are systematically smaller than the deviations from the true population mean (xᵢ − μ). When you sum the squared deviations and divide by n, you are dividing a sum that is already biased downward. The result — the biased variance — consistently underestimates the true population variance σ².

A simple demonstration: suppose the population is {1, 5, 9} with μ = 5 and σ² = (16+0+16)/3 = 10.67. Now take the sample {1, 5}. Sample mean x̄ = 3. Biased estimate (÷n): [(1−3)²+(5−3)²]/2 = [4+4]/2 = 4. Unbiased estimate (÷n−1): [4+4]/1 = 8. True population variance is 10.67. Neither sample estimate is exactly right (that is sampling variation), but on average across many possible samples of size 2, dividing by n−1 will hit the true value; dividing by n will consistently be too low.

The mathematical proof shows that E[biased estimator] = σ² × (n−1)/n. Multiplying by n/(n−1) corrects this: E[s²] = σ². This is exactly what dividing by n−1 achieves.

Bessel's Correction — The Intuition

Friedrich Bessel (1784–1846) was a German mathematician and astronomer who formalised this correction while working on astronomical measurement errors. The correction that bears his name answers: by what factor should we scale up the biased variance to remove the downward bias?

The answer is n/(n−1). Multiplying the biased variance (÷n) by n/(n−1) gives the unbiased variance (÷n−1). These two operations are equivalent: σ²_biased × n/(n−1) = Σ(xᵢ−x̄)²/n × n/(n−1) = Σ(xᵢ−x̄)²/(n−1) = s².

The intuition: when you draw a sample, you have n data points but only n−1 of them are 'free' to vary. Once you know n−1 values and the sample mean, the last value is completely determined (it must make the mean come out right). This is what statisticians call degrees of freedom. You have n observations but spend 1 degree of freedom computing x̄, leaving n−1 free degrees. Dividing by n−1 accounts for this constraint.

The effect of the correction diminishes as n grows: for n = 5, dividing by 4 vs 5 is a 25% difference. For n = 100, it is only 1%. For n = 1,000, it is 0.1%. This is why the distinction matters most in small samples (n < 30) and is negligible for large datasets.

Practical Decision Guide — Which Formula to Use

Use population variance (÷n) when: you measured every single member of the group you care about; you have no interest in generalising to a wider group; you are describing a closed system (all employees in a specific company, all products in a finished batch, all students in a single class during a specific term).

Use sample variance (÷n−1) when: you collected data from a subset of a larger group; you want to make inferences or predictions about the wider population; your data comes from a survey, experiment, clinical trial, or any process where you could in principle collect more data. This is the correct choice in the vast majority of real-world statistical work.

When you are not sure: default to sample variance (n−1). Most statistical software — including Excel's VAR(), Python's numpy.var(ddof=1), and R's var() — uses the sample formula by default. Excel has separate functions: VAR.S() for sample, VAR.P() for population. Python numpy.var() defaults to population (ddof=0); use ddof=1 for sample.

It rarely matters for large samples. For n ≥ 100, the difference between n and n−1 in the denominator is less than 1%. The distinction is most consequential when n is small (2–30), which is common in pilot studies, quality control sampling, and experimental research.

Excel, Python, and R — Which Formula Does Each Use?

Different tools have different defaults, which causes errors when switching between them. The table below shows exactly which function to call for each formula in the most common software environments.

Software	Sample Variance (n−1)	Population Variance (n)	Sample Std Dev (n−1)	Population Std Dev (n)
Excel	VAR.S() or VAR()	VAR.P()	STDEV.S() or STDEV()	STDEV.P()
Python numpy	np.var(a, ddof=1)	np.var(a) or np.var(a, ddof=0)	np.std(a, ddof=1)	np.std(a)
Python pandas	df.var() or df[col].var()	df.var(ddof=0)	df.std()	df.std(ddof=0)
R	var(x)	var(x) × (n−1)/n *	sd(x)	sd(x) × √((n−1)/n) *
SPSS	Variance in Descriptives	Not default — compute manually	Std Deviation	Not default
Google Sheets	VAR() or VAR.S()	VARP() or VAR.P()	STDEV()	STDEVP()
MATLAB	var(x) — n−1 by default	var(x,1) — weights=1 uses n	std(x)	std(x,1)
SQL (PostgreSQL)	VAR_SAMP(col)	VAR_POP(col)	STDDEV_SAMP(col)	STDDEV_POP(col)

Common Mistakes and How to Avoid Them

Mistake 1: Using population variance for sample data. The most frequent error. If you surveyed 40 people from a city of 500,000 and used σ² (÷40), your variance estimate is ~2.5% too low (factor of 39/40). For small samples this error compounds: n=10 gives 10% underestimation, n=5 gives 20%. Always ask: "Is this my full group or a subset?"

Mistake 2: Wrong Excel function. Excel's VAR() changed to VAR.S() in Excel 2010. Both use n−1. VAR.P() is the population function. If you learned statistics before 2010 and still type VAR(), you are getting the sample version (n−1) — which is usually correct, but verify. The STDEV vs STDEVP confusion causes the same problem for standard deviation.

Mistake 3: numpy defaulting to population. Python's numpy.var() uses population variance (n) by default — opposite to most other tools. This catches many data scientists off guard. If you are estimating population variance from a sample, always write np.var(data, ddof=1). The same applies to np.std().

Mistake 4: Applying correction to the whole dataset. If you have census data (data on everyone), using n−1 actually makes your variance estimate worse — you are introducing unnecessary inflation. Government statisticians and actuaries often work with true population data and correctly use n (population variance).

Mistake 5: Forgetting that SD correction differs from variance correction. While s² is an unbiased estimator of σ², the sample standard deviation s = √s² is NOT an unbiased estimator of σ. The square root introduces a small downward bias. For most practical purposes this is ignored, but in precision quality control (manufacturing, metrology) the c₄ correction factor is applied to obtain an unbiased σ estimate.

Related Calculators

Variance Calculator

Compute σ² and s² with full deviation table

Standard Deviation Calculator

Square root of variance, same units as data

Variance vs Standard Deviation

Which to report and why

Mean Calculator

Required first step before variance

Coefficient of Variation Calculator

Relative spread as % of mean

Statistics Hub

All statistics calculators & guides

← Back to Statistics Hub

Frequently Asked Questions

Educational use only. Content is based on publicly documented mathematical formulas and reviewed for accuracy by the CalcMulti Editorial Team. Last updated: February 2026.