Sample vs Population Statistics — Parameters, Statistics & Formulas
By CalcMulti Editorial Team··7 min read
In statistics, a population is the complete set of individuals or observations you want to study. A sample is a subset of that population. This distinction matters because the formulas you use — and the conclusions you can draw — depend critically on whether your data is the full population or just a sample of it.
The most consequential difference is in the variance formula: population variance divides by N, while sample variance divides by n−1 (Bessel's correction). Using the wrong divisor produces a biased estimate of spread that can undermine hypothesis tests, confidence intervals, and all downstream statistics.
Side-by-Side Comparison
| Property | Population | Sample |
|---|---|---|
| Definition | Every member of the group of interest | A subset selected from the population |
| Size notation | N (capital) | n (lowercase) |
| Mean notation | μ (mu) — a parameter | x̄ (x-bar) — a statistic |
| Variance notation | σ² (sigma-squared) — a parameter | s² — a statistic |
| Mean formula | μ = Σx / N | x̄ = Σx / n (identical form) |
| Variance formula | σ² = Σ(x − μ)² / N | s² = Σ(x − x̄)² / (n − 1) |
| SD formula | σ = √[Σ(x − μ)² / N] | s = √[Σ(x − x̄)² / (n − 1)] |
| Why different variance? | N is all the data — no estimation needed | n−1 corrects for bias from estimating μ with x̄ |
| Uncertainty | None — parameters are exact | Sampling error — statistics vary between samples |
| Goal | Describe the population exactly | Estimate population parameters from limited data |
Parameters vs Statistics — The Core Distinction
A parameter is a fixed numerical characteristic of a population — it has one true value (even if unknown). Examples: the true mean height of all adult humans on Earth (μ), the true proportion of defective chips produced by a factory (π), the true standard deviation of blood pressure readings across all hypertensive patients (σ).
A statistic is a numerical characteristic computed from a sample — it varies from sample to sample and serves as an estimate of the corresponding parameter. Examples: the mean height of 200 adults sampled from a population (x̄), the proportion of defective chips in a batch of 500 (p̂), the standard deviation of blood pressure in a clinical trial of 150 patients (s).
Greek letters (μ, σ, π) denote parameters. Latin letters (x̄, s, p̂) denote statistics. This convention is consistent across virtually all statistics textbooks. Inferential statistics is the science of using statistics (from samples) to estimate parameters (of populations), while quantifying how uncertain those estimates are.
Why Sample Variance Uses n−1 (Bessel's Correction)
When computing sample variance, we use the sample mean x̄ as an estimate of the true population mean μ. Because x̄ is calculated from the same data, it is systematically closer to the data points than the true μ would be — the deviations (xᵢ − x̄)² are slightly smaller on average than the true deviations (xᵢ − μ)². Dividing by n instead of n−1 would therefore systematically underestimate the true population variance.
Bessel's correction (dividing by n−1 instead of n) adjusts for this bias. The mathematical proof: the expected value of Σ(xᵢ − x̄)² / (n−1) equals σ² — the sample variance with this correction is an unbiased estimator of the population variance. Without the correction, the expected value is σ² × (n−1)/n, which is always smaller than σ².
Worked example: dataset {4, 7, 13, 16}. Mean = 40/4 = 10. Deviations: −6, −3, +3, +6. Squared deviations: 36, 9, 9, 36. Sum = 90. Population variance (if this were the full population): σ² = 90/4 = 22.5. Sample variance (if this is a sample): s² = 90/3 = 30. The sample variance is larger — this is the correction in action.
When does the difference matter? For small samples (n < 30), the difference between n and n−1 is substantial: at n=5, n/(n−1) = 1.25 — a 25% difference. For large samples (n > 100), n and n−1 are nearly identical. For n = 1000, the correction is only 0.1% — negligible in practice.
Which Formula Should You Use?
The practical rule: use the population formula (÷N) only when you have data on every single member of the population you care about. This is rare — it applies to a census, a complete factory batch, a historical record with no missing entries, or a closed dataset where every member is included.
Use the sample formula (÷n−1) in virtually all other situations: surveys (you interviewed 500 people, not all people), experiments (you treated 50 mice, not all possible mice), business analytics (you have last month's data, not all possible future months), quality control (you inspected 200 units, not the entire production run).
A useful test: would collecting more data change your formula denominator? If yes — if more data could exist — you have a sample, and n−1 is correct. If no — if the dataset is the complete universe you care about (e.g., all employees in one company, all students in one class) — and you only want to describe that group, use N.
The most common mistake: computing standard deviation in Excel. STDEV() uses n−1 (sample), STDEVP() uses N (population). STDEV() is almost always what you want.
Sampling Error and Standard Error
Because sample statistics (x̄, s, p̂) are computed from only part of the population, they vary from sample to sample. If you drew 100 different samples of size 50 from the same population, you would get 100 different x̄ values. This variation is called sampling error — not a mistake, but the natural uncertainty of working with a subset.
The standard error of the mean (SEM = s/√n) quantifies this variation: it estimates how much the sample mean x̄ would vary if you repeated the sampling process. Larger n → smaller SEM → more precise estimate of μ. This is why increasing sample size reduces uncertainty.
Confidence intervals and hypothesis tests are built on this principle. A 95% confidence interval x̄ ± 1.96 × SEM means: if you repeated the study many times, 95% of the resulting intervals would contain the true population mean μ. This is a statement about sampling variability, not about any individual sample.
Summary
If you have the full population, use the population formula (÷N). If you have a sample from a larger population — which is almost always the case — use the sample formula (÷n−1) and accompany your results with standard errors and confidence intervals to quantify uncertainty.
- Population (÷N): census data, complete historical records, closed datasets where every member is included and you only want to describe that group
- Sample (÷n−1): surveys, experiments, business samples, quality control checks, virtually all real-world datasets where the data is a subset of a larger universe
- Excel reminder: STDEV() = sample (n−1), STDEVP() = population (N) — STDEV() is almost always correct
- Large samples (n > 100): the difference between n and n−1 is <1% — in practice, both formulas give nearly identical results
Related Calculators
Population and sample variance with step-by-step solution
Standard Error CalculatorSE of the mean — precision of your sample mean
Confidence Interval CalculatorEstimate population parameters from samples
Sample Size CalculatorHow many observations do you need?
Descriptive vs Inferential StatisticsDescribing data vs drawing population conclusions
Statistics HubAll statistics calculators & guides
Frequently Asked Questions
Educational use only. Content is based on publicly documented mathematical formulas and reviewed for accuracy by the CalcMulti Editorial Team. Last updated: February 2026.