Statistics Formulas — Complete Reference Guide

By CalcMulti Editorial Team··10 min read

This reference guide collects the core formulas used in descriptive statistics, probability, and inferential statistics. Each formula is presented with its notation, what each symbol means, and a worked example. Bookmark this page as your go-to statistics formula sheet.

Formulas are organised by topic: start with descriptive statistics for summarising data, move to probability for modelling uncertainty, and finish with inferential statistics for drawing conclusions from samples.

Central Tendency

Central tendency measures describe where the centre of a dataset falls. The three main measures are the arithmetic mean, median, and mode — each appropriate for different data types and distributions.

MeasureFormulaUse whenSensitive to outliers?
Arithmetic Meanx̄ = Σx / nSymmetric data, no extreme outliersYes — strongly
Weighted Meanx̄w = Σ(wᵢxᵢ) / ΣwᵢValues have different importance/frequencyYes
Geometric MeanGM = (x₁×x₂×…×xₙ)^(1/n)Multiplicative data: growth rates, ratiosLess than arithmetic mean
Harmonic MeanHM = n / Σ(1/xᵢ)Rates and speeds (distance/time)Yes — to very small values
MedianMiddle value when sortedSkewed data, ordinal data, outliers presentNo — robust
ModeMost frequent value(s)Categorical data, multimodal distributionsNo

Measures of Spread (Variability)

Spread measures quantify how dispersed values are around the centre. A low spread means values cluster tightly; a high spread means they are widely scattered. The choice of spread measure depends on data type and whether outliers are present.

MeasureFormulaUnitsNotes
Rangemax − minSame as dataSimple but heavily influenced by extremes
IQRQ3 − Q1Same as dataMiddle 50%; robust to outliers
Population Varianceσ² = Σ(xᵢ − μ)² / nSquared unitsUse when data IS the full population
Sample Variances² = Σ(xᵢ − x̄)² / (n−1)Squared unitsUse when data is a sample; Bessel's correction
Population SDσ = √σ²Same as dataInterpret directly in original units
Sample SDs = √s²Same as dataMost common reported measure of spread
Coefficient of VariationCV = (s / x̄) × 100%%Relative spread — compare datasets with different units
Standard ErrorSE = s / √nSame as dataPrecision of the sample mean estimate

Position and Standardisation

Position measures locate a specific value within a distribution. Z-scores standardise values to a common scale regardless of original units, enabling direct comparison across different datasets.

MeasureFormulaInterpretation
Z-Score (population)z = (x − μ) / σStandard deviations above/below the population mean
Z-Score (sample)z = (x − x̄) / sStandard deviations above/below the sample mean
Percentile rankPR = (# values < x) / n × 100Percentage of data below value x
Quartile Q1Median of lower half25th percentile — 25% of data lies below
Quartile Q3Median of upper half75th percentile — 75% of data lies below
Tukey Outlier FenceQ1 − 1.5×IQR and Q3 + 1.5×IQRValues outside are potential outliers

Probability Rules

Probability quantifies uncertainty. All probabilities must satisfy 0 ≤ P(A) ≤ 1, and the probabilities of all possible outcomes must sum to 1. The four rules below cover the building blocks of most probability calculations.

RuleFormulaWhen to use
Complement ruleP(Aᶜ) = 1 − P(A)Finding "at least one" or "not A" scenarios
Addition rule (mutually exclusive)P(A ∪ B) = P(A) + P(B)Events that cannot both occur simultaneously
Addition rule (general)P(A ∪ B) = P(A) + P(B) − P(A ∩ B)Events that can overlap
Multiplication rule (independent)P(A ∩ B) = P(A) × P(B)Events where one does not affect the other
Multiplication rule (dependent)P(A ∩ B) = P(A) × P(B|A)Events where one affects the probability of the other
Conditional probabilityP(B|A) = P(A ∩ B) / P(A)Probability of B given A has already occurred
Bayes' theoremP(A|B) = P(B|A) × P(A) / P(B)Updating probability when new evidence arrives

Key Probability Distributions

Probability distributions describe how values are spread across possible outcomes. Choosing the right distribution is the foundation of correct statistical modelling.

DistributionFormulaMeanVarianceUse for
Normalf(x) = (1/σ√2π) × e^−½((x−μ)/σ)²μσ²Continuous symmetric data; CLT approximation
BinomialP(X=k) = C(n,k) × pᵏ × (1−p)^(n−k)npnp(1−p)Fixed n trials, success/failure outcomes
PoissonP(X=k) = λᵏ × e^−λ / k!λλCount of rare events in an interval
t-distributionHeavy-tailed; df parameterμ (df>1)df/(df−2)Small samples with unknown σ
Chi-squareSum of squared standard normalsdf2×dfCategorical data tests; variance tests
F-distributionRatio of two chi-square variatesdf₂/(df₂−2)ANOVA; comparing variances

Inferential Statistics Formulas

Inferential statistics uses sample data to draw conclusions about a population. The core tools are hypothesis tests (which produce a p-value) and confidence intervals (which produce a plausible range for the parameter).

TestFormuladfUse for
One-sample z-testz = (x̄ − μ₀) / (σ / √n)Mean vs known μ₀, σ known
One-sample t-testt = (x̄ − μ₀) / (s / √n)n−1Mean vs known μ₀, σ unknown
Two-sample Welch tt = (x̄₁ − x̄₂) / √(s₁²/n₁ + s₂²/n₂)Welch–Satterthwaite approx.Comparing two means, unequal variance
Chi-square GoFχ² = Σ(O − E)² / Ek − 1Observed vs expected frequencies
Chi-square independenceχ² = Σ(O − E)² / E(r−1)(c−1)Association between two categorical variables
CI for mean (t)x̄ ± t* × s/√nUnknown σ — uses t critical value
CI for proportion (z)p̂ ± z* × √(p̂(1−p̂)/n)Binary outcome — uses z critical value

Regression Formulas

Simple linear regression models the linear relationship between one predictor variable x and one outcome variable y. The goal is to find the line y = b₀ + b₁x that minimises the sum of squared residuals (the ordinary least squares criterion).

The slope b₁ tells you how much y changes on average for each unit increase in x. The intercept b₀ is the predicted value of y when x = 0. R² (the coefficient of determination) tells you what proportion of the variance in y is explained by x — ranging from 0 (no relationship) to 1 (perfect linear relationship).

QuantityFormulaInterpretation
Slopeb₁ = Σ(xᵢ−x̄)(yᵢ−ȳ) / Σ(xᵢ−x̄)²Change in y per unit increase in x
Interceptb₀ = ȳ − b₁x̄Predicted y when x = 0
Pearson rr = Σ(xᵢ−x̄)(yᵢ−ȳ) / (n−1)sₓsᵧLinear correlation strength: −1 to +1
R-squaredR² = r²Proportion of variance in y explained by x
Residuale = yᵢ − ŷᵢDifference between observed and predicted y
RMSE√(Σeᵢ² / (n−2))Typical prediction error in original units

Frequently Asked Questions

Educational use only. Content is based on publicly documented mathematical formulas and reviewed for accuracy by the CalcMulti Editorial Team. Last updated: February 2026.