What is the difference between simple and multiple linear regression?

Simple linear regression has one predictor variable (x) and models y = b₀ + b₁x. Multiple linear regression has two or more predictors and models y = b₀ + b₁x₁ + b₂x₂ + ... + bₖxₖ. Each slope bᵢ represents the change in y for a one-unit increase in xᵢ, holding all other predictors constant. Multiple regression is more powerful and realistic — most outcomes are influenced by more than one factor — but requires more data and careful attention to multicollinearity.

Does regression prove causation?

No. Regression identifies associations (and their direction and magnitude), but does not prove causation. A significant regression coefficient means x and y are linearly related — not that x causes y. Causation requires: temporal precedence (x comes before y), correlation, and elimination of alternative explanations (confounders). Randomised controlled experiments are the gold standard for causation. Observational regression studies can suggest causal relationships but cannot confirm them without additional causal analysis (instrumental variables, regression discontinuity, etc.).

What does a negative slope mean in regression?

A negative slope (b₁ < 0) means y decreases as x increases — a negative linear relationship. For example, if b₁ = −2.5 in a regression of fuel efficiency (y) on vehicle weight (x), then each additional 1,000 kg of weight is associated with 2.5 fewer km/litre of fuel efficiency. Whether a negative slope is good or bad depends entirely on context.

What is a residual and why does it matter?

A residual is the difference between an observed value and the predicted value: e = y − ŷ. Residuals represent the unexplained part of y — what the model cannot account for. Analysing residuals is how you check regression assumptions: plot residuals vs. fitted values (check linearity and homoscedasticity), create a Q-Q plot (check normality), look for patterns (autocorrelation in time series). Large residuals identify observations that the model fits poorly, which may indicate outliers or influential points.

Can I use regression to extrapolate (predict outside my data range)?

Extrapolation — predicting y for x values outside the range used to build the model — is risky. Linear relationships observed within a data range may not hold outside it. For example, a regression built on advertising spends of $1,000–$10,000 might show a strong positive slope, but the relationship may flatten or reverse at $100,000 (diminishing returns). Extrapolation is sometimes necessary but always uncertain. Flag extrapolations clearly and report wider prediction intervals.

Regression Analysis Explained — How Linear Regression Works

Name: Regression Analysis Explained — Linear Regression From Scratch
Availability: OnlineOnly
Author: CalcMulti Editorial Team

By CalcMulti Editorial Team·Updated: February 2026·10 min read

Linear regression models the relationship between an outcome variable (y) and one or more predictor variables (x). Simple linear regression uses a single predictor; multiple linear regression uses two or more. The goal: find the straight line y = b₀ + b₁x that best describes the data and allows prediction.

Regression is one of the most widely used statistical methods — for predicting sales from advertising spend, estimating salary from years of experience, or modelling the effect of a drug dose on recovery time. Understanding how regression works helps you interpret results correctly and avoid common pitfalls.

Formula

ŷ = b₀ + b₁x where b₁ = Σ(xᵢ−x̄)(yᵢ−ȳ) / Σ(xᵢ−x̄)² and b₀ = ȳ − b₁x̄

What Regression Analysis Does

Regression finds the "line of best fit" through your data — the line that minimises the total squared distance between the observed y values and the predicted ŷ values. This is the Ordinary Least Squares (OLS) criterion.

The slope (b₁) tells you: for each one-unit increase in x, the predicted y changes by b₁ units. If b₁ = 3.5 (hours of study vs exam score), then each additional hour of study is associated with 3.5 more points on the exam.

The intercept (b₀) tells you: the predicted value of y when x = 0. Sometimes the intercept is meaningful (e.g., baseline cost when quantity = 0); sometimes it is not interpretable (e.g., predicted salary when years of experience = 0 might be negative or outside the data range). Do not over-interpret the intercept if x = 0 is outside your data.

Worked Example — Advertising Spend vs Sales

Data: 5 weeks of TV advertising spend (x, $000s) and product sales (y, units): (1, 14), (2, 17), (3, 22), (4, 23), (5, 28).

x̄ = 3, ȳ = 20.8.

Slope b₁ = [(1−3)(14−20.8) + (2−3)(17−20.8) + (3−3)(22−20.8) + (4−3)(23−20.8) + (5−3)(28−20.8)] / [(1−3)² + (2−3)² + (3−3)² + (4−3)² + (5−3)²]

= [(−2)(−6.8) + (−1)(−3.8) + (0)(1.2) + (1)(2.2) + (2)(7.2)] / [4 + 1 + 0 + 1 + 4]

= [13.6 + 3.8 + 0 + 2.2 + 14.4] / 10 = 34 / 10 = 3.4.

Intercept b₀ = ȳ − b₁x̄ = 20.8 − 3.4 × 3 = 20.8 − 10.2 = 10.6.

Regression equation: ŷ = 10.6 + 3.4x.

Interpretation: Each additional $1,000 in TV advertising is associated with 3.4 more units sold. Predicted sales with $3,000 spend: ŷ = 10.6 + 3.4 × 3 = 20.8 units.

Understanding R² (Coefficient of Determination)

R² measures the proportion of variance in y that is explained by x. It ranges from 0 to 1.

R² = 0: the regression line is no better than simply predicting the mean ȳ for all observations.

R² = 1: the regression line perfectly predicts every y value — all points lie exactly on the line.

R² = 0.75 means the model explains 75% of the variance in y. The remaining 25% is unexplained variability ("residual" or "error").

R² is the square of the Pearson correlation coefficient r. For simple linear regression: R² = r².

R² Value	Interpretation	Example Domain
0.90 – 1.00	Excellent fit	Physics, engineering (controlled experiments)
0.70 – 0.89	Good fit	Business, applied science
0.50 – 0.69	Moderate fit	Social sciences, economics
0.30 – 0.49	Weak fit	Behavioural research
< 0.30	Poor fit	Complex human behaviour, noisy data

Regression Assumptions You Must Check

1. Linearity: the relationship between x and y is linear. Check with a scatterplot before running regression. If the relationship is curved, transform variables (e.g., log x) or use polynomial regression.

2. Independence: observations are independent of each other. Time series data violates this — use time series regression methods.

3. Homoscedasticity: the spread of residuals is roughly constant across all values of x (equal variance). Check with a residuals vs. fitted values plot — the spread should be roughly uniform (no fan shape).

4. Normality of residuals: the residuals (y − ŷ) should be approximately normally distributed. Check with a Q-Q plot or histogram of residuals. This assumption matters mainly for small samples — large samples are robust via CLT.

5. No multicollinearity (multiple regression): predictor variables should not be highly correlated with each other. Check using Variance Inflation Factor (VIF > 5 is problematic).

Regression vs Correlation — Key Differences

Correlation (r) measures the strength and direction of a linear relationship — it is symmetric. The correlation between x and y equals the correlation between y and x.

Regression predicts one variable from another — it is asymmetric. The regression of y on x is different from the regression of x on y. Regression requires you to designate a predictor (x) and an outcome (y) based on your research question.

When to use regression: you want to predict y from x, or quantify how much y changes per unit of x (the slope). When to use correlation: you want to know how strongly two variables are associated without implying a directional prediction.

Aspect	Correlation (r)	Regression (b₁)
What it measures	Association strength and direction	Rate of change (slope)
Symmetric?	Yes — r(x,y) = r(y,x)	No — different equations for x→y and y→x
Units	Dimensionless (−1 to +1)	Units of y per unit of x
Purpose	Describe relationship	Predict y from x
Influenced by SD?	No (standardised)	Yes (raw units)

Related Calculators

Linear Regression Calculator

Calculate slope, intercept, and R²

Correlation Explained

Relationship between correlation and regression

Correlation Calculator

Pearson r with significance test

Statistics Formulas Guide

All key regression formulas

Standard Error Calculator

Precision of estimates

Statistics Hub

All statistics calculators & guides

← Back to Statistics Hub

Frequently Asked Questions

Educational use only. Content is based on publicly documented mathematical formulas and reviewed for accuracy by the CalcMulti Editorial Team. Last updated: February 2026.