Linear Regression Calculator

Reviewed by CalcMulti Editorial Team·Last updated: ·Statistics Hub

Linear regression finds the best-fit straight line through a set of data points by minimising the sum of squared residuals (least squares method). The result is an equation y = mx + b that lets you describe the relationship and make predictions.

This calculator computes the slope (m), y-intercept (b), correlation coefficient (r), R², and allows you to predict Y for any X value.

Formula

y = mx + b m = Σ(x−x̄)(y−ȳ) / Σ(x−x̄)² b = ȳ − m·x̄

m
slope — change in y per unit increase in x
b
y-intercept — value of y when x = 0
x̄, ȳ
means of X and Y datasets
coefficient of determination — proportion of variance explained

Enter paired X and Y values (comma, space, or newline separated). Must have the same count.

R² Goodness-of-Fit Reference

R² rangeModel fitTypical field
0.90 – 1.00ExcellentPhysics, engineering, hard sciences
0.70 – 0.90GoodFinance, chemistry, controlled experiments
0.50 – 0.70ModerateBiology, economics, education research
0.30 – 0.50FairPsychology, sociology, field studies
0.00 – 0.30WeakComplex human behaviour, noisy systems

Common Mistakes

Extrapolating beyond the data range

The regression equation is only reliable within the range of observed X values. Predictions far outside that range assume the linear trend continues — which may not be true. Always state the valid prediction range.

Ignoring non-linearity

Linear regression assumes a straight-line relationship. A curved scatter plot with r ≈ 0 could still have a strong quadratic pattern. Always plot the data first. If the residuals show a curved pattern, consider polynomial or log regression.

Confusing r with R²

r = 0.7 sounds impressive. R² = 0.49 means the model explains 49% of variance — the majority is unexplained. Always report R², not just r, when describing model performance.

Linear Regression vs Alternatives — Which Model to Use?

SituationSimple LinearLogistic RegressionPolynomial / Non-linear
Continuous Y, linear scatter plot✓ Preferred
Binary outcome (yes/no, 0/1)
Curved scatter plot, R² < 0.4
Multiple predictor variables✓ (multiple regression)
Residuals show curved patternSwitch
Predict category membership

Case Study: Predicting Weekly Sales for a New Grocery Store Location

An operations analyst at a grocery chain used linear regression across 18 existing stores to model the relationship between store floor area (X, in sq ft) and weekly sales (Y, in USD). The result: y = 2.34x + 8,400, with R² = 0.79 — a strong fit, meaning floor area explained 79% of the variance in weekly sales across the sample.

When a prospective 4,200 sq ft location came up for approval, the model predicted weekly sales of 2.34 × 4,200 + 8,400 = $18,228. The site was within the observed data range (2,800–5,100 sq ft), so the interpolation was reliable. The team approved the lease.

After 8 weeks of operation, actual average weekly sales were $17,890 — within 1.9% of the model prediction. The analyst noted one residual outlier in the training data (a store with unusually high footfall due to a co-located pharmacy) and flagged it as a future predictor variable for model improvement.

Disclaimer

This calculator computes simple (single-variable) linear regression. Results assume a linear relationship between X and Y. Always inspect your data visually and check residuals before drawing conclusions.

Frequently Asked Questions