Linear Regression Calculator
Reviewed by CalcMulti Editorial Team·Last updated: ·← Statistics Hub
Linear regression finds the best-fit straight line through a set of data points by minimising the sum of squared residuals (least squares method). The result is an equation y = mx + b that lets you describe the relationship and make predictions.
This calculator computes the slope (m), y-intercept (b), correlation coefficient (r), R², and allows you to predict Y for any X value.
Formula
y = mx + b m = Σ(x−x̄)(y−ȳ) / Σ(x−x̄)² b = ȳ − m·x̄
- m
- slope — change in y per unit increase in x
- b
- y-intercept — value of y when x = 0
- x̄, ȳ
- means of X and Y datasets
- R²
- coefficient of determination — proportion of variance explained
Enter paired X and Y values (comma, space, or newline separated). Must have the same count.
R² Goodness-of-Fit Reference
| R² range | Model fit | Typical field |
|---|---|---|
| 0.90 – 1.00 | Excellent | Physics, engineering, hard sciences |
| 0.70 – 0.90 | Good | Finance, chemistry, controlled experiments |
| 0.50 – 0.70 | Moderate | Biology, economics, education research |
| 0.30 – 0.50 | Fair | Psychology, sociology, field studies |
| 0.00 – 0.30 | Weak | Complex human behaviour, noisy systems |
Common Mistakes
Extrapolating beyond the data range
The regression equation is only reliable within the range of observed X values. Predictions far outside that range assume the linear trend continues — which may not be true. Always state the valid prediction range.
Ignoring non-linearity
Linear regression assumes a straight-line relationship. A curved scatter plot with r ≈ 0 could still have a strong quadratic pattern. Always plot the data first. If the residuals show a curved pattern, consider polynomial or log regression.
Confusing r with R²
r = 0.7 sounds impressive. R² = 0.49 means the model explains 49% of variance — the majority is unexplained. Always report R², not just r, when describing model performance.
Linear Regression vs Alternatives — Which Model to Use?
| Situation | Simple Linear | Logistic Regression | Polynomial / Non-linear |
|---|---|---|---|
| Continuous Y, linear scatter plot | ✓ Preferred | — | — |
| Binary outcome (yes/no, 0/1) | — | ✓ | — |
| Curved scatter plot, R² < 0.4 | — | — | ✓ |
| Multiple predictor variables | ✓ (multiple regression) | ✓ | — |
| Residuals show curved pattern | Switch | — | ✓ |
| Predict category membership | — | ✓ | — |
Case Study: Predicting Weekly Sales for a New Grocery Store Location
An operations analyst at a grocery chain used linear regression across 18 existing stores to model the relationship between store floor area (X, in sq ft) and weekly sales (Y, in USD). The result: y = 2.34x + 8,400, with R² = 0.79 — a strong fit, meaning floor area explained 79% of the variance in weekly sales across the sample.
When a prospective 4,200 sq ft location came up for approval, the model predicted weekly sales of 2.34 × 4,200 + 8,400 = $18,228. The site was within the observed data range (2,800–5,100 sq ft), so the interpolation was reliable. The team approved the lease.
After 8 weeks of operation, actual average weekly sales were $17,890 — within 1.9% of the model prediction. The analyst noted one residual outlier in the training data (a store with unusually high footfall due to a co-located pharmacy) and flagged it as a future predictor variable for model improvement.
Related Calculators
Pearson r — strength of linear relationship
Normal Distribution CalculatorDistribution of regression residuals
Mean CalculatorMeans used in regression formula
Variance CalculatorVariance of X and Y datasets
Standard Error CalculatorSE of the regression slope
Statistics HubAll statistics calculators
Disclaimer
This calculator computes simple (single-variable) linear regression. Results assume a linear relationship between X and Y. Always inspect your data visually and check residuals before drawing conclusions.