Question 1

What does the slope tell you in linear regression?

Accepted Answer

The slope (m) tells you how much Y changes for each one-unit increase in X. m = 2.5 means Y increases by 2.5 for every 1-unit increase in X. A negative slope means Y decreases as X increases. m = 0 means X has no linear effect on Y. The slope is the core interpretation of any linear regression — always explain it in the units of your data.

Question 2

What is R² and how do I interpret it?

Accepted Answer

R² (coefficient of determination) measures how well the regression line fits the data. R² = 0.85 means 85% of the variation in Y is explained by the linear relationship with X. The remaining 15% is due to other factors or random variation. R² ranges from 0 (no fit) to 1 (perfect fit). A "good" R² depends on context: R² = 0.9 is expected in physics; R² = 0.3 may be impressive in social sciences.

Question 3

What is the difference between correlation and regression?

Accepted Answer

Correlation (r) measures the strength and direction of the linear relationship — it is symmetric (swapping X and Y gives the same r). Regression produces an equation to predict Y from X — it is not symmetric (regressing Y on X gives a different equation than X on Y). Regression also quantifies the magnitude of the effect (slope), while correlation only gives direction and strength.

Question 4

What are residuals in linear regression?

Accepted Answer

A residual is the difference between the observed Y and the predicted Y (ŷ = mx + b): residual = y − ŷ. The least squares method minimises the sum of squared residuals (Σ(y − ŷ)²). Large residuals indicate data points that the model fits poorly. Plotting residuals vs X is a key diagnostic — residuals should be randomly scattered around zero with no pattern.

Question 5

When does linear regression not apply?

Accepted Answer

Linear regression assumes: (1) linear relationship between X and Y, (2) independence of observations, (3) constant variance of residuals (homoscedasticity), (4) normally distributed residuals. It fails when: the relationship is curved (use polynomial or log regression), when there are influential outliers, or when X values are not measured without error. Always check a scatter plot before applying linear regression.

Question 6

How do I use the regression equation for prediction?

Accepted Answer

Substitute your X value into y = mx + b. Example: if m = 0.5, b = 10, then for X = 20: y = 0.5 × 20 + 10 = 20. Important: only predict within the range of your original X data (interpolation). Predicting far outside that range (extrapolation) is unreliable because the linear relationship may not hold.

R² range	Model fit	Typical field
0.90 – 1.00	Excellent	Physics, engineering, hard sciences
0.70 – 0.90	Good	Finance, chemistry, controlled experiments
0.50 – 0.70	Moderate	Biology, economics, education research
0.30 – 0.50	Fair	Psychology, sociology, field studies
0.00 – 0.30	Weak	Complex human behaviour, noisy systems

Situation	Simple Linear	Logistic Regression	Polynomial / Non-linear
Continuous Y, linear scatter plot	✓ Preferred	—	—
Binary outcome (yes/no, 0/1)	—	✓	—
Curved scatter plot, R² < 0.4	—	—	✓
Multiple predictor variables	✓ (multiple regression)	✓	—
Residuals show curved pattern	Switch	—	✓
Predict category membership	—	✓	—

Linear Regression Calculator

Formula

R² Goodness-of-Fit Reference

Common Mistakes

Linear Regression vs Alternatives — Which Model to Use?

Case Study: Predicting Weekly Sales for a New Grocery Store Location

Related Statistics Tools

Related Calculators

Disclaimer

Frequently Asked Questions