Correlation Explained — What It Means and How to Interpret It
By CalcMulti Editorial Team··8 min read
Correlation measures the strength and direction of the linear relationship between two variables. The most commonly used measure is Pearson's correlation coefficient (r), which ranges from −1 to +1. A value of +1 means a perfect positive linear relationship; −1 means a perfect negative linear relationship; 0 means no linear relationship.
Understanding correlation is one of the most important and most misused skills in data analysis. This guide explains what correlation measures, how to calculate it, how to interpret r values, and — critically — why correlation does not imply causation.
Formula
r = Σ[(xᵢ − x̄)(yᵢ − ȳ)] / [(n−1) · sₓ · sᵧ]
What Correlation Actually Measures
Pearson r measures how well the relationship between two variables can be described by a straight line. When both variables tend to increase together, r is positive. When one tends to increase as the other decreases, r is negative. When there is no linear pattern, r is near 0.
Importantly, r measures only linear relationships. Two variables can have a strong curved (non-linear) relationship and still produce r ≈ 0. For example, the relationship between stress and performance forms an inverted-U curve (Yerkes-Dodson law) — at very low and very high stress, performance is poor; at moderate stress, performance peaks. Pearson r would be near 0 for this data even though the relationship is strong and predictable.
r is also dimensionless — it does not change if you rescale or shift the variables. The correlation between height in cm and weight in kg is the same as the correlation between height in inches and weight in pounds.
How to Interpret Correlation Coefficients
The sign of r indicates direction: positive r means variables increase together; negative r means one increases as the other decreases.
The magnitude of |r| indicates strength. Common rule of thumb (varies by field):
| |r| Range | Strength | Real-World Example |
|---|---|---|
| 0.00 – 0.19 | Negligible / no relationship | Shoe size and intelligence |
| 0.20 – 0.39 | Weak | Hours of TV watched and GPA (r ≈ −0.25) |
| 0.40 – 0.59 | Moderate | Sleep duration and productivity (r ≈ 0.45) |
| 0.60 – 0.79 | Strong | Height and weight (r ≈ 0.70) |
| 0.80 – 1.00 | Very strong | Air temperature and ice cream sales (r ≈ 0.85) |
Worked Example — Study Hours vs Exam Score
Five students — study hours (x): 2, 4, 6, 8, 10; exam scores (y): 55, 65, 70, 80, 90.
x̄ = 6, ȳ = 72. sₓ = 3.16, sᵧ = 13.5.
Deviations: (2−6)(55−72) = (−4)(−17) = 68 | (4−6)(65−72) = (−2)(−7) = 14 | (6−6)(70−72) = (0)(−2) = 0 | (8−6)(80−72) = (2)(8) = 16 | (10−6)(90−72) = (4)(18) = 72.
Sum of cross-products = 68 + 14 + 0 + 16 + 72 = 170.
r = 170 / [(5−1) × 3.16 × 13.5] = 170 / [4 × 42.66] = 170 / 170.64 ≈ 0.996.
Interpretation: r = 0.996 indicates a near-perfect positive linear relationship between study hours and exam score. As study hours increase by 1, exam score increases by approximately 4.3 points on average.
Types of Correlation Coefficients
Pearson r is the most common but is not always appropriate. Here are the main correlation measures and when to use each.
| Correlation Type | Use When | Data Type | Key Property |
|---|---|---|---|
| Pearson r | Both variables are continuous and approximately linear | Continuous | Sensitive to outliers; assumes normality |
| Spearman's ρ (rho) | Ordinal data or non-linear but monotonic relationship | Ordinal or continuous | Based on ranks; robust to outliers |
| Kendall's τ (tau) | Small sample or many tied ranks | Ordinal | More conservative than Spearman; better for small n |
| Point-biserial | One binary variable, one continuous | Binary + continuous | Special case of Pearson r |
| Phi coefficient | Both variables are binary (0/1) | Binary | Special case of Pearson r for 2×2 tables |
Common Mistakes in Correlation Analysis
1. Confusing correlation with causation. r = 0.85 between ice cream sales and drowning deaths is not because ice cream causes drowning — both are caused by hot weather (a confounding variable). Always consider alternative explanations before inferring causation from correlation.
2. Ignoring non-linear relationships. r = 0 does not mean no relationship — it means no linear relationship. Always plot your data (scatterplot) before reporting r. A curved or U-shaped pattern can have r ≈ 0.
3. Mistaking statistical significance for practical significance. With n = 1,000, a correlation of r = 0.10 is statistically significant (p < 0.001) but explains only 1% of the variance (R² = 0.01). Always report R² alongside r for practical context.
4. Correlation restricted range. If you only study students with high GPA, the correlation between study hours and GPA will be smaller (possibly near 0) than in the full student population. Restricting the range of a variable attenuates the observed correlation.
5. Assuming linearity without checking. Pearson r assumes the underlying relationship is linear. Use a scatterplot to confirm linearity before computing r.
Related Calculators
Calculate Pearson r with significance test
Correlation vs CausationWhy correlation does not imply causation
Regression Analysis ExplainedUsing correlation to predict outcomes
Linear Regression CalculatorCalculate slope, intercept, and R²
Scatter Plot GuideStatistics formulas reference
Statistics HubAll statistics calculators & guides
Frequently Asked Questions
Educational use only. Content is based on publicly documented mathematical formulas and reviewed for accuracy by the CalcMulti Editorial Team. Last updated: February 2026.