Question 1

Can correlation ever prove causation?

Accepted Answer

Correlation alone cannot prove causation. However, a very strong, consistent, temporally ordered correlation with a plausible mechanism (and no obvious confounders) constitutes strong evidence for causation. The classic example: smoking and lung cancer showed r ≈ 0.9+ across dozens of studies, a dose-response relationship, biological plausibility, and temporal precedence (smoking before cancer). While not technically proof from correlation alone, the weight of evidence was overwhelming. Proof comes from the totality of evidence, not any single correlation.

Question 2

What is reverse causality and how do you detect it?

Accepted Answer

Reverse causality (reverse causation) means Y causes X, not X causes Y. Example: studies find that students who use tutoring services have lower grades. Does tutoring cause lower grades? No — students seek tutoring because they already have low grades. To detect reverse causality: (1) check temporal order — does X reliably precede Y? (2) use panel data to see if changes in X predict later changes in Y, (3) use instruments or natural experiments where the direction of causation is clear by design.

Question 3

What is a spurious correlation?

Accepted Answer

A spurious correlation is a statistically non-zero correlation between two variables that have no direct or indirect causal connection. It arises from: (1) a common cause (confounder) that drives both variables, (2) coincidental co-trends (both variables trended up over time for independent reasons), (3) data mining (searching thousands of pairs until one crosses p < 0.05 by chance). Spurious correlations are real statistically — they are just causally meaningless.

Question 4

Is correlation useful if it does not prove causation?

Accepted Answer

Absolutely. Correlation is useful for: (1) prediction — you can use X to predict Y even without causation (if someone tells me their shoe size, I can predict their age, even though shoe size doesn't cause aging in adults), (2) hypothesis generation — a correlation is a clue worth investigating causally, (3) screening — identifying variables worth studying further, (4) control — if correlation is strong enough and consistent, it can guide decisions even without full causal understanding (e.g., using historical correlations in financial models).

Question 5

How does randomisation eliminate confounding?

Accepted Answer

In a randomised controlled trial, participants are assigned to treatment or control groups purely by chance. Because assignment is random, both groups have the same average value of every possible confounding variable — measured or unmeasured. Any systematic difference in outcomes between the groups can therefore only be due to the treatment, not any third variable. This is why the RCT is the gold standard for establishing causation: it eliminates all confounders simultaneously, including ones researchers did not think to measure.

Property	Correlation	Causation
Definition	Two variables tend to vary together (positively or negatively)	One variable directly produces a change in another
Symmetry	Symmetric: r(X,Y) = r(Y,X)	Directional: X→Y is different from Y→X
How detected	Statistical analysis of observational data	Randomised experiments, causal analysis
Does zero r exclude causation?	Not completely — non-linear causal effects can give r≈0	No causation → no (linear) correlation, but not vice versa
Third variable problem	Cannot distinguish direct effect from shared cause	Requires controlling for all confounders
Strength needed	Any non-zero r indicates correlation	Requires: precedence, correlation, and elimination of alternatives
Common mistake	Treating a correlational finding as proof of causation	Ignoring that the cause might be reversed (reverse causality)
Research design needed	Cross-sectional or longitudinal observation	Randomised controlled trial (gold standard)
Can be established with observational data?	Yes — easily	Very difficult; requires causal inference methods
Example	Shoe size and reading ability correlate in children	Learning to read increases vocabulary

Correlation vs Causation — Why the Difference Matters

Side-by-Side Comparison

Three Explanations for Any Non-Zero Correlation

Spurious Correlations — Real Examples

How to Establish Causation

Confounding Variables — The Biggest Culprit

Summary

Related Calculators

Frequently Asked Questions