Guide10 min read

Bayes' Theorem: Complete Guide with Examples

Bayes' theorem tells you how to update a probability estimate when you receive new evidence. It is one of the most practically useful results in all of mathematics.

Named after Reverend Thomas Bayes (1701–1761), the theorem was published posthumously in 1763. Today it underpins machine learning, medical diagnostics, spam filtering, search engines, and scientific reasoning.

Formula

P(H|E) = P(E|H) × P(H) / P(E)

P(H) = prior — probability of hypothesis before evidenceP(E|H) = likelihood — how probable the evidence is if H is trueP(E) = marginal likelihood — total probability of the evidenceP(H|E) = posterior — updated probability of H given evidence E

Medical Testing: The Classic Example

Disease X affects 1% of the population. A test has 95% sensitivity (correctly identifies disease 95% of the time) and 90% specificity (correctly rules out disease 90% of the time). You test positive. Should you panic?

P(disease) = 0.01 (prior). P(positive|disease) = 0.95. P(positive|no disease) = 0.10.

P(positive) = P(pos|disease)×P(disease) + P(pos|no disease)×P(no disease) = 0.95×0.01 + 0.10×0.99 = 0.0095 + 0.099 = 0.1085.

P(disease|positive) = 0.0095 / 0.1085 ≈ 8.76%. Despite the positive test, there is only an ~9% chance of disease because the condition is rare.

This counterintuitive result is the base rate fallacy in action. Most positive tests in a low-prevalence population are false positives — why doctors order confirmatory tests.

Spam Filtering: Naive Bayes

Email spam filters use a Bayesian approach. Prior P(spam) ≈ 50% for a typical inbox. P("free"|spam) = 0.80 (80% of spam contains "free"). P("free"|legitimate) = 0.05 (5% of real email contains "free").

After seeing "free": P(spam|"free") = 0.80×0.50 / (0.80×0.50 + 0.05×0.50) = 0.40/0.425 ≈ 94.1%. The email is very likely spam.

"Naive" Bayes treats each word independently: P(spam|word1, word2, …) ∝ P(spam) × P(word1|spam) × P(word2|spam) × … This is an approximation but works remarkably well in practice.

Sequential Bayesian Updating

One of Bayes' greatest powers: the posterior from one observation becomes the prior for the next. Each new piece of evidence refines your estimate.

Example: You're testing whether a coin is fair (H: p=0.5) vs. biased toward heads (H_alt: p=0.7). Start with equal priors: P(fair) = P(biased) = 0.5.

After observing 3 heads in 3 flips: Update each flip using Bayes. After 3 heads: P(biased|3 heads) grows substantially. This is the mathematical basis for scientific hypothesis testing.

Common Mistakes with Bayes' Theorem

Confusing P(A|B) with P(B|A): "The probability of disease given positive test" ≠ "the probability of positive test given disease." This confusion is called the prosecutor's fallacy.

Ignoring the base rate: A test with 99% accuracy on a disease with 0.1% prevalence still has majority false positives. Always incorporate the prior.

Using the wrong reference class: Your prior must match the relevant population. A 45-year-old with symptoms has a different prior than the general 1% population prevalence.

Frequently Asked Questions