What is the difference between Bayesian and frequentist statistics?

Frequentists treat probability as long-run frequency; parameters are fixed (unknown) constants estimated from data. Bayesians treat parameters as random variables with prior distributions, updated by data to give posterior distributions. Frequentists produce p-values and confidence intervals; Bayesians produce posterior probabilities and credible intervals. Neither is universally superior — both are widely used.

What is a prior and how do I choose one?

A prior encodes your belief about a hypothesis before seeing evidence. Informative priors use historical data or domain expertise. Non-informative (flat) priors assign equal probability to all values when you have no prior knowledge. Empirical Bayes estimates priors from the data itself. The choice of prior is often the most debated aspect of Bayesian analysis, though its influence diminishes with more data.

Why is Bayes' theorem important in machine learning?

Naive Bayes classifiers, Bayesian networks, Gaussian processes, variational autoencoders, and much of probabilistic machine learning rely on Bayes' theorem. Training a model is literally Bayesian updating: the likelihood is the model fit, the prior is regularization, and the posterior is the trained model. Deep learning's relationship with Bayes is more philosophical, but the theorem remains foundational.

Guide10 min read

Bayes' Theorem: Complete Guide with Examples

Bayes' theorem tells you how to update a probability estimate when you receive new evidence. It is one of the most practically useful results in all of mathematics.

Named after Reverend Thomas Bayes (1701–1761), the theorem was published posthumously in 1763. Today it underpins machine learning, medical diagnostics, spam filtering, search engines, and scientific reasoning.

Formula

P(H|E) = P(E|H) × P(H) / P(E)

P(H) = prior — probability of hypothesis before evidenceP(E|H) = likelihood — how probable the evidence is if H is trueP(E) = marginal likelihood — total probability of the evidenceP(H|E) = posterior — updated probability of H given evidence E

Medical Testing: The Classic Example

Disease X affects 1% of the population. A test has 95% sensitivity (correctly identifies disease 95% of the time) and 90% specificity (correctly rules out disease 90% of the time). You test positive. Should you panic?

P(disease) = 0.01 (prior). P(positive|disease) = 0.95. P(positive|no disease) = 0.10.

P(positive) = P(pos|disease)×P(disease) + P(pos|no disease)×P(no disease) = 0.95×0.01 + 0.10×0.99 = 0.0095 + 0.099 = 0.1085.

P(disease|positive) = 0.0095 / 0.1085 ≈ 8.76%. Despite the positive test, there is only an ~9% chance of disease because the condition is rare.

This counterintuitive result is the base rate fallacy in action. Most positive tests in a low-prevalence population are false positives — why doctors order confirmatory tests.

Spam Filtering: Naive Bayes

Email spam filters use a Bayesian approach. Prior P(spam) ≈ 50% for a typical inbox. P("free"|spam) = 0.80 (80% of spam contains "free"). P("free"|legitimate) = 0.05 (5% of real email contains "free").

After seeing "free": P(spam|"free") = 0.80×0.50 / (0.80×0.50 + 0.05×0.50) = 0.40/0.425 ≈ 94.1%. The email is very likely spam.

"Naive" Bayes treats each word independently: P(spam|word1, word2, …) ∝ P(spam) × P(word1|spam) × P(word2|spam) × … This is an approximation but works remarkably well in practice.

Sequential Bayesian Updating

One of Bayes' greatest powers: the posterior from one observation becomes the prior for the next. Each new piece of evidence refines your estimate.

Example: You're testing whether a coin is fair (H: p=0.5) vs. biased toward heads (H_alt: p=0.7). Start with equal priors: P(fair) = P(biased) = 0.5.

After observing 3 heads in 3 flips: Update each flip using Bayes. After 3 heads: P(biased|3 heads) grows substantially. This is the mathematical basis for scientific hypothesis testing.

Common Mistakes with Bayes' Theorem

Confusing P(A|B) with P(B|A): "The probability of disease given positive test" ≠ "the probability of positive test given disease." This confusion is called the prosecutor's fallacy.

Ignoring the base rate: A test with 99% accuracy on a disease with 0.1% prevalence still has majority false positives. Always incorporate the prior.

Using the wrong reference class: Your prior must match the relevant population. A 45-year-old with symptoms has a different prior than the general 1% population prevalence.

Related Probability Tools

Bayesian Probability CalculatorCompute posterior probabilities Conditional Probability CalculatorP(A|B) fundamentals Bayes' Theorem CalculatorStep-by-step Bayes calculator Probability Rules GuideAll 5 core probability rules All Probability CalculatorsFull probability toolkit

Frequently Asked Questions

← Back to Probability Calculators