Type I vs Type II Error — False Positives vs False Negatives

By CalcMulti Editorial Team··8 min read

In hypothesis testing, two types of errors can occur. A Type I error (false positive) means you reject the null hypothesis when it is actually true — you conclude an effect exists when it does not. A Type II error (false negative) means you fail to reject the null hypothesis when it is actually false — you miss a real effect.

Understanding these errors is essential for designing studies, setting significance thresholds, and interpreting results. The probability of each error type can be controlled, but reducing one tends to increase the other — the fundamental trade-off of statistical inference.

Type I Error (False Positive)
VS
Type II Error (False Negative)

Side-by-Side Comparison

PropertyType I Error (α)Type II Error (β)
Also calledFalse positiveFalse negative
What it meansReject H₀ when H₀ is trueFail to reject H₀ when H₀ is false
Probabilityα (significance level, typically 0.05)β (typically set at 0.10–0.20)
Statistical powerLower α → harder to reject H₀ → lower powerPower = 1 − β (probability of detecting a real effect)
Controlled byChoosing the significance level (α)Sample size and effect size assumptions
Effect of larger nDoes not change (α is fixed by choice)β decreases (power increases) with larger n
Consequence in medicineApproving an ineffective drugMissing an effective treatment
Consequence in testingFailing a passing studentPassing a failing student
Conservative threshold helpsReducing α reduces Type I errorsReducing α increases Type II errors
Preferred balanceDepends on costs of each error in contextHigh power (1−β ≥ 0.80) is convention

Type I Error (False Positive) — Explained

A Type I error occurs when you reject the null hypothesis (H₀) when it is actually true. You conclude there is a real effect, relationship, or difference when in reality there is none — your finding is a false alarm.

Probability: The probability of a Type I error is α, the significance level. At α = 0.05, there is a 5% chance of a false positive when H₀ is true — meaning if you ran 100 tests under true null conditions, about 5 would falsely reach significance.

Real consequences: (1) Medicine — a drug is approved as effective when it is not; patients are exposed to side effects with no benefit. (2) Business — a new marketing strategy is implemented because it appeared to improve sales, but the improvement was noise. (3) Research — a false finding is published, wasting subsequent research effort trying to replicate or build on it.

How to reduce Type I errors: lower α (e.g., use 0.01 instead of 0.05); use pre-registration to prevent p-hacking; correct for multiple comparisons (Bonferroni correction when testing many hypotheses).

Type II Error (False Negative) — Explained

A Type II error occurs when you fail to reject the null hypothesis when it is actually false. A real effect exists, but your test did not detect it — you missed the signal.

Probability: β. By convention, β ≤ 0.20 (80% power) is the standard in most fields. This means: if the true effect size equals your assumption, you have an 80% chance of detecting it. The remaining 20% chance is the Type II error rate.

Real consequences: (1) Medicine — an effective drug is rejected in a trial because the sample was too small to detect the benefit; patients are denied a helpful treatment. (2) Safety testing — a product defect is not detected because the inspection sample was too small. (3) Research — a real phenomenon is dismissed as "no effect found," suppressing a valid line of inquiry.

How to reduce Type II errors: increase sample size (the most direct method); increase α (accept more false positives); choose a more sensitive measurement instrument (reduce noise); use a one-tailed test if the direction is pre-specified.

The Alpha–Beta Trade-Off

For a fixed sample size, there is a direct trade-off between α (Type I error rate) and β (Type II error rate). Making the significance threshold stricter (lower α) means it takes stronger evidence to reject H₀ — but this also means more real effects will fail to reach significance (higher β).

Example: A drug trial uses α = 0.05. The drug has a small but real effect. With n = 50, power = 65% — a 35% Type II error rate. If you tighten to α = 0.01, power drops to 45% — now 55% of real effects are missed. The only way to maintain power while tightening α is to increase sample size.

The right balance depends on the relative costs. In drug safety trials, a false positive (approving a harmful drug) is catastrophic — use very small α (0.01 or 0.001). In preliminary screening for promising research leads, missing a real effect (high β) is costly — accept α = 0.10 with awareness that further confirmation is needed.

Statistical Power = 1 − β

Statistical power is the probability of correctly rejecting H₀ when it is false — detecting a real effect. Power = 1 − β. A power of 0.80 means an 80% chance of getting a significant result when the true effect equals your assumed value.

Factors that increase power: (1) Larger sample size — the most controllable factor. (2) Larger true effect size — you cannot control reality, but you can focus on populations where effects are large. (3) Higher α — accepting more false positives buys more power. (4) Lower measurement noise (smaller σ) — better instruments or more controlled conditions. (5) One-tailed instead of two-tailed test — if you have strong directional priors.

Power analysis: Before running a study, choose your target power (usually 0.80), specify the minimum effect size you want to detect, and set α. The required sample size follows from these inputs. Under-powered studies waste resources — they cannot reliably detect effects even when they exist. An underpowered non-significant result is uninformative.

Power (1−β)β (Type II error rate)Interpretation
0.990.01Excellent — 99% chance of detecting the effect
0.900.10Very good — used in high-stakes research
0.800.20Conventional standard — adequate for most research
0.700.30Low — misses 30% of real effects
0.500.50Coin flip — underpowered study
< 0.50> 0.50Poorly powered — likely to miss real effects

Multiple Comparisons and Type I Error Inflation

When you run multiple hypothesis tests simultaneously, the probability of at least one Type I error grows rapidly. If you run 20 tests at α = 0.05, and all null hypotheses are true, you expect 1 false positive on average — and there is a 64% chance of at least one. This is the multiple comparisons problem.

Bonferroni correction: divide α by the number of tests. For 20 tests at α = 0.05, use α* = 0.05/20 = 0.0025 for each test. This controls the family-wise error rate (FWER) — the probability of any false positive — at 0.05.

False Discovery Rate (FDR) correction (Benjamini-Hochberg): controls the expected proportion of rejected hypotheses that are false positives. Less conservative than Bonferroni — used when some discoveries are expected (e.g., genomics, where many genes are being tested simultaneously and some real associations are expected).

Summary

Type I error (false positive, α) and Type II error (false negative, β) represent opposite failure modes of hypothesis testing. The right balance depends on the relative cost of each error in your specific context.

  • Type I error rate (α) is directly set by the significance threshold — choose α based on the cost of false positives
  • Type II error rate (β) is controlled through sample size — larger n gives more power (lower β)
  • The 80% power / α = 0.05 convention is a reasonable default, but adjust based on context
  • Multiple comparisons inflate Type I error — use Bonferroni or FDR correction when testing many hypotheses
  • When the cost of missing a real effect is very high (medicine, safety), prioritise high power (small β)

Frequently Asked Questions

Educational use only. Content is based on publicly documented mathematical formulas and reviewed for accuracy by the CalcMulti Editorial Team. Last updated: February 2026.