Which cognitive bias findings are most robustly replicated?

The planning fallacy, anchoring effects, loss aversion, and framing effects have strong replication records. Reference class forecasting as a planning fallacy intervention has been replicated in applied settings. The availability heuristic's basic mechanism replicates well; specific quantitative claims about its effects are more variable.

What happened to ego depletion research?

Roy Baumeister's ego depletion effect—the idea that willpower is a limited resource that depletes with use—has faced serious replication challenges since 2016. A large multi-lab replication found minimal effects. Some researchers argue the original effects were real but smaller and more context-dependent than originally claimed. This does not directly affect the planning bias literature, but it is a useful reminder that even well-known findings require scrutiny.

Is the Dunning-Kruger effect real?

The basic finding—that people with less competence in a domain tend to overestimate their performance more than competent people do—is real but often overstated. Some of the original effect is a statistical artifact of regression to the mean. The more modest, accurate claim is that self-assessment of competence in unfamiliar domains is unreliable. This still has practical implications but is less dramatic than the popular framing.

The Research on Cognitive Bias in Planning: What the Evidence Actually Shows

The cognitive bias literature is one of the most widely cited bodies of research in popular productivity and business writing. It is also one of the most misrepresented.

Some findings are highly robust, replicated across cultures and methodologies, and have direct implications for how planners should work. Others are contested, have failed to replicate at scale, or have been overstated beyond what the original data supports.

This research digest covers the key studies and researchers behind the ten biases most relevant to planning, with an honest account of what the evidence actually supports.

Kahneman and Tversky: The Foundation

The modern cognitive bias literature traces to a series of papers by Daniel Kahneman and Amos Tversky published between 1971 and 1986. Their program of research—later termed “Judgment Under Uncertainty”—documented systematic, predictable errors in human probabilistic reasoning.

Their 1974 Science paper, “Judgment Under Uncertainty: Heuristics and Biases,” introduced three heuristics—representativeness, availability, and anchoring—that produce reliable judgment errors. This was genuinely novel: previous economic models assumed rational agents who processed probabilities accurately. Kahneman and Tversky demonstrated that actual human judgment deviated from rational predictions in consistent, mappable ways.

Their 1979 Econometrica paper, “Prospect Theory: An Analysis of Decision Under Risk,” proposed a formal model of how people actually make decisions under uncertainty. Key finding: losses are psychologically weighted roughly twice as heavily as equivalent gains. This asymmetry—loss aversion—explains a range of planning behaviors, including the sunk cost fallacy and status quo bias.

Kahneman’s 2011 book, Thinking, Fast and Slow, synthesized this work and introduced the System 1 / System 2 framework to a general audience. The book is accessible and reliable; it accurately represents the research rather than oversimplifying it, and Kahneman is careful to flag contested findings.

Replication status: The core Kahneman-Tversky findings—framing effects, loss aversion, anchoring, the planning fallacy—have strong replication records. The specific magnitude of effects varies across studies and populations, but the directional findings are robust.

The Planning Fallacy: Kahneman, Lovallo, and Flyvbjerg

The planning fallacy was introduced by Kahneman and Tversky in a 1979 paper and developed more extensively by Kahneman and Dan Lovallo in a 1993 Journal of Business paper. Their analysis identified the “inside view” versus “outside view” distinction: planners who focus on the specific features of their plan consistently underestimate while planners who use base rates from comparable past projects are more accurate.

Bent Flyvbjerg extended this work into applied settings in a series of papers from the early 2000s. His research on 258 large infrastructure projects found cost overruns in the majority of projects studied, with 9 in 10 projects experiencing overruns. His 2002 paper in the Journal of the American Planning Association is the most-cited applied work on planning fallacy.

Reference class forecasting—Flyvbjerg’s proposed remedy—has been validated in multiple subsequent studies and has been adopted by government agencies in the UK, Denmark, and Australia for infrastructure project appraisal. This applied validation is an important signal: findings that have been tested in high-stakes real-world contexts with outcome data are more credible than findings from laboratory tasks alone.

Replication status: High. The planning fallacy is among the most robustly replicated findings in the behavioral economics literature. Flyvbjerg’s applied research has been replicated in multiple geographic contexts and project types.

Confirmation Bias: Wason and Subsequent Work

Peter Wason’s 1960 paper demonstrating that people test hypotheses by seeking confirming rather than disconfirming evidence—the “2-4-6 task”—established the basic confirmation bias finding. The effect has been replicated extensively.

However, the mechanism is more complex than the popular “we filter information to confirm beliefs” framing suggests. Jonathan Evans and colleagues have argued that confirmation bias in many settings is better understood as a “positive test strategy”—a tendency to test the most available hypothesis by looking for cases where it should be true—rather than as motivated reasoning in service of a conclusion you are attached to.

The distinction matters practically: motivated reasoning (defending a cherished belief) requires emotional investment; positive test strategy can operate even without any particular attachment to the outcome. This means confirmation bias can affect planning even when you are not emotionally committed to your plan—it is a default cognitive strategy, not just a defensive response.

Replication status: High for the basic effect. The specific mechanism remains debated in the research literature, but the practical implication—that people need structured adversarial processes to reliably surface disconfirming evidence—is supported across accounts.

Hindsight Bias: Fischhoff’s Foundational Work

Baruch Fischhoff’s 1975 paper, “Hindsight ≠ Foresight: The Effect of Outcome Knowledge on Judgment Under Uncertainty,” established that knowing an outcome causes people to overestimate the probability they would have assigned to it beforehand. He called this “creeping determinism”—outcomes feel inevitable in retrospect.

Fischhoff’s subsequent work on debiasing is equally important and more sobering. His 1982 paper in Judgment and Decision Making analyzed debiasing attempts and found that simply informing people about hindsight bias reduced but did not eliminate the effect. This is the foundation for the argument that awareness is an insufficient debiasing tool.

Philip Tetlock’s Superforecasting (2015), co-authored with Dan Gardner, is the most comprehensive treatment of the practical implications of hindsight bias for forecasting. Tetlock’s Good Judgment Project—which tracked thousands of forecasters making specific predictions over years—found that the best forecasters kept detailed records of their predictions and reviewed their accuracy systematically rather than relying on memory. Written prediction records are the only reliable defense against hindsight bias distorting your sense of your own track record.

Replication status: High. Hindsight bias replicates consistently across cultures and domains.

Overconfidence and the Dunning-Kruger Effect

The overconfidence literature is large and well-replicated at the general level—people consistently assign confidence intervals that are too narrow, meaning outcomes fall outside their stated confidence intervals more often than they should.

The Dunning-Kruger effect specifically—Kruger and Dunning’s 1999 paper “Unskilled and Unaware of It”—is more complicated. The paper found that people in the bottom quartile of performance on logic and grammar tasks significantly overestimated their performance, while people in the top quartile underestimated it slightly.

A 2020 reanalysis by Nuhfer and colleagues, and subsequent statistical critiques, found that a portion of the Dunning-Kruger pattern is a mathematical artifact: the correlation pattern Kruger and Dunning identified partially reflects regression to the mean rather than a genuine psychological effect. This does not mean the effect is fake, but it is smaller and more statistically subtle than popular accounts suggest.

What remains well-supported is the more general finding: self-assessment of competence in domains where you lack expertise is unreliable. This is a meaningful practical claim even if it does not justify the dramatic “peak of Mount Stupid” infographic that the effect is typically illustrated with.

Replication status: Moderate. The general overconfidence finding is robust. The specific Dunning-Kruger pattern is real but partly artifactual; the effect is smaller and more qualified than popular representations.

Sunk Cost and Loss Aversion

The sunk cost fallacy rests on loss aversion, which rests on Kahneman and Tversky’s prospect theory. Loss aversion is among the most replicated findings in behavioral economics.

However, David Gal and Derek Rucker published a 2018 paper arguing that loss aversion is often better explained by inertia—a preference for the current option over change—rather than by asymmetric weighting of losses and gains per se. Their argument does not undermine the practical finding that people continue bad plans longer than is rational; it proposes a different causal mechanism.

For planners, the mechanism is less important than the intervention: pre-committing to decision criteria before you are invested in a plan’s continuation is effective regardless of whether the underlying driver is loss aversion or inertia.

Replication status: High for loss aversion as a phenomenon. The specific weighting function from prospect theory (losses weighted 2x gains) is an approximation that varies across individuals and contexts.

Present Bias and Hyperbolic Discounting

Present bias—the tendency to overweight immediate rewards relative to future ones—has a robust experimental literature. Hyperbolic discounting, the formal model behind present bias, was developed by George Ainslie and Richard Herrnstein and has been replicated across species as well as human populations.

Richard Thaler and Shlomo Benartzi’s research on savings behavior demonstrates the severity of present bias in high-stakes real decisions: people systematically under-save even when they agree that saving matters, because immediate consumption feels more valuable than future security.

The planning implication is direct: any plan that depends on sustained attention to a long-term goal—without structural mechanisms to protect that attention from short-term urgency—is vulnerable to present bias. The intervention is not willpower but structural protection of strategic time.

Replication status: High. Hyperbolic discounting is one of the most replicated findings in behavioral economics.

What to Take Away From the Literature

A few summary points for planners reading this research:

The directional findings are more reliable than the specific magnitudes. When a study reports that the planning fallacy causes 50% cost overruns, the number is less reliable than the directional claim that planners systematically underestimate. Treat quantitative claims from single studies with skepticism.

Applied validation matters. Reference class forecasting has been validated in real infrastructure projects by government agencies. This is stronger evidence than laboratory studies alone. When a debiasing technique has been tested in high-stakes real-world contexts, that is significant.

Replication crises have affected some adjacent areas. Ego depletion—Roy Baumeister’s finding that willpower is a limited resource—has faced serious replication challenges since 2016. This finding is widely cited in productivity writing. The planning bias findings have held up better, but the general lesson is that popular representations of psychological research often overstate certainty.

Structural interventions outperform awareness interventions. This is consistent across the debiasing literature. The evidence for reference class forecasting and pre-mortems is substantially stronger than the evidence for “think more carefully about your biases.”

Dig deeper: The most accessible primary source is Kahneman’s Thinking, Fast and Slow, which accurately represents the research and flags contested findings. Flyvbjerg’s academic papers are freely available and directly applicable to project planning. Tetlock’s Superforecasting is the best treatment of calibration and prediction accuracy.

Tags: research-digest, cognitive-bias, kahneman, behavioral-economics, planning-fallacy

Frequently Asked Questions

Which cognitive bias findings are most robustly replicated?

The planning fallacy, anchoring effects, loss aversion, and framing effects have strong replication records. Reference class forecasting as a planning fallacy intervention has been replicated in applied settings. The availability heuristic's basic mechanism replicates well; specific quantitative claims about its effects are more variable.
What happened to ego depletion research?

Roy Baumeister's ego depletion effect—the idea that willpower is a limited resource that depletes with use—has faced serious replication challenges since 2016. A large multi-lab replication found minimal effects. Some researchers argue the original effects were real but smaller and more context-dependent than originally claimed. This does not directly affect the planning bias literature, but it is a useful reminder that even well-known findings require scrutiny.
Is the Dunning-Kruger effect real?

The basic finding—that people with less competence in a domain tend to overestimate their performance more than competent people do—is real but often overstated. Some of the original effect is a statistical artifact of regression to the mean. The more modest, accurate claim is that self-assessment of competence in unfamiliar domains is unreliable. This still has practical implications but is less dramatic than the popular framing.