In 1974, the Soviet Union set production quotas for nails measured in tons.
Factories responded rationally. They produced a small number of enormous, completely useless nails. Problem solved. Quota met.
Moscow reconsidered and switched to quotas measured in number of nails.
Factories responded rationally. They produced millions of tiny, flimsy nails—too small to hold anything together.
The Soviet nail industry was, by any measurable standard, thriving. It was also producing almost nothing of value. [1] [2]
This is Goodhart’s Law in its purest form. The economist Charles Goodhart first articulated it in 1975 in the context of monetary policy, but the observation has turned out to be one of the most universally applicable principles in organizational behavior: when a measure becomes a target, it ceases to be a good measure.
And if you’re running a data strategy today—whether you’re optimizing support teams, tracking product engagement, or measuring engineering productivity—there’s a good chance you’re manufacturing useless nails at scale.
The Mechanism Is Embarrassingly Simple #
Here’s the thing: Goodhart’s Law isn’t about people being dishonest or malicious. It’s about people being entirely predictable.
When you attach consequences to a number—bonuses, promotions, public dashboards, quarterly reviews—you are communicating a priority. Not what you say matters. What actually matters. And people are extraordinarily good at optimizing for what actually matters.
The problem is that every metric is a proxy. It’s a pointer to some underlying behavior or outcome you care about. Support ticket closure rate points to customer satisfaction. Active users points to product value. Features shipped points to engineering productivity. But the proxy is not the thing. And when you optimize hard enough for the proxy, you often destroy the underlying thing you actually cared about.
Marilyn Strathern, who turned Goodhart’s observation into something approaching a universal law, put it this way in 1997: “When a measure becomes a target, it ceases to be a good measure.” [3] Simple. Devastating.
Let me show you what this looks like when it plays out in practice, because I’ve seen each of these patterns in organizations across industries.
The Graveyard of Goodharts #
Support tickets closed. You set it as the metric because you want fast, effective support. What you get is agents closing tickets the moment they can plausibly argue the issue was resolved. Tickets get closed, reopened, closed again. Your time-to-close looks excellent. Your customers are having increasingly terrible experiences. And your ticket volume is growing, not shrinking—because nothing is actually getting fixed.
Active users. Product teams call this one the original sin of growth metrics. You need to show investors that people are engaging with your product. So “active” gets defined as “logged in during the past 30 days.” Teams run login reminder campaigns. They add friction to logging out. They build streaks and notifications. The DAU curve goes up and to the right. The question nobody asks: are these users getting value? Turns out, often not. They logged in. Then they left.
Velocity in software development. Story points per sprint becomes the target. Teams respond by inflating estimates. Complex work gets broken into small, easy-to-complete tasks. The velocity numbers look healthy. Technical debt quietly compounds. Then, one day, the team grinds to a halt because the codebase has become a nightmare to work in—and nobody saw it coming because the velocity metric never captured it. [4] [5]
Net Promoter Score. You ask customers how likely they are to recommend your product. But you ask them immediately after a support interaction that went well, or right after a successful onboarding call, or at the precise moment they’ve just received a discount. You survey selectively. You train support agents on how to “set up” the survey interaction. Your NPS climbs. Your actual promoter rate stays flat or drops. The number was measuring something real. What it’s measuring now is something else entirely. [6] [7]
Hospital waiting times. This one is well-documented and uncomfortable. After the UK NHS introduced a four-hour emergency department target, some hospitals began a practice known as “corridor care”—patients admitted to the hospital but kept in corridors rather than beds, because the clock officially stopped when they were “admitted.” The target was met. Patients were sometimes worse off. This is not a story about corrupt administrators. It’s a story about rational people responding to the incentive structure they were given. [8] [9]
Notice the pattern? The moment you put real consequences behind a metric, you’ve launched a competition between measuring what matters and making the number move. And making the number move is almost always easier.
Why “Better Metrics” Is the Wrong Answer #
The instinct, when you realize your metrics are being gamed, is to find better metrics. More granular ones. Composite scores. Harder-to-fake measurements.
This is understandable. It’s also largely futile.
Every metric is gameable given sufficient incentive and time. Not because people are bad, but because they’re adaptive and intelligent. If you put enough pressure on any number, someone will find a creative way to move it that has nothing to do with the underlying outcome you care about.
The arms race between metric designers and metric optimizers is one that the metric designers reliably lose. You design a composite score. Teams learn which sub-components are easiest to move and focus there. You add weight to the harder sub-components. Teams find new shortcuts. You add still more components. Eventually you have a metric so complex that nobody understands what it’s measuring anymore—which creates its own problems. [10]
This is the trap that most data strategy conversations fall into. “Our customer satisfaction metric isn’t working, let’s replace it with Customer Effort Score.” “Our velocity metric is being gamed, let’s switch to cycle time.” Maybe. But if you’re replacing metrics without changing the incentive structure and without deeply interrogating the behavior you actually want to drive, you’re just buying time before the new metric gets gamed too.
The problem isn’t the metric. It’s the architecture of your measurement system.
The Question That Changes Everything #
I want to suggest a different starting point.
Before you deploy any metric as a target—before you put it on a dashboard, attach it to a bonus, announce it in a quarterly business review—ask one question:
“If we optimize for this, what breaks?”
Not “what could theoretically go wrong.” Not a box-checking exercise. A serious, uncomfortable, cross-functional conversation about what rational people will do when this number becomes the thing they’re measured on. Bring in the people who will be measured. They will tell you, often immediately, exactly what the shortcuts are. They know. They’re just waiting to be asked.
This question is a close cousin of the pre-mortem technique from Gary Klein’s research on decision-making: imagine your project has already failed spectacularly, and then work backwards to figure out how. It forces you to think adversarially about your own system before someone else does. [11] [12]
If you can’t answer “what breaks,” you’re not ready to deploy the metric. You’re about to learn the hard way.
What Good Measurement Architecture Actually Looks Like #
The goal is to build measurement systems that are resistant to gaming—not because the metrics are too clever to game, but because the system is designed to make gaming obvious and counterproductive.
A few principles that actually help:
Measure at multiple levels of abstraction. Track leading indicators and lagging indicators. Track both the proxy metric and the underlying outcome. If your support ticket closure rate is improving but your customer satisfaction score is flat or declining, you have a signal that something is wrong. A single metric creates a single point of failure. A cluster of related metrics creates triangulation.
Separate measurement from incentives periodically. Not everything should have a bonus attached to it. Some metrics should be observational—used to understand what’s happening, not to drive behavior directly. The moment you attach consequences, you change the system. Use that power deliberately and sparingly.
Talk to the people being measured. This sounds obvious. It almost never happens. Before you lock in a metric framework, sit down with the teams who will live under it and ask them directly: how would you game this? What would rational behavior look like if this was the number that determined your performance review? You will learn things that no amount of metric design cleverness would have surfaced.
Build in qualitative checkpoints. Numbers don’t catch everything. Build explicit mechanisms for qualitative review alongside quantitative measurement. If the numbers say everything is fine but the people doing the work say something is wrong, trust the people.
Make the underlying outcome explicit. Write it down. Not “increase active users” but “ensure that users who log in are getting measurable value from the product, as evidenced by X.” The gap between the proxy metric and the underlying outcome you care about should be visible, explicit, and regularly interrogated.
The Data Strategy Implication #
Here’s where this connects back to something I’ve been writing about for a while now: data initiatives tend to fail not because they lack data, and not because they lack dashboards, but because they lack clear thinking about what behavior they actually want to drive.
Most data strategies I’ve encountered are metric-first. Organizations decide what they want to measure, build the infrastructure to measure it, publish dashboards, and then declare themselves data-driven. The question of what behavior those metrics are supposed to incentivize—and what they might inadvertently incentivize instead—gets skipped entirely.
Goodhart’s Law is one of the oldest and most reliable critiques of this approach. And it’s one that the data community has largely managed to avoid confronting directly, even as the evidence accumulates.
The fix is not more data. It’s not more sophisticated metrics. It’s clearer thinking about what you actually want people to do, followed by honest interrogation of whether your measurement system is driving that behavior or some functional-looking approximation of it.
You probably have some version of this problem in your organization right now. A metric that everyone reports on, that looks fine on the dashboard, where something about the underlying reality doesn’t match the number.
Find it. Ask the people living under it what they’d do if their job depended on moving it. Listen carefully to the answer.
Because I promise you: they already know.
Join the Conversation #
Have you seen Goodhart’s Law in action in your organization? I’d be especially curious about cases where the gaming was subtle enough that it took a long time to surface. Find me on LinkedIn or BlueSky.
References #
Soviet quota gaming
[1] Alienation and the Soviet Economy - P.C. Roberts
[2] The Soviet Economic System - Alec Nove
Goodhart’s Law and Strathern’s formulation
[3] "‘Improving ratings’: audit in the British University system" - Marilyn Strathern
[10] Categorizing Variants of Goodhart’s Law - David Manheim & Scott Garrabrant
Velocity and agile metrics
[4] Accelerate: The Science of Lean Software and DevOps - Nicole Forsgren, Jez Humble & Gene Kim
NPS and its limitations
[6] The One Number You Need to Grow - Fred Reichheld, Harvard Business Review
[7] A Longitudinal Examination of Net Promoter and Firm Revenue Growth - Timothy Keiningham et al., Journal of Marketing
NHS four-hour waiting time targets
[9] Emergency admissions to hospital: managing the demand - UK National Audit Office
Pre-mortems and adversarial thinking
[11] Performing a Project Premortem - Gary Klein, Harvard Business Review
[12] Thinking, Fast and Slow - Daniel Kahneman
Photo by Anne Kruse: https://www.pexels.com/photo/3-flat-head-nails-close-up-photography-190101/