Metrics Can Kill Innovation
The Hidden Cost of Measuring Everything the Same Way
One Takeaway: It’s easy for organizations to overinvest in measurable activities while underinvesting in valuable uncertainty. Understanding why this happens explains how “what gets measured gets managed” often ends up becoming “what can’t be measured gets eliminated.” This type of thinking can kill the work organizations need for long-term success.
In Growth Isn’t One Sided, we saw that creators need different measurement approaches than operators and refiners. In What Headquarters Can’t See we noted how the knowledge needed for good decisions often resists centralization. But there’s a related problem: that same knowledge often resists measurement. And when something can’t be measured, it becomes invisible to management systems. This means it often gets pushed aside.
This isn’t a problem due to bad metrics or poor implementation. It’s about the fundamental fact that measurement systems can create biases no matter how well-designed the metrics are.
Two Teams, Same Metrics, Different Outcomes
A fast-growing fintech company launched two new initiatives with two different teams. Both teams reported to the same Executive. Both were measured using the same framework that had made the company data-driven and successful.
The Payments Team. Mission: reduce payment processing costs. Key Metrics: monthly cost per transaction, processing success rate, quarterly cost savings. All clear, quantifiable, and attributable to their work.
The team performed great. A/B tests on processing algorithms showed a 3% cost reduction. Other changes led to a statistically significant 0.2% improvement in success rates. Infrastructure changes saved $400K per quarter. Every experiment had clear success metrics and rapid feedback.
After 18 months: $4M in measured, attributed cost savings. The team ended up expanding. The manager got promoted. Leadership used the team as a perfect example of how innovation should work.
The New Market Team. Mission: identify new market opportunities for financial services. Key Metrics: customer acquisition cost, market penetration, quarterly revenue from new initiatives.
Quarter 1: Explored changes to finance for healthcare. No revenue. High customer acquisition costs from experimental pricing. Zero market penetration. Metrics: all red.
Quarter 2: Pivoted to SMB lending based on partnership feedback. Minimal revenue from a break-even pilot. Acquisition costs looked terrible. Market penetration unmeasurable because the market itself was still being defined. Metrics: still red.
Quarter 3: Discovered an opportunity in contractor payroll. Partnership conversations promising but no contracts signed. No revenue to report.
Quarter 4: Team disbanded. Resources reallocated to “proven” optimization work like the Payments Team.
One year later: a competitor launched a contractor payroll service that became a $500M revenue line. The opportunity the New Market Team had identified in Quarter 3 was real. The measurement system killed it before value could materialize.
What happened? The Payments Team was doing (important) refiner work. They were improving existing systems where outcomes are measurable and attribution is clear. The metrics captured their value perfectly.
The New Market team was doing creator work. They were exploring uncertainty where outcomes take time and attribution can be ambiguous. The same types of metrics made their valuable work look like failure.
The metrics weren’t bad. Revenue, acquisition cost, and penetration are perfectly reasonable things to track. The problem is more fundamental. Valuable exploration generates unmeasurable or negative metrics in the short term. Measurable work tends to be optimization of things you already understand. As a result traditional measurement systems can’t distinguish between “failing” and “learning.”
Why Measurement Systems Break Down
Economist Charles Goodhart identified a problem that affects all measurement systems. When a measure becomes a target, it ceases to be a good measure.
The management pattern is predictable. You identify a metric that correlates with something valuable. You set it as a target and reward people for improving it. Eventually, people find ways to improve the metric that don’t improve the underlying value. The metric stops measuring what it was supposed to measure.
This isn’t about bad people gaming systems. It’s about rational behavior under constraints. When a customer satisfaction target is set at 4.5 out of 5, support teams learn to survey only happy customers. They resolve tickets quickly without solving problems. They can even coach customers on how to respond. These actions lead to the score going up. Actual satisfaction, measured by something different like retention and referrals, goes down. This is often referred to as “Metric-hacking.”
When you measure engineers on lines of code written, they write duplicative code. When you measure sales teams on quarterly revenue, they may close deals with unsustainable discounts. When you measure teams on number of experiments, trivial changes get labeled “experiments.” In each case, the metric improves while the thing you intended to measure gets worse.
Psychologist Donald Campbell identified a related effect. He found that the more important a metric becomes for decisions, the faster it corrupts. High-stakes metrics create strong incentives for manipulation. The definition gets negotiated. The measurement gets gamed. The metric becomes meaningless while appearing objective. This is why measurement systems can degrade over time. The very act of using them for high-stakes decisions creates pressure to corrupt them.
Management scholar Jerry Muller argues in his book, The Tyranny of Metrics, that this reveals a fundamental error. Often times leaders end up believing that measurement replaces judgment. In reality, measurement demands more judgment, not less. Judgment about what to measure. Judgment about how to interpret what you find. Judgment about when the numbers are being gamed. Organizations that eliminate judgment in favor of pure measurement make systematically worse decisions. By doing this they remove the interpretation layer that makes measurements meaningful.
These effects combine into what you might call the “Weight of the Measurable.” Some activities produce clear, quantifiable metrics quickly (like operator and refiner work). Other activities produce ambiguous, delayed, or unmeasurable outcomes (like creator work). Budget and headcount flow toward measurable activities following a gravitational-like pull. They’re easier to evaluate and justify. Unmeasurable but valuable activities get starved as a consequence.
This happens because measurable work has lower transaction and tracking costs. It’s easier to evaluate performance objectively. Feedback is faster. Attribution is clearer. There’s less political negotiation over resources. Managers favor it because it’s easier to justify even if it’s not as valuable.
Innovation is most often unmeasurable in the short term. Exploration doesn’t produce revenue yet. Early experiments often fail. Attribution can be ambiguous. Outcomes are delayed by quarters or years, not months.
Optimization is measurable. Improvements produce clear metrics. Tests show results quickly. Attribution is direct. Outcomes appear this month.
The result: measurement-driven organizations underinvest in innovation while overinvesting in optimization. Not because managers don’t value innovation. Because measurement systems make optimization visible and innovation invisible.
Obsessed with the Observable
Our world has developed what I can only describe as “a fetish for more data.” Data is helpful, but it is not all-knowing. You cannot allow your company to operate under the delusion that numbers remove the need for interpretation and judgment. You cannot become obsessed with the observable.
This is a version of what economist Friedrich Hayek called “scientism.” This is based on the belief that methods which work well in engineering work equally well for understanding human systems. In What Headquarters Can’t See we explored how local, tacit knowledge resists centralization. The measurement version of the same problem is that local, tacit knowledge also resists quantification. The customer success manager’s sense that an account is at risk, the operator’s judgment about a failing process, the creator’s instinct about an emerging market don’t become more real when you put a number on them. They simply become more convincing. Often the attempt to quantify them strips out the context that made them valuable in the first place.
Valuable knowledge that leads to action often resists centralization as well as quantification. If you make all your decisions based on dashboards, but some of the most valuable information can never be contained by a dashboard, then you’re missing out on the most vital knowledge for your businesses’ success.
The “Stat Sig” Trap
Economists Deirdre McCloskey and Stephen Ziliak argue in The Cult of Statistical Significance that confusing statistical significance with economic (or business) significance is one of the most expensive errors in modern decision-making. The same confusion plays out inside organizations every day.
Statistical significance (”stat. sig.”) means the observed difference is unlikely due to random chance. Business significance means the difference matters enough to change your decision. These are completely different questions. Organizations often treat them as identical.
An A/B test with a million users finds that changing a button color increases conversion by 0.02%. Stat. sig. at p < 0.001. Business significance: the improvement costs more to implement than it generates. Statistical standards say ship it. Business judgment says ignore it.
A pilot program in three cities shows a 25% increase in customer lifetime value. Not stat. sig. at p = 0.15, maybe because the sample is small. Business significance: if real, a 25% LTV increase is transformational. Statistical standards say kill it. Business judgment says expand the test.
Organizations trained to worship stat. sig. findings can kill valuable innovations while focusing on trivial optimizations. The most valuable innovations often start with weak signals in small samples. They start with conversations with a handful of customers. Or even experiments with limited users. These end up beginning as outliers rather than patterns. These are not good situations to find stat. sig. results. Mandating statistical significance for all decisions eliminates the exploration needed for innovation.
Type 1 vs Type 2
This connects to a deeper bias that deserves its own attention. Organizations tend to be better at avoiding visible mistakes than avoiding invisible ones. In statistical terms, there are two kinds of errors. A false positive (type 1 error): funding something that fails. A false negative (type 2 error): killing something that would have succeeded.
These errors are not treated equally. False positives are visible. Everyone knows you funded that failed project. There’s a name attached. There’s a post-mortem. False negatives are invisible. Nobody knows the initiative you killed would have been a $500M revenue line. There’s no post-mortem for the road not taken.
Innovation requires accepting some false positives to avoid false negatives. You have to fund experiments that don’t pan out to not kill experiments that would have transformed the business. But measurement-driven organizations set high bars for funding. They’re quick to kill a project for underperformance. They must show strong evidence before scaling. Every one of these rules optimizes against visible failure. None of them protect against invisible missed opportunity.
The contractor payroll team from our opening example was a false negative. The measurement system couldn’t distinguish between “this isn’t working” and “this hasn’t worked yet.” So it killed a real opportunity to avoid a visible failure.
Different Functions Need Different Measurement
Management scholar Steven Kerr identified the core dysfunction (On the Folly of Rewarding A, and Hoping for B). He found that organizations hope for long-term growth, innovation, and strategic positioning. But they reward quarterly results, measurable efficiency, and short-term wins. The unmeasurable things they hope for get neglected while the measurable things they can track get optimized.
This plays out differently across the three functions.
Measuring operator work. Operator work can be measured, but the wrong metrics destroy its value. Measuring pure efficiency (such as volume divided by time) drives out quality and judgment. Measuring short-term costs can drive out reliability investments. Measuring individual output can drive out collaborative problem-solving.
Better operator metrics focus on system reliability, customer outcomes, and problem resolution rather than speed. They use longer time horizons such as quarterly and annual rather than daily and weekly. They measure team performance rather than just individual output. The goal is enough measurement for accountability without so much that it drives out the context that makes operator work valuable.
Measuring refiner work. Refiner work is partially measurable, but measuring only immediate efficiency gains drives out capability building. Measuring only cost reduction drives out quality improvements. Measuring only successful experiments drives out the necessary failures that generate learning.
Better refiner metrics focus on rate of improvement, knowledge creation, and process capability over time. Better metrics take a portfolio view. They measure suites of improvements rather than individual projects. They value learning even from experiments that didn’t produce the expected result. The goal is measuring improvement and learning while avoiding pressure for immediate gains. Immediate pressure can stifle experimentation.
Measuring creator work. Creator work resists measurement almost entirely in the short term. Measuring short-term revenue stops exploration before it can generate revenue. Measuring success rate of experiments drives out necessary risk-taking. Requiring stat. sig. eliminates small-sample learning.
Better creator metrics focus on rate of experimentation, quality of learning, and the value of new options that get created. What future opportunities did this enable that didn’t exist before? These better metrics look at the entire portfolio rather than individual experiments. They use long time horizons. They explicitly accept that most individual experiments will fail. But, the portfolio can succeed even when most of its components don’t.
Consider what this means in practice. If a creator team runs ten experiments in a year, and eight fail, one produces modest results, and one opens a new market worth $50M—that’s an extraordinarily successful year. But a measurement system that evaluates experiments individually reports an 80% failure rate. The team looks terrible on paper while creating enormous value. Portfolio evaluation sees the $50M opportunity. Individual measurement sees eight failures.
The critical point: for creator work, measuring success or failure of individual experiments can be actively harmful. The right question isn’t “did this experiment work?” It’s “is our portfolio of experiments generating knowledge and creating options faster than it costs?”
Harmonizer Thinking as Measurement Translation
To review, In The Cost of Working Together we described harmonizer thinking as building new rules and systems inside the organization. In What Headquarters Can’t See we described it as knowledge brokering. It translates between local and central understanding. Now we can see a third dimension of its value. Harmonizer thinking protects valuable work from measurement systems that would kill it.
This isn’t a separate function. It’s the same way of thinking applied to the measurement problem. The person who designs shared metrics across functions also needs to ensure those metrics don’t destroy creator work. The person who translates local knowledge for central decision-makers also needs to advocate for that knowledge when the dashboard tells a different story.
In practice, this means several things.
Arguing for different metrics for different work. When leadership wants a uniform scorecard across all teams, harmonizer thinking makes the case that applying revenue targets to an exploration team is like grading a research lab on manufacturing output. It doesn’t argue against measurement. It argues for measurement that matches the work. Like reliability metrics for operators, learning metrics for refiners, portfolio metrics for creators.
Translating unmeasurable value into language leadership can act on. One team may have spent three quarters “failing” by every metric on the dashboard. But, they may have built relationships, identified a market, and developed knowledge that no competitor has. That value is real but invisible to the measurement system. Harmonizer thinking makes it visible. It doesn’t invent metrics to justify the work. It connects the dots on what the team has learned and what options that learning creates.
Protecting experimentation from premature judgment. Measurement systems want to evaluate quickly. Innovation needs time to develop. Harmonizer thinking creates space between these two pressures. It advocates for longer evaluation windows. It builds portfolio-level assessments. It helps create patience to distinguish between “this isn’t working” and “this hasn’t worked yet.”
This is perhaps the most immediately valuable thing harmonizer thinking does. Building systems and brokering knowledge are important but they address chronic problems. Protecting valuable work from measurement bias addresses an acute one. Somewhere in your organization right now, a team, or an IC, is doing work that could transform the business. Your measurement system could be telling you they’re failing. Without someone who can see the difference, you’ll make the rational decision to cut them. But when you do you may never know what you lost.
Measurement and the Knowledge Problem
Valuable knowledge is often local, tacit, and context-specific. Now we can see how the measurement problem compounds this.
The knowledge that matters most for good decisions often resists measurement. The customer success manager’s sense that an account is at risk. The operator’s judgment that a process is about to fail. The creator’s insight about an emerging opportunity. These are exactly the kinds of knowledge that drive good decisions. They’re also exactly the kinds that measurement systems can’t capture.
Organizations build decision-making systems around measurable, systematic knowledge while ignoring valuable local knowledge. Aggregated customer data is measurable, so it gets favored. A customer success manager’s tacit sense that something is wrong is unmeasurable, so it gets ignored. This happens even when it’s more accurate than the dashboard.
This is why uniform measurement across all functions can be damaging. Management wants “objective” metrics for everyone. But operator value often lives in reliability and judgment that resist quantification. Refiner value lives in systematic improvement that’s partially measurable but delayed. Creator value lives in option creation that’s largely unmeasurable. A uniform system measures what it can. But that means it measures most refiner and operator work well, and creator work not at all.
Looking Ahead: The Rules Underneath the Metrics
Understanding measurement economics explains why metrics can kill innovation. But measurement is only one part of a larger system that shapes behavior inside your organization.
Metrics are rules. But so are decision rights, budget processes, career paths, and evaluation frameworks. All these shape what people do independently of what leadership says it values. When any of these rules make cooperation a sacrifice rather than a rational choice, no amount of measurement redesign will fix the coordination failure.
Next, we’ll explore why persistent cross-functional problems aren’t motivation problems, they’re design problems. We’ll see why economist James Buchanan’s insights about institutional rules apply directly inside organizations. Why hoping for good employees is not a sustainable strategy. And why designing systems where cooperation is individually rational produces better outcomes than any speech from leadership ever could.
The Bottom Line
Measurement economics explains a systematic problem that affects every growing organization. What gets measured gets managed. What can’t be measured gets eliminated. Even when the unmeasurable may be more valuable than the measurable.
Goodhart’s Law means metrics stop being good measures when they become targets. Campbell’s Law means the more important a metric, the faster it corrupts. The obsession of the observable means the weight of easily quantified work crowds out valuable work that resists quantification. And the fear of visible failure exceeds the fear of invisible missed opportunity.
These biases combine to kill innovation in your business.
The answer isn’t eliminating measurement. It’s matching measurement to work type. Operators need reliability and quality metrics, not just efficiency. Refiners need learning and improvement metrics, not just cost reduction. Creators need exploration and option value metrics, not revenue targets. And for uncertain work, measure portfolios rather than individual experiments. Focus on judging learning value rather than success rates.
The competitive advantage goes to organizations that can measure what matters without measuring everything. That can hold people accountable without killing valuable uncertainty. That can use data to guide decisions without worshipping statistical significance at the expense of business judgment.

