Proxy Metrics: The Danger of Overfitting

What’s a proxy metric

When we can’t measure the thing we really care about, we measure something adjacent to it instead. For example when we talk about success, it’s hard to quantify, so we tend to refer to wealth, which is very easy to measure. Quality in almost every domain is impossible to measure, so we opt to measure quantity instead. We’d rather fly on a plane that had 100 safety checks beforehand, than a plane with 10 checks, even though 99 of those checks could be on tray-table hinges.

These are proxy metrics: measuring something that sort of emulates the thing we care about.

If the thing we care about is a complex curve, the proxy metric is like a tangent to that curve. It follows the curve for a time, perhaps for a long time, but at some point they separate. So the measure is useful, but it doesn’t give us a full picture, or a perfectly true guide.

What’s Overfitting

When we rely on proxy metrics as a north star, we threaten overfitting. Overfitting is a term most commonly used in Machine Learning, where a model is trained on a dataset. The model will learn perfectly how to respond to the dataset, but the dataset it only a small representation of the real world. The patterns in that dataset are likely not true for all of reality, and when a new case is presented to the model is responds in a peculiar way.

The same thing happens to humans too. We learn from a narrow set of experiences, and try and apply the same lessons to new experiences, with bizarre results. Humans are pattern-matching machines.

Give a child a sweet each time they cough, and they’ll cough all day long. They’ll cough around all adults, expecting the reward. They’re fitting the training model you’ve set, using Pavlovian conditioning.

Think it’s just children that can fall prey? I’ve seen people get praise from their boss for doing something one time, and have gone on to repeat that task again and again to ridiculous excess. We make bold predictions from tiny data-sets. That’s overfitting.

Another stark example comes from FBI training, in which students practice disarming an assailant. In the drill the student disarms the gun, and then hands it back to the teacher so they can repeat. There have been multiple instances during real events, where FBI agents have been recorded disarming an actual criminal, before dutifully handing the gun straight back. That’s overfitting with potentially horrendous consequences.

Overfitting bends away from intention

Just like the coughing child and the praised employee, all proxy metrics are tangents that touch the curve we really care about, perhaps with a lot of overlap, or else for a mere moment. The danger comes when we follow the tangent a long way from the curve.

If regulation says plane companies should do more plane tests, they’ll find the easiest way to add more tests. They’ll check belt buckles and overhead storage, and nothing gets safer. Test count is a very poor proxy metric for test quality.

At some point, measuring quality actually starts to pervert the intention. If the plane staff are doing 1000 belt buckle checks, they give less time and attention to the vital engine and wing checks. Now our proxy metric is actually detracting from the real aim. We’re getting less safe because of that metric.

If this sounds ridiculous consider this. Would you define meaning & success in your career or in your life purely in terms of money? Probably not, yet how often are you prone to making choices that give up money in order to maximise fulfilment or meaning or success as you actually define it? This can be a hard decision, because money is a tangible, measurable, demonstrable, external signal of success, a very powerful proxy metric. Yet choosing the money is actually pulling you further away from the other things you say you care more about.

The tangent pulls away from the curve. You are forced to choose between pursuing what you intended and the proxy metric. For most people, this decision is one we make continually, on a daily basis.

Metrics are incentives

We measure something because we want to optimise for it. It becomes a driving force. Numbers lend themselves very neatly to objective goals.

Metrics are powerful so be careful which you choose.

By all means, track money or quantity or whatever, but when there’s a decision to be made, will you be prepared to give up the number for the meaning? When the tangent splits from the curve, you’ll have a choice to make.

Occasionally I send out an idea & ask for your thoughts.