Metrics, Plumb Lines, and System Thinking

2012-11-12

Friday morning I was at a seminar taught by Jason Taylor, CTO at Allegiance. We were discussing how dev team velocity and product quality can compete for our attention; sometimes we trade one for the other. Jason mentioned that he’s a fan of competing metrics, and some neurons connected in my brain.

Plumb line suspended from the center point of multiple balancing legs. Photo credit: suttonhoo (Flickr)

I’m a big believer in measurement. As the old adage goes, you can’t improve what you don’t measure. Next time someone urges yoo change a behavior, or tells you she’s going to, ask what measurement of change is being proposed. If you get an unsatisfying answer, I predict you’ll also get an unsatisfying outcome.

I’m also a big believer in balance, as I’ve written about before. Good software balances many considerations.

Besides these existing predispositions, I’d recently read a blog post by Seth Godin, cautioning about the need to choose wisely what we measure. And I’ve been digesting The Fifth Discipline, by Peter Senge, which advocates wholistic, systemic thinking, where we recognize interrelationships that go well beyond simplistic, direct cause-and-effect.

All of these mental ingredients crystallized when Jason made his comment about competing metrics.

I realized that when we have a system that’s out of balance, and we pull a lever to correct, if we measure progress with a single-minded metric, we are setting ourselves up for skew, overcorrection, or puzzlement. I think I’ve made this mistake several times in my career. We have to measure each factor that contributes to an overall system dynamic if we want to shift balance efficiently and avoid pendulum behavior.

If you can directly measure balanced alignment to an ideal (i.e., you have a single metric that takes all competing factors into account), then you have the best of all worlds. I believe this is why net promoter score is so powerful. These types of measurements are least susceptible to the false proxy trap Godin warns about; we can’t “game” the system. But in many cases, the best we can do is measure multiple contributing factors.

If your organization is stuck in a binary tradeoff between quality and velocity, then a simple competing metric pair will suffice. Measure how quickly your team adds features (e.g., with story points), and measure how much your quality suffers (e.g., with a variant of containment rate), and you’ll have a good idea whether you should keep pushing on one side of the scale or the other.

However, a lot of systems are more complex, and it may be helpful to think of metrics as complementary rather than competing. Think legs of a tripod (or teepee), with a plumb line in the center. Adjust any leg, and the plumb line shifts. If your org sometimes trades velocity for quality, but also sometimes releases pressure by adding resources or by reducing scope, then you need to be measuring more than just quality and velocity to have a realistic idea of what’s happening. You also need to be measuring how often, and by how much, your scope changes, and how often, and by how much, you shift resources around.

In my younger years, I grumbled a few times about how feature creep impacts quality, without providing any useful metrics that made that tradeoff real to product management or executives. I’ve gradually learned to be better, but now I realize I still have room for improvement. I don’t think I’ve ever measured how many story points get deferred when an emergency drags resources away, or how many story points get done on a 4-month release versus an 8-month release.

I’m going to paint more complete pictures with my metrics, and see where it gets me.

Action Item

Think of a team problem where you'd like a different balance. How can you measure each factor that plays into the overall dynamics of the situation?

Comments

dougbert, 2012-11-13:

I am not a pilot but I understand "instrument flying" operation. It is possible to "fly the plane" by chasing the "artificial horizon", constantly trying to keep the horizon 'level'. Yet by solely following that type of flying, it is very easy to lose track of the overall objectuve if actually going somewhere desired. One can fly the plane correctly and safely, yet never get any where. Tracking bugs fixed is good, but does that tracking increase or decrease the entropy of the code? A metric is there and can be used for reports, but what metric is used to measure "better code", "cleaner code", or "code that properly reflects the model of the problem being solved"? As always, some great insight
Daniel, 2012-11-13:

Doug: I hadn't considered the analogy to flight, but I think it's a very insightful one. The problem of optimizing a particular number on the instruments, as opposed to seeking the overall best flying experience, is exactly the sort of problem that Seth Godin talked about with his caution about false proxies. We get enamored of a number and forget that it's only a means to an end. That said, I'd rather have two or three useful numbers than just a vague intention. This is why it was so smart of you to pick a specific target (e.g., "no modules > 10k lines") and work to hit it in your moab work.