Grumpy Old Men, Opacity, and Optimizers

Today I’m channeling my inner grumpy old man. And these guys are helping. (I am not old enough to pull off such a face by myself, although life is rapidly helping me get there. ;-)

Grumpy old men.

The reason I’m feeling grumpy is that I’ve had another in a long, long line of conversations about how to write faster code.

It’s not that optimization experts are dumb. Far from it. They are invariably smart, and in general, they are better informed than I am about how pipeline burst cache and GPUs and RAM prefetch algorithms work. I generally learn a lot when I talk to guys like this.

I applaud their passion, even if I think they sometimes get carried away.

No. What’s making me grumpy is that after decades of hard work, we still have compilers that encourage a culture of black magic and superstition around this topic. I thought I signed up for computer science, not voodoo.

To show you what I mean, let’s talk about the humble inline keyword in C and C++. The amount of FUD and nonsense around it is really unfortunate. How many of the following have you heard?

  • Inlines are always faster than normal function calls.
  • Inlines are sometimes slower than normal function calls, if they bloat the binary to the point where the data segment causes page faults.
  • Compilers treat inline as a hint only, and ignore it whenever they feel like it.
  • Ordinary programmers might not be smarter than the compiler, but I am. That’s why I use the compiler extension that always inlines, instead of just the ordinary inline keyword that’s useless.
  • If you declare a static in a function, it can’t be inlined.
  • If you throw an exception in a function, it can’t be inlined.
  • If you call a virtual in a function, it can’t be inlined.
  • If you call goto in a function, it can’t be inlined.
  • If you use thread-local storage in a function, it can’t be inlined.
  • Only code in a header can be inlined. Using inline in a .cpp is either an error or is totally ignored by the compiler.
  • Since templates are implemented in headers, they are automatically inline candidates.
  • static and inline are mutually contradictory.

I’m not going to argue the stuff in the above list, one way or another. There’s enough falsehood in that list to steer people wrong, and enough truth in it to be dangerous. Importantly, all of the items in the list are stated as timeless, universal truths, which is a prime facie reason to treat the list a Bad Idea™ regardless. Remember: it depends. :-)

My point today is not about inlines, though. It’s not even about performance dogma. Rather, it’s about opacity.

The optimization choices that a compiler makes about inlining and sundry other issues are opaque to most coders. And I claim that it is this fact–not irrational zealots–at the heart of a lot of holy wars, debates, and FUD about optimization. The classic paper by Meyers and Alexandrescu about how compiler optimization defeats the intent of the double-checked locking pattern provides eloquent examples of this opacity. If you haven’t read it, I encourage you to do so.

We should fix this.

Compiler makers, I hereby request a feature. Please add the ability to generate an “optimization plan” for a function, analogous to the “explain query plan” feature that DB admins have used to tune their work for decades.

I can imagine this working as a compiler switch, similar to -E which dumps preprocessor output to stdout. If I add --explain-optimizations to the cmdline, I would like a report that tells me:

  • What sorts of loop unwinding, reordering, and other shortcuts will be used. Please tie them back to the specific switches that are active.
  • How optimizations were constrained by block, function, and translation unit scope–and how optimizations might change naive assumptions about scope that a programmer would form by looking at the high-level representation of the code.
  • What additional optimizations might be possible if additional switches were added or removed.
  • What guesses were made about likely versus unlikely branches in conditionals.
  • What additional optimizations might be possible if not for a certain characteristic of the code. Please be specific: “I could not optimize out the extra assignment to foo, because codepath X requires it.”
  • How micro optimization decisions conflict with macro ones, and what assumptions and priorities were used to resolve these conflicts.

I realize I am not asking for something easy. But I believe explaining optimization choices cannot be harder than making those choices in the first place–and the problem must be somewhat tractable, since the SQL crowd has an analogous tool.

Let’s shine some light on this black magic, and turn performance tradeoffs into a science based on common, abundant knowledge. I think it could improve the whole industry.

Image credit: Neil. Moralee (Flickr)

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s