How to make a const-correct codebase in 4300 easy steps

2014-03-25

One of the codebases that I work on is theoretically C++, but if you peer under the hood, it looks more like 1990-vintage C. It’s 500 KLOC of almost purely procedural code, with lots of structs and few true objects. More dusty and brittle than I’d like.

I am not a C++ bigot; I first began to do serious, professional coding in C, not long after this codebase got its start. I understand the C-isms pretty well. And although I think Linus got carried away in his rant about the ugliness of C++, I can appreciate the ways that lean C sometimes makes its descendant look ugly and inefficient. (Though C++11 and 14 are making this less true…)

This means that I don’t consider the C-like style of this particular codebase a fatal flaw, in and of itself.

However, since my early coding adventures I’ve been converted to the advantages of OOP for complex projects and large teams, and I’ve also accumulated a lot of battlescars around multithreading. Our codebase needs OOP to solve some encapsulation antipatterns, and it needs RAII and C++11-style mutexing in the worst way. Its old, single-threaded mindset makes far too many things far too slow and error-prone.

A few months ago, we decided to make an investment to solve these problems.

To do it right, I had the team add a story to our scrum backlog about making the codebase const-correct. And therein lies a tale worth recounting…

The naive approach

At first, the team assumed that I wanted a human being to manually inspect every parameter to all ~4300 functions in the whole codebase, adding “const” wherever it belonged.

They weren’t thrilled. Although most were too polite to say so, they probably questioned the value of the exercise, privately wondering some of the same things that are articulated in this StackOverflow post. They complained that as soon as they tried to fix const in one place, it had a domino effect that caused endless ripples elsewhere.

I convinced them to give it a shot with two arguments:

The code could not be robustly multithreaded until we knew which functions modified which state — and thus, where mutexes were needed. The best way to codify that knowledge is to use const, because it's terse, standard, and enforced by the compiler.
Perhaps more importantly, I knew a way to upgrade the whole codebase, with near 100% accuracy, with very little effort. And I could solve the ripple effect.

The clever approach

My idea for automated const-fixing was simple, I thought. I would write a python script that embodied the following algorithm:

Use doxygen to generate call graphs for the whole codebase.
Given a whole-codebase call graph, identify all "leaf" functions (functions that are only called, but are not themselves callers, of anything else in the codebase).
For each leaf, find all function prototypes, including the one associated with the implementation.
For each prototype on a given leaf function, iterate over its parameters. On parameters where "const" might make sense but where it is not present, add "const" and see if the code still compiles and all unit tests still pass.
Prune all leaf functions from the call graph and repeat until no functions remain.

Turtles all the way down

I was hoping that after I described this straightforward algorithm, an enthused team member would volunteer. Nobody spoke up.

“Okay, I’ll do it,” I said. “I ought to be able to get that done in a few days.”

Famous last words…

At first, I made steady progress. The skeleton of the script was done after an hour or two of coding.

Adding reliable “roll back a change that turned out to be invalid” logic took me another hour or so to code and test. And adding parameter fuzzing (so that a parameter declared Foo * foo in one place, Foo *myfoo in another, and ::Foo * in another, would all be seen as semantic equivalents) took another hour.

Then I started cautious experiments to see how well my script was doing.

Ugh.

I already had regex-based python code that scanned code for function prototypes. Having done this sort of thing before, I knew that I was going to miss a few subtleties. I knew I’d have to handle inline /* ... */ and // ... comments, and maybe an #ifdef here or there. I knew there were a handful of macros in the codebase, but I was pretty sure there was nothing truly insidious. I figured I’d get at least 95-98% accuracy pretty quickly, because the formatting was very regular. Because I could rely on the compiler to tell me if something broke, and because I had a robust rollback mechanism, the worst that could happen if I mis-parsed something was that I’d fail to add a “const” that might have been theoretically possible.

Here are few of the gotchas that I didn’t anticipate, but that in hindsight should have been obvious:

The imperative to minimize diffs: When I rewrote code, my first attempt reformatted the function prototype slightly. I dropped extra whitespace, and I didn't keep the original linewraps and indents. I figured this would not be a problem. However, on closer inspection, I realized that a lot of prototypes looked like this: https://gist.github.com/dhh1128/9753554 The problem was that if I did a dumb rewrite of the function, I'd end up dropping the helpful comments. Also, my naive implementation drastically increased the noise in the diff, which was a big problem since we maintain half a dozen active branches of code for old maintenance releases, and depend on automatic merges to make bug fixes flow with minimal human intervention. It took me another hour or so to rewrite my "prototype rewrite" code so that it made the minimal possible change to alter constness.
Function pointers: Some function pointer declarations (as parameters, or as typedefs) gave my parser fits. I had to beef up my "function_decl" regex several times before I stopped getting false positives.
Typedefs: Sometimes a function was declared with one datatype, but defined using a typedef'ed alias. This caused my parameter correlation between the two instances of the prototype to fail. I ended up hard-coding my way around one particularly nasty instance of this. (Why not just fix the code, you ask? Excellent question. Remember my mandate to minimize diffs? Darn inconvenient...)
Function overrides: What if the same function name had more than one possible set of parameters? I couldn't just change the constness of the third parameter to CalculateIdealWidgetSize() blindly; I had to recognize clumps of functions that had a common parameter profile.
Mocks and stubs: This codebase includes a bunch of google test and gmock stuff. Each procedural function has its own testrunner, which links in a library of stubs for all the other functions in the codebase. This means that any given function in the codebase typically has 2 implementations (one real, one mocked), and often several different declarations. The mock declarations are in a totally different style from normal declarations. Some examples of mock declarations will give you a flavor: https://gist.github.com/dhh1128/9754282 I had to not just find and update ordinary function prototypes, but all their mocked equivalents as well, in whatever mocking variation they might appear. And I still needed to be able to correlate all the variants so I could make a coherent change across all instances of a particular function.
External APIs: At first, I was attempting to change every function prototype that I saw. However, I realized after a while that some externed function declarations referred to functions exposed by dynamically linked libraries. The code would compile when I changed the externed declaration, and tests might even pass — but at run-time in production, I might fail to link the function I wanted. I modified my script to only update prototypes where I could find at least one definition; if all I saw was declarations, I left the functions alone.
Doxygen bugs: A week or so into the project, I realized that doxygen's call graph was inaccurate. I found one doxygen setting that I could tweak, to increase the depth of call tree analysis — but even after bumping up that limit, some of the function relationships that I needed to understand were just missing. I chalk this up to subtleties like call-through-function-pointer, macros that obscured some function calls, and so forth.
Recursion and mutual dependencies: My tidy algorithm assumed that the overall call tree for the application was a directed acyclic graph. In fact, later analysis showed that I had 59 functions that were directly recursive, and probably a lot of other ones where recursion was indirect. I was able to detect this recursion, and artificially break it, in many cases. However, I found that about 20 iterations into my algorithm, I ran out of "leaf" functions with about 1000 functions still remaining to analyze. These functions all called some other function in the remaining graph, due to dense, unfortunate coupling. I brute-forced my way through these, perhaps sacrificing some const fixups that might have been possible if I'd been willing to analyze changes to a clump of functions at a time, instead to a single function only.

The detour

Partway through the project, a colleague suggested that I abandon the idea of parsing C/C++ with regular expressions, and switch to use a true grammar. I told him I had considered it. I knew of ply, which is a python implementation of lex/yacc. I’d noticed, last time I checked, that it shipped with a C grammar. We also discussed the possibility of using a compiler plugin for GCC or CLang. Both of these let you hook into the compiler after an abstract syntax tree has been generated, to inspect and react to what the compiler understands from the code.

I ended up discarding this approach. Working from an AST makes “reading” the code easy, but it doesn’t help write modifications. And the seemingly simple “function declarations” in the mocking layer actually resolve to very complex classes by the time google test/gmock’s macros are expanded; making heads or tails of them is not for the faint of heart. However, my colleague successfully built a GCC plugin for a slightly different code analysis task, and it seems promising for certain use cases. Not having to parse code yourself is definitely an attractive option, if it applies.

The results

So fast-forward a couple calendar weeks. It didn’t take me that much coding time to get this system working, but I was multitasking, trying to be a manager and a point of contact for the press, and interfacing with execs and product management… My coding time was pretty fragmented.

I finally got the code working pretty well, and I let it run…

… and run …

Since I was validating my changes by doing a full compile and test cycle between each one (and an extra cycle at the end of each rollback, to guarantee that the code was copacetic), it took quite a while to crunch through this codebase. I let it run for probably a total of about 200 hours, during which time I analyzed all 4300 functions in the codebase, one by one.

I am proud of the outcome. Altogether, I think this script found about 2000 functions where one or more parameters should have been const but were not. About 4500 parameters were changed in those 2000 functions, generating about 7000 lines of changes. The day after I pushed my massive change set, the behavior of the codebase on our regression suite was unchanged.

It felt pretty awesome to check back periodically, and see the results of hours (and hours and hours) of analysis that no human being had to wade through. As Hannibal would say, “I love it when a plan comes together.”

The script wasn’t perfect, though. I estimate that maybe 50 functions were missed due to parsing subtleties. A few prototypes didn’t match my regular expressions, and some false matches occurred as well. I also ended up ignoring the constness of member functions; there are only a handful in the whole codebase, and analyzing those would have required entirely different codepaths that I didn’t have time to write. I suspect there are other fixes lurking, if I could untie the gordian knot of dependencies that makes some functions impossible to analyze independently.

The code for this “const_fix” python script is available on github, in case it’s useful to anybody. It’s not very well genericized; I make assumptions about the organization of the files and folders within the codebase, and it’s hard-coded to email me if the script crashes. But you might be able to adapt it in an hour or two, if you have similar codebases that are scons/make/autotools-centric. I have since used the script on a second codebase, and the adaptation took me about an hour.

The moral(s) to the story

Stepping back from the details of this experience, I draw a few general conclusions:

Never get involved in a land war in Asia, as Vizzini wousay. (Or in other words, don’t invade unknown territory in your code, unless you’re assuming some blood, sweat, and tears. :-)

On the other hand, courage counts. A festering problem finally got fixed, because I was crazy enough to try.

It is a crying shame (and a glaring irony) that it’s so hard to code my way to a productive manipulation of … code. Programmers spend their whole careers validating the proposition that it’s worthwhile to enable computers to do grunt work, so humans can focus on more interesting tasks. Yet the artifact that programmers produce is so plagued by inconsistencies, so susceptible to bugs, and so complex to process that it takes a major investment to automate working with it. I am going to fix this in the programming ecosystem that I create.

An imperfect solution still made a huge difference. Sometimes perfect is the enemy of good enough.

Coding something right is cheaper and a whole lot less hassle than fixing an antipattern once it’s firmly entrenched. We’ll never get away from refactoring (nor would we want to) — but it pays to establish good habits of code hygeine, early, instead of after you write 500,000 lines of gunk.

Compilers ought to offer a “this should be const” warning, so this work would have been unnecessary. I googled; there isn’t any such option that I could find. Yet my need is not that unusual. Why isn’t this problem already solved?

Comments

dougbert, 2014-03-25:

Oh the memories....the memories of code bases past....... dougbert
Jason Ivey, 2014-03-27:

That is awesome. I believe I had the exact same idea at a previous company where we were both employed. It sounds like you were much more relentless than myself though. After discovering only a few of these parsing problems I decided it was a problem that I didn't want to solve with a regular expression. At the time I also looked at clang and gcc but came to a similar conclusion that once you were into the AST it was very difficult to do a replace on the original text. I'm excited to see what you had to do to solve the problem for so many functions. Jason
Daniel Hardman, 2014-03-27:

Yes, I smiled when I read your comment, because we both know other codebases where this would have been quite helpful. :-)
Daniel Hardman, 2014-03-27:

Hopefully they're not the kind of memories that wake you up at night, screaming... :-) We're making progress. Thanks for all the work you did to point us in a good direction.
λ, 2014-04-26:

You're very right about compilers in the last statement. I think that's very good idea and I'd like to hear some compiler guys opinions.
λ, 2014-04-27:

After some research I found this thread: http://gcc.gnu.org/ml/gcc-patches/2010-04/msg01465.html And it turns out that Wsuggest-attribute is already implemented in gcc (http://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html). Sadly, it's only about functions but not their arguments. Speaking of clang, there is a proposal http://lists.cs.uiuc.edu/pipermail/cfe-dev/2013-February/027816.html but I can't find any results of it.
Daniel Hardman, 2014-04-27:

Hey, thanks for the tip about Wsuggest-attribute! That is awesome, and I didn't know about it at all. I'm glad compilers are getting better.