One of the codebases that I work on is theoretically C++, but if you peer under the hood, it looks more like 1990-vintage C. It’s 500 KLOC of almost purely procedural code, with lots of structs and few true objects. More dusty and brittle than I’d like.
I am not a C++ bigot; I first began to do serious, professional coding in C, not long after this codebase got its start. I understand the C-isms pretty well. And although I think Linus got carried away in his rant about the ugliness of C++, I can appreciate the ways that lean C sometimes makes its descendant look ugly and inefficient. (Though C++11 and 14 are making this less true…)
This means that I don’t consider the C-like style of this particular codebase a fatal flaw, in and of itself.
However, since my early coding adventures I’ve been converted to the advantages of OOP for complex projects and large teams, and I’ve also accumulated a lot of battlescars around multithreading. Our codebase needs OOP to solve some encapsulation antipatterns, and it needs RAII and C++11-style mutexing in the worst way. Its old, single-threaded mindset makes far too many things far too slow and error-prone.
A few months ago, we decided to make an investment to solve these problems.
To do it right, I had the team add a story to our scrum backlog about making the codebase const-correct. And therein lies a tale worth recounting…
The naive approach
At first, the team assumed that I wanted a human being to manually inspect every parameter to all ~4300 functions in the whole codebase, adding “const” wherever it belonged.
They weren’t thrilled. Although most were too polite to say so, they probably questioned the value of the exercise, privately wondering some of the same things that are articulated in this StackOverflow post. They complained that as soon as they tried to fix const in one place, it had a domino effect that caused endless ripples elsewhere.
I convinced them to give it a shot with two arguments:
- The code could not be robustly multithreaded until we knew which functions modified which state–and thus, where mutexes were needed. The best way to codify that knowledge is to use const, because it’s terse, standard, and enforced by the compiler.
- Perhaps more importantly, I knew a way to upgrade the whole codebase, with near 100% accuracy, with very little effort. And I could solve the ripple effect.
The clever approach
My idea for automated const-fixing was simple, I thought. I would write a python script that embodied the following algorithm:
- Use doxygen to generate call graphs for the whole codebase.
- Given a whole-codebase call graph, identify all “leaf” functions (functions that are only called, but are not themselves callers, of anything else in the codebase).
- For each leaf, find all function prototypes, including the one associated with the implementation.
- For each prototype on a given leaf function, iterate over its parameters. On parameters where “const” might make sense but where it is not present, add “const” and see if the code still compiles and all unit tests still pass.
- Prune all leaf functions from the call graph and repeat until no functions remain.
Turtles all the way down
I was hoping that after I described this straightforward algorithm, an enthused team member would volunteer. Nobody spoke up.
“Okay, I’ll do it,” I said. “I ought to be able to get that done in a few days.”
Famous last words…
At first, I made steady progress. The skeleton of the script was done after an hour or two of coding.
Adding reliable “roll back a change that turned out to be invalid” logic took me another hour or so to code and test. And adding parameter fuzzing (so that a parameter declared
Foo * foo in one place,
Foo *myfoo in another, and
::Foo * in another, would all be seen as semantic equivalents) took another hour.
Then I started cautious experiments to see how well my script was doing.
I already had regex-based python code that scanned code for function prototypes. Having done this sort of thing before, I knew that I was going to miss a few subtleties. I knew I’d have to handle inline
/* ... */ and
// ... comments, and maybe an
#ifdef here or there. I knew there were a handful of macros in the codebase, but I was pretty sure there was nothing truly insidious. I figured I’d get at least 95-98% accuracy pretty quickly, because the formatting was very regular. Because I could rely on the compiler to tell me if something broke, and because I had a robust rollback mechanism, the worst that could happen if I mis-parsed something was that I’d fail to add a “const” that might have been theoretically possible.
Here are few of the gotchas that I didn’t anticipate, but that in hindsight should have been obvious:
- The imperative to minimize diffs
- When I rewrote code, my first attempt reformatted the function prototype slightly. I dropped extra whitespace, and I didn’t keep the original linewraps and indents. I figured this would not be a problem. However, on closer inspection, I realized that a lot of prototypes looked like this:
The problem was that if I did a dumb rewrite of the function, I’d end up dropping the helpful comments. Also, my naive implementation drastically increased the noise in the diff, which was a big problem since we maintain half a dozen active branches of code for old maintenance releases, and depend on automatic merges to make bug fixes flow with minimal human intervention. It took me another hour or so to rewrite my “prototype rewrite” code so that it made the minimal possible change to alter constness.
- Function pointers
- Some function pointer declarations (as parameters, or as typedefs) gave my parser fits. I had to beef up my “function_decl” regex several times before I stopped getting false positives.
- Sometimes a function was declared with one datatype, but defined using a typedef’ed alias. This caused my parameter correlation between the two instances of the prototype to fail. I ended up hard-coding my way around one particularly nasty instance of this. (Why not just fix the code, you ask? Excellent question. Remember my mandate to minimize diffs? Darn inconvenient…)
- Function overrides
- What if the same function name had more than one possible set of parameters? I couldn’t just change the constness of the third parameter to
CalculateIdealWidgetSize()blindly; I had to recognize clumps of functions that had a common parameter profile.
- Mocks and stubs
- This codebase includes a bunch of google test and gmock stuff. Each procedural function has its own testrunner, which links in a library of stubs for all the other functions in the codebase. This means that any given function in the codebase typically has 2 implementations (one real, one mocked), and often several different declarations. The mock declarations are in a totally different style from normal declarations. Some examples of mock declarations will give you a flavor:
I had to not just find and update ordinary function prototypes, but all their mocked equivalents as well, in whatever mocking variation they might appear. And I still needed to be able to correlate all the variants so I could make a coherent change across all instances of a particular function.
- External APIs
- At first, I was attempting to change every function prototype that I saw. However, I realized after a while that some
externed function declarations referred to functions exposed by dynamically linked libraries. The code would compile when I changed the
externed declaration, and tests might even pass–but at run-time in production, I might fail to link the function I wanted. I modified my script to only update prototypes where I could find at least one definition; if all I saw was declarations, I left the functions alone.
- Doxygen bugs
- A week or so into the project, I realized that doxygen’s call graph was inaccurate. I found one doxygen setting that I could tweak, to increase the depth of call tree analysis–but even after bumping up that limit, some of the function relationships that I needed to understand were just missing. I chalk this up to subtleties like call-through-function-pointer, macros that obscured some function calls, and so forth.
- Recursion and mutual dependencies
- My tidy algorithm assumed that the overall call tree for the application was a directed acyclic graph. In fact, later analysis showed that I had 59 functions that were directly recursive, and probably a lot of other ones where recursion was indirect. I was able to detect this recursion, and artificially break it, in many cases. However, I found that about 20 iterations into my algorithm, I ran out of “leaf” functions with about 1000 functions still remaining to analyze. These functions all called some other function in the remaining graph, due to dense, unfortunate coupling. I brute-forced my way through these, perhaps sacrificing some const fixups that might have been possible if I’d been willing to analyze changes to a clump of functions at a time, instead to a single function only.
Partway through the project, a colleague suggested that I abandon the idea of parsing C/C++ with regular expressions, and switch to use a true grammar. I told him I had considered it. I knew of ply, which is a python implementation of lex/yacc. I’d noticed, last time I checked, that it shipped with a C grammar. We also discussed the possibility of using a compiler plugin for GCC or CLang. Both of these let you hook into the compiler after an abstract syntax tree has been generated, to inspect and react to what the compiler understands from the code.
I ended up discarding this approach. Working from an AST makes “reading” the code easy, but it doesn’t help write modifications. And the seemingly simple “function declarations” in the mocking layer actually resolve to very complex classes by the time google test/gmock’s macros are expanded; making heads or tails of them is not for the faint of heart. However, my colleague successfully built a GCC plugin for a slightly different code analysis task, and it seems promising for certain use cases. Not having to parse code yourself is definitely an attractive option, if it applies.
So fast-forward a couple calendar weeks. It didn’t take me that much coding time to get this system working, but I was multitasking, trying to be a manager and a point of contact for the press, and interfacing with execs and product management… My coding time was pretty fragmented.
I finally got the code working pretty well, and I let it run…
… and run …
… and run …
Since I was validating my changes by doing a full compile and test cycle between each one (and an extra cycle at the end of each rollback, to guarantee that the code was copacetic), it took quite a while to crunch through this codebase. I let it run for probably a total of about 200 hours, during which time I analyzed all 4300 functions in the codebase, one by one.
I am proud of the outcome. Altogether, I think this script found about 2000 functions where one or more parameters should have been const but were not. About 4500 parameters were changed in those 2000 functions, generating about 7000 lines of changes. The day after I pushed my massive change set, the behavior of the codebase on our regression suite was unchanged.
It felt pretty awesome to check back periodically, and see the results of hours (and hours and hours) of analysis that no human being had to wade through. As Hannibal would say, “I love it when a plan comes together.”
The script wasn’t perfect, though. I estimate that maybe 50 functions were missed due to parsing subtleties. A few prototypes didn’t match my regular expressions, and some false matches occurred as well. I also ended up ignoring the constness of member functions; there are only a handful in the whole codebase, and analyzing those would have required entirely different codepaths that I didn’t have time to write. I suspect there are other fixes lurking, if I could untie the gordian knot of dependencies that makes some functions impossible to analyze independently.
The code for this “const_fix” python script is available on github, in case it’s useful to anybody. It’s not very well genericized; I make assumptions about the organization of the files and folders within the codebase, and it’s hard-coded to email me if the script crashes. But you might be able to adapt it in an hour or two, if you have similar codebases that are scons/make/autotools-centric. I have since used the script on a second codebase, and the adaptation took me about an hour.
The moral(s) to the story
Stepping back from the details of this experience, I draw a few general conclusions:
Never get involved in a land war in Asia, as Vizzini would say. (Or in other words, don’t invade unknown territory in your code, unless you’re assuming some blood, sweat, and tears. :-)
On the other hand, courage counts. A festering problem finally got fixed, because I was crazy enough to try.
It is a crying shame (and a glaring irony) that it’s so hard to code my way to a productive manipulation of … code. Programmers spend their whole careers validating the proposition that it’s worthwhile to enable computers to do grunt work, so humans can focus on more interesting tasks. Yet the artifact that programmers produce is so plagued by inconsistencies, so susceptible to bugs, and so complex to process that it takes a major investment to automate working with it. I am going to fix this in the programming ecosystem that I create.
An imperfect solution still made a huge difference. Sometimes perfect is the enemy of good enough.
Coding something right is cheaper and a whole lot less hassle than fixing an antipattern once it’s firmly entrenched. We’ll never get away from refactoring (nor would we want to)–but it pays to establish good habits of code hygeine, early, instead of after you write 500,000 lines of gunk.
Compilers ought to offer a “this should be const” warning, so this work would have been unnecessary. I googled; there isn’t any such option that I could find. Yet my need is not that unusual. Why isn’t this problem already solved?