Variadic macros tricks

Have you ever wanted to write a “for each” loop over all the args of a variadic macro? Or have you ever wanted to overload a macro on the number of arguments? (If you’re saying to yourself, “good grief, why?” — I’ll describe a use case at the bottom of this post.)

I learned how to do this today, and I wanted to blog about it to cement the technique in my own mind.

What happened when I decided to learn this technique. I’m trying to spare you… :-) Image credit: xkcd.com

Simple variadic macros

The first piece of magic you need to do something like this is __VA_ARGS__. This allows you to write macros that take an arbitrary number of arguments, using ... to represent the macro’s parameters:

Nice. __VA_ARGS__ is a standard feature of C99, and I’ve known about it for a long time. I’ve also known about GCC (and Clang’s) extension, which attaches special meaning to ##__VA_ARGS__ if it’s preceded by a comma–it removes the comma if ##__VA_ARGS__ expands to nothing. If I change my macro definition to:

…I can now call eprintf("hello, world"); without a complaint from the compiler.

But it’s not enough

That doesn’t let me do a “for each” loop, though. All the args that I pass are expanded, but I can’t do anything with them, individually. I have no names for my macro’s parameters–just the anonymous .

I went poking around, not expecting to find a solution, but I was pleasantly surprised.

Continue reading

How to make a const-correct codebase in 4300 easy steps

One of the codebases that I work on is theoretically C++, but if you peer under the hood, it looks more like 1990-vintage C. It’s 500 KLOC of almost purely procedural code, with lots of structs and few true objects. More dusty and brittle than I’d like.

Image credit: Tim Gillin (Flickr)

I am not a C++ bigot; I first began to do serious, professional coding in C, not long after this codebase got its start. I understand the C-isms pretty well. And although I think Linus got carried away in his rant about the ugliness of C++, I can appreciate the ways that lean C sometimes makes its descendant look ugly and inefficient. (Though C++11 and 14 are making this less true…)

This means that I don’t consider the C-like style of this particular codebase a fatal flaw, in and of itself.

However, since my early coding adventures I’ve been converted to the advantages of OOP for complex projects and large teams, and I’ve also accumulated a lot of battlescars around multithreading. Our codebase needs OOP to solve some encapsulation antipatterns, and it needs RAII and C++11-style mutexing in the worst way. Its old, single-threaded mindset makes far too many things far too slow and error-prone.

A few months ago, we decided to make an investment to solve these problems.

To do it right, I had the team add a story to our scrum backlog about making the codebase const-correct. And therein lies a tale worth recounting…

Continue reading

Small Files Are Your Friends

Yesterday I was discussing refactoring priorities with a colleague who’s a brilliant engineer, and I happened to mention my strong desire for smaller files in our codebase. I told him that I thought .h and .cpp (or .py or .java or .whatever) files with thousands of lines were a problem.

He asked me why.

He told me that he wasn’t opposed to the idea, but he always felt like it was more of a stylistic choice than a true imperative for good code. And he was curious to see if I could convince him differently.

After I pondered his question for a while, I realized that some of my opinion really is traceable to prejudice. I usually use IDEs instead of vim/emacs, and I think that promotes click-back-and-forth-and-hyperlink-in-many-little-files instead of open-a-big-file-and-scroll. My compatriots that are more console-centric are just as smart and effective–maybe more. So I’ll write that part off.

However, I also found some arguments for the small-file principle that feel more substantive. Small files are your friends.

More small friends. Photo credit: miguelandresen (Flickr)

Named scopes and cognitive complexity

The case for small functions is more discussed than the case for small files, and it has been made by almost every luminary in computer science. My colleague immediately conceded it, and I won’t repeat it here–but I will claim that many of the same arguments apply to files as well, because files as well as functions are an important named scope in software development. This in turn suggests some constraints on files with respect to cognitive complexity.

Studies of memory and human attention consistently demonstrate that we think best about small sets. This fact is reflected by the amount of detail visible within any given named scope, both in programming and in other thought tasks. How many top-level menus in the average application? Colors in most cultures’ divisions of the rainbow? Parameters in an easy-to-understand function? Sections in the average book store? Steps in easy-to-follow driving directions? (There’s a whole field called cognitive ergonomics that explores why these questions always have similar answers.)

How many functions should we put in a reasonable file?

For me, 2 or 5 or 10 feels tractable. 50 feels excessive.

If a “good function” also respects the cognitive complexity constraints of the human brain–not being too big to read in a screen or two, for example–then you end up with a reasonable upper boundary on file sizes of, maybe, 500 or 1000 lines. (See Steve Yegge’s insightful rant about code size being an engineer’s worst enemy. He focuses on codebase size, but much of what he says applies just as well at the next level down.)

I suppose that this argument is weakened by the features of some IDEs, which collapse tangential code blocks, display treeviews of functions, and support lots of hypertext-style navigation. But not all programmers use the same IDEs, and not all interactions with code are IDE-driven; file size remains relevant. There’s a reason why C# created partial classes to improve on java’s lump-it-all-in-a-single-file constraint…

When humans try to remember more than their brains can fit, stuff falls out. Big files mean that coders have to mentally model relationships between stuff that’s separated by way too much screen real estate. This is a recipe for bugs. It is also a serious impediment to learnability.

Loose coupling and encapsulation

Files are a natural unit of coupling. In most programming languages, you can declare a construct (a variable, an internal function, or class) within a file, and have that construct be invisible to the outside world. This means there is a built-in temptation for functions and classes to bind more tightly when they’re in the same file, because they have access to common but private knowledge. By breaking large files apart, you remove the temptation, break unnecessary dependencies, and promote looser coupling.

Another way to say this is that file boundaries are an encapsulation barrier. Use them to hide data. (See my recent post about encapsulation as a simplicity strategy.)

Code reuse and testability

A consequence of files hiding data is that when you have a function that might be useful in a dozen different modules, but the function is buried in a large file with lots of dependencies extraneous to that function, reuse and testability are both frustrated. If the function is in a file of its own, it’s more discoverable, and it’s reusable and testable without extra baggage.

Link optimization

A C/C++ corollary to the file boundary issue has to do with linkers and binary sizes. In many cases, linkers remove unused functions at compilation unit level, rather than at the individual function level. A .c or .cpp file is either in or out, as a unit. This means that if you have a .cpp file with 50 functions in it, and you call only 1 of them, all 50 get linked into the final binary. The result is bloated binaries. So: smaller .cpp files ==> smaller binaries. (Before you flame me about linker optimizations, I will admit that some linkers get more granular, depending on which switches you use. But it’s surprising how hard it is to do better than what I’ve described. Experiment and comment with your results.)

Counter Argument

I suppose you could argue that by making lots of small files, you’re creating more complexity in directories, in makefiles or projects, and so forth. Is 250 files in a folder worse than 15? Doesn’t that violate the “cognitive complexity” guideline above?

My comeback is: use packages or subdirectories or libraries (another level of management). You can’t subdivide forever, but you don’t need to.

The bottom line for me is experiential, not theoretical. I nearly always have cruddy experiences in code bases where large files are common. Small files don’t guarantee pleasant and productive work, but big ones seem to go hand-in-hand with other problems. I find it telling that codebases with big files are also codebases where people lament the lack of comments the most, for example. Over the years, I’ve become convinced that a simple rule of thumb about keeping files small will pay off more handsomely than almost any other coding best practice.

Action Item

Leave a comment to tell me what you think. Am I making a mountain out of a molehill? Or do you feel strongly about small file sizes as well? Have I omitted any important pros and cons from the discussion?

Don’t forget the circuit breakers

Recently I’ve been pondering an interesting book called Release It!, by Michael Nygard. It’s full of anecdotes from someone who has spent a major portion of his career troubleshooting high-profile crashes of some of the most complex production software systems in the world–airline reservations, financial institutions, leading online retailers, and so forth.

circuit breaker

A circuit breaker. Photo credit: Wikimedia Commons.

One design pattern that Nygard recommends was new to me, but it rang true as soon as I saw its description. Like many classic patterns, I’ve implemented variations on it without knowing the terminology. I like Nygard’s formulation, so I thought I’d summarize it here; as I’ve said before, good code plans for problems.

The pattern is called circuit breaker, and its purpose is to prevent runaway failures.

In systems without circuit breakers, failures in an external call may cause an exception on the caller’s side; this can cause the caller to log, retry, and/or execute other specialized logic. Since errors are supposed to be the corner case, the blocks of code that handle them are often expensive to execute. The very slowness of the error-handling codepath can be the source of further failures, because locks are held longer than normal, or because we poll until a connection is restored, overwhelming a system that’s already limping.

Or, to borrow an old idiom, “it never rains but it pours.”

In the circuit breaker pattern, on the other hand, the caller assigns each “circuit” (a codepath that invokes an external entity) to one of three possible states: closed, open, or half-open. Continue reading

Put Your Const Foot Forward

Here are two C++ style habits that I recommend. Neither is earth-shattering, but both have a benefit that I find useful. Both relate to the order in which constness shows up in your syntax.

1. When you write a conditional, consider putting the constant (unchangeable) value first:

if (0 == i)

… instead of:

if (i == 0)

The reason is simple. If you forget/mis-type and accidentally write a single = instead of two, making the expression into an assignment, you’ll get a compile error, instead of subtle and difficult-to-find misbehavior. (Thanks to my friend Doug for reminding me about this one not long ago.)

Ah, the joys of pointers… Image credit: xkcd.

2. With any data types that involve pointers, prefer putting the const keyword after the item that it modifies:

char const * VERSION = "2.5";

… instead of:

const char * VERSION = "2.5";

This rule is simple to follow, and it makes semantics about constness crystal clear. It lets you read data types backwards (from right to left) to get their semantics in plain English, which helps uncover careless errors. In either of the declarations of VERSION given above, the coder probably intends to create a constant, but that’s not what his code says. The semantics of the two are identical, as far as a compiler is concerned, but the first variant makes the mistake obvious. Reading right-to-left, the data type of VERSION is “pointer to const char” — so VERSION could be incremented or reassigned.

Use the right-to-left rule in reverse to solve the problem. If we want a “const pointer to const char”, then we want:

char const * const VERSION = "2.5";

That is a true string literal constant in C++. (Thanks to my friend Julie for teaching me this one.)

This might seem like nit-picky stuff, but if you ever get into const_iterator classes and STL containers, this particular habit helps you write or use templates with much greater comfort. It also helps if you have pointers to pointers and the like. (For consistency, I prefer to follow the same convention for reference variables as well. However, this is not especially important, since references are inherently immutable and therefore never bind to const.)

Action Item

Share a tip of your own, or tell me why you prefer different conventions.