I claim that by eliminating the C/C++-style dichotomy between headers and implementation, most modern programming languages have thrown out the baby with the bathwater.
If that sounds crazy, just hang with me for a minute.
I know my claim runs counter to popular wisdom; have a look at this thread on stackoverflow.com. Designers of languages like python and go and D and ruby and java consider it a feature that developers don’t have two redundant pictures of the same functionality. This comment from the C# 5.0 specification is typical:
“Because an assembly is a self-describing unit of functionality containing both code and metadata, there is no need for #include directives and header files in C#. The public types and members contained in a particular assembly are made available in a C# program simply by referencing that assembly when compiling the program” (p 3).
Bad headers are a royal pain
It can be onerous to maintain the parallelism between a .h and a .cpp. And most C/C++ headers are managed so poorly that the benefits you might claim for them are theoretical rather than real. Three common antipatterns that I particularly detest:
1. Putting everything in one monster header.
This couples all details of the system together in a single giant hairball. It may be fine for a project with 2 or 3 classes, but for dozens or hundreds of classes, it’s a major problem, and it violates the small file rule.
2. Writing function prototypes without any parameter names, because it’s less typing:
int do_something(Foo *, bar *, char const *);
This is wrong-headed. It means that the only way you can understand how to call the implementation is to study the implementation itself, instead of just reading the header.
3. Mismanaging #includes.
When you #include stuff that a consumer of a header doesn’t really need, just for “convenience”, what you’re really doing is artificially coupling the system (see problem #1), making your build more fragile, subverting incremental builds, and making compile time longer.
When you leave out #includes because the .cpp files already #include what they need, you are making it harder to trace dependencies. If you ever hear that header X must be #included before header Y, you’re suffering the consequences of this antipattern.
What headers could be good for
One of the insights often attributed to Guido van Rossum (inventor of python) is that code is meant to be read–by humans. This insight is enshrined in the zen of python (by Tim Peters) with the pithy statement that “Readability counts.” It reflects the same sentiment articulated by Martin Fowler, who said, “Any fool can write code that computers can understand. Good programmers write code that humans can understand.” And it echoes the observation of the venerable C.A.R. Hoare: “The readability of programs is immeasurably more important than their writeability” (Hints on Programming Language Design, 1973).
These are not dumb guys.
So let me ask: Which is easier for a human to read and understand–a 50-line header, or a 800-line implementation?
This is the the first baby that’s being thrown out with the bathwater. Think progressive disclosure: headers could dramatically simplify what a consumer of code has to wade through. If they worked right. If.
You might say that modern IDEs make this a non-issue. When you open an 800-line .java file, you get a treeview with all the methods in the class, and you can sort and filter them in any way you like.
I don’t buy it. The default views don’t hide details that are irrelevant for the consumer of the code, because they’re assuming you want to work on the implementation. That’s the file you opened, after all. You see all the private methods and members, all the nested inner classes, all the gobbledygook.
But let me ask another question: In java or D or C#, if you make an innocuous change (say, renaming a private member or fixing a typo in a comment), who has to know?
The answer is: everybody!
Consumers of your code are bound to the code’s implementation, not its fixture. Because those are not separate constructs, distributed builds are impractical, and SDKs for these languages must test the consumer’s compliance against compiled binaries.
You might say that since these languages compile so much faster than C++, recompiling has become painless. Fair enough. You might say that with JIT-compilation or interpreted languages, you can defer this problem until it goes away. But neither solution helps you if you have IP you must protect. For a C++ codebase, it’s possible to provide headers to a third party without giving away patents and trade secrets. In languages without headers, you’re up a creek. The best you can do is obfuscate, which may not satisfy the paranoid or the government regulator worried about export controls.
How headers ought to work
I can think of a way to have the best of both worlds: let implementers stop worrying about headers, and let consumers stop worrying about implementation:
Generate the headers.
Every time a compiler processes code, have it generate from the implementation a pure, simple interface that consumers can read. This is the basic idea behind Lazy C++, but if I were writing my own programming language, I’d take it much further:
- Have the compiler produce an “etag” or version stamp that unambiguously hashes the relevant header content, so consumers can identify a version to which they are bound. This etag should depend only on important details, not on comments or parameter names or other stylistic variations.
- Before replacing the old version of the header, have the compiler compare function/class signatures in old and new to see if compatibility has been broken. Distinguish between incremental additions (new functions don’t break compatibility with old clients) and changes (renaming a function or removing a parameter). Besides writing out an etag for the header, have the compiler write out an incremented version number, plus the oldest version number of the header that existing consumers would still be compatible with.
- Compile the headers (not just the impl) into the final binaries to facilitate semantic versioning.
- Make sure the language can identify preconditions, postconditions, and invariants unambiguously, so these can be documented (automatically) in the header.
- Distinguish between public and private comments (using something a little slicker than javadoc), so that comments from the implementation can be carried across to the header as well.
I’m not sure headers like this would be worth creating for python, which often has code that’s so simple it doesn’t need a summary. But for large, complex codebases, I think this would be a real boon. If you’ve ever had to wrestle your way through half a million lines of C or java or C# without good headers, I’m guessing you know what I mean.