Good Code Is Named Right

(Another post in my “What is ‘Good Code’?” series…)

A rose by any other name may smell as sweet, but in software, the names you choose have consequences.

Rosa berberifolia. Photo credit: I Believe I Can Fry (Flickr).

Names can confuse or cohere. In The Mythical Man-Month, Fred Brooks emphasizes the need for code to have “conceptual integrity.” He means that code that should embody a unifying and consistent vision, with minimal distraction or dissonance. Names of classes, functions, applications, interfaces, resources in RESTful URLs — all are a reflection of the code’s cohesiveness or its chaos.

I once worked with an engineer who liked to pull variable names out of the random hopper at the top of his brain: “apple”, “banana”, “ick”… Although his code provoked an occasional snort of amusement, it didn’t do much to guide later readers into a productive mindset.

One way I can distinguish a mediocre engineer from a great one is by the quality of their language–particularly, the names she or he chooses. Mediocre engineers are sloppy and inconsistent in their names, because they undervalue the way their code communicates to human beings. Mediocre engineers think that comments are for humans, and code is for computers. Code, like java or C++ or ruby, doesn’t communicate to computers at all, folks; it has to be turned into op-codes and 1s and 0s before a computer can use it! Code is human language. Comments are like parenthetical asides in normal human speech — needed occasionally, but annoying if they restate the obvious and distract from flow.

Good engineers understand this. It bothers them if something is called a “Controller” in the code, but it fails to implement IController. It bothers them if .ReadLine() doesn’t always read a line of text from a file; when they run across such a function, they are prone to rename it ReadUpToAFullLine() so the function’s semantics are obvious. If they implement a method that calculates a standard deviation, they are likely to name it something like calcStandardDeviation() instead of stdv() or calc(). (This is not about naming conventions, BTW. I don’t have a problem with short forms or whatever casing convention you prefer; I’m just emphasizing clarity.) Code from great engineers says what they mean, and means what they say. Notice how Martin Fowler (a great engineer) takes this for granted as he discusses an appropriate name for a class in Refactoring:

Does the price class represent an algorithm for calculating the price (in which case I prefer to call it Pricer or PricingStrategy), or does it represent a state of the movie (Star Trek X is a new release). At this stage the choice of pattern (and name) reflects how you want to think about the structure. At the moment I’m thinking about this as a state of a movie. If I later decide a strategy communicates my intention better, I will refactor to do this by changing the names.

Somewhere (maybe Scott Meyers?) I remember reading an expert’s lament about people naming classes FooManager, BarManager, etc. His point was that “Manager” says little or nothing about the class’s responsibilities. I agree (although I must admit I’ve written a few XManager classes in my time :-).

Truly great engineers take the language insight of good engineers one step further. Not only do they want clear and consistent names–they want their code to resonate to a unifying metaphor.

In the early days of ecommerce (I was writing CC processing stuff in about 1996), nobody talked about “shopping carts.” You just wrote code that accepted credit cards, and you kept track of what the user wanted to buy until they were ready to pay. You accumulated customer state in your session, or maybe your db, in whatever way you could cobble together. Messy. Once the shopping cart metaphor was introduced, it was easy to see how you could let a customer change quantities at the last minute, handle partial payments with different cards, apply discounts and coupons, and so forth.

The power of metaphor in code is so pervasive that it may be invisible unless you’re looking for it. Good metaphor leaks from coders to their managers and marketers and support staff and tech writers–and because it explains so much, so clearly and concisely, the audience gloms onto it immediately. From there it leaks out to customers and the blogosphere, and we start taking it for granted. Which says more to you: “a software application that lets you pretend to be running a full OS with simulated hardware” or “virtual machine”? How about “self-replicating program that subverts the normal purpose of software” or “virus”?

Action Item

Find a place in code where comments are compensating for a class, function, or variable with a less-than-ideal name, and fix it.

Extra Credit

Find a place in code where you have a weak or inconsistent metaphor. List implications of that metaphor problem. Brainstorm improvements; if one of the improvements seems particularly helpful, implement it.

Decoupling Interfaces as Versions Evolve, Part 2

This is part 2 of a series. You can read part 1 and part 3 as well.

Alternative Approaches to Interface Versioning

Lublinsky wrote a great article about interface versioning a while back (see page 38 of this issue of Microsoft’s Architecture Journal). This describes the state-of-the-art thinking about interface versioning in the web services world. Essentially he recommends versioning each method in an interface separately. (Sounds a lot like Win32’s approach of adding …Ex to every function when the original behavior no longer sufficed…) This approach is based on the insight that many parts of an interface will be stable for long periods of time, and that the most common kind of change to an interface is an addition. By increasing the granularity of the versioning, incompatibilities are less likely to arise for spurious reasons. This solves the classic problem where a .wsdl describes a dozen classes, a client uses only the first three, and yet the client breaks when something in the fourth class changes. However, it proliferates .wsdls and points of presence.

Another important discussion of this issue is “A SOA Versioning Covenant”, by Rocky Lhotka. This is an excellent review of the problem. (Note that the Lublinsky article, which is newer, discusses the covenant idea briefly.) Essentially Lhotka recommends that all objects accept messages (parameter lists to functions, recast as documents or self-contained packages of information); since each logical function will always have the signature DoSomething(message), the need to version interfaces goes away as long as changes just involve new message types. Instead, the messages are versioned using schema capabilities. Lhotka further recommends changing from contract-oriented thinking (X is required) to a covenant (If you do X, I will do Y). This approach has some of the same benefits as the invention, but it still relies on versioning a full interface rather than the subset someone wishes to use, and the difficulty of managing versions of messages is ignored.

Although both of these treatments (and the sources they cite in their own reviews of the problem) are nifty, they leave me unsatisfied. The bottom line is that I want to evolve interfaces whenever it makes sense, without worrying about breaking people — and I also want people who use my interface to be able to do so with confidence.

Tune in to part 3 of this series for my proposed solution.

Decoupling Interfaces As Versions Evolve, Part 1

This is part 1 of a series. You can read part 2 and part 3 as well.

The Goal

Software interfaces were invented to promote encapsulation and loose coupling. In theory this enables developing and deploying without undue interdependence, which is a very good thing.

“Why the ‘in theory’ caveat?”, I hear you saying. “Surely interfaces deliver on their promise…”

Well, yes and no. Interfaces certainly provide a nifty mechanism for information hiding if your scope of concern is a tidy programming problem over the horizon of one implementation. That’s just the sort of scenario that CS academics love to use to teach their acolytes.

But most commercial software development is done in a messier world. Versioning interfaces can cause enough headaches to water down their benefits considerably, and mainstream software development tools have not done enough to address the issue.

Immutability and Versioning

Current thinking on interface versioning calls for an interface to be immutable; each change to its semantics (as manifest in an .idl, a .h, or a .wsdl, for example) should cause a change to the interface number/name/guid. Consumers of an interface bind to a specific interface version to allow compile-time validation of interface usage. Modern IDEs typically leverage early binding to provide extra goodies like autocomplete, UML class diagrams, and doc comment generation.

This immutability is less than perfect. In non-ivory-tower development, it is common to alter the semantics of an interface dozens or hundreds of times during a given dev cycle as a team converges on the final implementation. Bob adds the DoNothing() and DoSomething() functions to IWidget on day 1, then realizes a week later that he also needs DoSomethingElse() for a corner case he hadn’t fully explored. On week 23, he decides to collapse the DoSomething functions both to DoSomethingEx() because by then the differences between them feel like they should be generalized.

If all code were written by Bob as part of a single cohesive deliverable, this evolution would be uninteresting. But suppose that on week 15, Sally gets a snapshot of Bob’s .idl, and begins to build a new component to interact with IWidget. It is critical that Sally’s expectations about IWidget semantics line up with Bob’s.

What makes this ugly is that in today’s highly distributed, highly oursourced, complex projects, Bob may not actually know that Sally is using his .idl. He may think it’s okay to keep cheating on interface immutability. Either Bob has to be obsessive about versioning his interface with each change — ending up with IWidget497 by the end of the project — or else Sally is forced to communicate with Bob that she is using his interface and needs it to be stable. Neither alternative is very attractive.

Evolution Isn’t Always Forward

Best practice is usually to require that IWidget5 be a strict superset of IWidget4. Despite enthusiastic lip service, practical considerations force us to cheat here as well. A security vulnerability forces us to start encrypting the string we return from a function. A change to the underlying OS forces us to throw an exception on a function that used to be exceptionless. Over time the assumptions about semantics attached to an interface accumulate enough drift that it is impractical to ever treat an IWidget9 as an instance of IWidget2. How does Sally know when that threshold has been passed by Bob?

And What About Deployment and Upgrade?

If you want to tease out mistakes in interface versioning, just poke at the deployment and upgrade scenarios you’re going to support. Do you require that a central manager be at least as new as all the components it’s managing? Or worse, do you require the whole system to be at the same revision level? In theory, this should be unnecessary; producers (managees) are free to expose functionality in new interfaces that older consumers (managers) don’t know about, and consumers can progressively downcast until they find a mutually supported interface, so it ought to be possible to have free variation in versions. However, in practice in rich, interdependent fabrics of services, the same actor may simultaneously provide one interface while consuming another, and the intermingled dependencies often cause ISVs to force broader upgrades than a customer would like. My favorite recent, real-world example of deployment problems is the infamous IE7 dwmapi.dll problem (see also this useful discussion of the problem).

Traditional Approach – Pros and Cons

What Can Be Done?

So if interfaces don’t provide as much separation of concerns as we wish, how do we cope?

Well, one alternative to traditional interface versioning is to do “late binding”. Only the most general characteristics of language syntax are validated when code is written; whether a particular object has a particular property of a particular data type is not tested until code actually executes. This is how interpreted languages like Python, PHP, and javascript work. It provides tremendous flexibility, and it is often the solution of choice in the free-wheeling, ad hoc universe of general web apps. I am a big fan, in many cases. I love the way RESTful interfaces support ad-hoc connections, for example.

But late binding is not a panacea. For one thing, late binding typically means that development tools can’t help you validate your usage very much. You end up writing and maintaining a lot of manual glue. For another, QA teams often push back against late-bound solutions because it increases the testing burden. Where a compiler could effectively validate millions of potential code paths at compile time for early bound code, testers struggle to achieve similar coverage. Result: bugs discovered later in the process. There is also a cost in performance and robustness that typically deters ISVs building standard enterprise or consumer applications.

There are subtler costs as well. When you late bind, you still have to use the interface you ultimately invoke, and the knowledge about how to use it has to be baked into the code ahead of time. It may not be baked in in the same way — maybe you use reflection or GetProcAddress to find the DoSomething function you’re after — but to late bind an interface, you have to early bind all the logic that handles the cases where GetProcAddress fails.

Another disadvantage of late binding is that you introduce a new dependency — this time on the supporting infrastructure. Maybe you’re using a great SOAP toolkit for PHP and that toolkit makes it easy to late bind to a web service. But now you depend on your SOAP toolkit. What if another actor in your system doesn’t have the same version of the toolkit?

What we’d like is a mechanism that combines the predictability and robust tool support of the traditional approach to interface versioning with the flexibility of late binding to get the best of both worlds. In part 2 of this series, I’ll look at some approaches to that goal, and discuss why they still leave me unsatisfied. In part 3, I’ll offer my own solution.