Lacunas Everywhere

I’m told that in Czech, the word “prozvonit” means “to call a mobile phone and let it ring once so that the other person will call back, saving the first caller money.”

Image credit: AstridWestvang (Flickr)

How would you translate this word to someone in New Guinea who has never experienced electricity, let alone a telephone or a bill from Verizon? You wouldn’t. This is an example of a “lacuna“–a translation problem caused by semantic gaps in a target language. Lacunas occur in programming languages. You might know a few; maybe you wish C++ had python-style generators–or that Java had Haskell’s notion of pure functions–or that C supported PHP-style string interpolation. But what if I told you that semantic misalignment between any pair of programming languages is just minor details? What if I claimed that all programming languages I’ve used have numerous, pernicious, and expensive semantic gaps? That we don’t see these gaps for the same reasons that a stone-age hunter-gatherer fails to notice his inability to discuss patterns of cell phone usage? Would you think I’m crazy? Continue reading

Encapsulation isn’t just for code

When computer science folks talk about encapsulation, they are usually thinking of how the principle applies to objects and functions inside a codebase. Best practice calls for a separation of concerns–each object responsible for one type of work, hiding all details from its neighbors.

That’s great. But it’s not the only way encapsulation ought to show up in software.

In actual deployment, software packages often manifest anti-patterns in the way that they are configured. A web server has to know all about three different database servers that contribute data for its pages; HA failover scripts must know the identity and responsibility of every actor in the system, as well as many particulars about how these entities use resources to accomplish their tasks.

No wonder our deployments are fragile and high-maintenance…

The cloud computing wave is raising the bar for encapsulation in the way applications–not just objects–discover and interact with one another. In this week’s installment of my series of posts about how to “cloudify”, I discuss how role-based interactions insulate components from details they don’t need to know. It’s encapsulation all over again. And this encapsulation pattern manifests itself in unlikely places–like the order queue at McDonald’s…

What can McDonalds teach a developer of cloud-friendly software? photo credit: phogel (Flickr)

Stay tuned for further installments of this series each Friday. As I said in Part 1, I believe that a competence with cloud–cloud-oriented programming, if you will–will be a checkbox on future tech resumes.

2 Surprising Truths About The Iron Triangle

Project management 101 teaches that, when managing outcomes, you cannot alter scope, schedule, or cost (resources) without affecting at least one of the other dimensions. This interrelationship is known colloquially as the “Iron Triangle.” Sometimes we put “quality” in the middle to show how it is unavoidably shaped by choices on the other constraints:

Image credit: John M. Kennedy T (Wikimedia Commons)

Lots of Dilbert cartoons derive their humor from the unwillingness of the Pointy Haired Boss (PHB) to acknowledge this relationship. These cartoons are funny because they are so eerily similar to conversations we’ve all had, where someone wants us to deliver ultra-high quality, on a limited budget, in an aggressive timeframe, with a boatload of features.

It ain’t gonna happen, folks. We engineers are clever, but we’re not magicians. Triangles don’t work that way.

You’ve learned some good principles when you can articulate this geometry lesson.

But there’s more.

Truth 1: Scope is a trickster

Many well meaning managers and executives understand this trilemma, and they distance themselves from Dilbert’s PHB by acknowledging that something has to give. “I pick scope,” they’ll say. “We absolutely must have the product before the summer doldrums, and we only have X dollars to spend, but I’m willing to sacrifice a few features.”

This can give product management heartburn–feature sets sometimes hang together in ways that make slicing and dicing dangerous. An airplane that’s good at takeoffs but that can’t land is unlikely to be a commercial success. Good product managers will point this out, and they’ll be right.

Continue reading

Good fences make good neighbors

In Robert Frost’s poem, “Mending Wall”, two farmers meet each spring to rebuild the rock wall between their properties. One farmer is the narrator. He notes that the unseen forces of winter and weather always cause some decay (“something there is that doesn’t love a wall”), and he wonders why the wall is necessary. There’s apple orchard on one side, and pine forest on the other–it’s not as if something will be kept in or out. The other farmer answers with the repeated aphorism “good fences make good neighbors.”

photo credit: DragonWoman (Flickr)

This poem could be a treatise for the principle of encapsulation in software. In software as in life:

  • Something there is that doesn’t love a wall.
  • Good fences make good neighbors.

What doesn’t love a wall?

Subroutines, formal interfaces, data hiding, class hierarchies, the pimpl idiom, and similar mechanisms all create barriers in software between consumers and providers of functionality. These techniques are well known, but we still have codebases littered with protected data members, unnecessary class declarations in headers, goto, and other suboptimal choices.

Why? Continue reading

Small Files Are Your Friends

Yesterday I was discussing refactoring priorities with a colleague who’s a brilliant engineer, and I happened to mention my strong desire for smaller files in our codebase. I told him that I thought .h and .cpp (or .py or .java or .whatever) files with thousands of lines were a problem.

He asked me why.

He told me that he wasn’t opposed to the idea, but he always felt like it was more of a stylistic choice than a true imperative for good code. And he was curious to see if I could convince him differently.

After I pondered his question for a while, I realized that some of my opinion really is traceable to prejudice. I usually use IDEs instead of vim/emacs, and I think that promotes click-back-and-forth-and-hyperlink-in-many-little-files instead of open-a-big-file-and-scroll. My compatriots that are more console-centric are just as smart and effective–maybe more. So I’ll write that part off.

However, I also found some arguments for the small-file principle that feel more substantive. Small files are your friends.

More small friends. Photo credit: miguelandresen (Flickr)

Named scopes and cognitive complexity

The case for small functions is more discussed than the case for small files, and it has been made by almost every luminary in computer science. My colleague immediately conceded it, and I won’t repeat it here–but I will claim that many of the same arguments apply to files as well, because files as well as functions are an important named scope in software development. This in turn suggests some constraints on files with respect to cognitive complexity.

Studies of memory and human attention consistently demonstrate that we think best about small sets. This fact is reflected by the amount of detail visible within any given named scope, both in programming and in other thought tasks. How many top-level menus in the average application? Colors in most cultures’ divisions of the rainbow? Parameters in an easy-to-understand function? Sections in the average book store? Steps in easy-to-follow driving directions? (There’s a whole field called cognitive ergonomics that explores why these questions always have similar answers.)

How many functions should we put in a reasonable file?

For me, 2 or 5 or 10 feels tractable. 50 feels excessive.

If a “good function” also respects the cognitive complexity constraints of the human brain–not being too big to read in a screen or two, for example–then you end up with a reasonable upper boundary on file sizes of, maybe, 500 or 1000 lines. (See Steve Yegge’s insightful rant about code size being an engineer’s worst enemy. He focuses on codebase size, but much of what he says applies just as well at the next level down.)

I suppose that this argument is weakened by the features of some IDEs, which collapse tangential code blocks, display treeviews of functions, and support lots of hypertext-style navigation. But not all programmers use the same IDEs, and not all interactions with code are IDE-driven; file size remains relevant. There’s a reason why C# created partial classes to improve on java’s lump-it-all-in-a-single-file constraint…

When humans try to remember more than their brains can fit, stuff falls out. Big files mean that coders have to mentally model relationships between stuff that’s separated by way too much screen real estate. This is a recipe for bugs. It is also a serious impediment to learnability.

Loose coupling and encapsulation

Files are a natural unit of coupling. In most programming languages, you can declare a construct (a variable, an internal function, or class) within a file, and have that construct be invisible to the outside world. This means there is a built-in temptation for functions and classes to bind more tightly when they’re in the same file, because they have access to common but private knowledge. By breaking large files apart, you remove the temptation, break unnecessary dependencies, and promote looser coupling.

Another way to say this is that file boundaries are an encapsulation barrier. Use them to hide data. (See my recent post about encapsulation as a simplicity strategy.)

Code reuse and testability

A consequence of files hiding data is that when you have a function that might be useful in a dozen different modules, but the function is buried in a large file with lots of dependencies extraneous to that function, reuse and testability are both frustrated. If the function is in a file of its own, it’s more discoverable, and it’s reusable and testable without extra baggage.

Link optimization

A C/C++ corollary to the file boundary issue has to do with linkers and binary sizes. In many cases, linkers remove unused functions at compilation unit level, rather than at the individual function level. A .c or .cpp file is either in or out, as a unit. This means that if you have a .cpp file with 50 functions in it, and you call only 1 of them, all 50 get linked into the final binary. The result is bloated binaries. So: smaller .cpp files ==> smaller binaries. (Before you flame me about linker optimizations, I will admit that some linkers get more granular, depending on which switches you use. But it’s surprising how hard it is to do better than what I’ve described. Experiment and comment with your results.)

Counter Argument

I suppose you could argue that by making lots of small files, you’re creating more complexity in directories, in makefiles or projects, and so forth. Is 250 files in a folder worse than 15? Doesn’t that violate the “cognitive complexity” guideline above?

My comeback is: use packages or subdirectories or libraries (another level of management). You can’t subdivide forever, but you don’t need to.

The bottom line for me is experiential, not theoretical. I nearly always have cruddy experiences in code bases where large files are common. Small files don’t guarantee pleasant and productive work, but big ones seem to go hand-in-hand with other problems. I find it telling that codebases with big files are also codebases where people lament the lack of comments the most, for example. Over the years, I’ve become convinced that a simple rule of thumb about keeping files small will pay off more handsomely than almost any other coding best practice.

Action Item

Leave a comment to tell me what you think. Am I making a mountain out of a molehill? Or do you feel strongly about small file sizes as well? Have I omitted any important pros and cons from the discussion?