software = science + art + people
2013-09-19
It’s the start of another school year, and my seventh-grade son is learning algebra. As I sat beside him to coach him through some homework the other night, I shared my favorite bit of wisdom about how to make math problems—even complex ones—simple and error-free:
Write the progression from known to unknown, one step at a time.
In my experience, the surest recipe for disaster is to short-circuit this rule. Collapse a few steps in your head in the name of efficiency, and you’ll forget a minus sign, or yol group incorrectly, or you’ll lose track of an exponent or an absolute value—and you’ll end up with a mess. You’ll have to debug your solution by slogging back through the problem from the beginning until you figure out where you went wrong.
It’s interesting—and maybe, profound—how nicely this piece of advice maps onto the design principle of progressive disclosure. The human mind is simply wired to perceive in broad outlines, and then to gradually clarify, a few details at a time.
Don’t believe me? Try a short experiment: draw this fractal.
I’ll bet that instead of laying down every pixel, like a printer, you immediately produce a simplification that captures the general shape as lines, with a lot of detail suppressed. You did this as a kid, when you drew stick figures and triangle+half-circle sailboats.
Artists sometimes squint to blur out what they don’t want to see, leaving only general patterns and colors. But coders never do, because we don’t expect code to work that way.
Drowning in details
I think this is one of the flaws in most programming languages I know: they immediately plunge you waist-deep into implementation details that hide the forest among the trees. To see what I mean, try this experiment, which I did a couple weekends ago:
I did this with the Julia programming language, but you could download Mozilla or Apache or Hadoop or anything else that sound interesting, and I think you’d have similar results: like me, I predict that you’ll quickly be overwhelmed with questions. Which of 3 likely entry points is the true top of the call graph? What do all these #ifdefs mean, and which ones are going to be active in my build? Which parts of the code have dependencies I need to understand?
This is the equivalent of trying to solve the entire algebra problem by holding all the transformations of an equation in your head at the same time. It can be paralyg. None of the techniques that are popular in programming communities today give satisfactory answers to this problem—not BDD or TDD, not golden threads, not design docs, not javadoc/doxygen, not aggressive commenting, not ER or UML diagrams, not architecture description languages.
This need for context, for a high-level picture, for a sketch that gives you a useful skeleton of a mental model, is the reason why any new hire into a team with a complex codebase gets a whiteboard-ish orientation from smart teammates. We intuitively know we need it, and we’ll never get it from the code itself. On a codebase that I currently own, it is a major reason why the team hates the code and tells horror stories about its learning curve.
There’s just no easy entrée.
What’s in a book?
You could claim that all is for the best in this best of all possible worlds (nod to Pangloss). After all, teams do provide overviews and walkthroughs and design docs, and sooner or later, we get through the learning curve. But I think code could do a much better job of communicating, if we raised the bar.
Think about books for a minute. Hopefully you bought a print book recently enough to remember what it was like to pick it up and consider reading. How did you decide? If you’re like me, you probably read a blurb on the front or back cover. Did you glance at the foreword or preface? Did you read the first page to see if it felt interesting? Did you scan a table of contents? Did you flip to an index to see everywhere that your favorite subtopic was referenced?
Code has only weak parallels for these broad-brushstroke mechanisms. In some sense, the main() routine is like chapter 1—but what comes after that may quickly become indecipherable, especially if you’re doing OOP or AOP or event-driven programming. You might liken java packages to a rough structure, but I think that in practice, they only deliver mediocre value because they tend to group by topic, not by code flow or by structural role. Headers insulate you from some details of an implementation, gestalt. Tests as a form of documentation are helpful, but they make it even harder to distinguish major themes from trivia. IDEs give you tree views, but trivia is intermixed with overarching concepts. Nowhere in a codebase do you typically find an explicit discussion about which constructs matter at install time, or which ones are important during startup but not during the later lifetime of the program.
Toward utopia
I don’t think this lack-of-a-big-picture problem can be solved with a single silver bullet. But here are a few ways that a better programming language/environment/ecosystem might make things better:
// normal function, not stubbed int x = do_something(); // stub declared, assigned a placeholder return value, // and expected to compile with a working unit test // without me ever leaving my flow int y = do_something_else(a, b) stubbed with return 25;
I have a few other ideas about how progressive disclosure might work in a better programming language, but I think I’ll stop there. I’m very curious to see if other smart people out there have good suggestions of their own.
Action Item
Tell me what you think would make it easier to perceive the rough behavior and structure of a big, complicated codebase in more efficient ways.
Comments-
-
-
-
trevharmon, 2013-09-19:
Daniel, I know you and I discussed this for some length of time earlier, but I've come across another thought. And, it deals with the different approaches we use when coding. Naturally, I can only use myself as an example. I wonder if some of this behavior comes from doing some form of bottom-up programming. What I mean by this is after the brainstorming that goes on when planning a project, do programmers naturally start by building the objects or building the flow? Let's compare this to writing a paper vs. a computer program. If I'm writing something small (e.g., for my blog), I'm likely to just do stream-of-conscience writing. Whatever comes into my head as the next logical step will be put directly written down without much thought. I do the same thing with quick-and-dirty scripts. Now, when I'm writing something larger, my behavior changes. Even though the brainstorming may have resulting in many, many individual pieces, I start the actual writing process by defining a clear flow of ideas (usually through defining chapter/section/sub-section headings). The details are then filled in using the brainstormed material. When coding a large project, I tend to do something different. Instead of starting with the overall flow, I go to creating the underlying objects. I build all of this infrastructure-providing pieces first. Later, I go back and add the logical flow. While both tasks are completed, it's clear my main focus is on the building blocks, not the overall flow. Interestingly enough, on medium-sized programming projects, I do sometimes follow the pattern of doing flow first. That's seen in the progressive disclosure MAIN: section we were discussing. I have a section of code at the beginning of an obvious file that consists of the following: - Initialization - Functions and flow control - Termination In this case it's always clear what the flow is and everything else is built around it. If I do this first and use good function names, those function (in their call order) are then copied to later in the file to be used as the skeleton for the rest of the code. Obviously, these are not large projects and are heavily skewed to procedural programming paradigms. As stated, in larger code bases my behavior appears to be different. So, that's my long-winded way of agreeing with you that program flow and progressive disclosure are often not given the level of attention they deserve.
Daniel Hardman, 2013-09-19:
Interesting analysis, Trev. You've got me wondering if I do the same thing. I know I jump right into flow on small stuff, but I'm not sure what I do on large stuff; next time I embark, I'll turn my radar on. I wonder if the "create the objects first" behavior comes from a subconsciously recognized need to have useful tools at your disposal before exploring flow. Maybe we can't think about flow until we have the object-level constructs to work with...
Vladimir Starostenkov, 2013-09-20:
Daniel, I thought you were talking mostly about project with existing code base in the post. Looks like Trev starts from scratch or POC. That's a huge difference. Consider adding few lines into linux kernel without being professional in it. No way. You have to take a book which has this built-in “granularity slider”. You mentioned Hadoop. That's one more case. That's easy to read a pair of papers on Map-Reduce paradigm, look through a book on Hadoop and write your project on top of it. But you don't need to go through Hadoop code at all. That is why Hadoop is popular. The same case is Qt for C++ developer. You don't have to know how it works to use it properly. You have to accept the paradigm. If you want to learn the core without the “granularity slider” - you'd better start from scratch. Qt has it's audience not only because it's architectural integrity, but detailed documentation. One more interesting example is SPARK project by Berkley AMP lab. It gives a Scala programmer the ability to work with distributed data structures just the same way he does with the local ones. That is against "Artists sometimes squint to blur out what they don’t want to see, leaving only general patterns and colors. But coders never do, because we don’t expect code to work that way." Sorry :)
Daniel Hardman, 2013-09-20:
Vladimir: I had not thought about how codebases are often associated with books, and how those books provide the sort of "granularity slider" I was wishing for. That's a good insight! Thanks. I am somewhat familiar with SPARK, but I need to study it a little more. I think that it provides one kind of progressive disclosure, but not many of the other ones I want. Thanks for the thoughtful comment!