Lacunas Everywhere

I’m told that in Czech, the word “prozvonit” means “to call a mobile phone and let it ring once so that the other person will call back, saving the first caller money.”

Image credit: AstridWestvang (Flickr)

How would you translate this word to someone in New Guinea who has never experienced electricity, let alone a telephone or a bill from Verizon? You wouldn’t. This is an example of a “lacuna“–a translation problem caused by semantic gaps in a target language. Lacunas occur in programming languages. You might know a few; maybe you wish C++ had python-style generators–or that Java had Haskell’s notion of pure functions–or that C supported PHP-style string interpolation. But what if I told you that semantic misalignment between any pair of programming languages is just minor details? What if I claimed that all programming languages I’ve used have numerous, pernicious, and expensive semantic gaps? That we don’t see these gaps for the same reasons that a stone-age hunter-gatherer fails to notice his inability to discuss patterns of cell phone usage? Would you think I’m crazy?

Symptoms

Well, how many of the following scenarios sound familiar? (If this list gets too long, just read a few–but I wanted to show you how pervasive the problem is…)

  • You’ve just written the definitive implementation of CIDR parsing (or printer detection, or timezone handling, or whatever), and you worry about somebody else re-inventing the wheel. (Imagine if you could assert that no functions having the same semantics and intent as yours get introduced in the future, without triggering a warning message and a judicious human override…)
  • You shipped prototype code (or a stub, or an ugly kludge) that was okay once upon a time, but should never have seen the light of day. (Imagine if you could tell a compiler that certain code is iffy, and run a test before release to guarantee that none of that existed in shipping code paths…)
  • You maintain design docs on a wiki, a network share, a CMS, or similar–and the relationship between these artifacts and the source code that embodies/implements them is stored nowhere except the heads of the dev team and maybe an occasional comment. As a result, coders refer to them only occasionally; they decay over time and may not have high ROI. (Imagine if your IDE could knit together all these sources, because your programming language could directly express the idea of attachments and hyperlinks… Imagine if you could implement a business rule such as “all dialogs in the UI must be linked to a usability eval plan” — and have tests fail where such relationships don’t exist…)
  • For security purposes, you need to know which public functions are callable by external entities, which contextual constraints would make them safe, and which parts of your code execute with which elevated privileges. You spend hours building a spreadsheet, but it’s out-of-date almost immediately. (Imagine that security semantics were directly declarable in code in such a way that the compiler and other tools could inspect them, warn about them, and report them on demand…)
  • You have methods that are only callable at certain phases in the lifecycle of an object. Your only mechanism for enforcing conformity is documentation, plus throwing exceptions/SIGABRT if they’re ignored. (Imagine if temporal semantics were declarable and enforceable…)
  • You write code that validates parameters at the top of a function, and then you regurgitate those same semantics in redundant javadoc-style comments, so human consumers of the function learn about the constraints without needing to see the impl. (Imagine if preconditions were treated as an essential characteristic of a function’s interface, making them as visible as parameter names. No doc necessary…)
  • You waste time writing boilerplate code that does nothing more than assign args to member variables in constructors: this.size = size; this.color = color. (Imagine if you could simply note a “copy args” intention and have the compiler generate the rest–automatically updating the assignments as code evolves…)
  • You write comments that look like this:
    // IF YOU EVER MODIFY THIS ARRAY, MAKE SURE YOU ALSO
    // MODIFY THE HANDLING ROUTINES IN xyz AND abc !!!

    (Imagine if semantic coupling were directly expressible to a compiler…)

  • You write code that forwards parameters (static factory method takes args A, B, and C, then calls constructor with args A, B, and C). The names and constraints on parameters are identical in both contexts, but you have to repeat precondition code and javadoc for them as many times as they occur. (Imagine if you could simply note a correspondence and have the compiler forward and document semantics for you automatically…)
  • You’d like to enforce coding standards–formatting and naming conventions, maybe, but also trickier stuff, like “we strictly obey the Liskov Substitution Principle.” (Imagine if any given file or folder could hyperlink to formally defined conventions, and then the compiler would enforce them…)
  • You receive an edict from on high that, on all shipping projects, you can’t use any code with a GPL license–or you are asked to evaluate the IP of an M&A candidate for all open source usage–or you want to know what attribution you should put in your product’s about box. You need to know which components and libraries have which licenses. You grep the code and hope your report is comprehensive and accurate, but you aren’t totally confident. (Imagine if the license for a piece of code could be directly expressed in the syntax of your programming language… Imagine if you could tell a compiler to refuse to use code with a license you don’t like…)
  • You’d like to identify code paths that are invoked by admins, and diff those against code paths that are invoked by less privileged users. Or you’d like to find code paths used by customer X. Or you’d like to guarantee that the latest round of testing exercises every function that’s been changed in the past 3 weeks. (Imagine if people, use cases, security profiles, and other “business concerns” could be associated with places in code… Imagine if, at compile time, you could generate unions and intersections of call trees based on arbitrary criteria that your team dreamed up…)
  • You endlessly fiddle with “signed/unsigned comparison” warnings, even though the two numbers you’re comparing invariably have ranges that are small and positive. (Imagine if the range of the operands in a comparison, rather than just their types, were known to compilers…)
  • You want a particular class to be threadsafe, and you go to considerable trouble to make it so, but you worry that later maintainers won’t understand key subtleties. (Imagine if you could assert thread safety, and the compiler would enforce it forever…)
  • For performance or scale reasons, you need to guarantee that a particular function, throughout its maintenance lifetime, never triggers file I/O, never uses the network, always runs faster than a reference impl, performs as O(n) with respect to a given parameter, etc. (Imagine that all standard library calls “knew” their resource usage characteristics. Imagine that a compiler could validate rich assertions about the call tree of any compilation target…)
  • You want to make a variable “read-only” or “final” — but partway through a function, rather than at declaration time. (Imagine if semantics could be attached anywhere, not just in declarations…)

I could go on. All day long, every day, developers around the world wrestle to codify constructs that really don’t map well onto the semantic space that their chosen language provides. Their language may be Turing-complete, but that doesn’t mean it’s semantically rich. The problem, I think, is caused by our industry undervaluing the human dimension of software development. We are taught to analyze and create context-free grammars. That’s a hard task, and perhaps we can be forgiven for thinking that once we get there, with a fast and robust compiler, a nice runtime, documentation, and other tools, we’ve mostly achieved the mandate of a programming language. Everything else goes in the comments. But programming languages have a dual audience (humans as well as compilers)–and the thinking, messy half of it gets neglected. We have a lacuna humana.

Workarounds Don’t Cut It

Perhaps you’re saying to yourself: “Language X has a way to solve problem Y.” At the micro level, I don’t necessarily disagree. I have written unit tests that (sort of) proved thread-safety in a codebase. I’ve created scripts that proved copyright/license compliance. I have found clever ways to enforce one or two high-value coding standards. I know about Ada’s numeric range types. I’ve decorated python code in such a way that prototype code was discoverable, so we wouldn’t ship it. I’ve used @Override in java. The const and constexpr keywords in C++ tell you something about thread safety. But collectively, I claim that today’s programming languages do a poor job addressing any needs that are not tied pretty directly to deciding what machine code gets put in a binary–even though the discipline of software development subsumes many other concerns. If you’re hoping to solve human problems, your coding tools are crippled by the narrow scope of the language they support. How much wasted time is attributable to issues like what I’ve listed?

Passing the Buck Doesn’t Cut It

Perhaps you’re saying to yourself: “Use the right tool for the right job. A programming language shouldn’t have the fuzzy jobs in the examples above.” Really? Almost every piece of code on the planet ends up having some kind of copyright/license associated with it. The license can be described in text, and it directly impacts how the software is produced and consumed–but the language of the software should only concern itself with classes and functions, and ignore this issue? A language shouldn’t be indicted for creating useless redundancy that undermines encapsulation and the accuracy of docs? A language is well designed, even if it generates tons of useless warnings, displayed redundantly, for all time, to all coders who work on a codebase? I’m not claiming that tech writers should create content in the same programming language as developers, or that graphic artists should start coding instead of photoshopping their icons. We don’t need to compile gantt charts. I’m just saying that even within the domain of problems that software engineers usually own, our languages are too semantically barren to solve lots of real-world problems. This costs us real time and money.

Plugging the Gaps

It doesn’t have to be this way. If you’re following my blog, you know that I’ve been designing a new programming language. One of its most important innovations offers a quantum leap in semantic density, without lots of noise or bother. I’ll be explaining this feature, “marks,” in a series of follow-on posts. I hope you’ll subscribe or check back to see where I’m headed.

11 thoughts on “Lacunas Everywhere

  1. David H says:

    Hi Daniel –
    You’ve definitely got me curious!
    Can you give us your reasoning why a whole new language vs. extending an existing language to solve your use cases? Becaues the problems you cite sound more like a wish list for additional features to a language, not complaints about features you want removed from any given language.
    Also, what you’ve written so far reminds me of Donald Knuth and his “literate programming”, which was intended to address your concerns about the human aspect of programming. If I understand correctly, in Knuth’s system you write a document whose primary language is English (or Spanish or whatever your human language is) and the compiler extracts embedded code out of it and creates the program. That way the code and documentation are never far apart. Comments on that?
    David H

    • David: the problem I’m highlighting doesn’t require a new programming language; it could easily be solved by extending existing ones. In fact, I even toyed with the idea of writing up a proposal for C++17… However, “extending” doesn’t just mean adding a library or package into the core runtime; it would require a change in assumptions about what we believe is valid content for code. For that reason, I suspect that existing languages won’t glom onto it.

      In my next couple posts, I’ll describe the solution. I’ve been slow to post, lately, but I’ll try to write them quickly so you’re not left hanging for too long. :-) I’m not trying to be mysterious; I’m just having a dickens of a time pulling half a dozen mental threads into a coherent tapestry.

      Your connection with Knuth is an interesting one. I’ve heard of literate programming, and I’ve read some stuff by him before, but I am not very familiar with the specific idea you describe, so now I’ve got a new homework assignment.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s