Codecraft

software = science + art + people

Mountains, Molehills, and Markedness

2014-07-28

In my previous three posts, I explained why the semantics of programming languages are introduced “marks” — a feature of the intent programming language I’m creating — and gave you a taste for how they work.

In this post, I’m going to offer more examples, so you see the breadth of their application.

Aside

Before I do, however, I can’t resist commenting a bit on the rationale for the name “marks”.

In linguistics, markedness is the idea that some values in a language’s conceptual or structural systems should be assumed, while others must be denoted explicitly through morphology, prosodics, structural adjustments, and so forth. Choices about markedness are inseparable from worldview and from imputed meaning. Two quick examples:

  1. Chinese generally doesn't inflect tense/aspect, but when necessary, it marks utterances in the past or future using extra particles. Contrast "我吃" ("I eat" or "I am eating" or "I habitually eat" — or even "I will eat" or "I ate", if the speaker considers tense/aspect irrelevant or knowable from context) :: "我吃" ("I ate" [explicitly in the past]).
  2. In languages that have a grammatical gender, nouns are often marked to indicate a category that the linguistic community deems more semantically rich/interesting than unknown/neuter. Contrast English "I saw some lions" :: "I saw some lionesses"; Spanish "Vi algunos leones" :: "Vi algunas leonas"; and German "Ich sah einige Löwen" :: "Ich sah einige Löwinnen". In each case, the first form doesn't make any particular claim about gender, whereas the marked form does.

In all human language, meaning is powerfully influenced by patterns of markedness. We pay attention to marks. Whether we’re raising our eyebrows, selecting words with care during an debate, or straining to understand a friend on an iffy cell phone connection, we key off of their presence or absence. We do it intuitively and constantly.

Yet for all their power, marks are unobtrusive and cheap to use.

That’s a happy combination.

Markedness in programming languages

Of course, markedness already manifests in programming languages, even if you’re not using my “marks”. Depending on whether you’re in java, C++, or python, the default visibility of class members is package, private, or public — all other visibilities must be marked. Constness is marked in C++. Alignment of data structures, casting, partial template specialization, scope of closure variables, and many other features all embody markedness rules in one way or another.

Unfortunately, the literalness of programming languages, and the fundamental assumption that the purpose of a language is exactly and only to embody instructions that get translated to machine code, has caused markedness to be mismanaged. I’ve already written at length about the semantic gap between code and human software development activities — the lacuna humana. That arises partly because of markedness problems; go back and read my blueprint for marks and see how markedness can’t propagate or evaluate without the infrastructure I describe.

Another consequence of markedness mismanagement is clumsiness and verbosity. Human languages are parsimonious; default cases tend not to be the marked ones. Even when marks do appear, they propagate meaning without ad nauseum repetition. But programming languages have historical baggage that flips markedness on its head — the threadsafe, bounds-checking, non-blocking, const-correct versions of features that we should use by default all require extra marks. Think sprintf_s, rand_r, the std:: namespace… Think smart pointers versus raw pointers. Think Hoare’s billion-dollar mistake. How many explicit assertions and preconditions have you written over the years, to sanity-check stuff that should always be true (if (myArg == null) throw Exception("Can't be null.")…), instead of writing code to allow a few corner cases?

More Examples

Hopefully I’ve convinced you that markedness matters. I think it’s a mountain, rather than a molehill.

A mole hill with a glorious horizon behind it. Are those mountains, or just trees? :-) Image credit: Strep72 (Flickr).

But just in case I haven’t, here are more scenarios to think about. As you read these, keep in mind what you already know about marks: they have full access to the code DOM at compile time; they propagate in sophisticated ways; they can generate code; and they can attach to constructs that traditional code ignores, such as requirements, human teams, and so forth.

Final Thoughts

This post would be incomplete if I didn’t acknowledge the limitations of marks. My friend Trev Harmon (@trev_harmon) was asking me the other day how much I thought my ideas overlapped with the goals of the semantic web. Marks are not nearly that ambitious. Although they expand the scope of semantics in programming languages in important ways, they can’t turn code into a fitting conveyance for all human communication. They work well within the domain-specific language of software development.

Another of my friends, David Handy, pointed out that propagation of marks through a call graph gets problematic across closure boundaries and function pointers. That’s quite true, and I’m not sure how surmountable it is.

So marks can’t butter your toast, or write poetry. :-)

Still, I think they’re a useful innovation. I’m hoping that smarter minds than mine can pick up on the kernel of the idea and take it to cool new places I haven’t yet imagined. My friend David also pointed out some cool ways that marks could be used to gather statistics, which I had not considered. What else will you dream up? If you’re interested in collaborating, let me know. Also, I would appreciate you sharing this series of posts with people who don’t read my blog; I’m interested in broadening the conversation as much as possible.


Comments

  • Dennis, 2014-07-30:

    Just one little correction to the German example in section "Aside". You use the word "Löwin" as genderless expression, but this is the female lion. Whereas "Löwinnen" is the plural of "Löwin". If you wan't to correct it, it would be "Ich sah einige Löwen" :: "Ich sag einige Löwinnen". But nevertheless thank you for sharing your insights!

  • Daniel Hardman, 2014-07-30:

    Thank you so much for the catch, Dennis! I've updated the post.

  • David H, 2014-08-07:

    The additional use cases for marks that you have described here look much more challenging to implement than the ones you gave previously. The earlier use cases could have been implemented by examining a call graph and a DOM of the compiled code. But the use cases involving temporal boundaries, wouldn't those require more sophisticated flow analysis? So, if difficulty of implementation is not to be considered at all, then I can add some more use cases to your list. :) How about marking variables that contain direct user input? Or that contain data POSTed from the internet, etc? And then have the compiler make sure that you never use untrusted input e.g. to form SQL expressions, nor pass it to an exec() function call, etc. How about marking functions that should only be performed by users with admin privileges? How about marking functions that should only execute within a security sandbox? Or, conversely, marking those functions trusted to make changes outside of the sandbox, making everything else restricted? There are probably a lot more security-related rules that, if your marks system was fully functional as you described, could finally be enforced.

  • Daniel Hardman, 2014-08-07:

    Excellent notes, David. The one about direct user input is a great use case, and a piece of cake to implement — and I think it would even be possible to take most of the burden off of the coder, because functions that receive direct user input can be painted with a mark that propagates to anything that calls them in an assignment. This makes it so the very act of calling something like sscanf() can automatically cause the variables that get set to acquire the "direct user input" mark, without the coder lifting a finger. You are right that temporal propagation is harder to implement than some of the other ones. Maybe I'll have to defer that one if it proves too challenging — although I have some ideas about how it would work, and they seem feasible in my mind. I guess we'll see when I get there. At the moment I'm still in the early stages of lexing/parsing... The whole security angle is huge, and I'm hoping someone who's got a deep background in vulnerability analysis can chime in with wisdom.