% fortune -ae paul murphy

Literate programming

The interview I mentioned last week, between Andrew Binstock and Donald Knuth included this bit about literate programming:

Andrew: One of the few projects of yours that hasn't been embraced by a widespread community is literate programming. What are your thoughts about why literate programming didn't catch on? And is there anything you'd have done differently in retrospect regarding literate programming?

Donald: Literate programming is a very personal thing. I think it's terrific, but that might well be because I'm a very strange person. It has tens of thousands of fans, but not millions.

In my experience, software created with literate programming has turned out to be significantly better than software developed in more traditional ways. Yet ordinary software is usually okay - I'd give it a grade of C (or maybe C++), but not F; hence, the traditional methods stay with us. Since they're understood by a vast community of programmers, most people have no big incentive to change, just as I'm not motivated to learn Esperanto even though it might be preferable to English and German and French and Russian (if everybody switched).

Jon Bentley probably hit the nail on the head when he once was asked why literate programming hasn't taken the whole world by storm. He observed that a small percentage of the world's population is good at programming, and a small percentage is good at writing; apparently I am asking everybody to be in both subsets.

Yet to me, literate programming is certainly the most important thing that came out of the TeX project. Not only has it enabled me to write and maintain programs faster and more reliably than ever before, and been one of my greatest sources of joy since the 1980s -it has actually been indispensable at times. Some of my major programs, such as the MMIX meta-simulator, could not have been written with any other methodology that I've ever heard of. The complexity was simply too daunting for my limited brain to handle; without literate programming, the whole enterprise would have flopped miserably.

If people do discover nice ways to use the newfangled multithreaded machines, I would expect the discovery to come from people who routinely use literate programming. Literate programming is what you need to rise above the ordinary level of achievement. But I don't believe in forcing ideas on anybody. If literate programming isn't your style, please forget it and do what you like. If nobody likes it but me, let it die.

On a positive note, I've been pleased to discover that the conventions of CWEB are already standard equipment within preinstalled software such as Makefiles, when I get off-the-shelf Linux these days.

Here's the front page summary from the literate programming main web site:

I believe that the time is ripe for significantly better documentation of programs, and that we can best achieve this by considering programs to be works of literature. Hence, my title: "Literate Programming."

Let us change our traditional attitude to the construction of programs: Instead of imagining that our main task is to instruct a computer what to do, let us concentrate rather on explaining to human beings what we want a computer to do.

The practitioner of literate programming can be regarded as an essayist, whose main concern is with exposition and excellence of style. Such an author, with thesaurus in hand, chooses the names of variables carefully and explains what each variable means. He or she strives for a program that is comprehensible because its concepts have been introduced in an order that is best for human understanding, using a mixture of formal and informal methods that reinforce each other.

(from: Donald Knuth. "Literate Programming (1984)" in Literate Programming. CSLI, 1992, pg. 99.)

And, from Daniel Mall, quoted a bit later on the same web page:

The key features of literate programming are the organization of source code into small sections and the production of a book quality program listing. Literate programming is an excellent method for documenting the internals of software products especially applications with complex features. Literate programming is useful for programs of all sizes. Literate programming encourages meaningful documentation and the inclusion of details that are usually omitted in source code such as the description of algorithms, design decisions, and implementation strategy.

Literate programming increases product quality by requiring software developers to examine and explain their code. The architecture and design is explained at a conceptual level. Modeling diagrams are included (UML). Long procedures are restructuring by folding portions of the code into sections. Innovative ideas, critical technical knowledge, algorithmic solutions, and unusual coding constructions are clearly documented.

Literate programs are written to be read by other software developers. Program comprehension is a key activity during corrective and perfective maintenance. High quality documentation facilitates program modification with fewer conceptual errors and resultant defects. The clarity of literate programs enables team members to reuse existing code and to provide constructive feedback during code reviews.

Organization of source code into small sections. The style of literate programming combines source code and documentation into a single source file. Literate programs utilize sections which enable the developer to describe blocks of code in a convenient manner. Functions are decomposed into several sections. Sections are presented in the order which is best for program comprehension. Code sections improve on verbose commenting by providing the ability to write descriptive paragraphs while avoiding cluttering the source code.

Production of a book quality program listing. Literate programming languages (CWEB) utilize a combination of typesetting language (TeX) and programming language (C++). The typesetting language enables all of the comprehension aids available in books such as pictures, diagrams, figures, tables, formatted equations, bibliographic references, table of contents, and index. The typographic processing of literate programs produces code listings with elegantly formatted documentation and source code. Listings generated in PDF format include hypertext links.

This, it seems to me, points at part of what has to be the long term right answer for development project management.

First, I've worked with an awful lot of people in development roles - and I've yet to meet someone who's both a competent programmer and less than fully literate in whatever the person's first language is. Indeed the very best programmers I've met have uniformly been fluent in English and competent in at least one other language. As a result I've come to believe that someone who can't write a literate, multi-clause, sentence isn't going to write good code either. Thus the idea of tying the two together to first write the narrative, then embed code in the action sections of the narrative, makes perfect, intuitive, sense to me.

Second, I think this whole business of intermixing a human readable language with computer transformable language addresses only the code development and maintenance elements in the overall business process fired off when someone decides development is the right answer. Specifically, this idea needs to be extended to cover requirements specification, both prior to and during initial development and during the subsequent maintenance period.

As I said yesterday, my experience with hundreds of mid range projects and a few big ones is that the normal requirements and specifications processes get in the way producing effective applications - and that for interactive business applications the right answer is continuous prototyping with no hard break in the process once the thing goes into production.

The generalization of this to almost all applications is to write the manual first, the code second.

In a continuous prototyping environment the screens and actions you define interactively with users describe the business process - and so making the formal manual is a matter of printing out the screens and adding detail to the FYI (Vision attaches a For Your Information summary line to every user enterable field and shows it in a standard location as the mouse enters the field - on the screen included in yesterday's blog (click on it for the bigger picture) the FYI is on the bottom and accompanied by a "next screen" identifying the next step in the work process.)

If you're building code to control temperature in a brewing vat writing the manual first means documenting the interfaces, documenting the hardware assumptions, and describing the API you're building to that hardware in terms of those interfaces to your plant control application - and, of course, the differences between the manual's description of what something does and the traditional specifications description of what it's supposed to, are merely matters of voice and tense.

And that, I think, is the bottom line on literate programming: establishing an across the board equivalence between the usage narrative and the code is an idea that applies about equally well to continuous prototyping for interactive business applications and batch or control operations.


Paul Murphy wrote and published The Unix Guide to Defenestration. Murphy is a 25-year veteran of the I.T. consulting industry, specializing in Unix and Unix-related management issues.