On the Construction of the Foucault Literate Programming System

Huub de Beer

Version 0.1 — 2014/04/21

Bootstrapping Foucault

Introduction

Foucault extracts the code from literate programs written in markdown. There are three ways to add code to a markdown document:

indented code blocks: a sequence of lines indented with four spaces or a tab on top of the current indentation and enclosed by blank lines is treated as code. For example:
```
while true do
    puts "I am a code block"
end
```
inline code: text enclosed by `'s is treaded as piece of code. For example, x = x + 1 if x < 100.
fenced code blocks: all lines in between two 'fences' are treated as code. A fence is a line starting with, at least, three tildes (~) or tick marks (```). The top fence is preceded by a blank line and the bottom fence is followed by a blank line. Furthermore, the bottom fence should start with at least the same amount and type of fence characters. For example:

start = '~~~'
while line =~ /^~~~.*/ do
    puts "line is a code line"
    line = next line
end

Foucault extracts only code inside fenced code blocks, allowing the literate programmer to write about code without writing code by using indented code blocks or, when a very short piece of code, inline code.

Foucault is written as a literate program; this document is the foucault program. There is one problem, however, how can foucault's code be extracted from this document with foucault? It does not yet exist!

The solution is called bootstrapping. We create a simple executable program that can extract code from this document that, itself, is not a literate program. Of course, once that program is created, it can be added to this document as well, allowing it to extract itself. In this section the bootstrapping process is described in three phases.

Phase 0: Simple Fenced Code Blocks

The simplest way to add a fenced code block is to use fences of exactly three tildes.

~~~
# say, like this.
~~~

To run a literate program, then, is to find a way to get the code out the document and feed it to the compiler or interpreter. A simple way to go about that, could be to inspect each line in the document. If the line is part of a code block, keep it, if not, discard it.

Let us interpret a document as a sequence of lines. We use a simple state machine with two states, in code block and out code block, starting in out of code block. If we encounter a code block delimiter as defined above, i.e., a line starting with three tildes, we change state to in code block. In that state, every next line will be collected into the runnable program. Whenever we encounter the code block delimiter we will discard that line. Furthermore, all lines outside a code block will be discarded as well.

CODE_BLOCK_DELIMITER = /^~~~.*$/

def program_collector(document)
    program = []
    incode = false
    while not document.empty? do
        line = document.shift
        if line =~ CODE_BLOCK_DELIMITER then
            incode = ! incode
            next
        end

        program.push line if incode
    end

    program.join
end

Now, to test this simple program, we run it on this document. When running the result, running it on this document, it should give the same result.

document = File.readlines ARGV[0]
program = program_collector document
puts program

Of course, this being the first document, the program to create a runnable program from it does not yet exists. To that end, we bootstrap foucault by copying all code in this document, by hand, to a separate file and run that.

Phase 1: sizing start and end

According to the pandoc manual, fenced code blocks are bit more complicated than just lines of code enclosed by two lines starting with three tildes. For one thing, the number of tildes might be longer than three. However, if so, the ending line should start with, at least, as many tildes, although more are allowed.

As code blocks cannot be nested — that just does not make sense —, we can adapt the first phase bootstrapping program simply by remembering and testing the number of tildes of the starting line. We introduce two functions to determine if a line is a fence and, if so, what length that fence is.

DELIMITER = /^(?<fence>(?<prefix>~~~)~*).*$/

def is_fence?(line)
    DELIMITER.match(line)
end

def fence_size(line)
    DELIMITER.match(line)[:fence].size
end

Now, we adapt the program a bit.

def program_collector(document)
    program = []
    incode = false
    current_fence_size = 0

    while not document.empty? do

        line = document.shift

        if incode 

            if is_fence? line and fence_size(line) >= current_fence_size
                incode = false
                # End of the code block: ignore this line
                next
            else
                # Still in a code block: collect this line for the program
                program.push line
            end

        else
            # We're outside a code block

            if is_fence? line then
                incode = true
                current_fence_size = fence_size line
            end

            # the line is not part of the code, ignore it and go to the next one
            next

        end
    end

    program.join
end

document = File.readlines ARGV[0]
program = program_collector document
puts program

Observe how the program is getting harder to read and we have not incorporated all the fence related rules yet.

Phase 2: two kinds of fences

Not only tildes can make up a fence, also the backtick can do so if a line starts with, at least, three backticks. This usage of the backtick to denote code is different from denoting code inline with a single backtick.

So, the delimiter of a code block can contain of three or more tildes or backticks. However, a block should end with the same symbols in the fence as in the fence it started with. Besides keeping track of the size of a fence, we should also keep track of the type of fence.

We could use these functions and adapt the original program, but the code will only get more complex. Instead, think about determining code blocks differently, not in the last place because there is another way to denote code blocks in markdown alltogether. What we want to know, given the current state and the current line: do we collect this line as part of the program or not?

One way to go about is to extend the state machine and add in what kind code block we're dealing with. So, the initial state still is out of code, but to get into a code block, we've got two options: a backtick fence or a tilde fence. Besides keeping the state, we'll also keep track of the size of the fence. Let us build that state machine.

class CodeBlockDeterminator

    def initialize()
        to_start_state
    end

    def collect_line?(line) 
        collect_line = false
        case @state
        when :fenced_block
            if is_fence?(line) and fence_size(line) >= @size and fence_type(line) == @type then
                 # recognized the end of this code block
                 to_start_state 
            else
                # We're still in a code block: collect this line
                return true
            end            
        when :start
            if is_fence?(line)
                # Start of a new code block
                @state = :fenced_block
                @type = fence_type line
                @size = fence_size line
            end
        end
        
        # not in a code block: don't collect this line
        return false
    end

    private

    def to_start_state()
        @state = :start
        @size = 0
        @type = :none
    end

    FENCE = /^(?<fence>(?<type>~|`){3,}).*$/

    def is_fence?(line)
        FENCE.match line
    end

    def fence_size(line)
        FENCE.match(line)[:fence].size
    end

    def fence_type(line)
        FENCE.match(line)[:type]
    end

end

Using that state machine, the program becomes

def program_collector(document)
    program = []
    state_machine = CodeBlockDeterminator.new

    while not document.empty? do

        line = document.shift

        program.push line if state_machine.collect_line? line

    end

    program.join
end

document = File.readlines ARGV[0]
program = program_collector document
puts program

State machine to determinate lines




class CodeBlockDeterminator

    def initialize()
        to_start_state
    end

    def collect_line?(line) 

        case @state

        when :fenced_block
            if is_fence?(line) and fence_size(line) >= @size and fence_type(line) == @type then
                 # recognized the end of this code block
                 to_start_state 
            else
                # We're still in a code block: collect this line
                return true
            end            

        when :start
            if is_fence?(line)
                # Start of a new code block
                @state = :fenced_block
                @type = fence_type line
                @size = fence_size line
            end
        end
        
        # not in a code block: don't collect this line
        return false
    end

    private

    def to_start_state()
        @state = :start
        @size = 0
        @type = :none
    end

    FENCE = /^(?<fence>(?<type>~|`){3,}).*$/

    def is_fence?(line)
        FENCE.match line
    end

    def fence_size(line)
        FENCE.match(line)[:fence].size
    end

    def fence_type(line)
        FENCE.match(line)[:type]
    end

end

Making Foucault Functional

Up till now, the bootstrapping process focused on getting the foucault system up and running. Now it does, it is time to make a "real" program out of it. One of the characteristics of such a real program that foucault tries to be, is being a command line tool like any other in the UNIX userland: a program with default behavior, options to change that default behavior, and reasonable error reporting. In this section, we discuss these aspects.

Making a real program out of foucault

We want foucault to behave like any other command line tool. To that end, we add command line arguments functionality. The optionparser library is made for this purpose. Following the documentation page of optionparser we adapt foucault, adding options. These options will be discussed in detail in the rest of this section.

#!/usr/bin/env ruby
require 'ostruct'
require 'optparse'

# Set all default options
options = OpenStruct.new
options.mirror = false
options.output_dir = ''
options.output = ''
options.debug = false

OptionParser.new do |opts|
    
    opts.banner = "Literate programming with Foucault — taking a narrative turn"
    opts.separator ""
    opts.separator "Usage: foucault [options] input files"
    opts.separator ""
    opts.separator "Options:"

    opts.on("-o", 
            "--output [PATH]", 
            "Path to output collected program text to") do |path|
        options.output = path
    end

    opts.on("-d", "--debug", "Generate program texts in debug mode") do |d|
        options.debug = true
    end

    opts.on_tail("-h",
                 "--help",
                 "Show this message") do
        puts opts
    end

end.parse!



document = []
ARGV.each do |input_file|
    document.concat File.readlines input_file
end

The OpenStructure options now contains all the options set by the user.

Enabling debug mode

Foucault can collect code from fenced code blocks. It ignores all other lines in the document. This is the default way of operating and what we want when we generate source code from the document.

However, while programming one often makes errors. Although literate programming is a style of programming that focusses on thinking before programming, errors will still be a regular occurrence during programming. To find and resolve these issues, the program code will be compiled, interpreted of otherwise analysed and found errors will be reported on by mentioning the place in the program text the error approximately did occur.

The line numbers in a program text generated by foucault, however generally do not match the line numbers of that line in the original document. To make debugging easier, therefore, foucault should have a debug mode that keeps the line numbers of the code in the document and the corresponding generated program text the same.

The simple solution would be to make program documentation lines from all the non-code block lines in the document. It makes looking through the generated program code harder as the "live" code is hidden in comments. As we want to adapt the program code in the document, not in the generated program text, it makes more sense to just translate all non-code lines into blank lines, making it easy to spot the code. Ideally, we would like to have tooling support that would in the document refer to the errors in the generated program text.

def program_collector(document, debug = false)
    program = []

    require_relative 'lib/code_block_determinator.rb'

    line_determinator = CodeBlockDeterminator.new

    while not document.empty? do

        line = document.shift

        if line_determinator.collect_line? line
            program.push line
        elsif debug
            program.push "\n"
        else
            # ignore this line
        end

    end

    program.join
end

Setting the output file

Foucault will output the collected program to standard out. Sometimes, however, we want to write the program text to a file. With the -o or --output option, the used can specify the (relative) path to that file. This file will be overwritten if it exists and created if it does not.

program = program_collector document, options.debug

if not options.output.empty? then
    # try to write collected program text to file specified in options.output
    File.open(options.output, "w") do |file|
        file.puts program
    end
else
    # No output file specified: use STDOUT
    puts program
end