Foucault extracts the code from literate programs written in markdown. There are three ways to add code to a markdown document:
indented code blocks: a sequence of lines indented with four spaces or a tab on top of the current indentation and enclosed by blank lines is treated as code. For example:
while true do
puts "I am a code block"
end
inline code: text enclosed by `'s is treaded as piece of code. For example, x = x + 1 if x < 100
.
fenced code blocks: all lines in between two 'fences' are treated as code. A fence is a line starting with, at least, three tildes (~
) or tick marks (```). The top fence is preceded by a blank line and the bottom fence is followed by a blank line. Furthermore, the bottom fence should start with at least the same amount and type of fence characters. For example:
start = '~~~'
while line =~ /^~~~.*/ do
puts "line is a code line"
line = next line
end
Foucault extracts only code inside fenced code blocks, allowing the literate programmer to write about code without writing code by using indented code blocks or, when a very short piece of code, inline code.
Foucault is written as a literate program; this document is the foucault program. There is one problem, however, how can foucault's code be extracted from this document with foucault? It does not yet exist!
The solution is called bootstrapping. We create a simple executable program that can extract code from this document that, itself, is not a literate program. Of course, once that program is created, it can be added to this document as well, allowing it to extract itself. In this section the bootstrapping process is described in three phases.
The simplest way to add a fenced code block is to use fences of exactly three tildes.
~~~
# say, like this.
~~~
To run a literate program, then, is to find a way to get the code out the document and feed it to the compiler or interpreter. A simple way to go about that, could be to inspect each line in the document. If the line is part of a code block, keep it, if not, discard it.
Let us interpret a document as a sequence of lines. We use a simple state machine with two states, in code block and out code block, starting in out of code block. If we encounter a code block delimiter as defined above, i.e., a line starting with three tildes, we change state to in code block. In that state, every next line will be collected into the runnable program. Whenever we encounter the code block delimiter we will discard that line. Furthermore, all lines outside a code block will be discarded as well.
CODE_BLOCK_DELIMITER = /^~~~.*$/
def program_collector(document)
program = []
incode = false
while not document.empty? do
line = document.shift
if line =~ CODE_BLOCK_DELIMITER then
incode = ! incode
next
end
program.push line if incode
end
program.join
end
Now, to test this simple program, we run it on this document. When running the result, running it on this document, it should give the same result.
document = File.readlines ARGV[0]
program = program_collector document
puts program
Of course, this being the first document, the program to create a runnable program from it does not yet exists. To that end, we bootstrap foucault by copying all code in this document, by hand, to a separate file and run that.
According to the pandoc manual, fenced code blocks are bit more complicated than just lines of code enclosed by two lines starting with three tildes. For one thing, the number of tildes might be longer than three. However, if so, the ending line should start with, at least, as many tildes, although more are allowed.
As code blocks cannot be nested — that just does not make sense —, we can adapt the first phase bootstrapping program simply by remembering and testing the number of tildes of the starting line. We introduce two functions to determine if a line is a fence and, if so, what length that fence is.
DELIMITER = /^(?<fence>(?<prefix>~~~)~*).*$/
def is_fence?(line)
DELIMITER.match(line)
end
def fence_size(line)
DELIMITER.match(line)[:fence].size
end
Now, we adapt the program a bit.
def program_collector(document)
program = []
incode = false
current_fence_size = 0
while not document.empty? do
line = document.shift
if incode
if is_fence? line and fence_size(line) >= current_fence_size
incode = false
# End of the code block: ignore this line
next
else
# Still in a code block: collect this line for the program
program.push line
end
else
# We're outside a code block
if is_fence? line then
incode = true
current_fence_size = fence_size line
end
# the line is not part of the code, ignore it and go to the next one
next
end
end
program.join
end
document = File.readlines ARGV[0]
program = program_collector document
puts program
Observe how the program is getting harder to read and we have not incorporated all the fence related rules yet.
Not only tildes can make up a fence, also the backtick can do so if a line starts with, at least, three backticks. This usage of the backtick to denote code is different from denoting code inline with a single backtick.
So, the delimiter of a code block can contain of three or more tildes or backticks. However, a block should end with the same symbols in the fence as in the fence it started with. Besides keeping track of the size of a fence, we should also keep track of the type of fence.
We could use these functions and adapt the original program, but the code will only get more complex. Instead, think about determining code blocks differently, not in the last place because there is another way to denote code blocks in markdown alltogether. What we want to know, given the current state and the current line: do we collect this line as part of the program or not?
One way to go about is to extend the state machine and add in what kind code block we're dealing with. So, the initial state still is out of code, but to get into a code block, we've got two options: a backtick fence or a tilde fence. Besides keeping the state, we'll also keep track of the size of the fence. Let us build that state machine.
class CodeBlockDeterminator
def initialize()
to_start_state
end
def collect_line?(line)
collect_line = false
case @state
when :fenced_block
if is_fence?(line) and fence_size(line) >= @size and fence_type(line) == @type then
# recognized the end of this code block
to_start_state
else
# We're still in a code block: collect this line
return true
end
when :start
if is_fence?(line)
# Start of a new code block
@state = :fenced_block
@type = fence_type line
@size = fence_size line
end
end
# not in a code block: don't collect this line
return false
end
private
def to_start_state()
@state = :start
@size = 0
@type = :none
end
FENCE = /^(?<fence>(?<type>~|`){3,}).*$/
def is_fence?(line)
FENCE.match line
end
def fence_size(line)
FENCE.match(line)[:fence].size
end
def fence_type(line)
FENCE.match(line)[:type]
end
end
Using that state machine, the program becomes
def program_collector(document)
program = []
state_machine = CodeBlockDeterminator.new
while not document.empty? do
line = document.shift
program.push line if state_machine.collect_line? line
end
program.join
end
document = File.readlines ARGV[0]
program = program_collector document
puts program
class CodeBlockDeterminator
def initialize()
to_start_state
end
def collect_line?(line)
case @state
when :fenced_block
if is_fence?(line) and fence_size(line) >= @size and fence_type(line) == @type then
# recognized the end of this code block
to_start_state
else
# We're still in a code block: collect this line
return true
end
when :start
if is_fence?(line)
# Start of a new code block
@state = :fenced_block
@type = fence_type line
@size = fence_size line
end
end
# not in a code block: don't collect this line
return false
end
private
def to_start_state()
@state = :start
@size = 0
@type = :none
end
FENCE = /^(?<fence>(?<type>~|`){3,}).*$/
def is_fence?(line)
FENCE.match line
end
def fence_size(line)
FENCE.match(line)[:fence].size
end
def fence_type(line)
FENCE.match(line)[:type]
end
end
Up till now, the bootstrapping process focused on getting the foucault system up and running. Now it does, it is time to make a "real" program out of it. One of the characteristics of such a real program that foucault tries to be, is being a command line tool like any other in the UNIX userland: a program with default behavior, options to change that default behavior, and reasonable error reporting. In this section, we discuss these aspects.
We want foucault to behave like any other command line tool. To that end, we add command line arguments functionality. The optionparser
library is made for this purpose. Following the documentation page of optionparser we adapt foucault, adding options. These options will be discussed in detail in the rest of this section.
#!/usr/bin/env ruby
require 'ostruct'
require 'optparse'
# Set all default options
options = OpenStruct.new
options.mirror = false
options.output_dir = ''
options.output = ''
options.debug = false
OptionParser.new do |opts|
opts.banner = "Literate programming with Foucault — taking a narrative turn"
opts.separator ""
opts.separator "Usage: foucault [options] input files"
opts.separator ""
opts.separator "Options:"
opts.on("-o",
"--output [PATH]",
"Path to output collected program text to") do |path|
options.output = path
end
opts.on("-d", "--debug", "Generate program texts in debug mode") do |d|
options.debug = true
end
opts.on_tail("-h",
"--help",
"Show this message") do
puts opts
end
end.parse!
document = []
ARGV.each do |input_file|
document.concat File.readlines input_file
end
The OpenStructure
options
now contains all the options set by the user.
Foucault can collect code from fenced code blocks. It ignores all other lines in the document. This is the default way of operating and what we want when we generate source code from the document.
However, while programming one often makes errors. Although literate programming is a style of programming that focusses on thinking before programming, errors will still be a regular occurrence during programming. To find and resolve these issues, the program code will be compiled, interpreted of otherwise analysed and found errors will be reported on by mentioning the place in the program text the error approximately did occur.
The line numbers in a program text generated by foucault, however generally do not match the line numbers of that line in the original document. To make debugging easier, therefore, foucault should have a debug mode that keeps the line numbers of the code in the document and the corresponding generated program text the same.
The simple solution would be to make program documentation lines from all the non-code block lines in the document. It makes looking through the generated program code harder as the "live" code is hidden in comments. As we want to adapt the program code in the document, not in the generated program text, it makes more sense to just translate all non-code lines into blank lines, making it easy to spot the code. Ideally, we would like to have tooling support that would in the document refer to the errors in the generated program text.
def program_collector(document, debug = false)
program = []
require_relative 'lib/code_block_determinator.rb'
line_determinator = CodeBlockDeterminator.new
while not document.empty? do
line = document.shift
if line_determinator.collect_line? line
program.push line
elsif debug
program.push "\n"
else
# ignore this line
end
end
program.join
end
Foucault will output the collected program to standard out. Sometimes, however, we want to write the program text to a file. With the -o
or --output
option, the used can specify the (relative) path to that file. This file will be overwritten if it exists and created if it does not.
program = program_collector document, options.debug
if not options.output.empty? then
# try to write collected program text to file specified in options.output
File.open(options.output, "w") do |file|
file.puts program
end
else
# No output file specified: use STDOUT
puts program
end