1 # RubyLit - This README is a program!
     
2 
     3 The output of this README.md file is a _program_ that turns documents into:
     
4 
     5 * <a href="html/README.rb.html">README.rb</a> - a Ruby program
     
6 * <a href="html/README.html.html">README.html</a> - a (hideously) formatted HTML document
     
7 
     8 It's a
     
9 <a href="https://en.wikipedia.org/wiki/Literate_programming">literate program</a>
    
10 (wikipedia.org)
    
11 and the concept comes from Donald Knuth.
    
12 (Update: I've written about what it _felt like_ to make this little program:
    
13 <a href="http://ratfactor.com/cards/literate-programming">literate-programming</a>.)
    
14 
    15 In order to get the process rolling, there's a non-literate `stage1.rb` script
    
16 that does the initial "tangling". You'll see what _that_ means in a moment.
    
17 
    18 To make this literate programming stuff work, all source is indented. That not
    
19 only makes it easy to read in source form, but by chosing that method,
    
20 I've also made this document a valid Markdown file, so the README will be
    
21 properly formatted when you view it on the Web as HTML output via tools such as
    
22 <a href="http://ratfactor.com/repos/reporat/">RepoRat</a>.
    
23 
    24 Here's how it starts. The program takes the root filename as a command
    
25 line argument:
    
26 
    27     fname = ARGV[0]
    
28 
    29 I intially wanted to use the file extension ".lit", but the real magic happened
    
30 when I realized I could use ".md" and have the README itself be the program:
    
31 
    32     fname_input = "#{fname}.md"
    
33 
    34 Next, I "tangle" and "weave" the input document:
    
35 
    36     <<Tangle>>
    
37     <<Weave>>
    
38 
    39 "Tangle" creates the program and "Weave" creates the documentation.
    
40 Those little `<<bracket things>>` are literate programming macros that
    
41 include source code from other sections of the document (identified by
    
42 Markdown subheadings).
    
43 
    44 
    45 ## Tangle
    
46 
    47 What's neat about literate programming is that the source can be presented
    
48 in any order, so you can explain it however you like. The tangling process
    
49 puts it back into order so it can actually run.
    
50 
    51 The _full_ literate programming concept as imagined by Knuth and implemented
    
52 in his initial 'WEB' system not only allows you to include bits of code,
    
53 but even lets you define parametric macros, so the literate document is
    
54 actually a **meta-language** on top of the underlying programming language!
    
55 
    56 I've just implemented a crude "include" macro for this demonstration, but that
    
57 alone gives me a ton of flexibility!
    
58 
    59 Here's how I've made that work. I've got a `segments` hash that will store
    
60 all of the lines of a "segment" in the literate program. I have the program
    
61 begin in a segment identified with the `:start` symbol:
    
62 
    63     myseg = :start
    
64     segments = {}
    
65     segments[myseg] = []
    
66 
    67 Then I loop through all of the lines in the source document and handle
    
68 just two special cases. (All other lines are treated as the "document"
    
69 part of the literate program and are completely ignored here!)
    
70 
    71     File.open(fname_input).each do |line|
    
72         <<Handle segments>>
    
73         <<Handle code lines>>
    
74     end
    
75 
    76 And after gathering the lines into segments, I recursively follow
    
77 the includes to write out the final program to a file:
    
78 
    79     <<Write the program>>
    
80 
    81 ## Handle segments
    
82 
    83 When I see a "## " at the beginning of the line, I know it's a Markdown
    
84 level 2 heading, which I'm using to indicate new code segments. They
    
85 don't _have_ to include code and even if they do, they don't _have_ to
    
86 be used.
    
87 
    88 Here you can see that I'm setting the current segment to the name of
    
89 the heading and initializing a new array to store the code lines:
    
90 
    91       if(line.start_with?('## '))
    
92         myseg = line[3..].chomp
    
93         segments[myseg] = []
    
94       end
    
95 
    96 ## Handle code lines
    
97 
    98 When I see a space at the beginning of the line, I know it's indented source
    
99 code. This is extremely strict and not flexible and is just one example of the
   
100 non-industrial nature of this demonstration program. :-)
   
101 
   102       if(line.start_with?(' '))
   
103         segments[myseg].push(line)
   
104       end
   
105 
   106 ## Write the program
   
107 
   108 Here's the fun part! I've got this recursive method called `put_lines` that
   
109 takes an open destination file, the hash of named code segments (each segment
   
110 being an array of lines), and a target segment to print.
   
111 
   112 I'm looking at each line to see if it's a `<<macro thingy>>` (include request).
   
113 If it is, I recurse into the requested segment. Otherwise, I just output the
   
114 current line to the file:
   
115 
   116     def put_lines(file, segments, sname)
   
117       segments[sname].each do |line|
   
118 
   119         if(m = /^\s*<<([^>]+)>>\s*$/.match(line))
   
120 
   121           put_lines(file, segments, m[1])
   
122 
   123         else
   
124 
   125           file.puts line
   
126 
   127         end
   
128 
   129       end
   
130     end
   
131 
   132 To start the above recursive process, I open the ".rb" output file and request
   
133 the `:start` segment:
   
134 
   135     File.open("#{fname}.rb", 'w') do |out|
   
136       put_lines(out, segments, :start)
   
137     end
   
138 
   139 That's it!
   
140 
   141 
   142 ## Weave
   
143 
   144 The "weave" part of the application is the documentation creation portion.
   
145 
   146 Since my scheme for this literate program is to encode it as pure Markdown,
   
147 I could just rely on an external tool to create the HTML (in fact, that's
   
148 probably how you're reading this README right now).
   
149 
   150 But since Ruby comes with a Markdown parser as part of it's Standard Library,
   
151 I figured I might as well include it. The parser and generator are part of
   
152 the RDoc (Ruby documentation) module:
   
153 
   154     require 'rdoc'
   
155 
   156 The markdown source is the literate program document (yeah, we read it in
   
157 the Tangle process and we'll read it again for Weave):
   
158 
   159     data = File.read(fname_input)
   
160 
   161 Then some boilerplate. RDoc is like a mini-Pandoc in that it can take input
   
162 and produce output in a bunch of different formats, and we pay for that
   
163 flexibility with some complexity:
   
164 
   165     formatter = RDoc::Markup::ToHtml.new(RDoc::Options.new, nil)
   
166     html = RDoc::Markdown.parse(data).accept(formatter)
   
167 
   168 And then I just write that out to a ".html" file, bookended by start and
   
169 end document tags:
   
170 
   171     File.open("#{fname}.html", 'w') do |out|
   
172       out.puts("<html><body>")
   
173       out.print(html)
   
174       out.puts("</body></html>")
   
175     end
   
176 
   177 And that's it!
   
178 
   179 ## Running it!
   
180 
   181 To turn this README into a program starts with "stage1", which only includes
   
182 the "tangle" part of the process (no documentation output):
   
183 
   184     $ ruby stage1.rb README
   
185     Found segment 'Tangle'
   
186     Found segment 'Handle segments'
   
187     Found segment 'Handle code lines'
   
188     Found segment 'Write the program'
   
189     Found segment 'Weave'
   
190     Found segment 'Comparing this document with its output'
   
191     Fetching segment 'start'
   
192     Fetching segment 'Tangle'
   
193     Fetching segment 'Handle segments'
   
194     Fetching segment 'Handle code lines'
   
195     Fetching segment 'Write the program'
   
196     Fetching segment 'Weave'
   
197 
   198 (As you can see, I also gave it some output to help me debug segment names.)
   
199 
   200 That produces `README.rb`, which is now the Ruby program we've described
   
201 above, which can be used to process itself again:
   
202 
   203     ruby README.rb README
   
204 
   205 And running that _again_ proves that we're **fully bootstrapped**. We're
   
206 running the output of the README against the README:
   
207 
   208     ruby README.rb README
   
209 
   210 (Note that this part of the document you're reading right now has indented
   
211 "code" blocks to show the command line and output. Those are not valid Ruby, so
   
212 why is that okay? That's okay because they're never explicitly included in the
   
213 program!  The Ruby interpreter never sees them.)
   
214 
   215 ## Bootstrapping
   
216 
   217 This repo includes two simple literate test programs (Markdown files) I used to
   
218 get the intial "stage1" program working:
   
219 
   220 * `hello.md`
   
221 * `hello-segments.md`
   
222 
   223 When stage1 was done, I copied it to use as the basis for the final
   
224 document/program you're reading now. Then I followed the "Running it!" process
   
225 exactly as shown above and it worked! :-)
   
226 
   227 ## A batch processing extension
   
228 
   229 Parker Glynn-Adey has extended this program to support the exporting of
   
230 segments of a literate source file as separate output files...which, in turn,
   
231 can be literate programs in their own right! Woah.
   
232 
   233 See `README-BATCH.md` in this repository for that example.