1 # RubyLit - This README is a program!
2
3 The output of this README.md file is a _program_ that turns documents into:
4
5 * <a href="html/README.rb.html">README.rb</a> - a Ruby program
6 * <a href="html/README.html.html">README.html</a> - a (hideously) formatted HTML document
7
8 It's a
9 <a href="https://en.wikipedia.org/wiki/Literate_programming">literate program</a>
10 (wikipedia.org)
11 and the concept comes from Donald Knuth.
12 (Update: I've written about what it _felt like_ to make this little program:
13 <a href="http://ratfactor.com/cards/literate-programming">literate-programming</a>.)
14
15 In order to get the process rolling, there's a non-literate `stage1.rb` script
16 that does the initial "tangling". You'll see what _that_ means in a moment.
17
18 To make this literate programming stuff work, all source is indented. That not
19 only makes it easy to read in source form, but by chosing that method,
20 I've also made this document a valid Markdown file, so the README will be
21 properly formatted when you view it on the Web as HTML output via tools such as
22 <a href="http://ratfactor.com/repos/reporat/">RepoRat</a>.
23
24 Here's how it starts. The program takes the root filename as a command
25 line argument:
26
27 fname = ARGV[0]
28
29 I intially wanted to use the file extension ".lit", but the real magic happened
30 when I realized I could use ".md" and have the README itself be the program:
31
32 fname_input = "#{fname}.md"
33
34 Next, I "tangle" and "weave" the input document:
35
36 <<Tangle>>
37 <<Weave>>
38
39 "Tangle" creates the program and "Weave" creates the documentation.
40 Those little `<<bracket things>>` are literate programming macros that
41 include source code from other sections of the document (identified by
42 Markdown subheadings).
43
44
45 ## Tangle
46
47 What's neat about literate programming is that the source can be presented
48 in any order, so you can explain it however you like. The tangling process
49 puts it back into order so it can actually run.
50
51 The _full_ literate programming concept as imagined by Knuth and implemented
52 in his initial 'WEB' system not only allows you to include bits of code,
53 but even lets you define parametric macros, so the literate document is
54 actually a **meta-language** on top of the underlying programming language!
55
56 I've just implemented a crude "include" macro for this demonstration, but that
57 alone gives me a ton of flexibility!
58
59 Here's how I've made that work. I've got a `segments` hash that will store
60 all of the lines of a "segment" in the literate program. I have the program
61 begin in a segment identified with the `:start` symbol:
62
63 myseg = :start
64 segments = {}
65 segments[myseg] = []
66
67 Then I loop through all of the lines in the source document and handle
68 just two special cases. (All other lines are treated as the "document"
69 part of the literate program and are completely ignored here!)
70
71 File.open(fname_input).each do |line|
72 <<Handle segments>>
73 <<Handle code lines>>
74 end
75
76 And after gathering the lines into segments, I recursively follow
77 the includes to write out the final program to a file:
78
79 <<Write the program>>
80
81 ## Handle segments
82
83 When I see a "## " at the beginning of the line, I know it's a Markdown
84 level 2 heading, which I'm using to indicate new code segments. They
85 don't _have_ to include code and even if they do, they don't _have_ to
86 be used.
87
88 Here you can see that I'm setting the current segment to the name of
89 the heading and initializing a new array to store the code lines:
90
91 if(line.start_with?('## '))
92 myseg = line[3..].chomp
93 segments[myseg] = []
94 end
95
96 ## Handle code lines
97
98 When I see a space at the beginning of the line, I know it's indented source
99 code. This is extremely strict and not flexible and is just one example of the
100 non-industrial nature of this demonstration program. :-)
101
102 if(line.start_with?(' '))
103 segments[myseg].push(line)
104 end
105
106 ## Write the program
107
108 Here's the fun part! I've got this recursive method called `put_lines` that
109 takes an open destination file, the hash of named code segments (each segment
110 being an array of lines), and a target segment to print.
111
112 I'm looking at each line to see if it's a `<<macro thingy>>` (include request).
113 If it is, I recurse into the requested segment. Otherwise, I just output the
114 current line to the file:
115
116 def put_lines(file, segments, sname)
117 segments[sname].each do |line|
118
119 if(m = /^\s*<<([^>]+)>>\s*$/.match(line))
120
121 put_lines(file, segments, m[1])
122
123 else
124
125 file.puts line
126
127 end
128
129 end
130 end
131
132 To start the above recursive process, I open the ".rb" output file and request
133 the `:start` segment:
134
135 File.open("#{fname}.rb", 'w') do |out|
136 put_lines(out, segments, :start)
137 end
138
139 That's it!
140
141
142 ## Weave
143
144 The "weave" part of the application is the documentation creation portion.
145
146 Since my scheme for this literate program is to encode it as pure Markdown,
147 I could just rely on an external tool to create the HTML (in fact, that's
148 probably how you're reading this README right now).
149
150 But since Ruby comes with a Markdown parser as part of it's Standard Library,
151 I figured I might as well include it. The parser and generator are part of
152 the RDoc (Ruby documentation) module:
153
154 require 'rdoc'
155
156 The markdown source is the literate program document (yeah, we read it in
157 the Tangle process and we'll read it again for Weave):
158
159 data = File.read(fname_input)
160
161 Then some boilerplate. RDoc is like a mini-Pandoc in that it can take input
162 and produce output in a bunch of different formats, and we pay for that
163 flexibility with some complexity:
164
165 formatter = RDoc::Markup::ToHtml.new(RDoc::Options.new, nil)
166 html = RDoc::Markdown.parse(data).accept(formatter)
167
168 And then I just write that out to a ".html" file, bookended by start and
169 end document tags:
170
171 File.open("#{fname}.html", 'w') do |out|
172 out.puts("<html><body>")
173 out.print(html)
174 out.puts("</body></html>")
175 end
176
177 And that's it!
178
179 ## Running it!
180
181 To turn this README into a program starts with "stage1", which only includes
182 the "tangle" part of the process (no documentation output):
183
184 $ ruby stage1.rb README
185 Found segment 'Tangle'
186 Found segment 'Handle segments'
187 Found segment 'Handle code lines'
188 Found segment 'Write the program'
189 Found segment 'Weave'
190 Found segment 'Comparing this document with its output'
191 Fetching segment 'start'
192 Fetching segment 'Tangle'
193 Fetching segment 'Handle segments'
194 Fetching segment 'Handle code lines'
195 Fetching segment 'Write the program'
196 Fetching segment 'Weave'
197
198 (As you can see, I also gave it some output to help me debug segment names.)
199
200 That produces `README.rb`, which is now the Ruby program we've described
201 above, which can be used to process itself again:
202
203 ruby README.rb README
204
205 And running that _again_ proves that we're **fully bootstrapped**. We're
206 running the output of the README against the README:
207
208 ruby README.rb README
209
210 (Note that this part of the document you're reading right now has indented
211 "code" blocks to show the command line and output. Those are not valid Ruby, so
212 why is that okay? That's okay because they're never explicitly included in the
213 program! The Ruby interpreter never sees them.)
214
215 ## Bootstrapping
216
217 This repo includes two simple literate test programs (Markdown files) I used to
218 get the intial "stage1" program working:
219
220 * `hello.md`
221 * `hello-segments.md`
222
223 When stage1 was done, I copied it to use as the basis for the final
224 document/program you're reading now. Then I followed the "Running it!" process
225 exactly as shown above and it worked! :-)
226
227 ## A batch processing extension
228
229 Parker Glynn-Adey has extended this program to support the exporting of
230 segments of a literate source file as separate output files...which, in turn,
231 can be literate programs in their own right! Woah.
232
233 See `README-BATCH.md` in this repository for that example.