rubylit/README.md

1 # RubyLit - This README is a program! 2 3 The output of this README.md file is a _program_ that turns documents into: 4 5 * <a href="html/README.rb.html">README.rb</a> - a Ruby program 6 * <a href="html/README.html.html">README.html</a> - a (hideously) formatted HTML document 7 8 It's a 9 <a href="https://en.wikipedia.org/wiki/Literate_programming">literate program</a> 10 (wikipedia.org) 11 and the concept comes from Donald Knuth. 12 (Update: I've written about what it _felt like_ to make this little program: 13 <a href="http://ratfactor.com/cards/literate-programming">literate-programming</a>.) 14 15 In order to get the process rolling, there's a non-literate `stage1.rb` script 16 that does the initial "tangling". You'll see what _that_ means in a moment. 17 18 To make this literate programming stuff work, all source is indented. That not 19 only makes it easy to read in source form, but by chosing that method, 20 I've also made this document a valid Markdown file, so the README will be 21 properly formatted when you view it on the Web as HTML output via tools such as 22 <a href="http://ratfactor.com/repos/reporat/">RepoRat</a>. 23 24 Here's how it starts. The program takes the root filename as a command 25 line argument: 26 27 fname = ARGV[0] 28 29 I intially wanted to use the file extension ".lit", but the real magic happened 30 when I realized I could use ".md" and have the README itself be the program: 31 32 fname_input = "#{fname}.md" 33 34 Next, I "tangle" and "weave" the input document: 35 36 <<Tangle>> 37 <<Weave>> 38 39 "Tangle" creates the program and "Weave" creates the documentation. 40 Those little `<<bracket things>>` are literate programming macros that 41 include source code from other sections of the document (identified by 42 Markdown subheadings). 43 44 45 ## Tangle 46 47 What's neat about literate programming is that the source can be presented 48 in any order, so you can explain it however you like. The tangling process 49 puts it back into order so it can actually run. 50 51 The _full_ literate programming concept as imagined by Knuth and implemented 52 in his initial 'WEB' system not only allows you to include bits of code, 53 but even lets you define parametric macros, so the literate document is 54 actually a **meta-language** on top of the underlying programming language! 55 56 I've just implemented a crude "include" macro for this demonstration, but that 57 alone gives me a ton of flexibility! 58 59 Here's how I've made that work. I've got a `segments` hash that will store 60 all of the lines of a "segment" in the literate program. I have the program 61 begin in a segment identified with the `:start` symbol: 62 63 myseg = :start 64 segments = {} 65 segments[myseg] = [] 66 67 Then I loop through all of the lines in the source document and handle 68 just two special cases. (All other lines are treated as the "document" 69 part of the literate program and are completely ignored here!) 70 71 File.open(fname_input).each do |line| 72 <<Handle segments>> 73 <<Handle code lines>> 74 end 75 76 And after gathering the lines into segments, I recursively follow 77 the includes to write out the final program to a file: 78 79 <<Write the program>> 80 81 ## Handle segments 82 83 When I see a "## " at the beginning of the line, I know it's a Markdown 84 level 2 heading, which I'm using to indicate new code segments. They 85 don't _have_ to include code and even if they do, they don't _have_ to 86 be used. 87 88 Here you can see that I'm setting the current segment to the name of 89 the heading and initializing a new array to store the code lines: 90 91 if(line.start_with?('## ')) 92 myseg = line[3..].chomp 93 segments[myseg] = [] 94 end 95 96 ## Handle code lines 97 98 When I see a space at the beginning of the line, I know it's indented source 99 code. This is extremely strict and not flexible and is just one example of the 100 non-industrial nature of this demonstration program. :-) 101 102 if(line.start_with?(' ')) 103 segments[myseg].push(line) 104 end 105 106 ## Write the program 107 108 Here's the fun part! I've got this recursive method called `put_lines` that 109 takes an open destination file, the hash of named code segments (each segment 110 being an array of lines), and a target segment to print. 111 112 I'm looking at each line to see if it's a `<<macro thingy>>` (include request). 113 If it is, I recurse into the requested segment. Otherwise, I just output the 114 current line to the file: 115 116 def put_lines(file, segments, sname) 117 segments[sname].each do |line| 118 119 if(m = /^\s*<<([^>]+)>>\s*$/.match(line)) 120 121 put_lines(file, segments, m[1]) 122 123 else 124 125 file.puts line 126 127 end 128 129 end 130 end 131 132 To start the above recursive process, I open the ".rb" output file and request 133 the `:start` segment: 134 135 File.open("#{fname}.rb", 'w') do |out| 136 put_lines(out, segments, :start) 137 end 138 139 That's it! 140 141 142 ## Weave 143 144 The "weave" part of the application is the documentation creation portion. 145 146 Since my scheme for this literate program is to encode it as pure Markdown, 147 I could just rely on an external tool to create the HTML (in fact, that's 148 probably how you're reading this README right now). 149 150 But since Ruby comes with a Markdown parser as part of it's Standard Library, 151 I figured I might as well include it. The parser and generator are part of 152 the RDoc (Ruby documentation) module: 153 154 require 'rdoc' 155 156 The markdown source is the literate program document (yeah, we read it in 157 the Tangle process and we'll read it again for Weave): 158 159 data = File.read(fname_input) 160 161 Then some boilerplate. RDoc is like a mini-Pandoc in that it can take input 162 and produce output in a bunch of different formats, and we pay for that 163 flexibility with some complexity: 164 165 formatter = RDoc::Markup::ToHtml.new(RDoc::Options.new, nil) 166 html = RDoc::Markdown.parse(data).accept(formatter) 167 168 And then I just write that out to a ".html" file, bookended by start and 169 end document tags: 170 171 File.open("#{fname}.html", 'w') do |out| 172 out.puts("<html><body>") 173 out.print(html) 174 out.puts("</body></html>") 175 end 176 177 And that's it! 178 179 ## Running it! 180 181 To turn this README into a program starts with "stage1", which only includes 182 the "tangle" part of the process (no documentation output): 183 184 $ ruby stage1.rb README 185 Found segment 'Tangle' 186 Found segment 'Handle segments' 187 Found segment 'Handle code lines' 188 Found segment 'Write the program' 189 Found segment 'Weave' 190 Found segment 'Comparing this document with its output' 191 Fetching segment 'start' 192 Fetching segment 'Tangle' 193 Fetching segment 'Handle segments' 194 Fetching segment 'Handle code lines' 195 Fetching segment 'Write the program' 196 Fetching segment 'Weave' 197 198 (As you can see, I also gave it some output to help me debug segment names.) 199 200 That produces `README.rb`, which is now the Ruby program we've described 201 above, which can be used to process itself again: 202 203 ruby README.rb README 204 205 And running that _again_ proves that we're **fully bootstrapped**. We're 206 running the output of the README against the README: 207 208 ruby README.rb README 209 210 (Note that this part of the document you're reading right now has indented 211 "code" blocks to show the command line and output. Those are not valid Ruby, so 212 why is that okay? That's okay because they're never explicitly included in the 213 program! The Ruby interpreter never sees them.) 214 215 ## Bootstrapping 216 217 This repo includes two simple literate test programs (Markdown files) I used to 218 get the intial "stage1" program working: 219 220 * `hello.md` 221 * `hello-segments.md` 222 223 When stage1 was done, I copied it to use as the basis for the final 224 document/program you're reading now. Then I followed the "Running it!" process 225 exactly as shown above and it worked! :-) 226 227 ## A batch processing extension 228 229 Parker Glynn-Adey has extended this program to support the exporting of 230 segments of a literate source file as separate output files...which, in turn, 231 can be literate programs in their own right! Woah. 232 233 See `README-BATCH.md` in this repository for that example.