rubylit/README-BATCH.md

1 # RubyLit - This README is a program! 2 3 *Note:* This literary program was extended by [Parker Glynn-Adey](https://pgadey.ca). 4 The interesting extension aspects of the extension are explained in the sections: 5 6 * [Usage](#label-Usage) 7 * [Batch Generation](#label-Batch+Generation) 8 9 <hr> 10 11 The output of this README.md file is a _program_ that turns documents into: 12 13 * <a href="html/README.rb.html">README.rb</a> - a Ruby program 14 * <a href="html/README.html.html">README.html</a> - a (hideously) formatted HTML document 15 16 It's a 17 [literate program](https://en.wikipedia.org/wiki/Literate_programming) 18 (wikipedia.org) 19 and the concept comes from Donald Knuth. 20 (Update: I've written about what it _felt like_ to make this little program: 21 [literate-program](http://ratfactor.com/cards/literate-programming) 22 .) 23 24 In order to get the process rolling, there's a non-literate `stage1.rb` script 25 that does the initial "tangling". You'll see what _that_ means in a moment. 26 27 To make this literate programming stuff work, all source is indented. That not 28 only makes it easy to read in source form, but by chosing that method, 29 I've also made this document a valid Markdown file, so the README will be 30 properly formatted when you view it on the Web as HTML output via tools such as 31 [RepoRat](http://ratfactor.com/repos/reporat/). 32 33 The program is executed in three steps: 34 35 <<Parse Arguments>> 36 <<Tangle>> 37 <<Weave>> 38 39 "Parse Arguments" is the boring stuff, to make sure that we comply with [Usage](#label-Usage). 40 "Tangle" extracts a program (or some other text file) and "Weave" creates the documentation. 41 Those little `<<bracket things>>` are literate programming macros that 42 include source code from other sections of the document (identified by 43 Markdown subheadings). 44 45 ## Usage 46 47 puts "usage: ruby rubylit.rb INPUT-LIT [OUTPUT] [START]" 48 49 RubyLit accepts three arguments `rubylit.rb INPUT-LIT [OUTPUT] [START]`. 50 The `INPUT-LIT` argument is mandatory and specifies the literate program that RubyLit should process. 51 It is specified as a literate markdown file without the extension, for example `README` in place of `README.md`. 52 53 The arguments `[OUTPUT]` and `[START]` are optional. 54 If `OUTPUT` is present, then RubyLit will tangle the literate program in to the file `OUTPUT`. 55 If `OUTPUT` is absent, then RubyLit will produce `INPUT-LIT.rb`. 56 57 If the argument `START` is present, then RubyLit will tangle the literate program in to the file `OUTPUT` starting from the segment `START`. 58 This allows for the possibility of [Batch Generation](#label-Batch+Generation) of files. 59 The default behaviour, when `START` is absent, is to tangle the entire literate program `INPUT-LIT`. 60 61 ## Tangle 62 63 What's neat about literate programming is that the source can be presented 64 in any order, so you can explain it however you like. The tangling process 65 puts it back into order so it can actually run. 66 67 The _full_ literate programming concept as imagined by Knuth and implemented 68 in his initial 'WEB' system not only allows you to include bits of code, 69 but even lets you define parametric macros, so the literate document is 70 actually a **meta-language** on top of the underlying programming language! 71 72 I've just implemented a crude "include" macro for this demonstration, but that 73 alone gives me a ton of flexibility! 74 75 Here's how I've made that work. I've got a `segments` hash that will store 76 all of the lines of a "segment" in the literate program. I have the program 77 begin in a segment identified with the `:start` symbol: 78 79 myseg = :start 80 segments = {} 81 segments[myseg] = [] 82 83 Then I loop through all of the lines in the source document and handle 84 just two special cases. (All other lines are treated as the "document" 85 part of the literate program and are completely ignored here!) 86 87 File.open(fname_input).each do |line| 88 <<Handle segments>> 89 <<Handle code lines>> 90 end 91 92 And after gathering the lines into segments, I recursively follow 93 the includes to write out the final program to a file: 94 95 <<Write the program>> 96 97 ## Handle segments 98 99 When I see a "## " at the beginning of the line, I know it's a Markdown 100 level 2 heading, which I'm using to indicate new code segments. They 101 don't _have_ to include code and even if they do, they don't _have_ to 102 be used. 103 104 Here you can see that I'm setting the current segment to the name of 105 the heading and initializing a new array to store the code lines: 106 107 if(line.start_with?('## ')) 108 myseg = line[3..].chomp 109 segments[myseg] = [] 110 end 111 112 ## Handle code lines 113 114 When I see a space at the beginning of the line, I know it's indented source 115 code. This is extremely strict and not flexible and is just one example of the 116 non-industrial nature of this demonstration program. :-) 117 118 if(line.start_with?(' ')) 119 segments[myseg].push(line) 120 end 121 122 ## Write the program 123 124 Here's the fun part! I've got this recursive method called `put_lines` that 125 takes an open destination file, the hash of named code segments (each segment 126 being an array of lines), and a target segment to print. 127 128 I'm looking at each line to see if it's a `<<macro thingy>>` (include request). 129 If it is, I recurse into the requested segment. Otherwise, I just output the 130 current line to the file: 131 132 def put_lines(file, segments, sname) 133 segments[sname].each do |line| 134 135 if(m = /^\s*<<([^>]+)>>\s*$/.match(line)) 136 137 put_lines(file, segments, m[1]) 138 139 else 140 141 file.puts line 142 143 end 144 145 end 146 end 147 148 To start the above recursive process, I open the output file and request the appropriate segment. 149 If the `START` argument was present as `ARGV[2]` at the time of execution, then we start the recursive process from there. 150 Otherwise, we start tangling from the beginning of the literate program at the `:start` symbol. 151 152 File.open(fname_output, 'w') do |out| 153 if ARGV[2] 154 puts "Tangling #{fname_input} to output #{fname_output} from \"#{ARGV[2]}\"." 155 put_lines(out, segments, ARGV[2]) 156 else 157 puts "Tangling #{fname_input} to output #{fname_output} from :start." 158 put_lines(out, segments, :start) 159 end 160 end 161 162 That's it! 163 164 ## Weave 165 166 The "weave" part of the application is the documentation creation portion. 167 168 Since my scheme for this literate program is to encode it as pure Markdown, 169 I could just rely on an external tool to create the HTML (in fact, that's 170 probably how you're reading this README right now). 171 172 But since Ruby comes with a Markdown parser as part of it's Standard Library, 173 I figured I might as well include it. The parser and generator are part of 174 the RDoc (Ruby documentation) module: 175 176 require 'rdoc' 177 178 The markdown source is the literate program document (yeah, we read it in 179 the Tangle process and we'll read it again for Weave): 180 181 data = File.read(fname_input) 182 183 Then some boilerplate. RDoc is like a mini-Pandoc in that it can take input 184 and produce output in a bunch of different formats, and we pay for that 185 flexibility with some complexity: 186 187 formatter = RDoc::Markup::ToHtml.new(RDoc::Options.new, nil) 188 html = RDoc::Markdown.parse(data).accept(formatter) 189 190 And then I just write that out to a ".html" file, bookended by start and end document tags: 191 192 File.open("#{fname}.html", 'w') do |out| 193 puts "Weaving #{fname_input} to output #{fname}.html." 194 out.puts("<html><body>") 195 out.print(html) 196 out.puts("</body></html>") 197 end 198 199 And that's it! 200 201 ## Running it! 202 203 To turn this README into a program starts with "stage1", which only includes 204 the "tangle" part of the process (no documentation output): 205 206 $ ruby stage1.rb README 207 208 Found segment 'Usage' 209 Found segment 'Tangle' 210 Found segment 'Handle segments' 211 Found segment 'Handle code lines' 212 Found segment 'Write the program' 213 Found segment 'Weave' 214 Found segment 'Running it!' 215 Found segment 'Bootstrapping' 216 Found segment 'Batch Generation' 217 Found segment 'Parse Arguments' 218 Fetching segment 'start' 219 Fetching segment 'Parse Arguments' 220 Fetching segment 'Usage' 221 Fetching segment 'Tangle' 222 Fetching segment 'Handle segments' 223 Fetching segment 'Handle code lines' 224 Fetching segment 'Write the program' 225 Fetching segment 'Weave' 226 227 (As you can see, I also gave it some output to help me debug segment names.) 228 229 That produces `README.rb`, which is now the Ruby program we've described 230 above, which can be used to process itself again: 231 232 ruby README.rb README 233 234 And running that _again_ proves that we're **fully bootstrapped**. We're 235 running the output of the README against the README: 236 237 ruby README.rb README 238 239 (Note that this part of the document you're reading right now has indented 240 "code" blocks to show the command line and output. Those are not valid Ruby, so 241 why is that okay? That's okay because they're never explicitly included in the 242 program! The Ruby interpreter never sees them.) 243 244 ## Bootstrapping 245 246 This repo includes two simple literate test programs (Markdown files) I used to 247 get the intial "stage1" program working: 248 249 * `hello.md` 250 * `hello-segments.md` 251 252 When stage1 was done, I copied it to use as the basis for the final 253 document/program you're reading now. Then I followed the "Running it!" process 254 exactly as shown above and it worked! :-) 255 256 ## Batch Generation 257 258 RubyLit allows for creating multiple files from a single literate markdown file. 259 We call this process batch generation. 260 It was inspired by the LaTeX package [docstrip](https://ctan.org/pkg/docstrip). 261 For example, one can extract a `batch-generation.sh` from this `README`. 262 The following script first creates the usual `README.rb`, `README.html`, and then produces a file: `batch-generation.sh`. 263 264 #!/bin/bash 265 266 ruby README.rb README 267 ruby README.rb README rubylit.rb 268 269 ruby rubylit.rb README batch-generation.sh "Batch Generation" 270 271 Once you can extract arbitary segments as output files, the sky is the limit. 272 You can have all sorts of weird recursive things going on. 273 For example, you can extract other literate programs as Markdown files. 274 One issue with embedding Markdown files in a literate program is that Markdown cares aboue line initial whitespaces. 275 The batch generation system is _just_ a bash script, so we can clean up initial whitespaces with a bit of `sed`. 276 For example, we can do the following. 277 278 ruby rubylit.rb README markdown-example.md "Markdown Example" 279 sed --in-place 's/^ //g' markdown-example.md # clean-up initial whitespace 280 ruby rubylit.rb markdown-example hello-world.sh 281 282 283 ## Markdown Example 284 285 We leave finding a use for this weird recursive literary programming as an exercise for the reader. 286 287 # An Embedded Literary Program. 288 289 This is itself a literary program! 290 *Wow* it's so meta! 291 292 #!/bin/bash 293 echo "Hello, world!" 294 295 ## Parse Arguments 296 297 This is the most boring and hacky part of the program. 298 Ideally, it would be replaced with something using [OptionParser](https://ruby-doc.org/stdlib-2.4.2/libdoc/optparse/rdoc/OptionParser.html). 299 300 if ARGV[0] 301 #puts "ARGV[0] is present: adopting value for fname" 302 fname = ARGV[0] 303 fname_input = "#{fname}.md" 304 if ARGV[1] 305 #puts "ARGV[1] is present: adopting value for fname_output" 306 fname_output = ARGV[1] 307 if ARGV[2] 308 #puts "ARGV[2] is present: adopting value for initial_segment" 309 initial_segment=ARGV[2] 310 else 311 #puts "No ARGV[2] is present: assuming default value :start" 312 initial_segment=:start 313 end 314 else 315 #puts "No ARGV[1] is present: default value" 316 fname_output = "#{fname}.rb" 317 end 318 else 319 <<Usage>> 320 exit 1; 321 end