1     The last update was very exciting. Now I'm actually
     
2     reading a single word's definition from a string,
     
3     inlining all of the code into memory, and executing it.
     
4 
     5     To put it in concrete terms, this 'meow5' definition:
     
6 
     7         "meow meow meow meow meow exit"
     
8 
     9     Was turned into this in memory:
    
10 
    11         <meow word machine code>
    
12         <meow word machine code>
    
13         <meow word machine code>
    
14         <meow word machine code>
    
15         <exit word machine code>
    
16 
    17     The 'exit' word even pops the exit status code from the
    
18     stack. Between that and all of the meowing, we're
    
19     getting extremely "conCATenative" here. Sorry.
    
20 
    21     So I need to figure out what step comes next. I need to:
    
22 
    23         1. Get user input from STDIN
    
24         2. Figure out how immediate mode will work
    
25            (currently, i start in compile mode and when
    
26            that's done, I execute whatever was compiled!)
    
27         3. Create the colon ':' and semicolon ';' words to
    
28            toggle compile mode (and create word definitions!)
    
29 
    30     I would also like to have introspection and diagnostics
    
31     and visualizations as early in this project as possible!
    
32     But for now, I'm gonna stay the course towards an
    
33     absolutely minimumal proof of concept. I want to be able
    
34     to type this:
    
35 
    36         : meow5 meow meow meow meow meow exit ;
    
37         meow5
    
38 
    39     And see (something like) this:
    
40 
    41         Meow.
    
42         Meow.
    
43         Meow.
    
44         Meow.
    
45         Meow.
    
46         BYE!
    
47         $
    
48 
    49     So how about #2 and/or #3 from the list above - how
    
50     simple can the colon command be?
    
51 
    52     So I've updated the input string:
    
53 
    54         db 'meow  : meow5 meow meow meow meow meow ;
    
55             meow5 exit', 0
    
56 
    57     (ignore the newline)
    
58     Which reads as:
    
59 
    60         1. call meow right now in "immediate" mode
    
61         2. : switches to compile mode and
    
62         3. store "meow5" as name
    
63         4. inline 5 meow words
    
64         5. ; writes tail (including saved name) and
    
65         6. switches back to immediate mode
    
66         7. call new meow5 word
    
67         8. exit
    
68 
    69     and have created a mode var and added imm/comp flags to
    
70     tails. todo:
    
71 
    72     [ ] colon word store name somewhere
    
73     [ ] find should also match mode flag (use &)
    
74     [ ] semicolon should write tail
    
75     [ ] immediate mode should find and exec words...somehow
    
76 
    77     Next two nights: Hmm...okay, so adding more words that
    
78     will execute as they're entered ("immediate" words) is
    
79     forcing me to deal with how they should return execution
    
80     to whatever called them.
    
81 
    82     To recap:
    
83 
    84         * Compiled code in meow5 will be concatenated
    
85           together, so there is no such thing as "return"
    
86           _within_ a compiled word - execution truly just
    
87           flows from the end of one word to the beginning of
    
88           the next.
    
89 
    90         * Many words (':' or 'colon' is an example), which
    
91           must be able to operate outside of a compiled word
    
92           because it is needed to do the compiling!
    
93 
    94         * Some words can execute _both_ ways in a single
    
95           definition. 'exit' is my only example currently -
    
96           it's simple because no part of the program needs
    
97           to execute after it's done, of course.
    
98 
    99         * A select few words will even need to be executed
   
100           from within the meow5 binary itself (in assembly)
   
101           to make the initial functionality of the
   
102           interpreter available. 'find' and 'inline' are two
   
103           such fundamental words.
   
104 
   105         * I've slowly been converting all of the traditional
   
106           procedure calls in this prototype into simple
   
107           jumps and manually keeping track of a single level
   
108           of return address.
   
109 
   110     Now the ':' command forces me to implement a return
   
111     stack for immediate execution, at the very least,
   
112     because it will need to call, for instance, 'get_token',
   
113     to get the name of the word being defined:
   
114 
   115         : meow 5 ...;
   
116 
   117     Here 'meow5' is the name of the new word.
   
118 
   119     Anyway, after sleeping on it, I think I'll solve this by
   
120     having macros to start and end a word in assembly. In
   
121     addition to taking care of the housekeeping duties of
   
122     creating the tail metadata, they'll also setup return
   
123     jumping and stack poppin'. The length of the word in the
   
124     tail will NOT include the return stuff so it won't be
   
125     included when the word is inlined.
   
126 
   127     Anyway, it makes sense in my head.
   
128 
   129     The basic word-making macros are easy enough:
   
130 
   131         %macro DEFWORD 1 ; takes name of word to make
   
132             %1:
   
133         %endmacro
   
134 
   135         %macro ENDWORD 3
   
136             end_%1:
   
137             ; todo: immediate "return" goes here
   
138             tail_%1:
   
139                 dd LAST_WORD_TAIL ; linked list
   
140                 %define LAST_WORD_TAIL tail_%1
   
141                 dd (tail_%1 - %1) ; length of word
   
142                 dd %3             ; flags
   
143                 db %2, 0        ; name as string
   
144         %endmacro
   
145 
   146     I tested this and I'll spare you the GDB walkthrough. It
   
147     works and I was able to execute this word from my input
   
148     string.
   
149 
   150         DEFWORD foo
   
151             mov eax, 42
   
152         ENDWORD foo, "foo", IMMEDIATE
   
153 
   154     So I'll test a call/return action with this foo, then
   
155     convert them all.
   
156 
   157     It worked. Now converting...
   
158 
   159     Worked out some bugs.
   
160 
   161     Silly little mistakes.
   
162 
   163     Here's the thing: it's getting pretty annoying to have
   
164     to bust out GDB, guess where to set a break point, step
   
165     through the code, try to remember the C-dominated syntax
   
166     to print stuff, etc., only to find out that I forgot to
   
167     add a line or I put the wrong thing in a string data
   
168     declaration.
   
169 
   170     Don't get me wrong, I'm grateful for GDB. It's been a
   
171     good tool and I know I should probably re-learn some of
   
172     its customization options.
   
173 
   174     But what I really want is better debugging in my program
   
175     itself.
   
176 
   177     So I've added "word not found" handling in the main
   
178     routine, so it goes like this:
   
179 
   180         get_next_token:
   
181             CALLWORD get_token
   
182                 if all done, jump to .run_it
   
183             CALLWORD find
   
184                 if not found, jump to .token_not_found
   
185             CALLWORD inline
   
186             jmp get_next_token
   
187 
   188         .run_it:
   
189             jmp data_segment
   
190 
   191         .token_not_found:
   
192             print first part of error message
   
193             print token name
   
194             print last part of error message
   
195 
   196     I'll test it out:
   
197 
   198         input_buffer_start:
   
199             db 'honk meow meow meow meow meow exit', 0
   
200 
   201 $ mr
   
202 Could not find word "honk"
   
203 
   204     Excellent, that'll save me untold minutes of debugging
   
205     right there.
   
206 
   207     Now let's see if I've converted everthing to my new
   
208     macros DEFWORD ... ENDWORD properly:
   
209 
   210 $ mr
   
211 Meow!
   
212 Meow!
   
213 Meow!
   
214 Meow!
   
215 Meow!
   
216 Meow!
   
217 Meow!
   
218 ...
   
219 
   220     Oh no! I've got an infinite loop somehow.
   
221 
   222     Even though I'm putting in some of the "infrastructure"
   
223     for it, I'm not doing any immediate mode execution yet,
   
224     so it's nothing like that.
   
225 
   226     Nothing for it but to debug with GDB...
   
227 
   228 (gdb) break get_next_token.run_it
   
229 Breakpoint 1 at 0x80491c2: file meow5.asm, line 272.
   
230 ...
   
231 273	    jmp data_segment ; jump to the "compiled" program
   
232 0x0804a054 in data_segment ()
   
233 (gdb)
   
234 Single stepping until exit from function data_segment,
   
235 which has no line number information.
   
236 
   237     Oh, right. There's no debugger info for the machine code
   
238     I've inlined into memory and executed.
   
239 
   240     All the more reason to have debugging tools built into
   
241     my program itself. But I don't have those yet, so at
   
242     least GDB can give me a disassembly:
   
243 
   244 (gdb) disas &data_segment,&here
   
245 Dump of assembler code from 0x804a054 to 0x804a454:
   
246    0x0804a054 <data_segment+0>:	mov    $0x1,%ebx
   
247 => 0x0804a059:	mov    $0x804a006,%ecx
   
248    0x0804a05e:	mov    $0x6,%edx
   
249    0x0804a063:	mov    $0x4,%eax
   
250    0x0804a068:	int    $0x80
   
251    0x0804a06a:	jmp    *0x804a459
   
252    0x0804a070:	mov    $0x1,%ebx
   
253    0x0804a075:	mov    $0x804a006,%ecx
   
254    0x0804a07a:	mov    $0x6,%edx
   
255    0x0804a07f:	mov    $0x4,%eax
   
256    0x0804a084:	int    $0x80
   
257    0x0804a086:	jmp    *0x804a459
   
258 
   259    ... repeats three more times...
   
260 
   261    0x0804a0e0:	pop    %ebx
   
262    0x0804a0e1:	mov    $0x1,%eax
   
263    0x0804a0e6:	int    $0x80
   
264    0x0804a0e8:	jmp    *0x804a459
   
265    0x0804a0ee:	add    %al,(%eax)
   
266    0x0804a0f0:	add    %al,(%eax)
   
267 
   268     So the nice thing about 5 "meows" in a row is that the
   
269     repetition is really easy to spot.
   
270 
   271     The weird thing is that they all end with a jump back to
   
272     the exact same place near the beginning (but not exactly
   
273     at the begining) of the inlined code.
   
274 
   275     Where is that jump coming from?
   
276 
   277     Oh, ha ha, I found it almost immediately. It's the
   
278     "return" that I put in my ENDWORD macro. That's not
   
279     supposed to be inlined with the "compiled" version of
   
280     words and it's due to a silly mistake.
   
281 
   282     The last line here:
   
283 
   284         end_%1:
   
285             jmp [return_addr]
   
286         tail_%1:
   
287             dd LAST_WORD_TAIL
   
288             dd (tail_%1 - %1)
   
289 
   290     Should be:
   
291 
   292             dd (end_%1 - %1)
   
293 
   294     So the jmp [return_addr] doesn't get inlined!
   
295 
   296     I'll fix that.
   
297 
   298     And now?
   
299 
   300 (gdb) disas  &data_segment,&here
   
301 Dump of assembler code from 0x804a054 to 0x804a454:
   
302    0x0804a054 <data_segment+0>:	push   %es
   
303    0x0804a055:	mov    0x6ba0804,%al
   
304    0x0804a05a:	add    %al,(%eax)
   
305    0x0804a05c:	add    %bh,0x4(%eax)
   
306    0x0804a062:	int    $0x80
   
307    0x0804a064:	jmp    *0x804a459
   
308    0x0804a06a:	push   %es
   
309    0x0804a06b:	mov    0x6ba0804,%al
   
310    0x0804a070:	add    %al,(%eax)
   
311    0x0804a072:	add    %bh,0x4(%eax)
   
312    0x0804a078:	int    $0x80
   
313    0x0804a07a:	jmp    *0x804a459
   
314    0x0804a080:	push   %es
   
315     ...
   
316 
   317     What on earth? That ain't right.
   
318 
   319     Next night: ohhhh...crud. Yeah, the problem is due to
   
320     the "return" code at the end of each word. My
   
321     dirt-simple inline is going to need an additional
   
322     length: there's a distance from the tail to the
   
323     beginning of the machine code and a separate length of
   
324     the machine code.  (They used to be the same thing.)
   
325 
   326     The DEFWORD macro produces this for "meow":
   
327 
   328         meow:
   
329             ...
   
330         end_meow:
   
331             jmp [return_addr]
   
332         tail_meow:
   
333             ...
   
334             dd (end_meow - meow)
   
335             dd (tail_meow - meow) <-- need to add this
   
336 
   337     And any other code that reads the tail (I guess that's
   
338     just 'find' right now?) will also need to be updated. I
   
339     wonder if I should be storing these "tail offsets" in
   
340     NASM macros as constants so I don't have to hunt them
   
341     down if they change in the future?
   
342 
   343     Yeah, I'll do that too. In addition to making changes
   
344     painless, it will make my intent clearer in the code
   
345     than bare offset numbers and a comment ever could.
   
346 
   347         ; Memory offsets for each item in tail:
   
348         %define T_CODE_LEN    4
   
349         %define T_CODE_OFFSET 8
   
350         %define T_FLAGS       12
   
351         %define T_NAME        16
   
352 
   353     Inline is re-worked to use the length and offset of the
   
354     machine code in relation to the tail address:
   
355 
   356         DEFWORD inline
   
357             pop esi ; param1: tail of word to inline
   
358             mov edi, [here]    ; destination
   
359             mov eax, [esi + T_CODE_LEN]    ; get len of code
   
360             mov ebx, [esi + T_CODE_OFFSET] ; get start of code
   
361             sub esi, ebx    ; set start of code for movsb
   
362             mov ecx, eax    ; set len of code for movsb
   
363             rep movsb       ; copy [esi]...[esi+ecx] into [edi]
   
364             add [here], eax ; save current position
   
365         ENDWORD inline, "inline", (IMMEDIATE)
   
366 
   367     Crossing fingers...
   
368 
   369 $ mr
   
370 Meow.
   
371 Meow.
   
372 Meow.
   
373 Meow.
   
374 Meow.
   
375 
   376     Yay, working again!
   
377 
   378     Now I can try to do something _new_ with these changes:
   
379     find immediate mode and compile mode words.
   
380 
   381     And to _really_ do this right, I'll use the FORTH colon
   
382     word ':' as my immediate/compile mode separator.
   
383 
   384     Here's my new "input buffer" string:
   
385 
   386         db 'meow meow : meow meow meow exit', 0
   
387 
   388     For now the definition of ':' will _just_ set the mode:
   
389 
   390         DEFWORD colon
   
391             mov dword [mode], COMPILE
   
392         ENDWORD colon, ":", (IMMEDIATE)
   
393 
   394     And I've got two different definitions of 'meow' all
   
395     ready to go. They're both called "meow" in the
   
396     dictionary, but one of them has an IMMEDIATE flag and
   
397     the other has the COMPILE flag to specify which mode
   
398     they should match. The only difference is that they
   
399     print different strings.
   
400 
   401     If all goes well, the "input buffer" string I set above
   
402     should print two immediate meows and then compile three
   
403     compile meows and an exit and then run that...
   
404 
   405 $ mr
   
406 Immediate Meow!
   
407 Immediate Meow!
   
408 Meow.
   
409 Meow.
   
410 Meow.
   
411 
   412     Wow!
   
413 
   414     So I guess I've done two of the four TODOs I set at the
   
415     start of this log above:
   
416 
   417     [ ] colon word store name somewhere
   
418     [x] find should also match mode flag (use &)
   
419     [ ] semicolon should write tail
   
420     [x] immediate mode should find and exec words...somehow
   
421 
   422     The colon word isn't storing the word name and there's
   
423     no semicolon yet, so I'm not adding the new words to the
   
424     dictionary yet, but I also made progress in other areas.
   
425 
   426     I'll start a new log now with the other two TODOs.
   
427 
   428     See you in log05.txt!