1     First log session to test what I've got so far. GNU Debugger recorded in
     
2     GNU screen for the full GNU experience.
     
3     I'll clean up a lot of the gdb prompts and stuff for clarity.
     
4 
     5 Reading symbols from nasmjf...
     
6 Breakpoint 1 at 0x804900e: file nasmjf.asm, line 80.
     
7 Breakpoint 1, _start () at nasmjf.asm:80
     
8 80          cld    ; Clear the "direction flag" which means the string instructions (such
     
9 82          mov [var_S0], esp ; save the regular stack pointer (used for data) in FORTH var S0!
    
10 84          mov ebp, return_stack_top ; Initialise the return stack pointer
    
11 
    12     Trying a defined "function" in GDB to cut down on the typing. I always
    
13     have to cast the NASM labels to (int) since the debugging info has no
    
14     way of telling GDB what I'm storing there. "int" in this case just means
    
15     I've got a 4-byte (32 bit) value. GDB has a strong C heritage.
    
16         p - displays the VALUE of the label, which is an address
    
17         x - displays the memory at the address
    
18         p/x and x/x displays as hexadecimal
    
19         *(int) uses the address stored AT the memory referenced by the label
    
20             (again, strong C heritage in this syntax)
    
21     All three of these won't always be relevant, but it saves a lot of typing.
    
22 
    23 (gdb) define foo
    
24 Type commands for definition of "foo".
    
25 End with a line saying just "end".
    
26 >p/x (int)$arg0
    
27 >x/x (int)$arg0
    
28 >x/x *(int)$arg0
    
29 >end
    
30 
    31     Initial nonsense over. now we use the main mechanism that drives the Forth
    
32     instructions: the NEXT macro is inlined at the end of every word and here
    
33     to bootstrap the action. cold_start contains the address of the "QUIT" word.
    
34     (quit is a silly name - it doesn't quit Forth, it "quits" TO the interpreter)
    
35     (side note: i'd like everything to be lowercase except assembly macros. But
    
36     after 'quit' and 'docol', I haven't been good about converting them. Will
    
37     probably do a couple rounds of cleanup at some point...)
    
38 
    39     NEXT loads the address of the next instruction and we jump to it, executing
    
40     the machine code there.
    
41 
    42 _start () at nasmjf.asm:88
    
43 88          mov esi, cold_start       ; give next forth word to execute
    
44 27          lodsd     ; NEXT: Load from memory into eax, inc esi to point to next word.
    
45 28          jmp [eax] ; Jump to whatever code we're now pointing at.
    
46 
    47     Since QUIT is defined with the DEFWORD macro, it begins with a call to
    
48     the 'DOCOL' word - which, in essense, sets up the rest of the Forth word
    
49     to be executed (QUIT, in this case) for another call to NEXT.
    
50 
    51 docol () at nasmjf.asm:40
    
52 40          lea ebp, [ebp-4]   ; "load effective address" of next stack position
    
53 41          mov [ebp], %1      ; "push" the register value to the address at ebp
    
54 70          add eax, 4      ; eax points to docol (me!) in word definition. Go to next.
    
55 
    56     Here I use that 'foo' function to see if that's true about the eax register.
    
57     Note that the add 4 instruction has NOT yet executed. GDB always shows the
    
58     next instruction before you tell it to step forward to that instruction!
    
59 
    60 (gdb) foo $eax
    
61 $9 = 0x804a010
    
62 0x804a010:      0x08049000
    
63 0x8049000 <docol>:      0x89fc6d8d
    
64 
    65     Yup! It points to DOCOL all right. Now we step and add 4 to eax:
    
66 
    67 (gdb) s
    
68 71          mov esi, eax    ; Put the next word pointer into esi
    
69 (gdb) foo $eax
    
70 $10 = 0x804a014
    
71 0x804a014:      0x0804a12c
    
72 0x804a12c:      0x08049218
    
73 
    74     Every single Forth word ends with NEXT, which executes the next word.
    
75     In this case, it's happening at the end of DOCOL (and DOCOL's job is
    
76     to get everything set up to have NEXT execute the rest of the word...)
    
77 
    78 (gdb) s
    
79 27          lodsd     ; NEXT: Load from memory into eax, inc esi to point to next word.
    
80 28          jmp [eax] ; Jump to whatever code we're now pointing at.
    
81 
    82     Double-checking that the instructions in QUIT are what we'll be running
    
83     now...
    
84 
    85 (gdb) foo $eax
    
86 $12 = 0x804a12c
    
87 0x804a12c:      0x08049218
    
88 0x8049218 <code_R0>:    0x04c30868
    
89 
    90     Yes! The 'R0' constant is the first thing we run in QUIT! It's really wild
    
91     how constants in Forth are actually words with a single instruction that
    
92     pushes a value onto the stack! In this case, R0 is the top of the return
    
93     stack.
    
94 
    95     The push %5 line is from the DEFCONST macro, which, in turn, calls the
    
96     DEFCODE macro because consts are words. Then the NEXT macro continues to
    
97     the next word in QUIT...
    
98 
    99 code_R0 () at nasmjf.asm:568
   
100 568             push %5
   
101 code_R0 () at nasmjf.asm:27
   
102 27          lodsd     ; NEXT: Load from memory into eax, inc esi to point to next word.
   
103 28          jmp [eax] ; Jump to whatever code we're now pointing at.
   
104 
   105     ...which happens to be RSPSTORE, which puts a value on the return stack.
   
106 
   107 code_RSPSTORE () at nasmjf.asm:201
   
108 201             pop ebp
   
109 code_RSPSTORE () at nasmjf.asm:27
   
110 27          lodsd     ; NEXT: Load from memory into eax, inc esi to point to next word.
   
111 28          jmp [eax] ; Jump to whatever code we're now pointing at.
   
112 
   113     ...and then QUIT runs INTERPRET, which takes words on STDIN and then
   
114         ...calls _WORD to get a word from input which
   
115             ...calls _KEY to get a character ("key") of input
   
116 
   117 code_INTERPRET () at nasmjf.asm:209
   
118 209             call _WORD              ; Returns %ecx = length, %edi = pointer to word.
   
119 _WORD.skip_non_words () at nasmjf.asm:310
   
120 310             call _KEY               ; get next key, returned in %eax
   
121 _KEY () at nasmjf.asm:351
   
122 
   123     _KEY first checks to see if it needs input (currkey has reached
   
124     bufftop). On first run, they're both zero, so yeah, we need more
   
125     input.
   
126 
   127     Aside: again, "key" isn't how we would normally describe this in
   
128     a modern environment - it's the next "character" (and even that's
   
129     becoming a thing of the past now that Unicode is pretty much standard
   
130     everywhere...).
   
131 
   132     Anyway, comparing currkey (ebx = 0) and bufftop (0) sets the Zero
   
133     Flag (ZF) because the difference between them is the same. As we
   
134     can see in the 'info reg' display below:
   
135 
   136 351             mov ebx, [currkey]
   
137 352             cmp ebx, [bufftop]
   
138 353             jge .get_more_input
   
139 (gdb) info reg
   
140 ...
   
141 ebx            0x0                 0
   
142 eflags         0x246               [ PF ZF IF ]
   
143 ...
   
144 
   145     We get more input by telling Linux to give us input from
   
146     STDIN into a fixed-size buffer:
   
147 
   148 
   149 _KEY.get_more_input () at nasmjf.asm:361
   
150 361             xor ebx,ebx             ; 1st param: stdin
   
151 362             mov ecx,buffer          ; 2nd param: buffer
   
152 363             mov [currkey],ecx
   
153 364             mov edx,buffer_size     ; 3rd param: max length
   
154 365             mov eax,__NR_read       ; syscall: read
   
155 366             int 0x80                ; syscall!
   
156 
   157     Now I type "foo<enter>":
   
158 
   159 foo
   
160 
   161     We check to make sure the input isn't zero-length.
   
162     I don't think it would ever be - the <enter> key would
   
163     always give us at least '\n'?
   
164 
   165 367             test eax,eax            ; If %eax <= 0, then exit.
   
166 368             jbe .eof
   
167 369             add ecx,eax             ; buffer+%eax = bufftop
   
168 370             mov [bufftop],ecx
   
169 
   170     We can see how long the input string is. Yup, 4 bytes is
   
171     right: "foo\n".
   
172 
   173 (gdb) foo $eax
   
174 $15 = 0x4
   
175 
   176     Now we're back to _KEY, having gathered some input.
   
177     We repeat the check...
   
178 
   179 371             jmp _KEY
   
180 _KEY () at nasmjf.asm:351
   
181 351             mov ebx, [currkey]
   
182 352             cmp ebx, [bufftop]
   
183 353             jge .get_more_input
   
184 
   185     This time we have input (and bufftop is at a higher
   
186     address than currkey), so we continue on by grabbing
   
187     the current "key" (character):
   
188 
   189 354             xor eax, eax
   
190 355             mov al, [ebx]           ; get next key from input buffer
   
191 
   192     If that worked, the al register now has the first
   
193     character of "foo\n". Yup, there's the "f"! (p/c means
   
194     print as a character. We can also p/s to print a C-style
   
195     string.)
   
196 
   197 (gdb) p/c $al
   
198 $19 = 102 'f'
   
199 
   200     Now we set currkey to the next character and return...
   
201 
   202 356             inc ebx
   
203 357             mov [currkey], ebx        ; increment currkey
   
204 358             ret
   
205 
   206     Back at _WORD, we check to see if we've hit a character
   
207     to skip. Forth is so syntactically simple, I just love it.
   
208 
   209     NOTE that the jbe instruction is "jump if compared value is
   
210     before (less than) or equal", so any character smaller
   
211     than an ASCII space (0x20) will cause us to keep seeking in the
   
212     .skip_non_words loop. This is a clever way to skip spaces,
   
213     tabs, newlines, returns, form feeds, etc. I'll improve the
   
214     comments for these instructions in the actual program now.
   
215 
   216 _WORD.skip_non_words () at nasmjf.asm:311
   
217 311             cmp al,'\'              ; start of a comment?
   
218 312             je .skip_comment        ; if so, skip the comment
   
219 313             cmp al,' '              ; space?
   
220 314             jbe .skip_non_words     ; if so, keep looking
   
221 
   222     Nope, character looks good. So we add it to word_buffer
   
223     in memory. The stosb instruction implicitly copies what's
   
224     in the al register (the 'b' is for byte) to memory at
   
225     the address stored in the edi register.
   
226 
   227     Then edi is incremented so that the next time this happens,
   
228     the next byte will go to the next position, and so forth.
   
229     It turns out, this is the sort of thing we're guaranteeing
   
230     when we cleared the direction flag at the very beginning.
   
231 
   232 317             mov edi,word_buffer     ; put addr to word return buffer in edi
   
233 
   234     Now that we've established that we're past any whitespace
   
235     and are gathering the actual input, we're in .collect_word.
   
236     I'll snip the stepping through _KEY for 'o', 'o', and '\n'
   
237 
   238 _WORD.collect_word () at nasmjf.asm:319
   
239 319             stosb                   ; add character to return buffer
   
240 320             call _KEY               ; get next key, returned in %al
   
241 
   242     After every call to _KEY, we check to see if we're done
   
243     collecting the word. The ja instruction is "jump if the
   
244     compared value is after (greater than)," which is the
   
245     exact opposite of the jbe check above.
   
246     To put it straight: before we were looping WHILE the
   
247     character was whitespace, now we loop UNTIL the character
   
248     is whitespace.
   
249 
   250 321             cmp al,' '              ; is blank?
   
251 322             ja .collect_word        ; if not, keep looping
   
252 
   253     Now _WORD returns the length and address of the collected word.
   
254 
   255 325             sub edi, word_buffer    ; hmm, the len?
   
256 326             mov ecx, edi            ; return it
   
257 327             mov edi, word_buffer    ; return address of the word
   
258 328             ret
   
259 
   260     Then we return to _INTERPRET from _WORD:
   
261 
   262 code_INTERPRET () at nasmjf.asm:212
   
263 212             xor eax,eax             ; back from _WORD...zero eax
   
264 ...
   
265 
   266     Let's check the return values now:
   
267 
   268 (gdb) p $ecx
   
269 $1 = 3
   
270 (gdb) x/3c $edi
   
271 0x804a068 <word_buffer>:        102 'f' 111 'o' 111 'o'
   
272 
   273     Yay! There's the "foo" string that was input.
   
274     Even though I've got some of the _FIND word that tries to
   
275     match the input word, I think this has been quite enough
   
276     for one log. :-)