1     Tonight, we'll see how much of the FIND word works. FIND looks for
     
2     words in the "dictionary" of defined Forth words via linked list.
     
3     The interpreter uses it to look up the addresses of the word
     
4     implementations so it can "compile" them into new words definitions.
     
5 
     6 Reading symbols from nasmjf...
     
7 Breakpoint 1 at 0x804900e: file nasmjf.asm, line 80.
     
8 
     9     Now that I'm using GNU Screen with windows for Vim and GDB that
    
10     will close if either application exits, I need to reload the
    
11     file in GDB when I make changes (previously I was just restarting
    
12     GDB).
    
13 
    14 (gdb) file nasmjf
    
15 Reading symbols from nasmjf...
    
16 
    17     No need to step through everything until this point because I
    
18     already know it works: WORD collects a word entered through STDIN.
    
19 
    20     So I break when we enter the implementatoin for FIND and then
    
21     continue (run the program). The "foo" below is where nasmjf is
    
22     asking for input and I type "foo" and hit enter.
    
23 
    24 (gdb) break _FIND
    
25 Breakpoint 2 at 0x804920b: file nasmjf.asm, line 485.
    
26 (gdb) c
    
27 Continuing.
    
28 foo
    
29 
    30     Our breakpoint triggers. Now we're in FIND. It checks if we've
    
31     run out of entries.
    
32 
    33 Breakpoint 2, _FIND () at nasmjf.asm:485
    
34 485         push esi                ; _FIND! Save esi, we'll use this reg for string comparison
    
35 488         mov edx,[var_LATEST]    ; LATEST points to name header of the latest word in the diction
    
36 ary
    
37 _FIND.test_word () at nasmjf.asm:490
    
38 490             test edx,edx            ; NULL pointer?  (end of the linked list)
    
39 491         je .not_found
    
40 
    41         And then I think this is clever: instead of immediately
    
42         checking if the name strings match, it checks the precalculated
    
43         and stored length of the name first. Much more efficient.
    
44 
    45 496         xor eax,eax
    
46 497         mov al, [edx+4]           ; al = flags+length field
    
47 498         and al,(F_HIDDEN|F_LENMASK) ; al = name length
    
48 499         cmp cl,al        ; Length is the same?
    
49 500         jne .prev_word          ; nope, try prev
    
50 
    51         And that's what happens here: the length doesn't match, so
    
52         we move to the previous word in the linked list. And it
    
53         starts over at .test_word...
    
54 
    55 _FIND.prev_word () at nasmjf.asm:517
    
56 517         mov edx,[edx]           ; Move back through the link field to the previous word
    
57 518         jmp .test_word          ; loop, test prev word
    
58 _FIND.test_word () at nasmjf.asm:490
    
59 490             test edx,edx            ; NULL pointer?  (end of the linked list)
    
60 
    61     So I set a new breakpoint back in INTERPRET right after FIND
    
62     returns to see how a "not found" condition is handled.
    
63 
    64 (gdb) break 215
    
65 Breakpoint 3 at 0x8049043: file nasmjf.asm, line 215.
    
66 (gdb) c
    
67 Continuing.
    
68 Breakpoint 3, code_INTERPRET () at nasmjf.asm:215
    
69 215         test eax,eax            ; Found?
    
70 
    71     If FIND fails, INTERPRET checks if the input is a numeric literal.
    
72 
    73 216         jz .try_literal
    
74 code_INTERPRET.try_literal () at nasmjf.asm:230
    
75 230         inc byte [interpret_is_lit] ; DID NOT MATCH a word, trying literal number
    
76 231         call _NUMBER            ; Returns the parsed number in %eax, %ecx > 0 if error
    
77 _NUMBER () at nasmjf.asm:407
    
78 407         xor eax,eax
    
79 408         xor ebx,ebx
    
80 410         test ecx,ecx            ; trying to parse a zero-length string is an error, but returns
    
81 0
    
82 411         jz .return
    
83 
    84     It's neat how Forth supports numeric input in the base
    
85     of your choice without any extra syntax. Just set BASE.
    
86 
    87 413         mov edx, [var_BASE]    ; get BASE (in dl)
    
88 416         mov bl,[edi]            ; bl = first character in string
    
89 417         inc edi
    
90 418         push eax                ; push 0 on stack
    
91 _NUMBER () at nasmjf.asm:419
    
92 419         cmp bl,'-'              ; negative number?
    
93 420         jnz .convert_char
    
94 _NUMBER.convert_char () at nasmjf.asm:435
    
95 435         sub bl,'0'              ; < '0'?
    
96 436         jb .negate
    
97 437         cmp bl,10        ; <= '9'?
    
98 438         jb .compare_base
    
99 439         sub bl,17              ; < 'A'? (17 is 'A'-'0')
   
100 440         jb .negate
   
101 441         add bl,10
   
102 _NUMBER.compare_base () at nasmjf.asm:444
   
103 444             cmp bl,dl               ; >= BASE?
   
104 445         jge .negate
   
105 _NUMBER.negate () at nasmjf.asm:453
   
106 453         pop ebx
   
107 _NUMBER.negate () at nasmjf.asm:454
   
108 454         test ebx,ebx
   
109 455         jz .return
   
110 _NUMBER.return () at nasmjf.asm:459
   
111 459         ret
   
112 
   113     Coming back from NUMBER, a value > 0 in ecx indicates an error
   
114     in trying to parse a numeric value.
   
115 
   116 code_INTERPRET.try_literal () at nasmjf.asm:232
   
117 232         test ecx,ecx
   
118 233         jnz .parse_error
   
119 
   120     And sure enough, "foo" was not a valid base-ten (the default)
   
121     value, so we jump to the parse_error section. This should
   
122     print an error message.
   
123 
   124 code_INTERPRET.parse_error () at nasmjf.asm:267
   
125 267         mov ebx,2               ; 1st param: stderr
   
126 268         mov ecx,errmsg          ; 2nd param: error message
   
127 269         mov edx,(errmsgend - errmsg) ; 3rd param: length of string
   
128 270         mov eax,[__NR_write]    ; write syscall
   
129 
   130     But oops! Looks like I've got an error.
   
131 
   132 Program received signal SIGSEGV, Segmentation fault.
   
133 code_INTERPRET.parse_error () at nasmjf.asm:270
   
134 270         mov eax,[__NR_write]    ; write syscall
   
135 
   136     The next evening, I load it up again to see what's going on...
   
137 
   138 Reading symbols from nasmjf...
   
139 (gdb) break code_INTERPRET.parse_error
   
140 Breakpoint 2 at 0x80490a6: file nasmjf.asm, line 267.
   
141 (gdb) cont
   
142 Continuing.
   
143 foo
   
144 
   145 Breakpoint 2, code_INTERPRET.parse_error () at nasmjf.asm:267
   
146 267         mov ebx,2               ; 1st param: stderr
   
147 268         mov ecx,errmsg          ; 2nd param: error message
   
148 269         mov edx,(errmsgend - errmsg) ; 3rd param: length of string
   
149 
   150     First I try to print the value at errmsg as a string. It
   
151     should be the string "PARSE ERROR: ".
   
152 
   153 (gdb) x/s $ecx
   
154 0x804a315 <errmsg>:     ""
   
155 
   156     Weird. Let's look at the first 4 bytes:
   
157 
   158 (gdb) x/4x $ecx
   
159 0x804a315 <errmsg>:     0x00    0x00    0x00    0x53
   
160 
   161     Weird! Looking at stuff...
   
162 
   163 (gdb) info addr errmsg
   
164 Symbol "errmsg" is at 0x804a315 in a file compiled without debugging.
   
165 (gdb) info addr errmsgend
   
166 Symbol "errmsgend" is at 0x804a322 in a file compiled without debugging.
   
167 (gdb) x/10c $ecx
   
168 0x804a315 <errmsg>:     0 '\000'        0 '\000'        0 '\000'        83 'S'  69 'E'  32 ' '  69 '
   
169 E'      82 'R'
   
170 0x804a31d:      82 'R'  79 'O'
   
171 
   172     Huh, so I've basically got "---SE ERROR: " (where '-' is NUL). Something
   
173     is happening to the first three bytes of my string. Or is this some
   
174     alignment issue? I'll see... To be continued.