1     So the error in TCFA (which gets the code address for
     
2     a dictionary entry) has been fixed.
     
3 
     4     Now let's see why the COLON definition is crashing with
     
5     a segfault.
     
6 
     7     I type my usual test word ": FIVE 5 ;".
     
8 
     9 
    10 80          cld    ; Clear the "direction flag" which means the string
    
11 (gdb) c
    
12 Continuing.
    
13 : FIVE 5 ;
    
14 
    15     (We'll skip everything that we now know works from log06.txt.)
    
16 
    17     First, INTERPRET checks STATE to see if we're executing
    
18     or compiling (we're executing).
    
19 
    20 code_INTERPRET.check_state () at nasmjf.asm:239
    
21 239         mov edx,[var_STATE]
    
22 240         test edx,edx
    
23 241         jz .execute             ; Jump if executing.
    
24 
    25     Then we check to see if we are executing a literal value
    
26     or a word (it's a word ":" (COLON)).
    
27 
    28 code_INTERPRET.execute () at nasmjf.asm:254
    
29 254         mov ecx,[interpret_is_lit] ; Literal?
    
30 255         test ecx,ecx               ; Literal?
    
31 256         jnz .do_literal
    
32 
    33     Now we jump to the code pointed to at the beginning
    
34     of COLON, which is DOCOL. (Increasingly, as we define
    
35     more words with other words rather than pure machine
    
36     language, they'll start with DOCOL.)
    
37 
    38     Here I double-check that we're about t jump to the
    
39     first pointe in COLON and that it points to DOCOL.
    
40 
    41 260         jmp [eax]
    
42 (gdb) info symbol $eax
    
43 COLON in section .data of /home/dave/nasmjf/nasmjf
    
44 (gdb) info symbol *$eax
    
45 DOCOL in section .text of /home/dave/nasmjf/nasmjf
    
46 
    47 
    48     Here's DOCOL. I may have stepped through this before
    
49     but it's worth looking at again since it's crucial
    
50     to understanding this type of Forth implementation.
    
51 
    52     Given the glacial pace at which I'm porting this, I need
    
53     lots of reminders!
    
54 
    55     This cheatsheet is currently in a comment at the top
    
56     of my jonesforth.asm:
    
57 
    58        esi - next forth word address to execute
    
59        ebp - return stack for forth word addresses
    
60 
    61     In the source, PUSHRSP and POPRSP usually handle the
    
62     ebp register, which we're using for the return stack
    
63     pointer (RSP).
    
64 
    65     DOCOL's first two lines are both from the PUSHRSP macro
    
66     (you can see that they have lower line numbers).
    
67 
    68     That handles ebp.
    
69 
    70     Then DOCOL advances esi to the next word pointer
    
71     (in COLON after DOCOL itself).
    
72 
    73 DOCOL () at nasmjf.asm:40
    
74 40          lea ebp, [ebp-4]   ; "load effective address" of next stack position
    
75 41          mov [ebp], %1      ; "push" the register value to the address at ebp
    
76 70          add eax, 4      ; eax points to DOCOL (me!) in word definition. Go to next.
    
77 71          mov esi, eax    ; Put the next word pointer into esi
    
78 
    79     Let's see if that's right. Here's the entire definition
    
80     of COLON. We don't see DOCOL here because it's inserted
    
81     by the DEFWORD macro, but it comes right before FWORD.
    
82     (By the way, FWORD is just WORD, but I can't have a
    
83     symbol called "WORD" in NASM because it's a reserved
    
84     keyword.)
    
85 
    86         DEFWORD ":",1,,COLON
    
87         dd FWORD
    
88         dd CREATE
    
89         dd LIT, DOCOL, COMMA
    
90         dd LATEST, FETCH, HIDDEN
    
91         dd RBRAC
    
92         dd EXIT
    
93 
    94     I think it's super-cool that Forth exposes all of the
    
95     primitives needed to create (or replace!) the COLON
    
96     compiler so can you can use them in the interpreter
    
97     yourself. Truly a no-holds-barred language.
    
98 
    99     At any rate, the pointer in esi should be the next one
   
100     in COLON and it should point to WORD (well, FWORD).
   
101 
   102 (gdb) info symbol $esi
   
103 COLON + 4 in section .data of /home/dave/nasmjf/nasmjf
   
104 (gdb) info symbol *$esi
   
105 FWORD in section .data of /home/dave/nasmjf/nasmjf
   
106 
   107     Great! And then the NEXT macro puts the address pointed
   
108     to by esi into eax, increments esi to next word pointer,
   
109     and jumps to the address *pointed to* by the address now
   
110     in eax. HAVE YOU GOT THAT???
   
111 
   112     This is made even more confusing by the lodsd instruction.
   
113     The mnemonic stands for "load string doubleword". The idea
   
114     is that you can use it to load a "string" of values
   
115     by repeatedly calling lodsd (or loadsb for byte, etc.).
   
116     What it actually does is load 4 bytes from the address at
   
117     esi into eax and then increments esi by 4.
   
118 
   119     (By the way, I've come to _loathe_ the terms "word", "double"
   
120     "long", etc. I'm okay with "byte" because it's come to
   
121     mean "8 bits" pretty universally in the year 2022. If
   
122     I were king, we would just use the byte count for these sizes
   
123     like:
   
124         b  = 1 byte  = 8 bits
   
125         b2 = 2 bytes = 16 bits
   
126         b4 = 4 bytes = 32 bits
   
127         b8 = 8 bytes = 64 bits
   
128     and "lodsd" would become "lodsb4". Well, lods* would probably
   
129     have a better mnemonic. But you get the idea. Anyway, harping
   
130     on x86 is, like, a full-time job and it ain't gonna get this
   
131     Forth port done.)
   
132 
   133 
   134 
   135 27          lodsd     ; NEXT: Load from memory into eax, inc esi to point to next word.
   
136 
   137     And did it work?
   
138 
   139 28          jmp [eax] ; Jump to whatever code we're now pointing at.
   
140 (gdb) info symbol *$eax
   
141 code_FWORD in section .text of /home/dave/nasmjf/nasmjf
   
142 
   143     Yup! It's jumping to WORD.
   
144 
   145     (As another aside, it occurs to me that "WORD" is
   
146     a really confusing name for this Forth word - it just
   
147     tokenizes a space-delimited string from input. Otherwise,
   
148     it doesn't have anything to do with Forth's concept
   
149     of "words" as executable code stored in a "dictionary".)
   
150 
   151 code_FWORD () at nasmjf.asm:302
   
152 302         call _WORD
   
153 
   154     Cool, so now I'll skip stepping through WORD/KEY as
   
155     we gather the string "FIVE" (the name of the word I'm
   
156     trying to define) from input.
   
157 
   158 _WORD.skip_non_words () at nasmjf.asm:309
   
159 309         call _KEY               ; get next key, returned in %eax
   
160             ...
   
161 325         mov ecx, edi            ; return it
   
162 
   163     I have to admit, I don't understand why I can't access
   
164     the memory at word_buffer.
   
165 
   166 (gdb) x/s (int)word_buffer
   
167 0x45564946:     <error: Cannot access memory at address 0x45564946>
   
168 
   169     Wait a dang second, 45 56 49 46 isn't an address, it's
   
170     the actual characters 'E' 'V' 'I' 'F' (or "FIVE" stored
   
171     little-endian)!
   
172 
   173     Grrrr... another gripe - the way GDB treats symbols
   
174     just confuses me. I like how NASM does it: foo is
   
175     always an address, [foo] is always the value AT that
   
176     address. It's very consistent.
   
177 
   178     Next night: gosh darn it! I remembered. You gotta put
   
179     a '&' in front of "variables" to get the address...and
   
180     that includes when you're trying to use the 'examine'
   
181     ('x') command to format and view memory using the variable
   
182     name.
   
183 
   184 (gdb) p &word_buffer
   
185 $1 = (<data variable, no debug info> *) 0x804a068 <word_buffer>
   
186 (gdb) x/4c &word_buffer
   
187 0x804a068 <word_buffer>:        70 'F'  73 'I'  86 'V'  69 'E'
   
188 
   189     At any rate, looks good. WORD returns "FIVE".
   
190 
   191 code_FWORD () at nasmjf.asm:303
   
192 303         push edi                ; push base address
   
193 304         push ecx                ; push length
   
194 
   195     And with any luck, now we'll be headed to the next word in
   
196     the COLON definition, CREATE.
   
197 
   198 code_FWORD () at nasmjf.asm:27
   
199 27          lodsd     ; NEXT: Load from memory into eax, inc esi to point to next word.
   
200 28          jmp [eax] ; Jump to whatever code we're now pointing at.
   
201 (gdb) info symbol $eax
   
202 CREATE in section .data of /home/dave/nasmjf/nasmjf
   
203 (gdb) info symbol *$eax
   
204 code_CREATE in section .text of /home/dave/nasmjf/nasmjf
   
205 
   206     Yay!
   
207     
   208     Now CREATE makes the header (dictionary link, name, flags)
   
209     portion of the word we're compiling.
   
210 
   211     In the next log, we'll see if CREATE works and then try to
   
212     track down which word is causing a segfault when COLON runs.