1     Warning, the examples with variables in this log are
     
2     all wrong. This update explains:
     
3 
     4         !!!!!!!!!!!!!!!!!!!! Update !!!!!!!!!!!!!!!!!!!!
     
5         ! In log19.txt, I realize that my variable     !
     
6         ! handling is wrong. Variables should leave    !
     
7         ! their addresses on the stack, not their      !
     
8         ! values! We need FETCH to get the value from  !
     
9         ! the address!                                 !
    
10         !!!!!!!!!!!!!!!!!!!! Update !!!!!!!!!!!!!!!!!!!!
    
11 
    12     Two new words add to the word "compiler" abilities of
    
13     the interpreter:
    
14 
    15         IMMEDIATE  sets the latest word to be "immediate"
    
16         HIDE       takes the next "word" of input, looks it up
    
17                    in the dictionary, and then sets that word
    
18                    to be hidden (via the word HIDDEN)
    
19 
    20     HIDE seems the easiest to test, so we'll start with that:
    
21 
    22 : emit2 EMIT EMIT ;
    
23 66 65 emit2
    
24 AB
    
25 HIDE emit2
    
26 66 65 emit2
    
27 PARSE ERROR: 66 65 emit2
    
28 
    29     That worked: we get the parse error because emit2 has been
    
30     hidden and is no longer found in the dictionary.
    
31 
    32     The HIDDEN word used by HIDE actually toggles the hidden state,
    
33     so can we call HIDE again to unhide the word?
    
34 
    35 HIDE emit2
    
36 
    37 Program received signal SIGSEGV, Segmentation fault.
    
38 code_HIDDEN () at nasmjf.asm:635
    
39 635         xor [edi], word F_HIDDEN  ; Toggle the HIDDEN bit in place.
    
40 
    41     Oh, ha ha, no, of course not. It's hidden, so HIDE can't
    
42     find it (and since there's absolutely no error checking,
    
43     we crash trying to toggle the bit in some random memory
    
44     location).
    
45 
    46     I guess we could use LATEST and HIDDEN to manually toggle
    
47     it back, but I can't be bothered tonight.
    
48 
    49     Onward to IMMEDIATE.
    
50 
    51 : ab 66 65 EMIT EMIT ;
    
52 IMMEDIATE
    
53 : foo 1000 . ;
    
54 foo
    
55 foo
    
56 55 EMIT
    
57 ;
    
58 
    59 ^C
    
60 Program received signal SIGINT, Interrupt.
    
61 _WORD.skip_non_words () at nasmjf.asm:339
    
62 339         call _KEY               ; get next key, returned in %eax
    
63 
    64    Something went wrong. I had to Ctrl+C to end the
    
65    program. It was merrily taking input, but nothing
    
66    would execute, not even Ctrl+D to end the input
    
67    and exit.
    
68 
    69    Let's try that again and verify we're toggling the
    
70    right word...
    
71 
    72 (gdb) r
    
73 Starting program: /home/dave/nasmjf/nasmjf
    
74 (gdb) c
    
75 Continuing.
    
76 LATEST 4 + C@ .
    
77 6
    
78 LATEST 5 + C@ EMIT
    
79 L
    
80 
    81     Okay, just sanity checking LATEST - it points to
    
82     a word with 6 letters in the name and starts with
    
83     the letter "L" (it's LATEST itself).
    
84 
    85     I'll define my 'ab' word again, try it out (it should
    
86     print the letters "AB"), and check LATEST again...
    
87 
    88 : ab 66 65 EMIT EMIT ;
    
89 ab
    
90 AB
    
91 LATEST 4 + C@ .
    
92 2
    
93 LATEST 5 + C@ EMIT
    
94 a
    
95 IMMEDIATE
    
96 ab
    
97 
    98     Drat! Then it locked up again. So IMMEDIATE is
    
99     definitely not working right.
   
100 
   101     Next night: okay, let's see what's going on...
   
102 
   103 (gdb) break code_IMMEDIATE
   
104 Breakpoint 2 at 0x80494ec: file nasmjf.asm, line 1097.
   
105 (gdb) c
   
106 Continuing.
   
107 : ab 66 65 EMIT EMIT ;
   
108 ab
   
109 AB
   
110 IMMEDIATE
   
111 
   112 Breakpoint 2, code_IMMEDIATE () at nasmjf.asm:1097
   
113 (gdb) p/x (int)var_LATEST
   
114 $1 = 0x804e000
   
115 (gdb) x/10c (int)var_LATEST
   
116 0x804e000:   ...  2 '\002' 97 'a'  98 'b' ...
   
117 
   118     So that's right - LATEST points at word 'ab'...
   
119 
   120 1098        add edi, 4                ; Point to name/flags byte.
   
121 1099        xor byte [edi], F_IMMED   ; Toggle the IMMED bit.
   
122 (gdb) p/x $edi
   
123 $2 = 0x804a6b0
   
124 
   125     That's a dead giveaway, the address in register
   
126     edi should now be LATEST + 4.  But it's actually
   
127     the _address_ of LATEST + 4!
   
128 
   129 (gdb) p/x (int)var_LATEST
   
130 $3 = 0x804e000
   
131 
   132     It still takes me a bit before I see it...
   
133 
   134 (gdb) disass 1099
   
135 No function contains specified address.
   
136 (gdb) disass code_IMMEDIATE
   
137 Dump of assembler code for function code_IMMEDIATE:
   
138    0x080494ec <+0>:     mov    edi,0x804a6ac   <--- should be PTR
   
139    0x080494f1 <+5>:     add    edi,0x4
   
140 => 0x080494f4 <+8>:     xor    BYTE PTR [edi],0x80
   
141    0x080494f7 <+11>:    lods   eax,DWORD PTR ds:[esi]
   
142    0x080494f8 <+12>:    jmp    DWORD PTR [eax]
   
143 End of assembler dump.
   
144 
   145     I finally see it.
   
146 
   147     I have
   
148 
   149         mov edi, var_LATEST
   
150 
   151     where I should have
   
152 
   153         mov edi, [var_LATEST]
   
154 
   155     (so of course it wasn't working after that. LATEST was
   
156     incremented and no longer pointed at word. All further
   
157     interpretation would fail to match!)
   
158 
   159     With that fixed, it should work...
   
160 
   161 (gdb) load
   
162 (gdb) r
   
163 Starting program: /home/dave/nasmjf/nasmjf
   
164 : ab 66 65 EMIT EMIT ;
   
165 ab
   
166 AB
   
167 IMMEDIATE
   
168 
   169     So now 'ab' should execute as soon as the interpreter
   
170     sees it, even in compile mode:
   
171 
   172 : five 5 . ab ;
   
173 AB
   
174 five
   
175 5
   
176 
   177     Yeah! The call to 'ab' executed at "compile time" rather
   
178     than "run time" for the new word 'five'. Using this, we
   
179     could add new language features to FORTH in FORTH.
   
180 
   181     Next, the TICK (single quote ') word gets the address
   
182     of a word (supplied after the ' so it doesn't execute.
   
183     this is the same trick LIT uses).
   
184 
   185     This implementation can only work at compile time because
   
186     the interpreter needs to turn the word that follows as
   
187     a 4-byte address for ' to be able to read and then hope
   
188     over that value. Just for fun, let's try to print the
   
189     address of the EMIT word outside of the compile state:
   
190 
   191 ' EMIT .
   
192 
   193 Program received signal SIGSEGV, Segmentation fault.
   
194 
   195     See?
   
196 
   197     Now let's use it the same way, but in a new compiled word:
   
198 
   199 : addrofemit ' EMIT . ;
   
200 addrofemit
   
201 134521260
   
202 
   203     Looks like it worked, but is that address correct?
   
204 
   205 (gdb) info addr EMIT
   
206 Symbol "EMIT" is at 0x804a1ac in a file compiled without debugging.
   
207 (gdb) p/d 0x804a1ac
   
208 $1 = 134521260
   
209 
   210     Yup!
   
211 
   212     Then the next night, I've got a really exciting one,
   
213     BRANCH0.
   
214 
   215     But first, I'm trying to figure out how to even test
   
216     BRANCH, let alone its conditional big brother!
   
217 
   218     I even worked it out on paper the next morning, and I'm
   
219     still not seeing why this doesn't work:
   
220 
   221 : foo 65 EMIT BRANCH -12 ;
   
222 foo
   
223 A
   
224 Program received signal SIGSEGV, Segmentation fault.
   
225 code_BRANCH () at nasmjf.asm:27
   
226 27          lodsd     ; NEXT: Load from memory into eax, inc esi to point to next word.
   
227 
   228     By my calculations, esi starts off pointing at the
   
229     offset number's instruction (-12), then we should be
   
230     branching back to "65":
   
231 
   232           0    "-12"
   
233          -4    BRANCH
   
234          -8    EMIT
   
235         -12    "65"
   
236 
   237     And I've had a bummer of a time trying to step through
   
238     it by breaking on BRANCH because that word is used
   
239     (correctly) as part of the interpreter loop.
   
240 
   241     So I'm going to copy BRANCH with the silly name BRUNCH
   
242     and see why it's not correct!
   
243 
   244         DEFCODE "BRUNCH",6,,BRUNCH
   
245         add esi, [esi]
   
246         NEXT
   
247 
   248     Should be pretty simple, right? It's just a one-liner!
   
249 
   250 (gdb) break code_BRUNCH
   
251 (gdb) c
   
252 Continuing.
   
253 : foo 65 EMIT BRUNCH -12 ;
   
254 foo
   
255 A
   
256 Breakpoint 2, code_BRUNCH () at nasmjf.asm:251
   
257 251         add esi, [esi]          ; add the offset to the instruction pointer
   
258 
   259     Okay, now let's thoroughly examine this. We're going to
   
260     add the negative number stored where esi points to FROM
   
261     esi. Where does esi point?
   
262 
   263 (gdb) p/x $esi
   
264 $3 = 0x804e01c
   
265 (gdb) x/x $esi
   
266 0x804e01c:      0x0804a0f0
   
267 (gdb) info sym *$esi
   
268 LIT in section .data of /home/dave/nasmjf/nasmjf
   
269 (gdb) x/b $esi+4
   
270 0x804e020:      -12 '\364'
   
271 
   272     Yup, we can see that esi points to the address of LIT
   
273     followed by the value -12. As expected.
   
274 
   275 (gdb) s
   
276 27          lodsd     ; NEXT: Load from memory into eax, inc esi to point to next word.
   
277 
   278     Now that's run, where does esi point now?
   
279 
   280 (gdb) info sym *$esi
   
281 Cannot access memory at address 0x1009810c
   
282 (gdb) p/x $esi
   
283 $4 = 0x1009810c
   
284 
   285     What? That address isn't right. It should be 12 less
   
286     than before, not...oh wait...
   
287 
   288 (gdb) disass code_BRUNCH
   
289 Dump of assembler code for function code_BRUNCH:
   
290    0x08049054 <+0>:     add    esi,DWORD PTR [esi]
   
291 => 0x08049056 <+2>:     lods   eax,DWORD PTR ds:[esi]
   
292    0x08049057 <+3>:     jmp    DWORD PTR [eax]
   
293 End of assembler dump.
   
294 
   295     Now I see it. We subtracted the address of LIT, not
   
296     the -12 that follows it. No wonder I got a segfault.
   
297 
   298     So how do I get the value -12 right after BRANCH?
   
299 
   300     Next night: okay, so I reviewed the ported words
   
301     so far and I'm pretty sure COMMA (,) fits the bill.
   
302     It "compiles" the value on the stack to the current
   
303     position...
   
304 
   305 : foo 65 EMIT BRUNCH -12 , ;
   
306 foo
   
307 A
   
308 Breakpoint 3, code_BRUNCH () at nasmjf.asm:251
   
309 251         add esi, [esi]          ; add the offset to the instruction pointer
   
310 (gdb) x/x **$esi
   
311 0x8049228 <code_LIT>:   0xffad50ad
   
312 
   313     ...no, dang it, that doesn't work either, and for
   
314     the same reason. Sure, ',' will store whatever's on
   
315     the stack, but we're still getting LIT -12 compiled
   
316     first when we're compiling.
   
317 
   318     So I really don't see any easy way to test BRANCH,
   
319     let alone 0BRANCH with an arbitrary snippet of code
   
320     at this point. :-(
   
321 
   322     Next evening: I've also just ported LITSTRING and TELL,
   
323     two more primitives that appear hard to test because
   
324     I'm not sure how to compile literal values into memory
   
325     yet.
   
326 
   327     So, this would be a pretty big let-down way to end a
   
328     log file but...
   
329 
   330     IT APPEARS THAT I'VE PORTED ALL OF THE ASSEMBLY!
   
331 
   332     Yeah, so starting with the next log, I'm going to
   
333     start feeding jonesforth.f, which is the second half
   
334     of the language implementation implemented in itself,
   
335     into my port and fix the inevitable bugs.
   
336 
   337     It's been about six months of slowly chipping away
   
338     at this port nearly every single evening. I can
   
339     barely believe this stage has arrived. This is so
   
340     cool. 8-)