1     The last log saw yet _another_ bug in my DEFVAR macro.
     
2     Sheesh. Hopefully I have variables working correctly
     
3     now?
     
4 
     5     As I mentioned last night, the next words look like a
     
6     real mixture of things. Let's jump into it:
     
7 
     8     WITHIN is a conditional that tests if a number is
     
9     between two other numbers. This highlights the advantage
    
10     of such a flexible language (and the simplicity of the
    
11     syntax). I'll format this slightly to make a bit of a
    
12     truth table out of it to demonstrate how within works:
    
13 
    14 1 2 3 WITHIN .  0
    
15 2 3 1 WITHIN .  0
    
16 3 1 2 WITHIN .  0
    
17 2 1 3 WITHIN .  1  <-- 2 is within 1 and 3
    
18 
    19     DEPTH gives us the depth of the stack:
    
20 
    21 1 2
    
22 .S
    
23 2 1
    
24 DEPTH .
    
25 8
    
26 
    27     Huh? Oh, it's in bytes, not number of items!
    
28 
    29     ALIGNED rounds a number up to the next multiple of 4,
    
30     which aligns addresses on 32-bit (4 byte) computers.
    
31     It's easy to test:
    
32    
    33 1 ALIGNED .
    
34 4
    
35 7 ALIGNED .
    
36 8
    
37 
    38     ALIGN performs ALIGNED on HERE. Here's the definition.
    
39 
    40         : ALIGN HERE @ ALIGNED HERE ! ;
    
41 
    42     I have a sense for when this would be used while
    
43     compiling, but don't really feel like trying to make up
    
44     a test for it. I'm content to see how it's used later
    
45     and test *that*.
    
46 
    47     And now, the moment we've all been waiting for: strings!
    
48 
    49     First, we need a new primitive to store the bytes of the
    
50     string. The word ',' (COMMA) does this for word-sized
    
51     (four bytes, the size of a 32-bit address) data. As with
    
52     the other byte-sized words, the name is prepended with a
    
53     letter 'C'.
    
54 
    55     So let's compare 'C,':
    
56 
    57         : C,
    
58                 HERE @ C!	( store the character in the compiled image )
    
59                 1 HERE +!	( increment HERE pointer by 1 byte )
    
60         ;
    
61 
    62     ...with the assembly definition of ',':
    
63 
    64         mov edi, [var_HERE]
    
65         stosd                  ; puts the value in eax at edi, increments edi
    
66         mov [var_HERE], edi
    
67 
    68     Same thing, only stosd increments edi (HERE) by 4 rather
    
69     than 1.
    
70 
    71     I've learned just to see these primitives in action.
    
72     Trying to use _some_ them on their own is challenging
    
73     and surpisingly unrewarding.
    
74 
    75     Next, S" is a word that stores the characters which
    
76     follow as a string until it hits the end quote: ".
    
77 
    78     Note that there has to be a space after S" or it
    
79     wouldn't be matched as the correct word. However, the
    
80     final quote is NOT a word, it's just the special
    
81     character S" is looking for.
    
82 
    83     Finally, S" pushes the address of the start of the
    
84     string onto the stack, followed by the length of the
    
85     string.
    
86 
    87     Okay, I think I've got all of that:
    
88 
    89 S" Hello World"
    
90 .
    
91 11
    
92 
    93     There we go. "Hello World" is 11 characters long.
    
94 
    95 DUP
    
96 HEX
    
97 .
    
98 804EADC
    
99 @ EMIT
   
100 H
   
101 
   102     It has been stored in memory at a particular address,
   
103     which means we can retrieve the string from there.
   
104     (I've EMITted the 'H' from Hello.)
   
105 
   106     In compile mode, it's stored in the word being compiled,
   
107     along with the LITSTRING word we defined in assembly.
   
108 
   109     Bug where in memory is this stored when in immediate
   
110     mode? Jones explains that this implementation stores it
   
111     in the same place where we compile words. So I guess we
   
112     can do this:
   
113 
   114 HERE @ .
   
115 804EADC
   
116 HERE @ @ EMIT
   
117 H
   
118 HERE @ 1 + @ EMIT
   
119 e
   
120 
   121     And since HERE clearly hasn't been moved to a point
   
122     after the string, that means it is temporary. It will be
   
123     overwritten as soon as we define a new word!
   
124 
   125     A close relative to S" (in fact, it uses S" in compiling
   
126     mode), is the ." word, which is Forth's print string
   
127     word. I'm excited to have this:
   
128 
   129 ." Hello World!"
   
130 Hello World!
   
131 
   132     I've updated the README now that I can do this. :-)
   
133 
   134     And the good stuff keeps coming. The next words allow us
   
135     to define our own constants and variables. Not only
   
136     that, there's a handy explanation for each which would
   
137     have saved me some trouble before had I bothered to look
   
138     ahead (or, you know, actually learned the language I was
   
139     going to implement).
   
140 
   141 	10 CONSTANT TEN
   
142 	VARIABLE FOO
   
143 
   144 	When TEN is executed, it leaves the integer 10 on the stack
   
145         When FOO is executed, it leaves the address of FOO on the stack
   
146 
   147     Let's try a constant:
   
148 
   149 42 CONSTANT answer
   
150 ." The answer is " answer .
   
151 The answer is 42
   
152 
   153     And a variable:
   
154 
   155 VARIABLE foo
   
156 9000 foo !
   
157 foo @ .
   
158 9000
   
159 
   160     The definition of CONSTANT is pretty easy. It's
   
161     basically the same as (and functionally equivalent to)
   
162     this:
   
163 
   164 : answer 42 ;
   
165 ." The answer is " answer .
   
166 The answer is 42
   
167 
   168     But variables needs to allocate some memory and store its
   
169     address. Two simple utilities aid in this:
   
170 
   171         ALLOT - advances HERE by the amount on the stack and
   
172                 leaves the previous HERE on the stack
   
173 
   174         CELLS - multiplies the number on the stack by the
   
175                 natural address size of the machine (4 bytes
   
176                 for our 32-bit implementation)
   
177 
   178     What's neat about CELLS is it shows how you can build up
   
179     words that read rather like a natural language:
   
180 
   181 HEX
   
182 HERE @ .
   
183 804EB30
   
184 5 CELLS ALLOT
   
185 .
   
186 804EB30
   
187 HERE @ .
   
188 804EB44
   
189 
   190     And that looks like 20 bytes...hey, wait a second. Now
   
191     that I can easiy store values, I'll let Forth figure it
   
192     out:
   
193 
   194 HERE @ CONSTANT previous
   
195 5 CELLS ALLOT CONSTANT new
   
196 previous .
   
197 804EB78
   
198 new .
   
199 804EB98
   
200 new previous - .
   
201 20
   
202 
   203     VALUE is like VARIABLE, except that the result is a word
   
204     which leaves its value on the stack like a constant
   
205     instead of its address. Unlike a constant, it can be
   
206     updated by another word, TO.
   
207 
   208         10 VALUE foo   create foo, set to 10
   
209         20 TO foo      update foo to 20
   
210 
   211     Sounds good:
   
212 
   213 10 VALUE foo
   
214 foo .
   
215 10
   
216 20 TO foo
   
217 PARSE ERROR: 20 TO
   
218 
   219     Huh? Oh! Right between these two word definitions is
   
220     where I have it stop reading jonesforth.f on load.
   
221 
   222     Guess it's time to figure out the next bug in my port.
   
223 
   224     I'll try reading all lines again:
   
225 
   226         %assign __lines_of_jf_to_read 10000
   
227 
   228 
   229 PARSE ERROR:    ( look it up in the dictionary )
   
230         >DFA
   
231 PARSE ERROR:    ( look it up in the dictionary )
   
232         >DFA
   
233 
   234 Program received signal SIGSEGV, Segmentation fault.
   
235 _COMMA () at nasmjf.asm:688
   
236 688         stosd     ; puts the value in eax at edi, increments edi
   
237 (gdb)
   
238 
   239     Okay, so the PARSE ERROR message prints out the word
   
240     that caused the trouble, here ">DFA", after a buffer's
   
241     worth of context.
   
242 
   243     Ha ha, how silly. I simply missed that word in my port.
   
244     It's a simple definition since we already have >CFA,
   
245     which returns the codeword for a word pointer. >DFA just
   
246     has to advance 4 bytes to the "data" (so-called threaded
   
247     word addresses) after the codeword.
   
248 
   249     Okay, that's defined. Now can we run all of
   
250     jonesforth.f?
   
251 
   252 Program received signal SIGSEGV, Segmentation fault.
   
253 _COMMA () at nasmjf.asm:697
   
254 697         stosd                  ; puts the value in eax at edi, increments edi
   
255 (gdb)
   
256 
   257     Drat! Nope, still segfaulting. And in COMMA (',') again.
   
258 
   259     I wonder if I've overflowed some memory limitation? How
   
260     to check that...hmmm... Well, comma stores where HERE
   
261     points, and that's in memory reserved with Linux's brk
   
262     syscall. How about I bump that from 0x16000 to 0x64000
   
263     bytes:
   
264 
   265         add eax, 0x64000  ; add our desired number of bytes to break addr
   
266 
   267     Nope, exact same error:
   
268 
   269 _COMMA () at nasmjf.asm:697
   
270 
   271     And same thing if I add another zero to the number. So
   
272     much for an easy answer. So I guess, ideally, I would
   
273     break when COMMA is trying to stosd at an address in edi
   
274     that is outside the reserved FORTH data area. But first
   
275     I need to know what that area is.
   
276 
   277     I'm going to add some custom FORTH variables to capture
   
278     this so it'll be easy to examine. I'll test in gdb
   
279     first:
   
280 
   281 Breakpoint 2, _start () at nasmjf.asm:103
   
282 103         xor ebx, ebx
   
283 104         mov eax, __NR_brk         ; syscall brk
   
284 105         int 0x80
   
285 106         mov [var_HERE], eax       ; eax has start addr of data segment
   
286 (gdb) p/x $eax
   
287 $4 = 0x804e000
   
288 107         mov [var_CSTART], eax     ; store info: start address of data segment
   
289 108         add eax, 0x16000          ; add our desired number of bytes to break addr
   
290 (gdb) p/x (int)var_HERE
   
291 $5 = 0x804e000
   
292 (gdb) p/x (int)var_CSTART
   
293 $6 = 0x804e000
   
294 
   295     So far so good, CSTART contains the start address of the
   
296     data area.
   
297 
   298 109         mov ebx, eax              ; reserve memory by setting this new break addr
   
299 (gdb) p/x $eax
   
300 $7 = 0x80b2000
   
301 110         mov [var_CEND], eax       ; store info: end address of data segment
   
302 111         mov eax, __NR_brk         ; syscall brk again
   
303 (gdb) p/x $eax
   
304 $8 = 0x80b2000
   
305 112         int 0x80
   
306 (gdb) p/x $eax
   
307 117         mov ecx, 0                ; LOADJF read only flag for open
   
308 (gdb) p/x $eax
   
309 $10 = 0x80b2000
   
310 (gdb) p/x (int)var_CEND
   
311 $11 = 0x80b2000
   
312 
   313     That looks right. CEND contains the end address of the
   
314     data segment. Did I get my requested 0x64000 bytes?
   
315 
   316     Let's use the new FORTH vars to find out:
   
317 
   318 (gdb) c
   
319 Continuing.
   
320 HEX CSTART @ . CEND @ . CEND @ CSTART @ - .
   
321 804E000 80B2000 64000
   
322 
   323     Looking good. Now to catch the bad address being used in
   
324     COMMA. I added two compares and an ".oops" label:
   
325 
   326         _COMMA:
   
327         mov edi, [var_HERE]
   
328         cmp edi, [var_CSTART]
   
329         jl .oops
   
330         cmp edi, [var_CEND]
   
331         jg .oops
   
332         stosd
   
333         mov [var_HERE], edi
   
334         ret
   
335     .oops:
   
336         nop
   
337 
   338 (gdb) break _COMMA.oops
   
339 Breakpoint 2 at 0x8049343: file nasmjf.asm, line 707.
   
340 (gdb) c
   
341 Continuing.
   
342 
   343 Breakpoint 2, _COMMA.oops () at nasmjf.asm:707
   
344 707         nop
   
345 (gdb) p var_HERE
   
346 'var_HERE' has unknown type; cast it to its declared type
   
347 (gdb) p (int)var_HERE
   
348 $1 = 61368
   
349 (gdb) p (int)var_CSTART
   
350 $2 = 134537216
   
351 (gdb) p (int)var_CEND
   
352 $3 = 134627328
   
353 
   354     Okay, so HERE has been set to an invalid address
   
355     somehow. I wish backtraces worked. Then I'd be able to
   
356     see which word this came from.
   
357 
   358     I'm tempted to divide and conquer...and it looks like
   
359     if I stop execution of jonesforth.f right before the
   
360     definition of SEE, it doesn't segfault.
   
361 
   362     So I'll continue testing 'til there and then tackle the
   
363     problem head-on.
   
364 
   365     Anyway, where was I? Oh yeah, VALUE!
   
366 
   367     To quote myself:
   
368     
   369     "VALUE is like VARIABLE, except that the result is a word
   
370     which leaves its value on the stack like a constant
   
371     instead of its address. Unlike a constant, it can be
   
372     updated by another word, TO."
   
373 
   374         10 VALUE foo   create foo, set to 10
   
375         20 TO foo      update foo to 20
   
376 
   377     Sounds good:
   
378 
   379 10 VALUE foo
   
380 foo .
   
381 10
   
382 20 TO foo
   
383 foo .
   
384 20
   
385 
   386     That's better. I'll continue with the word testing in
   
387     the next log.