1     So log01.txt concluded with a nice little demonstration
     
2     of programatically inlining machine code at runtime to
     
3     "compile" a program and run it.
     
4 
     5     The next step is to start to turn this into an actual
     
6     language by creating headers for words (I've decided
     
7     I'll use the Forth term "word" to refer to the functions
     
8     we create in this language).
     
9 
    10        [x] Look up word length from header so it doesn't
    
11            have to be manually created and sent to the
    
12            inline function.
    
13 
    14        [x] Look up word by stored ASCII  name in header at
    
15            runtime. That'll be exciting. I'll practically
    
16            have a programming language at that point.
    
17 
    18     I think I'll use a linked list of words like many
    
19     traditional Forths, since that's what I learned how to
    
20     implement in my JONESFORTH port, nasmjf.
    
21 
    22     Note: I added design-notes.txt to this repo because I
    
23           have been having some on-going thoughts about how
    
24           to implement this program as a whole, but they're
    
25           not things I can act upon right away and I don't
    
26           want to have to come back here searching in these
    
27           logs to find them (or worse, forget about them
    
28           entirely!)
    
29 
    30     Okay, now I've got #1 from above list working. Instead
    
31     of a "header", I've got "tails" at the end of my words.
    
32     Ha ha, cats have tails. So this just keeps getting
    
33     better.
    
34 
    35     I did it that way because then it becomes trivial to get
    
36     the length of the machine code. Here's the definition of
    
37     the exit word now, with its tail:
    
38 
    39         exit:
    
40             mov ebx, 0 ; exit with happy 0
    
41             mov eax, SYS_EXIT
    
42             int 0x80
    
43         exit_tail:
    
44             dd 0 ; null link is end of linked list
    
45             dd (exit_tail - exit) ; len of machine code
    
46             db "exit", 0 ; name, null-terminated
    
47 
    48     So now I don't have to give the length of the word's
    
49     machine code to inline anymore, just the tail address.
    
50     inline gets the stored length and does all the rest!
    
51 
    52     Here's the new inline:
    
53 
    54         ; inline function!
    
55         ;   input: esi - tail of the word to inline
    
56         inline:
    
57             mov edi, [here]    ; destination
    
58             mov ecx, [esi + 4] ; get len into ecx
    
59             sub esi, ecx       ; sub len from  esi (start of code)
    
60             rep movsb ; movsb copies from esi to esi+ecx into edi
    
61             add edi, ecx       ; update here pointer...
    
62             mov [here], edi    ; ...and store it
    
63             ret
    
64 
    65     Still not too complicated. And I think this might even
    
66     be its final form?
    
67 
    68     Let's see if this works...
    
69 
    70 Program received signal SIGSEGV, Segmentation fault.
    
71 inline () at meow5.asm:67
    
72 67	    rep movsb
    
73 
    74     Darn it.
    
75 
    76     Oh, wait! It was inlining the meows just fine, it was
    
77     doing exit that failed. I simply hadn't updated it to
    
78     point to the tail yet. Simple mistake:
    
79 
    80         ; inline exit
    
81         mov esi, exit   <---- oops!
    
82         call inline
    
83 
    84     needs to be:
    
85 
    86         ; inline exit
    
87         mov esi, exit_tail
    
88         call inline
    
89 
    90     How about now...
    
91 
    92 $ mrun
    
93 Meow.
    
94 Meow.
    
95 Meow.
    
96 Meow.
    
97 Meow.
    
98 
    99     Awesome! Guess I can start making it find words by ASCII
   
100     name in the tails, searching by linked list. Very
   
101     exciting progress tonight!
   
102 
   103     I've got two more todos:
   
104 
   105         [x] Add tails to anything that should be a word
   
106 
   107         [ ] Make all words take params from the stack, not
   
108             from pre-defined registers. Yes, we're losing
   
109             some speed by going to main memory, but I have
   
110             a feeling the stack is surely in CPU cache most
   
111             of the time? I should look that up someday...
   
112 
   113     So I'm going to call my word that looks up other words
   
114     by string name by searching through a linked list of
   
115     words 'find', just like in Forth. (Well, except there
   
116     it's FIND, of course.)
   
117 
   118     Two nights later: I've written the 'find' word and added
   
119     tails to all of my words so far. But I've got a
   
120     segfault:
   
121 
   122 dave@cygnus~/meow5$ mr
   
123 ./build.sh: line 33:  1966 Segmentation fault      ./$F
   
124 
   125 	So it's GDB time:
   
126 
   127 dave@cygnus~/meow5$ mb
   
128 Reading symbols from meow5...
   
129     ...
   
130 143	    push temp_meow_name ; the name string to find
   
131 144	    call find           ; answer will be in eax
   
132 81	    pop eax ; first param from stack!
   
133 84	    mov ecx, [last]
   
134 86	    cmp ecx, 0  ; a null pointer (0) is end of list
   
135 87	    je .not_found
   
136 93	    lea edx, [ecx + 8] ; set dictionary name pointer
   
137 94	    mov ebx, eax      ; (re)set name to find pointer
   
138 
   139 	Okay, so here's where I'm comparing the search string to
   
140     be found against the first (well, last) word's name in
   
141 	the linked list ("dictionary"). So let's see if I got
   
142 	the name from the dictionary entry's "tail" correctly.
   
143 
   144 	Oh, and here's my comment block from 'find' explaining
   
145 	the register use:
   
146 
   147 		; input:
   
148 		;   stack -> eax
   
149 		; register use:
   
150 		;   eax - start of null-terminated name to find
   
151 		;   ebx - name to find byte pointer
   
152 		;   ecx - dictionary list pointer
   
153 		;   edx - dictionary name byte pointer
   
154 
   155 	The first thing in the tail should be a link to the next
   
156 	word in the dictionary. The ecx register should have that
   
157 	link:
   
158 
   159 (gdb) x/a $ecx
   
160 0x804908d <find_tail>:	0x8049052 <inline_tail>
   
161 
   162 	Yup! That's right. The next word is 'inline'.
   
163 
   164 	The next thing is the length of the word's machine code:
   
165 
   166 (gdb) x/dw $ecx+4
   
167 0x8049091:	39
   
168 
   169 	39 bytes seems reasonable. Okay, the next thing should be
   
170 	the null-terminated string of the word name:
   
171 
   172 (gdb) x/s $ecx+8
   
173 0x8049095:	"find"
   
174 
   175 	Yes!
   
176 
   177 	And have I correctly pointed to the first byte of this
   
178 	string in the edx register?
   
179 
   180 (gdb) x/s $edx
   
181 0x8049095:	"find"
   
182 
   183 	Wow, also yes!
   
184 
   185 	Okay, so the next thing to confirm is that I have the
   
186 	address of the string to match in register eax:
   
187 
   188 (gdb) x/a $eax
   
189 0x80490c1 <inline_a_meow+10>:	0x74e8308b
   
190 
   191 	Oops! That's not right. That's an address somewhere in my
   
192 	loop that inlines meow five times...
   
193 
   194 	I see it now!
   
195 
   196 143	    push temp_meow_name ; the name string to find
   
197 144	    call find           ; answer will be in eax
   
198 81	    pop eax ; first param from stack!
   
199 
   200 	I forgot that 'call' will push the return address onto
   
201 	the stack. Which is why I can't just pop my parameter.
   
202 
   203 	I need to use the stack pointer and an offset to get the
   
204 	value...
   
205 	
   206 	I use arrays as stacks all the time in higher level
   
207 	languages, so a PUSH and POP are second nature to me.
   
208 	But I must confess that in an assembly language context,
   
209 	I get super confused by terms like "top", "bottom" and
   
210 	"low" and "high".
   
211 	
   212 	So I prefer to make all of this SUPER CONCRETE. Here's
   
213 	my own personal explanation:
   
214 
   215 		push eax ; containing 0xAAA
   
216 		push ebx ; containing 0xBBB
   
217 		push ecx ; containing 0xCCC
   
218 		push edx ; containing 0xDDD
   
219 		pop edx
   
220 		pop ecx
   
221 
   222 			The Stack:
   
223 			----------
   
224 			0xAAA  <-- esp + 4
   
225 			0xBBB  <-- esp
   
226 			0xCCC  <-- esp - 4
   
227 			0xDDD  <-- esp - 8
   
228 
   229 	Heck, I'm gonna verify that for myself right now with
   
230 	all of you watching:
   
231 
   232 (gdb) s
   
233 125	mov eax, 0xAAA
   
234 126	mov ebx, 0xBBB
   
235 127	mov ecx, 0xCCC
   
236 128	mov edx, 0xDDD
   
237 129	push eax
   
238 130	push ebx
   
239 131	push ecx
   
240 132	push edx
   
241 133	pop edx
   
242 134	pop ecx
   
243 (gdb) x $esp + 4
   
244 0xffffd77c:	0x00000aaa
   
245 (gdb) x $esp
   
246 0xffffd778:	0x00000bbb
   
247 (gdb) x $esp - 4
   
248 0xffffd774:	0x00000ccc
   
249 (gdb) x $esp - 8
   
250 0xffffd770:	0x00000ddd
   
251 
   252 	Whew! At least I've got that much right. :-)
   
253 
   254 	So my fix is this:
   
255 
   256 		mov eax, [esp + 4] ; first param from stack!
   
257 	
   258 	And now let's see what we've got in eax:
   
259 
   260 (gdb) x/a $eax
   
261 0x804a006 <temp_meow_name>:	0x776f656d
   
262 
   263 	Perfect. And $ebx should be the same to begin with:
   
264 
   265 (gdb) x/a $ebx
   
266 0x804a006 <temp_meow_name>:	0x776f656d
   
267 
   268 	Yup. Good so far.
   
269 
   270 	...wait. This next line isn't right:
   
271 
   272 96	    cmp edx, ebx
   
273 
   274 	What am I doing? I'm comparing the two addresses here,
   
275 	not the characters they point to. Even worse, I can't
   
276 	compare two pointed-to *values* at the same time. I need
   
277 	to actually store at least one of the two *values* to
   
278 	compare in a register!
   
279 
   280 	Sheesh. Lemme fix this up. Okay, so here's the new
   
281 	register use, which I'm trying to make as conventional
   
282 	as I know how...
   
283 
   284 		; register use:
   
285 		;   al  - to-find name character being checked
   
286 		;   ebx - start of dict word's name string
   
287 		;   ecx - byte offset counter (each string character)
   
288 		;   edx - dictionary list pointer
   
289 		;   ebp - start of to-find name string
   
290 
   291 	And the code has changed quite a bit, so I'm gonna step
   
292 	through it again:
   
293 
   294 (gdb) s
   
295 146	    push temp_meow_name ; the name string to find
   
296 147	    call find           ; answer will be in eax
   
297 find () at meow5.asm:80
   
298 80	    mov ebp, [esp + 4] ; first param from stack!
   
299 83	    mov edx, [last]
   
300 find.test_word () at meow5.asm:85
   
301 85	    cmp edx, 0  ; a null pointer (0) is end of list
   
302 86	    je .not_found
   
303 92	    lea ebx, [edx + 8] ; set dict. word name pointer
   
304 93	    mov ecx, 0         ; reset byte offset counter
   
305 
   306 	Okay, first the ebx register should now point to the
   
307 	current dictionary word's name that we're gonna test:
   
308 
   309 (gdb) x/s $ebx
   
310 0x804909f:	"find"
   
311 
   312 	Good.
   
313 
   314 	And the ebp register should point to the to-find name:
   
315 
   316 (gdb) x/s $ebp
   
317 0x804a006 <temp_meow_name>:	"meow"
   
318 
   319 	Good.
   
320 
   321 find.compare_names_loop () at meow5.asm:95
   
322 95		mov al, [ebp + ecx] ; get next to-find name byte
   
323 96	    cmp al, [ebx + ecx] ; compare with next dict word byte
   
324 
   325 	Now the character in byte register al should be the first
   
326 	one from the to-find name "meow":
   
327 
   328 (gdb) p/c $al
   
329 $2 = 109 'm'
   
330 
   331 	Good.
   
332 
   333 	And the character pointed to by ebx+ecx should be the
   
334 	first one from the dict word "find":
   
335 
   336 (gdb) x/c $ebx+$ecx
   
337 0x804909f:	102 'f'
   
338 
   339 	Good.
   
340 
   341 	And since these don't match, the jump should take us to
   
342 	the next word...
   
343 
   344 97	    jne .try_next_word  ; found a mismatch!
   
345 find.try_next_word () at meow5.asm:102
   
346 102	    mov ecx, [ecx]   ; follow the tail! (linked list)
   
347 Program received signal SIGSEGV, Segmentation fault.
   
348 
   349 	Oh, right. Silly me. I'm storing the dictionary word
   
350 	links in the edx register now, not ecx! I missed this
   
351 	one...
   
352 
   353 	Okay, how about now?
   
354 
   355 find.try_next_word () at meow5.asm:103
   
356 103	    mov edx, [edx]   ; follow the tail! (linked list)
   
357 (gdb) x/a $edx
   
358 0x8049097 <find_tail>:	0x8049052 <inline_tail>
   
359 (gdb) s
   
360 104	    jmp .test_word
   
361 
   362 	That's better. Let's see if we're testing "meow" vs
   
363 	"inline" now (well, 'm' vs 'i'):
   
364 
   365 (gdb) p/c $al
   
366 $1 = 109 'm'
   
367 (gdb) x/c $ebx+$ecx
   
368 0x804905a:	105 'i'
   
369 
   370 	Good!
   
371 
   372 	And the next word should be "meow", so 'm' vs 'm':
   
373 
   374 (gdb) p/c $al
   
375 $2 = 109 'm'
   
376 (gdb) x/c $ebx+$ecx
   
377 0x8049037:	109 'm'
   
378 98	    jne .try_next_word  ; found a mismatch!
   
379 99	    cmp al, 0           ; both hit 0 terminator at same time
   
380 100	    je .found_it
   
381 find.try_next_word () at meow5.asm:103
   
382 103	    mov edx, [edx]   ; follow the tail! (linked list)
   
383 
   384 	What? 
   
385 
   386 	Oh. <facepalm> It just dropped through. I forgot the
   
387 
   388 		jmp .compare_names_loop
   
389 	
   390 	at the end of my loop...
   
391 
   392 	I'll spare you the second go where I had an infinite loop
   
393 	because I had *also* forgotten to increment the ecx
   
394 	register to check the next letter in the strings...
   
395 
   396 	Okay, and now?
   
397 
   398 Reading symbols from meow5...
   
399 (gdb) break 97
   
400 Breakpoint 1 at 0x8049081: file meow5.asm, line 97.
   
401 1: /c $al = <error: No registers.>
   
402 (gdb) r
   
403 Starting program: /home/dave/meow5/meow5
   
404 Breakpoint 1, find.compare_names_loop () at meow5.asm:97
   
405 97	    cmp al, [ebx + ecx] ; compare with next dict word byte
   
406 (gdb) display /c *($ebx + $ecx)
   
407 (gdb) display /c $al
   
408 1: /c $al = 109 'm'
   
409 2: /c *($ebx + $ecx) = 102 'f'
   
410 (gdb) c
   
411 Continuing.
   
412 Breakpoint 1, find.compare_names_loop () at meow5.asm:97
   
413 97	    cmp al, [ebx + ecx] ; compare with next dict word byte
   
414 1: /c $al = 109 'm'
   
415 2: /c *($ebx + $ecx) = 105 'i'
   
416 ...
   
417 1: /c $al = 109 'm'
   
418 2: /c *($ebx + $ecx) = 109 'm'
   
419 ...
   
420 1: /c $al = 101 'e'
   
421 2: /c *($ebx + $ecx) = 101 'e'
   
422 ...
   
423 1: /c $al = 111 'o'
   
424 2: /c *($ebx + $ecx) = 111 'o'
   
425 ...
   
426 1: /c $al = 119 'w'
   
427 2: /c *($ebx + $ecx) = 119 'w'
   
428 ...
   
429 1: /c $al = 0 '\000'
   
430 2: /c *($ebx + $ecx) = 0 '\000'
   
431 (gdb) c
   
432 Continuing.
   
433 
   434 Program received signal SIGSEGV, Segmentation fault.
   
435 inline_a_meow () at meow5.asm:152
   
436 152	    mov esi, [eax]      ; putting directly in reg for now
   
437 
   438 	Yay! (Not the segfault, but the apparent correct matching
   
439 	of the strings.)
   
440 
   441 	Now let's see what's happening once we get a match,
   
442 	because clearly eax is not getting returned with a valid
   
443 	word tail address...
   
444 
   445 (gdb) break find.found_it 
   
446 ...
   
447 Breakpoint 1, find.found_it () at meow5.asm:113
   
448 113	    mov eax, ecx  ; pointer to tail of dictionary word
   
449 
   450 	Gah! I see it. Another ecx that should be an edx. I
   
451 	could have sworn I searched for these...
   
452 
   453 Reading symbols from meow5...
   
454 (gdb) break find.found_it
   
455 Breakpoint 1 at 0x8049097: file meow5.asm, line 113.
   
456 (gdb) r
   
457 Starting program: /home/dave/meow5/meow5
   
458 
   459 Breakpoint 1, find.found_it () at meow5.asm:113
   
460 113	    mov eax, edx  ; pointer to tail of dictionary word
   
461 (gdb) p/a $edx
   
462 $1 = 0x804902f <meow_tail>
   
463 
   464 	That's better. So yeah, we definitely found the meow
   
465 	word by string. Very cool. Let's see what happens next...
   
466 
   467 (gdb) s
   
468 114	    ret           ; (using call/ret for now)
   
469 (gdb)
   
470 inline_a_meow () at meow5.asm:152
   
471 152	    mov esi, [eax]      ; putting directly in reg for now
   
472 (gdb)
   
473 153	    call inline
   
474 (gdb)
   
475 inline () at meow5.asm:62
   
476 62	    mov edi, [here]    ; destination
   
477 
   478 	Yes, very nice...
   
479 
   480 Breakpoint 1, find.found_it () at meow5.asm:113
   
481 Breakpoint 1, find.found_it () at meow5.asm:113
   
482 Breakpoint 1, find.found_it () at meow5.asm:113
   
483 Breakpoint 1, find.found_it () at meow5.asm:113
   
484 
   485 	That's four more 'meow's getting inlined...
   
486 
   487 Breakpoint 1, find.found_it () at meow5.asm:113
   
488 
   489 	That's the 'exit'...
   
490 
   491 113	    mov eax, edx  ; pointer to tail of dictionary word
   
492 (gdb) c
   
493 Continuing.
   
494 
   495 Program received signal SIGSEGV, Segmentation fault.
   
496 inline () at meow5.asm:63
   
497 63	    mov ecx, [esi + 4] ; get len into ecx
   
498 
   499 	Wait, how did esi get the wrong value?
   
500 
   501 	Oh jeez, I have these brackets around eax here:
   
502 
   503 		mov esi, [eax]      ; putting directly in reg for now
   
504 	
   505 	But I want the address in eax, not the value it's pointing
   
506 	to. Yet another easy fix:
   
507 
   508 		mov esi, eax      ; putting directly in reg for now
   
509 
   510 	You know what? I feel like this should be good now.
   
511 
   512 	Let's do this:
   
513 
   514 dave@cygnus~/meow5$ mr
   
515 Meow.
   
516 Meow.
   
517 Meow.
   
518 Meow.
   
519 Meow.
   
520 
   521 	Yes!
   
522 
   523 	I'm now able to find words by string name in the
   
524 	dictionary and "compile" them into memory and run them.
   
525 
   526 	The only TODO "checkbox" I didn't check in this log was
   
527 	this one:
   
528 
   529         [ ] Make all words take params from the stack, not
   
530             from pre-defined registers.
   
531 
   532 	Which should be no problem. That'll be a nice easy way
   
533 	to start the next log, so I'll see you in log03.txt
   
534 	with that!