Re: 6502's and Symbol Tables




"David Schmenk" <dschmenk@xxxxxxxxxxxxxx> wrote in message news:mG3yk.19780$LG4.7379@xxxxxxxxxxxxxxxxxxxxxxx
mdj wrote:
Hi,

As a part of an ongoing interpreter project I had cause to develop a
symbol table in 6502 assembly. Since symbol lookup typically occurs
mostly during tokenisation and not runtime, I chose the following
implementation strategy:

- Symbols are stored sequentially in memory as Pascal String, with the
symbols address immediately following.
- At the oposing end of "symbol memory" a vector keeps track of the
start of each symbol and is "insertion sorted" when a new symbol is
created.
- Symbol lookup works as a binary search via the sorted vector

Being conscious of memory usage and management complexity I avoided
using a hashing algorithm for this project. Of course, as the symbol
table gets big, this approach becomes slower...

I'm interested to know (without having to do a lot of disassembly) how
others have handled this in their projects? Dave, how do you handle
symbols in vm02? Have others disassembled or written assemblers on the
Apple II or other small architectures?

Thanks,

Matt

Hi Matt-

I used a multi-tiered approach with VM02 based on what Java class files look like and Java bytecode executes. All symbol names in the Java class file are strings and collected along with other sorts of runtime data into something called the Constant Pool. Strings, strings, everywhere. So, I have a hash table for all the strings in the system. Along with a reference counted, handle based memory manager, I can quickly add a string into the table and return a handle to it's memory. If the same string is already resident in the table, I just return it's handle and increment the reference count. String comparisons for equality just become a 16 bit handle comparison.

Later in the runtime when the symbol is first referenced by a method call or field load/store I have to traverse some object's superclass hierarchy to resolve the handle to the code or data. I keep a full 32 bits per symbol reference in the Constant Pool so I can have 16 bits for the string handle and 16 bits for the resolved code/data.

This ends up being a lot of book-keeping, but the late dynamic binding nature of Java doesn't give you much option. VM02 also doesn't have a very large global namespace to maintain (especially when you have to fit everything in 64K!). There is the global class table that doesn't get very large for a given program so it can be searched linearly - but again, the results are saved in the symbol reference so it only happens once per symbol. My approach also chews up memory pretty fast as everything in the system is a 32 bit value. The upside is it runs pretty fast once the initial binding has occurred - i.e. no more lookups.

The only part of this that is probably interesting to you is the string hash table. My table is 256 entries of unordered linked lists. The hash function is simple enough too - just rotate and xor through all the characters in the string. I ran quite a few class files through the hashing function and got a nice smooth distribution.

I don't know what your project is targeting, so I'm not sure any of these ideas are useful. It always seems to end up being the memory vs speed trade off.

Dave...

Sorry if I sound like I complain alot, but, here's a few experienced programmers discussing a topic that could be interesting to a less-experienced programmer, but, due to you guys knowing what you mean, no examples are given. That is the purpose of these groups, to educate those who are less knowledgeable. That said, could you please post some samples of what you're discussing, or give us links to source code so we can see it for ourselves?

Thank you,

Bill Garber from GS-Electronics
http://www.garberstreet.com

.



Relevant Pages

  • Re: UDTs (as was vb6)
    ... Third an address string that is 30 chars long. ... you want to have some of these addresses in memory. ... a "reference". ... I haven't explained "member" yet: ...
    (microsoft.public.dotnet.languages.vb)
  • Re: 6502s and Symbol Tables
    ... Symbols are stored sequentially in memory as Pascal String, ... Symbol lookup works as a binary search via the sorted vector ... Along with a reference counted, handle based memory manager, I can quickly add a string into the table and return a handle to it's memory. ...
    (comp.sys.apple2.programmer)
  • Re: UDTs (as was vb6)
    ... Third an address string that is 30 chars long. ... you want to have some of these addresses in memory. ... We need a variable to store the reference to an object ... You write down the variable name, followed by a dot, followed by the member ...
    (microsoft.public.dotnet.languages.vb)
  • Re: MASD syntax to load the current address in C.A at runtime
    ... so my memory on A=PC is not as good. ... CODE object, the CRC on the CODE object will change. ... easier to store individual pieces as the time string was built. ...
    (comp.sys.hp48)
  • Re: 6502s and Symbol Tables
    ... Symbols are stored sequentially in memory as Pascal String, ... Symbol lookup works as a binary search via the sorted vector ... Along with a reference counted, handle based memory manager, I can quickly add a string into the table and return a handle to it's memory. ...
    (comp.sys.apple2.programmer)