Re: Am I parsing this correctly? (when do I build the symbol table)



On Thu, 17 May 2007 21:43:38 -0700, Ryan Dary <iecc@xxxxxxxxxxxx>
wrote:

Take a look at this source code, ...

Sub do_something_else( byRef i As Integer )
i = i * 10
End Sub

Function do_something( i As Integer ) As Integer
Dim a As Integer = i + 10
do_something_else i
Return a * 5
End Function

My understanding of the syntax tree is that I'm not supposed to be
worried about the "meaning" of the code, but rather the "structure" of
the code. So, I'm not building a symbol tree at this phase... the
problem with that seems to be that I'm unable to make heads or tails
of the lines of code within the function declaration. For instance,
as I parse the Dim statement (which is used to declare a variable), I
am able to parse the components "Dim a As Integer = <exp>" where the
<exp> (expression) seems to be impossible to really parse without
having a symbol table thus far in the parsing. I wouldn't know if "i"
is a variable or a function or a constant, because I don't have any
way of looking it up in a symbol table. So, should I be building the
symbol table as I'm parsing the syntax tree from the tokens?

Strictly speaking, it is never *necessary* to build a symbol table
while parsing - it can always be done by a separate pass (or passes)
through the syntax tree. However, as a practical matter, there are
languages like C for which parsing is very difficult to perform
without the symbol information - it can be done but the result is very
messy and the work to disambiguate everything is more trouble than it
is worth. Regardless of the language, constructing the symbol table
as early as possible saves work later.


That said ... a VB like syntax is easy enough that you really could
leave symbol processing until later if you wanted to. While parsing
you only need to know the context in which the identifier is used -
declaration, expression, function argument, etc. You structure your
syntax tree to record the context but you leave the identifier
references generic.

Assuming your parser recognizes keywords for built-in types - and
without going into too much detail - your syntax tree might initially
look something like:

(program
(proc
( (id "do_something_else")
(params (id "i" (type (ref integer)))) )
(code
(expr (op =
(expr (id "i"))
(expr (op * (id "i") (const 10))))) ))

(func
( (id "do_something")
(type integer)
(params (id "i" (type integer)) )
(vars
(id "a"
(type array
(type integer)
(dim (expr (op + (id "i") (const 10)))) )))
(code
(call (id "do_something_else")
(args (expr (id "i"))))
(return (expr (op * (id "a") (const 5)))) ))
:
)

Since user types like records are constructed from built-in types or
recursively from previously declared user types, it is sufficient for
the parser to recognize the built-ins.

Then you walk the syntax tree, constructing your type and symbol
tables(s) as you determine what the identifiers mean. During this
pass you can check the code for undeclared and duplicate identifiers,
unfinished forward declarations, etc. If the language syntax requires
all symbols be declared before use then you can also perform some
usage checking in this pass. If your language allows identifiers to
be referenced before they are declared (e.g., function names) then
you'll have to wait until the entire symbol table is constructed
before checking usage.

Once your types and symbols are defined, you can rewrite the code
parts of the tree so the generic ID references point directly to the
symbol table entries.

Hope this helps.
George
.



Relevant Pages

  • Re: Suggestion for dynamic grammar/parser - pls advise
    ... dynamic grammar and parsing schemes. ... parsing to implement yor task on the best way. ... The other problem was how to extend terminals' set? ... The problem also was how to build abstarct syntax tree from ...
    (comp.compilers)
  • Am I parsing this correctly? (when do I build the symbol table)
    ... and I want to know if I'm parsing this correctly. ... Right now, I am walking through the tokens, and building up a tree ... My understanding of the syntax tree is that I'm not supposed to be ... as I parse the Dim statement, ...
    (comp.compilers)