Re: Tokenizer theory and practice
- From: "Dmitry A. Kazakov" <mailbox@xxxxxxxxxxxxxxxxx>
- Date: Sun, 18 May 2008 10:29:59 +0200
On Sat, 17 May 2008 10:22:34 +0200, Hans-Peter Diettrich wrote:
Dmitry A. Kazakov schrieb:
When I do similar stuff, I do it in a way that the parser returned
typed objects rather than copies of the source. The whole idea to
copy the source is bogus, IMO.
Indeed, textual copies are of little use. Can you suggest a
descriptive formalism for the objects, returned by an lexer?
Not with a bottom-up approach. But when parser does it top-down or
else somewhere in the middle, it well knows what to expect at the
cursor. Being at the top it knows the exact type, so that parsing
either fails or yields a token. Below that it knows only some set of
types, i.e. in OO terms, a class of types. In this case the returned
token would be a polymorphic object from that class (or else a
failure). The class could be like "infix operation","literal" etc. In
fact, this is merely the abstract factory pattern. The parser acts a
factory, the parsed source at the cursor determines the concrete token
type and then its value.
I think this could be formalized. One premise is that the set of
tokens forms a tree/forest-like hierarchy, which is, I believe, almost
always the case.
--
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de
.
- References:
- Tokenizer theory and practice
- From: Hans-Peter Diettrich
- Re: Tokenizer theory and practice
- From: Dmitry A. Kazakov
- Re: Tokenizer theory and practice
- From: Hans-Peter Diettrich
- Tokenizer theory and practice
- Prev by Date: Compiler positions available for week ending May 18
- Next by Date: Re: Tokenizer theory and practice
- Previous by thread: Re: Tokenizer theory and practice
- Next by thread: Re: Tokenizer theory and practice
- Index(es):