Re: Tokenizer theory and practice
- From: Hans-Peter Diettrich <DrDiettrich1@xxxxxxx>
- Date: Sun, 18 May 2008 10:37:49 +0200
cr88192 schrieb:
I hope to separate the data structures, as syntactical elements, from
attributes etc. as kind of semantic code. In the best case it should be
possible to derive both an serializer and an de-serializer from a given
formal description.
ok.
so, one first describes all of the pieces, and then later how they are
assembled?...
ok, but it may be difficult to pull off...
You are right. Looking at some popular binary file formats, the
construction (writing) must be done sequentially, with possible
patches of offsets in preceding tables. When an offset table is
written last, it must be read in first, from the end of the file. I'm
not sure whether a formal description of such procedures is possible,
for use in both reading and writing such an file. An according grammar
may become context sensitive, what classifies the problem as very
interesting from the scientific point of view, but it's unlikely that
it will result in a usable tool.
as noted, I also said that there would be conditionals, that or we could
also support BNF-based descriptions.
u64 uvli() {
byte v;
return((v&0x80)?(((v&0x7f)<<7)|uvli()):(v&0x7f));
};
u64 svli() {
uvli v;
return((v&1)?(-((v+1)>>1)):(v>>1));
}
of course, this does not make it clear how to encode these types...
A common example would be UTF-8 encoding/decoding, which is easily
described in a procedural way, and possibly also in a grammar, but
deriving the code from such a grammar seems to exceed my capabilities.
<sigh>
your intent is for reverse engineering or something?...
that is what this sounds like to me at least...
That's the background, how I came to parsers at all ;-)
usually for more complex or bulky formats, I write special tools...
After considering all the topics, mentioned in this thread, I better
leave the theory to other people, too. As you stated before:
more likely, one can end up more with what would amount to a
format-design tool, than something that can actually reliably parse
existing formats.
<<
DoDi
.
- Follow-Ups:
- Re: Tokenizer theory and practice
- From: cr88192
- Re: Tokenizer theory and practice
- References:
- Tokenizer theory and practice
- From: Hans-Peter Diettrich
- Re: Tokenizer theory and practice
- From: cr88192
- Re: Tokenizer theory and practice
- From: Hans-Peter Diettrich
- Re: Tokenizer theory and practice
- From: cr88192
- Tokenizer theory and practice
- Prev by Date: Re: Tokenizer theory and practice
- Next by Date: LR(k) parser generator for k>1?
- Previous by thread: Re: Tokenizer theory and practice
- Next by thread: Re: Tokenizer theory and practice
- Index(es):