Re: my ears are burning... ;)
- From: Eric Mahurin <eric.mahurin@xxxxxxxxx>
- Date: Sat, 27 Sep 2008 22:30:19 -0500
[Note: parts of this message were removed to make it a legal post.]
On Sat, Sep 27, 2008 at 1:04 PM, <parrt@xxxxxxxxx> wrote:
On Sep 26, 10:30 pm, Eric Mahurin <eric.mahu...@xxxxxxxxx> wrote:
On Fri, Sep 26, 2008 at 1:19 AM, <pa...@xxxxxxxxx> wrote:
As for packrat etc... Turn on backtrack=true and ANTLR throttles up
from LL(1) to LL(*) to a PEG as necessary. Memoization only occurs if
you say so, either on grammar or on rule. I have plans to have ANTLR
suggest where you should memoize.
This sounds very good. I'd like to see how you did some of this.
There is some really amazing jiggery-pokery in the grammar analyzer.
Sometimes I look at it and say, "holy crap! it actually works!" ;)
My parser
generator does LL(1) and LL(*) where directed.
Just to prevent confusion in the namespace, I assume you mean LL(k)
not LL(*). LL(k) does more than a single symbol of lookahead but with
a finite maximum determined at parser generation time. LL(*), on the
other hand, can generate cyclic DFA that can look arbitrarily ahead to
distinguish alternatives. As long as the lookahead language is
regular, LL(*) can handle it. If you have a left common prefix that is
recursive, naturally that means the lookahead language is not regular.
At that point, we must backtrack. That is where we get into predicated
LL(*), which knows how to backtrack and incorporate semantic
predicates its own. In fact I implement syntactic predicates,
backtracking, as a special form of semantic predicate. this has
advantages as you'll see next.
I do mean LL(*). I decided up front to make my parser mainly LL(1) for
speed and clarity but give a backdoor to accomplish LL(*) via backtacking.
When you set
"backtrack=true", does it generate backtracking code only where thegrammar
really needs it (a lookahead token/character could match twoalternatives?)
or for (almost) all alternatives? Now I'm curious to see what kind ofcode
ANTLR generates, especially with selective memoization.
Yes, ANTLR does the optimal thing. Imagine two alternatives that are
distinguishable with a single symbol of lookahead in 90% of the cases.
But, in 9% of the cases it requires three token lookahead and in 1% of
the cases it requires a full backtrack. ANTLRWill generate a parsing
decision that does exactly that. It will never, even in the same
parsing decision, default to using the maximum strength in all input
sequence cases. Without a lot of fancy pants grammar analysis and DFA
generation, this can't be done.
Very nice...
Pure packrat parsing is really easy. There is no analysis to be done
all. You just always backtrack. PEGs have a number of disadvantages.
Two of which are that it cannot handle infinite streams like socket
protocols. Also, if you are always backtracking, the error message is
essentially "no, Your input does not match the syntax". Also, action
execution is extremely difficult in the face of backtracking,
particularly continuously. Consequently, it is my opinion that pure
PEG based parser generators are not viable commercially unless you
don't care about invalid input, you don't care about infinite parsing
strings, and you don't care about action execution (you only need an
AST or parse tree from the parse). For some projects, actually make
sense, but for most commercial applications you'll need something like
ANTLR's approach where it tries to avoid backtracking at all cost.
I completely agree and came to the exact same conclusions a long time ago
(after starting down the auto-backtrack route).
a bitAs an ex-ANTLR guy, I've always found you to be nice. I learned quite
about parsing while using ANTLR that I wanted to go off and design myown.
I've counted a bunch of others that did the same. That says a lot. Iwas
much easier to wrap your head around parsing when you saw the recursive
descent LL code that ANTLR generated compared to the tables of *ACC.
Hooray! Yeah, I applaud everybody's attempt to build their own parser
generator. Besides being fun, it helps to push the state-of-the-art.
Bryan Ford's packrat parsing really helped me improve ANTLR. He and I
are scheduled to write an LL(*) paper sometime this year.
I finally got my parser generator setup for re-targeting. The codeof
generator is more generically a grammar tree traverser. There are a ton
possibilities for a given traverser: generate code in a specificlanguage,
parse while traversing (no code generation), generate a different type of
parser (LALR?), generate grammar diagrams, etc.
The the question in terms of retargetability is whether or not you can
change all of your internal data structures without affecting the code
generation very much. ;) Be careful you're not cutting and pasting
logic of code generation when you make a new target. If so, it's not
really retargetable. :)
Good point. This is something I struggled with for a while. I decided to
go with a fairly generic layer to hold the grammar tree and put most of the
complexity in the code generation engine. This gives a lot of flexibility
to the engines, but makes them more complex. But, the engines don't need to
handle higher level combinators built on lower level ones. For example,
"one-or-more", "zero-or-more", and repeat by a count/range are all built
with a recursion function that detects left and right recursion (and builds
loops).
But, point taken. I could think about refactoring some between the layers
to minimize any code duplication. Another option might be helper
methods/functions.
Thanks for ANTLR!
My pleasure. You know, I just realized it's been exactly 20 years
since I started building yucc, the predecessor of ANTLR! Wow, one of
these days, I had better get another interest! ;)
Ter
.
- Follow-Ups:
- Re: my ears are burning... ;)
- From: parrt
- Re: my ears are burning... ;)
- References:
- my ears are burning... ;)
- From: parrt
- Re: my ears are burning... ;)
- From: Eric Mahurin
- Re: my ears are burning... ;)
- From: parrt
- my ears are burning... ;)
- Prev by Date: Re: How to make Ruby _THE_ scripting language of choice, fold in SQLite
- Next by Date: Parsing a time span
- Previous by thread: Re: my ears are burning... ;)
- Next by thread: Re: my ears are burning... ;)
- Index(es):
Relevant Pages
|