Re: my ears are burning... ;)



[Note: parts of this message were removed to make it a legal post.]

On Sat, Sep 27, 2008 at 1:04 PM, <parrt@xxxxxxxxx> wrote:

On Sep 26, 10:30 pm, Eric Mahurin <eric.mahu...@xxxxxxxxx> wrote:
On Fri, Sep 26, 2008 at 1:19 AM, <pa...@xxxxxxxxx> wrote:
As for packrat etc... Turn on backtrack=true and ANTLR throttles up
from LL(1) to LL(*) to a PEG as necessary. Memoization only occurs if
you say so, either on grammar or on rule. I have plans to have ANTLR
suggest where you should memoize.

This sounds very good. I'd like to see how you did some of this.

There is some really amazing jiggery-pokery in the grammar analyzer.
Sometimes I look at it and say, "holy crap! it actually works!" ;)

My parser
generator does LL(1) and LL(*) where directed.

Just to prevent confusion in the namespace, I assume you mean LL(k)
not LL(*). LL(k) does more than a single symbol of lookahead but with
a finite maximum determined at parser generation time. LL(*), on the
other hand, can generate cyclic DFA that can look arbitrarily ahead to
distinguish alternatives. As long as the lookahead language is
regular, LL(*) can handle it. If you have a left common prefix that is
recursive, naturally that means the lookahead language is not regular.
At that point, we must backtrack. That is where we get into predicated
LL(*), which knows how to backtrack and incorporate semantic
predicates its own. In fact I implement syntactic predicates,
backtracking, as a special form of semantic predicate. this has
advantages as you'll see next.


I do mean LL(*). I decided up front to make my parser mainly LL(1) for
speed and clarity but give a backdoor to accomplish LL(*) via backtacking.



When you set
"backtrack=true", does it generate backtracking code only where the
grammar
really needs it (a lookahead token/character could match two
alternatives?)
or for (almost) all alternatives? Now I'm curious to see what kind of
code
ANTLR generates, especially with selective memoization.

Yes, ANTLR does the optimal thing. Imagine two alternatives that are
distinguishable with a single symbol of lookahead in 90% of the cases.
But, in 9% of the cases it requires three token lookahead and in 1% of
the cases it requires a full backtrack. ANTLRWill generate a parsing
decision that does exactly that. It will never, even in the same
parsing decision, default to using the maximum strength in all input
sequence cases. Without a lot of fancy pants grammar analysis and DFA
generation, this can't be done.


Very nice...

Pure packrat parsing is really easy. There is no analysis to be done
all. You just always backtrack. PEGs have a number of disadvantages.
Two of which are that it cannot handle infinite streams like socket
protocols. Also, if you are always backtracking, the error message is
essentially "no, Your input does not match the syntax". Also, action
execution is extremely difficult in the face of backtracking,
particularly continuously. Consequently, it is my opinion that pure
PEG based parser generators are not viable commercially unless you
don't care about invalid input, you don't care about infinite parsing
strings, and you don't care about action execution (you only need an
AST or parse tree from the parse). For some projects, actually make
sense, but for most commercial applications you'll need something like
ANTLR's approach where it tries to avoid backtracking at all cost.


I completely agree and came to the exact same conclusions a long time ago
(after starting down the auto-backtrack route).



As an ex-ANTLR guy, I've always found you to be nice. I learned quite
a bit
about parsing while using ANTLR that I wanted to go off and design my
own.
I've counted a bunch of others that did the same. That says a lot. I
was
much easier to wrap your head around parsing when you saw the recursive
descent LL code that ANTLR generated compared to the tables of *ACC.

Hooray! Yeah, I applaud everybody's attempt to build their own parser
generator. Besides being fun, it helps to push the state-of-the-art.
Bryan Ford's packrat parsing really helped me improve ANTLR. He and I
are scheduled to write an LL(*) paper sometime this year.

I finally got my parser generator setup for re-targeting. The code
generator is more generically a grammar tree traverser. There are a ton
of
possibilities for a given traverser: generate code in a specific
language,
parse while traversing (no code generation), generate a different type of
parser (LALR?), generate grammar diagrams, etc.

The the question in terms of retargetability is whether or not you can
change all of your internal data structures without affecting the code
generation very much. ;) Be careful you're not cutting and pasting
logic of code generation when you make a new target. If so, it's not
really retargetable. :)


Good point. This is something I struggled with for a while. I decided to
go with a fairly generic layer to hold the grammar tree and put most of the
complexity in the code generation engine. This gives a lot of flexibility
to the engines, but makes them more complex. But, the engines don't need to
handle higher level combinators built on lower level ones. For example,
"one-or-more", "zero-or-more", and repeat by a count/range are all built
with a recursion function that detects left and right recursion (and builds
loops).

But, point taken. I could think about refactoring some between the layers
to minimize any code duplication. Another option might be helper
methods/functions.

Thanks for ANTLR!

My pleasure. You know, I just realized it's been exactly 20 years
since I started building yucc, the predecessor of ANTLR! Wow, one of
these days, I had better get another interest! ;)

Ter



.



Relevant Pages

  • Re: Is pyparsing really a recursive descent parser?
    ... There are an enormous variety of parsing tools, ... For instance, the Earley parser ... Okay, in some contexts, an ambiguous grammar may be considered ... over the other through the context of the parse results? ...
    (comp.lang.python)
  • Re: ANTLR Target for Ruby
    ... is to figure out where backtracking might be needed and do it efficiently ... a large overhead over using a hand-crafted parser. ... If you also provided a sweet metagrammar (I find your earlier Grammar ... The real work is in the "engine" that does parser ...
    (comp.lang.ruby)
  • Re: Parser, BNF..
    ... >I have to do a 3D render, with a piece of code to parse a scene code ... Could anyone know a site where parsing and BNF is explained, ... depends in part on the kind of grammar required. ... Also there's the issue of how the parser ...
    (comp.graphics.algorithms)
  • Re: "Intro to Pyparsing" Article at ONLamp
    ... For unchanging datafile formats pyparsing seems to be OK. ... and then construct the corresponding grammar parser. ... With data-driven parsing, you are starting with data to be parsed, and you ...
    (comp.lang.python)
  • Re: "Intro to Pyparsing" Article at ONLamp
    ... and then construct the corresponding grammar parser. ... > With data-driven parsing, you are starting with data to be parsed, and you ... I'd like to add another parser type, lets call this a natural language ...
    (comp.lang.python)