Single production vs recursive grammar structure



If you look to a well known javacc grammar (see javacc-4.0\examples
\JavaGrammars\Java1.0.2LS.jj) you can note that the top-down approach
is heavily exploited (to keep simpler the grammar?).

For example, the production that checks equalities is the following
one

void EqualityExpression() :
{}
{
InstanceOfExpression() ( ( "==" | "!=" ) InstanceOfExpression() )*
}

The InstanceOfExpression() production and the other ones (in)directly
called are

void InstanceOfExpression() :
{}
{
RelationalExpression() [ ... ]
}
void RelationalExpression() :
{}
{
ShiftExpression() ( ( "<" | ">" | "<=" | ">=" ) ShiftExpression() )*
}

and so on.
Please note that the "operator" and the righthandside operands are
optional and this pattern is commonly uses throughout the whole
grammar, a sort of percolation pattern.
Well, I expanded the Gpl.jj grammar following the Java1.0.2LS.jj

void
EqualityExpression() :
{ }
{
RelationalExpression() [ ( <EQ_SIGN> | <NE_SIGN> )
RelationalExpression() ]
} // end Gpl.EqualityExpression()

void
RelationalExpression() :
{ }
{
AdditiveExpression() [ <RELATIONAL_OP> AdditiveExpression() ]
} // end Gpl.RelationalExpression()

and so on.

The following unite test assertion

public void testRelationalExpression()
{
GplProxy lGplProxy = GplProxyImpl.istanza();
String lMethodName = "RelationalExpression";

--> assertNotNull( lGplProxy.applyProduction( lMethodName, "5" ) );

fails with the following trace

Call: RelationalExpression
Call: AdditiveExpression
Call: MultiplicativeExpression
Call: UnaryExpression
Call: PrimaryExpression
Call: Literal
Consumed token: <<NUMBER_LITERAL>: "5" at line 1 column 1>
Return: Literal
Return: PrimaryExpression
Return: UnaryExpression
Return: MultiplicativeExpression
Return: AdditiveExpression
Return: RelationalExpression
Call: Eof
Consumed token: <<EOF> at line 1 column 1>
Return: Eof

The original test (reasonably) asserts that "5" is NOT a relational
(well formed) expression (in case of error applyProduction() should
return the diagnostic exception and instead it does not).

If you study the Java1.0.2LS.jj grammar structure and the top-down
recursive approach for its productions, most of them are of the form

lhsPart ( someSortOfOperator rhsPart )*

or

lhsPart [ someSortOfOperator rhsPart ]

Instead I would rather write the production for equality comparison
as

void
EqualityExpression() :
{ }
{
RelationalExpression() ( <EQ_SIGN> | <NE_SIGN> )
RelationalExpression()

} // end Gpl.EqualityExpression()

and

void
RelationalExpression() :
{ }
{
AdditiveExpression() <RELATIONAL_OP> AdditiveExpression()

} // end Gpl.RelationalExpression()

If I do it, "5" is finally rejected as bad formed relational
expression but that breaks a lot of other unit tests, as well.
Forcing the productions (some production) to the form

lhsPart someSortOfOperator rhsPart

(lhs, operator and rhs all three mandatory) breaks the whole structure
of the recursive algorithm.
For instance the simple conditional expression

if ( a; 1; 0 )

void
ConditionalExpression() :
{}
{
<IF> <PAR_OPEN> DisjunctiveExpression() ";" Expression() ";"
Expression() <PAR_CLOSED>
} // end Gpl.ConditionalExpression()

is rejected (not anymore accepted as legal): "a" is not considered
anymore a DisjunctiveExpression() (ConditionalOrExpression in terms
of Java1.0.2LS.jj).
If I run teh grammar voia cmd line testing a whole expression I get
the following trace:

Type in a expression followed by a "$" or ^D to quit:

if( a; 1; 0)$
Call: Expression
Call: AdditiveExpression
Call: MultiplicativeExpression
Call: UnaryExpression
Call: ConditionalExpression
Consumed token: <"if" at line 1 column 1>
Consumed token: <<PAR_OPEN>: "(" at line 1 column 3>
Call: DisjunctiveExpression
Call: ConjunctiveExpression
Call: EqualityExpression
Call: RelationalExpression
Call: AdditiveExpression
Call: MultiplicativeExpression
Call: UnaryExpression
Call: PrimaryExpression
Call: GroupId
Consumed token: <<GROUP_ID>: "a" at line
1 column
5>
Return: GroupId
Return: PrimaryExpression
Return: UnaryExpression
Return: MultiplicativeExpression
Return: AdditiveExpression
Return: RelationalExpression
Return: EqualityExpression
Return: ConjunctiveExpression
Return: DisjunctiveExpression
Return: ConditionalExpression
Return: UnaryExpression
Return: MultiplicativeExpression
Return: AdditiveExpression
Return: Expression
Return: ExpressionList
Exception in thread "main"
it.finmatica.gpj.ec.istruzioni.gpl.ParseException: En
countered ";" at line 1, column 6.
Was expecting one of:
<RELATIONAL_OP> ...
"+" ...
"-" ...
<MULT_ARITHMETIC_OP> ...

Then new syntactical form for RelationalExpression()

AdditiveExpression() <RELATIONAL_OP> AdditiveExpression()

now expects a <RELATIONAL_OP> after a" has been consumed. That is not
found, outer productions unsuccessfully try to consume a
multiplicative operator, then an additive operator before failing.

I have some doubts:

1)
I'm testing the grammar wrongly: generally it is not possible to
isolatedly test a single production

2)
there is some lack in the set of testing tools.
Currently

/**
* Applies a production (nonterminal) to the given input, no control
about how the input will be split in tokens
* @param pMethodName Name of the method of Gpl.jj to be applied
* @param pInput String to be checked
* @return GplProductionException diagnostic in case of error, null
otherwise
*
* @pre pMethodName != null
* @pre pInput != null
*/
public
Throwable
applyProduction
( String pMethodName
, String pInput
);

applyProduction does not control at all how the given input has been
token-split (actually checks only that the invoked production of Gpl
does not raise neither a TokenMgrException nor a ParserException. I
think that token splitting (and checking it) is not the solution.

3)
Perhaps i should rewrite throught the whole grammar by systematically
splitting the production
in two subproductions

lhsPart
| lhsPart someSortOfOperator rhsPart

but this is ambiguos and requires, as well, systematic use of lookahed

4)
The grammar is wrong


Any suggestion?
Thanks in advance

ciao
Cesare

The post is already pretty long, I prefer not to attache the Gpl.jj
grammar, for the time being, if not necessary.

.



Relevant Pages

  • Re: C# 2.0 language grammar
    ... While I agree that it probably boiled down to an arbitrary decision, ... namespace-type-name production is used is in the context of a using-directive ... particular productions are used, and according to the grammar it is, then I ... question the logic in constructing the namespace-name and namespace-type-name ...
    (microsoft.public.dotnet.languages.csharp)
  • "Z -> z versus Z -> z e" - sole thread for future discussion/postings
    ... When WE suggesting replacing the production rule Z -> z e ... with the rule Z -> e to terminate the derivation, ... Suppose a grammar G ... of a linear grammar that introduce a non-terminal, ...
    (sci.math)
  • "Z -> z versus Z -> z e" - sole thread for future discussion/postings
    ... When WE suggesting replacing the production rule Z -> z e ... with the rule Z -> e to terminate the derivation, ... Suppose a grammar G ... of a linear grammar that introduce a non-terminal, ...
    (sci.logic)
  • Re: flex+bison problem: syntax error that shouldnt be
    ... The "grammar" you have here is rather simple, ... NAME looks like it is supposed to be string, but AGE ... extern void yyerror(const char *msg); ...
    (comp.unix.programmer)
  • "Z -> z versus Z -> z e" - sole thread for future discussion/postings
    ... When WE suggesting replacing the production rule Z -> z e ... with the rule Z -> e to terminate the derivation, ... Suppose a grammar G ... of a linear grammar that introduce a non-terminal, ...
    (comp.theory)