How to solve such ambiguity ? #1812

4silvertooth · 2022-06-27T09:46:12Z

4silvertooth
Jun 27, 2022

I don't know if i should call it an ambiguity but lets say i want these both expression
10 * (2+2) and 10 (2+2)
to be evaluated equally
in grammar calculator_embedded_actions.js

the * operator before LParen is optional and assumed to be multiplication operation
I've tried doing it this way by changing

chevrotain/examples/grammars/calculator/calculator_embedded_actions.js

Line 137 in db17ce3

op = $.CONSUME(MultiplicationOperator)

to

          op = $.OPTION(()=>$.CONSUME(MultiplicationOperator))          
          $.OPTION2({GATE: ()=>!op && tokenMatcher($.LA(1), LParen),DEF: ()=>{
	    $.OPTION3(()=>$.CONSUME2(MultiplicationOperator))
             if(!op) op = Multi
          }});

it does give proper results but i don't think so it should be done that way (also diagram looks ugly as hell)

, both the case being OPTION sometimes the op is just undefined in other use-cases, how should this be implemented?

$.CONSUME(MultiplicationOperator) is optional only if there is LParen ahead and op should be Multi.

Answered by msujew

Jun 27, 2022

I believe there's an alternative to this solution that is a bit easier to read, and that doesn't require two CONSUME(MultiplicationOperator) calls. Putting your requirement into other words; the parser has to read a * token if the token afterwards isn't a (, but if it is (, we know that we don't have to parse another *:

const nextParenthesis = tokenMatcher($.LA(1), LParen);
$.OR([
  {
    ALT: () => $.CONSUME(MultiplicationOperator)
  },
  {
    GATE: () => nextParenthesis,
    ALT: EMPTY_ALT()
  }
]);

View full answer

msujew · 2022-06-27T10:31:53Z

msujew
Jun 27, 2022
Collaborator

I believe there's an alternative to this solution that is a bit easier to read, and that doesn't require two CONSUME(MultiplicationOperator) calls. Putting your requirement into other words; the parser has to read a * token if the token afterwards isn't a (, but if it is (, we know that we don't have to parse another *:

const nextParenthesis = tokenMatcher($.LA(1), LParen);
$.OR([
  {
    ALT: () => $.CONSUME(MultiplicationOperator)
  },
  {
    GATE: () => nextParenthesis,
    ALT: EMPTY_ALT()
  }
]);

0 replies

4silvertooth · 2022-06-27T10:41:59Z

4silvertooth
Jun 27, 2022
Author

Thanks,
After reading your answer it hit me like a rubber duck debugging it could be just as simple as this,

        $.MANY(() => {
          $.OR([
            {
              ALT: ()=>{
                op = $.CONSUME(MultiplicationOperator)
                rhsVal = $.SUBRULE2($.atomicExpression)
                if (tokenMatcher(op, Multi)) {
                  value *= rhsVal
                } else { // op instanceof Div
                  value /= rhsVal
                }
            }},
            {
              ALT: ()=>{
                rhsVal = $.SUBRULE($.parenthesisExpression)
                value *= rhsVal
              }
            }	 
          ])
        })

(function calculatorExample() {
 // ----------------- lexer -----------------
 const createToken = chevrotain.createToken;
 const tokenMatcher = chevrotain.tokenMatcher;
 const Lexer = chevrotain.Lexer;
 const EmbeddedActionsParser = chevrotain.EmbeddedActionsParser;

 // using the NA pattern marks this Token class as 'irrelevant' for the Lexer.
 // AdditionOperator defines a Tokens hierarchy but only leafs in this hierarchy
 // define actual Tokens that can appear in the text
 const AdditionOperator = createToken({name: "AdditionOperator", pattern: Lexer.NA});
 const Plus = createToken({name: "Plus", pattern: /\+/, categories: AdditionOperator});
 const Minus = createToken({name: "Minus", pattern: /-/, categories: AdditionOperator});

 const MultiplicationOperator = createToken({name: "MultiplicationOperator", pattern: Lexer.NA});
 const Multi = createToken({name: "Multi", pattern: /\*/, categories: MultiplicationOperator});
 const Div = createToken({name: "Div", pattern: /\//, categories: MultiplicationOperator});

 const LParen = createToken({name: "LParen", pattern: /\(/});
 const RParen = createToken({name: "RParen", pattern: /\)/});
 const NumberLiteral = createToken({name: "NumberLiteral", pattern: /[1-9]\d*/});

 const PowerFunc = createToken({name: "PowerFunc", pattern: /power/});
 const Comma = createToken({name: "Comma", pattern: /,/});

 const WhiteSpace = createToken({
   name: "WhiteSpace",
   pattern: /\s+/,
   group: Lexer.SKIPPED
 });

 // whitespace is normally very common so it is placed first to speed up the lexer
 const allTokens = [WhiteSpace,
   Plus, Minus, Multi, Div, LParen, RParen,
   NumberLiteral, AdditionOperator, MultiplicationOperator,
   PowerFunc, Comma];
 const CalculatorLexer = new Lexer(allTokens);


 class Calculator extends EmbeddedActionsParser {
   constructor() {
     super(allTokens);

     const $ = this;

     $.RULE("expression", () => {
       // uncomment the debugger statement and open dev tools in chrome/firefox
       // to debug the parsing flow.
       // debugger;
       return $.SUBRULE($.additionExpression)
     });


     // Lowest precedence thus it is first in the rule chain
     // The precedence of binary expressions is determined by
     // how far down the Parse Tree the binary expression appears.
     $.RULE("additionExpression", () => {
       let value, op, rhsVal;

       // parsing part
       value = $.SUBRULE($.multiplicationExpression);
       $.MANY(() => {
         // consuming 'AdditionOperator' will consume
         // either Plus or Minus as they are subclasses of AdditionOperator
         op = $.CONSUME(AdditionOperator);
         //  the index "2" in SUBRULE2 is needed to identify the unique
         // position in the grammar during runtime
         rhsVal = $.SUBRULE2($.multiplicationExpression);

         // interpreter part
         // tokenMatcher acts as ECMAScript instanceof operator
         if (tokenMatcher(op, Plus)) {
           value += rhsVal
         } else { // op "instanceof" Minus
           value -= rhsVal
         }
       });

       return value
     });


     $.RULE("multiplicationExpression", () => {
       let value, op, rhsVal;

       // parsing part
       value = $.SUBRULE($.atomicExpression);
       $.MANY(() => {
         $.OR([
           {
             ALT: ()=>{
               op = $.CONSUME(MultiplicationOperator)
               rhsVal = $.SUBRULE2($.atomicExpression)
               if (tokenMatcher(op, Multi)) {
                 value *= rhsVal
               } else { // op instanceof Div
                 value /= rhsVal
               }
           }},
           {
             ALT: ()=>{
               rhsVal = $.SUBRULE($.parenthesisExpression)
               value *= rhsVal
             }
           }	 
         ])
       })

       return value
     });


     $.RULE("atomicExpression", () => $.OR([
       // parenthesisExpression has the highest precedence and thus it
       // appears in the "lowest" leaf in the expression ParseTree.
       {ALT: () => $.SUBRULE($.parenthesisExpression)},
       {ALT: () => parseInt($.CONSUME(NumberLiteral).image, 10)},
       {ALT: () => $.SUBRULE($.powerFunction)}
     ]));


     $.RULE("parenthesisExpression", () => {
       let expValue;

       $.CONSUME(LParen);
       expValue = $.SUBRULE($.expression);
       $.CONSUME(RParen);

       return expValue
     });

     $.RULE("powerFunction", () => {
       let base, exponent;

       $.CONSUME(PowerFunc);
       $.CONSUME(LParen);
       base = $.SUBRULE($.expression);
       $.CONSUME(Comma);
       exponent = $.SUBRULE2($.expression);
       $.CONSUME(RParen);

       return Math.pow(base, exponent)
     });

     // very important to call this after all the rules have been defined.
     // otherwise the parser may not work correctly as it will lack information
     // derived during the self analysis phase.
     this.performSelfAnalysis();
   }
 }

 // for the playground to work the returned object must contain these fields
 return {
   lexer: CalculatorLexer,
   parser: Calculator,
   defaultRule: "expression"
 };
}())

1 reply

msujew Jun 27, 2022
Collaborator

Great, that's even better, since it doesn't require to perform some manual lookahead. Glad I could be of help :)

bd82 · 2022-06-27T11:16:37Z

bd82
Jun 27, 2022
Maintainer

Hello @4silvertooth

Not directly related, but you may be interested in knowing an alternative approach to building
expressions grammars that does not model the precedence in the grammar.

https://docs.swift.org/swift-book/ReferenceManual/Expressions.html#//apple_ref/doc/uid/TP40014097-CH32-ID383
See the section titled: Infix Expressions

The main problem with modeling the precedence in the grammar is that it creates very deep parse trees, hence
also very deep call stacks, so the performance is less than stellar.

On the other hand removing the precedence would also mandate having a separate "evaluation" step for the value.
So that would add complexity and may be overkill for your use case.

3 replies

4silvertooth Jun 27, 2022
Author

I am evaluating parser library for a project, first tried ohmjs but I've hit performance issues,

My grammar does have many of these expression like prefix as +/- Number so 1+-3=-2 or 1--1=2 can be evaluated,
Precedence Infix Expression
var = 1+2*3 <- 7
Non precedence infix expression like

1
+2
*3
=====
9 (and not 7)

Also postfix expressions like add, sub, mul, div percentages

200
+5%
=====
=  210

So far i haven't faced performance issue unless it's a huge file, from doc you've posted i couldn't find

removing the precedence would also mandate having a separate "evaluation" step for the value

part.

bd82 Jun 27, 2022
Maintainer

So far i haven't faced performance issue unless it's a huge file, from doc you've posted i couldn't find

I meant that if your parse infix expressions as a simple list than your parser likely would not be able to calculate the
value of the expressions in "embedded actions" and you will have a post parsing analysis step/s to fully analyze your expressions.

That is often a good thing if you are building anything non-trivial (e.g auto-complete / language services / compiler).
But these additional steps can add complexity if you just want to build a "simple" calculator.

4silvertooth Jun 27, 2022
Author

Alright got that, i guess the cst generated for visitor pattern does have this analysis and $.MANY is an array with operator and operands and a foreach loop will suffice to evaluate.

4silvertooth · 2022-07-13T09:38:37Z

4silvertooth
Jul 13, 2022
Author

Hi,
Checkout what I built using chevrotain,

https://github.com/4silvertooth/QwikTape

The parser is here
https://github.com/4silvertooth/QwikTape/blob/main/src/parser/tape-embedded.js

Not playground friendly as it depends on QuickJs for BigNum.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to solve such ambiguity ? #1812

{{title}}

Replies: 4 comments 4 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

How to solve such ambiguity ? #1812

4silvertooth Jun 27, 2022

Replies: 4 comments · 4 replies

msujew Jun 27, 2022 Collaborator

4silvertooth Jun 27, 2022 Author

msujew Jun 27, 2022 Collaborator

bd82 Jun 27, 2022 Maintainer

4silvertooth Jun 27, 2022 Author

bd82 Jun 27, 2022 Maintainer

4silvertooth Jun 27, 2022 Author

4silvertooth Jul 13, 2022 Author

4silvertooth
Jun 27, 2022

Replies: 4 comments 4 replies

msujew
Jun 27, 2022
Collaborator

4silvertooth
Jun 27, 2022
Author

msujew Jun 27, 2022
Collaborator

bd82
Jun 27, 2022
Maintainer

4silvertooth Jun 27, 2022
Author

bd82 Jun 27, 2022
Maintainer

4silvertooth Jun 27, 2022
Author

4silvertooth
Jul 13, 2022
Author