Built-in macros for common lexer patterns #9
Replies: 7 comments 3 replies
-
I think this would be a very good idea. Besides accelerating development for "pros", it would also lower the entry barrier for "newbies" |
Beta Was this translation helpful? Give feedback.
-
If you want built-in rules, then add it. But, I don't see the need in grammars-v4, because all these grammars describe the lexical structure for the language already, including int's, id's, string literals, etc. We can rewrite the rules in these grammars via Trash with the built-ins, but the rewriter would need to check the RHS of the rule and verify that they are the same as what they are intended to replace. This is likely pretty easy to do (I already have a "fold" rewrite that does essentially the same thing). But then what? This shortens the grammars by a few rules. Is that valuable? I don't know. However, if the build-in rules offered some improvement in speed, then yes, this would be a good addition. If I want to take this grammar that now uses the built-in lexer rules and port it to another parser generator (Trash can do this), then I need to get these definitions from somewhere, because I will need to implement them in the other parser generator. The problem I have is how Antlr grammar composition works, especially for lexer rules. They do not follow the usual semantics of any implementation of EBNF. Most people write a grammars independently of other grammars. If you try to graft together grammars, the default should be that they work independently of each other. This is the hallmark of a good programming language: referential transparency. This was the reason FORTRAN was such a disaster in the "old days". Every variable was global. You didn't know what subroutine was modifying what. Unfortunately, Antlr is really at this stage. Currently, all the lexer rules in the default mode are pooled together in the default mode. You now have a lexer that works unpredictably because the recognition depends on import order. I'm exploring how to add in css and javascript into the html grammar. The first step is to rewriting the lexer grammars to not use the default mode. |
Beta Was this translation helpful? Give feedback.
-
You are right, which is what we're looking to tackle with include But this proposal does not affect rules order. The grammar author would still be responsible for putting them in the right sequence. I agree with Federico's comments. Just yesterday I was helping out a newbie on this precise topic. |
Beta Was this translation helpful? Give feedback.
-
Here is the issue:
antlr/antlr4#4498 (comment)
Numbers and strings cannot be recognized correctly · Issue #4498 · antlr/antlr4
github.com
It’s the typical starter issue re-inventing the wheel…
… Le 22 déc. 2023 à 14:46, Ken Domino ***@***.***> a écrit :
I agree with Federico's comments. Just yesterday I was helping out a newbie on this precise topic.
What was the problem? Please explain.
If you add built-ins, please tell me how I can get the definition so I can then use the definition in grammar rewrites. Trash can manipulate grammars really well, but it can't do that if it doesn't have a definition for the symbol.
This is the problem I have with Eclipse XText. Built-ins are not in a text file. In fact, it's wrapped up in the whole damn Eclipse IDE. I can't look into that very easily.
—
Reply to this email directly, view it on GitHub <#9 (reply in thread)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAZNQJGNKFKUQ7W5VX5ZCK3YKWFLNAVCNFSM6AAAAABAW6E2X6VHI2DSMVQWIX3LMV43SRDJONRXK43TNFXW4Q3PNVWWK3TUHM3TSMRXHE3DQ>.
You are receiving this because you authored the thread.
|
Beta Was this translation helpful? Give feedback.
-
OK, I'm a little confused. First off, the fellow posts antlr/antlr4#4498 but never actually asks a question. (Beginners seem to do that a lot.) But, let's assume he is asking: "Why doesn't this input parse for my grammar?" Clearly, he doesn't understand the "two golden rules how Antlr lexers work". Most beginners don't understand Antlr lexers because they keep thinking this is EBNF. No, Antlr lexer grammars don't work that way. They work independently from the parser. They match the longest rule first, then if two or more rules match, the first one "wins." This is a recurring problem we see in StackOverflow, at least one or two times a month. Bart Kiers has a lot of patience explaining that over and over. Actually, what I thought is to have a requirement of a reusable set of rules, sort of in a package library, like npm, or something really "built-into" the runtime, for INT, STRING, etc. preloaded and available for use in a grammar, without having to define them oneself. Or is this something analogous to generics or templates? Sorry if I'm lost. Rereading your original comment, when I hear "macros", I think C preprocessor. Is that what you are thinking of? A macro feature, or a package library, or both?? Or templates? All of these analogies could have value. How would Note, Trash, which a toolkit that sits on top of Antlr. trunfold inputs a grammar, and outputs a grammar that unfolds one or more rules. I think we need to draw a distinction between a parser generator, and a tool that refactors grammars. |
Beta Was this translation helpful? Give feedback.
-
I'm beginning to think that this proposal is better addressed by includes. |
Beta Was this translation helpful? Give feedback.
-
Closing this topic in favour of includes |
Beta Was this translation helpful? Give feedback.
-
A number of lexer patterns are valid in almost every programming language, including DSLs.
As an example, most languages support similarly:
Writing lexer rules for these is an amusing learning exercise, but it would be an accelarator to provide macros for them.
A lexer rule using a macro could look as follows:
INTEGER_LITERAL: #all_integer_literals;
This would be much simpler and readable than:
And IDE would need the ability to provide 'macro insights' i.e. whatever concrete rules constitute the macro.
Beta Was this translation helpful? Give feedback.
All reactions