-
Notifications
You must be signed in to change notification settings - Fork 629
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Elm: Rewritten using PackCC PEG parser #3312
Conversation
Thank you for contribution. |
You can run the test cases locally: make check.
Let me explain how to use misc/review script that may help you review the result of text execution.
It seems that all cases in Units are passed. So I think I don't have to write how to use misc/review for Units. |
You don't have to add elm.c and elm.h. Our build system may generate them. |
I said I would rewrite the Elm parser using mtable-regex. (Taken from "Elm" entry of Wikipedia)
Another idea is extracting two tags, one for the function, another is for the type.
or
. There are the fruits of my short study. |
@masatake many thanks for your feedback. Responses below... On I'm very sorry about those test failures - obviously hadn't read the documentation on contributing with sufficient care, as I didn't know those tests existed. I will review, fix, and update. Thank you for your guidance above - that will be helpful. On I am happy to delete them, and will do that. On rewriting the Elm parser using mtable-regex: I misunderstood your comment about that - I thought you were suggesting that as an idea for my work. I didn't realise you were planning to do it, and I didn't mean to derail your project. Apologies. However, I'm pleased you're happy with the idea for this parser. On the typeref and signature fields: I am not an expert on ctags, nor on language design. However, I will tell you what I know about Elm. Consider the following function: myFunc : A -> B -> C
myfunc a b = ... And suppose that x = myFunc a1 b1 -- Value x is of type C
y = myFunc a1 -- Value y is of type B -> C
z = myFunc -- Value z is of type A -> B -> C It's very common in Elm to pass around values such as So when thinking about the typeref of On the subject of signatures you gave this example: hypotenuse : Float -> Float -> Float
hypotenuse a b =
sqrt (a^2 + b^2) and said this:
In fact this is not true for Elm - and it surprised me when I was learning Elm! For example, look at the API documentation for the String type. You'll see all the signatures are just sequences of types with arrows - none of them have any names. As a very specific example, the source code for So in summary, I think the signature for I am also saying that I think in Elm the signature and the typeref are the same thing. I'm sure it is better to have one and not both. At the moment the parser uses signature and not typeref, but if you think it's important to have typeref and not signature, please say so. I hope that is all helpful. I welcome your thoughts. |
Thank you for the great explanation.
How do you call x derived from myFunc?
Oh, I see. Let’s focus on input.elm: 1: myFunc : A -> B -> C
...
9: myFunc a b = ... Consider 1 and 9 are the line numbers in the input file. With your idea, the tags extracted from the input are:
(tags0masatake) Understandable. However, this output lost some information like: Even if an Elm programmer specialized version of myFunc, storing the information about the original Thinking about X2, and X3. the following one may be better:
(tags1masatake) Thinking about X1,
(tags2masatake) Thinking about X4, making a tag for the typedecl may be better:
(tags3masatake)
"tags3mastake" records most of all raw information in the source code. However, the most of tools reading tags file expect the function kind tag has the typeref field.
(tags4masatake) As the first implementation, the following tags file has no conflict with tags4masatake:
(tags5masatake) Making tags5masatake more friendly is acceptable:
(tags6masatake) We can shrink more:
( I read Units/parser-elm.r/elm-signatures.d. As I wrote, for reserving the space for recording |
It seems that we are in the same time zone :-). Run |
With the following steps, you may be able to reproduce the failures observed in CI/CD environments.
(Edited for adding CFLAGS) |
|
It seems that the scope stack is popped though it is empty. |
It seems that the bug from the peg_common.h. I will fix it. |
Error reported is ``` ctags: main/numarray.c:178: intArrayRemoveLast: Assertion `current->count > 0' failed. ``` caused by unbalanced SET_SCOPE/POP_KIND. Thanks to Masaktake Yamato for [finding the cause](universal-ctags#3312 (review)).
Following an earlier comment, committed files generated by `make -BC win32`. Earlier comment is: universal-ctags#3312 (comment)
Codecov Report
@@ Coverage Diff @@
## master #3312 +/- ##
==========================================
+ Coverage 85.31% 85.41% +0.10%
==========================================
Files 212 216 +4
Lines 49987 50278 +291
==========================================
+ Hits 42645 42947 +302
+ Misses 7342 7331 -11
Continue to review full report at Codecov.
|
Thank you for your continued feedback and support, @masatake. I will need to spend some time writing a reply to your questions and suggestions. I hope that will be very soon. |
I found "X4. the line number (1) for the type declaration" doesn't make sense.
However, the language processor behind https://elm-lang.org/examples/hello reports an error:
A definition and its type decl must be in close. So ideal tags output may be:
(tags8masatake, derived from tags4mastake) Elm programmers may think calling "a b" "signature" is incorrect. |
Thank you for your continued feedback and support. That's appreciated.
These are good philosophies. I understand and support those.
That helps me understand your comments better, and I understand your comments X1 to X4. Before I comment on your feedback, I should explain some more about Elm. More about Elm(elm1nik) The following code shows a function definition, and just before it is the type annotation: hypotenuse : Float -> Float -> Float
hypotenuse a b =
sqrt (a^2 + b^2) As you have discovered for yourself, a type annotation has to come immediately before its corresponding function definition. Only whitespace and comments are allowed in between. Since there is never any reason to put a comment between a type annotation and its function definition, in practice you really only ever see a type annotation on the line immediately before its function definition. (elm2nik) Consider the following code: type MyType = Cons String This code creates a new type called The constructor Signatures and typerefsSo now to discuss signatures and typerefs (which are concepts in ctags) and the things we find in Elm - and we should try to link them together. One thing in Elm is the thing that describes a function's type, which looks like I agree that this is best described as a ctags typeref. Another thing in Elm is the thing that looks like
We agree that this is not perfect, but it's acceptable given what is already built into ctags and that I will explain it in the per-parser man page. The user of the function won't see these names, but given the motto of ctags (Extracting "raw information" and passing it to client tools) I agree it is useful to include these in the ideal parser. We should be aware that constructors will not have this kind of "signature" because of elm2nik above. However, they will have the typeref. You also say
I also agree with this. It is because of (elm1nik), which you have also found for yourself. It does not make sense for the type declaration be separate from the function or constructor tag. It should be part of those. At this point we agree on everything! Typeref fieldThe next thing I want to talk about is what kind of typeref we are using. You use the field I think the best words to use are any of these:
Please let me know what you think about these options. I will use Mistake in constructor signaturesIn
Given all the above, I need to change this the signature to a typeref. But also what follows it is wrong. The typeref for Tagging parametersYou give some examples of tagging individual parameters (such as SummaryNow I have these actions:
And from one of your earlier comments
|
The actions above have been done now. (I had a bit more time available than I expected.) |
I read your comment. I agree with you what you wrote. About typeref, I would like to adjust your understanding. As you know, the typeref field has two subfields. input.c: struct point { float x, y; };
struct point currentPoint(void)
{
return CURRENT;
}
int nextOf (int n)
{
return n + 1;
} tags output:
The developers of Geany IDE using ctags internally expects the 1st subfield holds the
I agreed with the parser maintainer, but I kept things as-is because 2-fields-typedef had no serious harm. These things are discussed in About Elm, you can put anything you want to the 1st subfield. You can put something important information in the field. While reading test cases, I got a question.
How does the ideal parser store the information about The current implementation emits:
Good. There is no issue here. But, how about the ideal parser? |
If "typename" is a placeholder, then I will use that. I will make a note to change the code and the man page.
I think the ideal parser would output something like this:
or in more detail, this
In the end, I think one influence on this is what IDE developers (or other people) would like to use it for. But those are my ideas. |
/* For a signature such as "a1 b2 c3" we want to transform it | ||
* to "a1 b2 c3" for the signature field. | ||
*/ | ||
static vString *collapseWhitespace (const char *sig) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(MY TODO) This can be moved to vstring.h in the future.
- (Format) typeref:description: -> typeref:typename: - (Refactor) Use isspace() for clarity. - (Man page) Man page passes `make man-test`. - (Comments) Remove incorrect task in comments. - (Comments) Add thanks in comments.
Thank you for those comment, @masatake. Those recommendations and changes have been made now. |
it's the closest concept available in ctags. | ||
Use "--fields=+S". | ||
|
||
.. code-block:: Elm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The followig change is needed to test this example with "make man-test".
diff --git a/man/ctags-lang-elm.7.rst.in b/man/ctags-lang-elm.7.rst.in
index d9fbfa786..0d236aa04 100644
--- a/man/ctags-lang-elm.7.rst.in
+++ b/man/ctags-lang-elm.7.rst.in
@@ -123,18 +123,39 @@ signature field. They are not really function signatures, but
it's the closest concept available in ctags.
Use "--fields=+S".
+"input.elm"
+
.. code-block:: Elm
funcA a1 a2 =
a1 + a2
"output.tags"
-with "--sort=no --extras=+r --fields=+rS"
+with "--options=NONE -o - --sort=no --extras=+r --fields=+rS input.elm"
.. code-block:: tags
funcA input.elm /^funcA a1 a2 =$/;" f signature:a1 a2 roles:def
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll fix this.
The commits are well-tailored. However, this pull request is mostly proposing a new parser. |
I have found some areas we can improve the Elm parser. |
Thank you for your contribution. |
I made a pull request to improve the Elm parser a bit. See #3320. |
@niksilver, I have a question about the Elm language. A. As you assigned the "imported" role to the language objects that appeared in "import module Y exposing (language objects), I'm thinking about "exported" ( or "exposed" ) role. |
@masatake This is a good question. Currently the parser says In the original optlib parser the module
Would you like me to change that? It is not correct to say it is "exported", though - that is not a word used in Elm Edited to emphasise that "exported" is not a word used in Elm. |
YES! |
Howver, that will be incompatible change. https://www.bitterjug.com/2017/02/06/elm-support-compiled-into-universal-ctags/ What I wrote here is wrong. We are talking about |
I don't think the change is incompatible. The Elm optlib parser only marked the module as imported - the language objects exposed by the module were not tagged (and so had no roles). The proposal here is to keep the module as "imported" (which is still correct) but mark the language objects exposed by the module as "exposed". |
@niksilver You are correct. |
This is a rewrite of the current Elm parser - it has been switched to the PackCC PEG parser, from the optlib parser. The change brings a number of advantages:
Dec
inimport Json.Decode as Dec
) then the original module name will be listed in a moduleName field.It is compatible with the current optlib Elm parser, except for the corrections to scoping above.
There are some imperfections and limitations, mostly because of Elm's rules about whitespace sensitivity plus how the PEG parser works. Most notably, picking out tags inside let/in blocks will occasionally fail. I suspect the way to fix this is to use an entirely hand-written parser, but that's beyond my skills and desire.
Other limitations are listed in
peg/elm.peg
andman/ctags-lang-elm.7.rst.in
.To study the syntax of Elm, https://elm-lang.org/examples/hello is helpful (@masatake added).