Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Elm: Rewritten using PackCC PEG parser #3312

Merged
merged 11 commits into from
Mar 30, 2022

Conversation

niksilver
Copy link
Contributor

@niksilver niksilver commented Mar 21, 2022

This is a rewrite of the current Elm parser - it has been switched to the PackCC PEG parser, from the optlib parser. The change brings a number of advantages:

  • Much more forgiving of source code layout, and therefore more reliable. For example:
    • comments and commented-out code are properly ignored;
    • constructors do not have to each appear on a new line with a | or = ahead of them;
    • if a constructor has the same name as its type its tag is still generated;
    • key top level elements (such as the LHS of a function definition) can span multiple lines freely and still be recognised correctly.
  • Correct scoping. For example: fields, types, etc are scoped to their module; constructors are scoped to their type; functions defined in let/in blocks are scoped to the outer function.
  • Signatures are generated for functions and constructors.
  • If a namespace is used (for example Dec in import Json.Decode as Dec) then the original module name will be listed in a moduleName field.

It is compatible with the current optlib Elm parser, except for the corrections to scoping above.

There are some imperfections and limitations, mostly because of Elm's rules about whitespace sensitivity plus how the PEG parser works. Most notably, picking out tags inside let/in blocks will occasionally fail. I suspect the way to fix this is to use an entirely hand-written parser, but that's beyond my skills and desire.

Other limitations are listed in peg/elm.peg and man/ctags-lang-elm.7.rst.in.


To study the syntax of Elm, https://elm-lang.org/examples/hello is helpful (@masatake added).

@masatake
Copy link
Member

Thank you for contribution.
I'm in busy for 3 weeks. So my review may go slow.
Please allow me to give you comments incrementally.

@masatake
Copy link
Member

You can run the test cases locally: make check.
tmain and units are the major sub targets of check.

$ make tmain

Failed tests
============================================================
list-fields-with-prefix/stdout-compare
list-roles/stdout-compare
list-fields/stdout-compare

make: *** [Makefile:7207: tmain] Error 1
...

Let me explain how to use misc/review script that may help you review the result of text execution.

$ misc/review
misc/review
1) <n>ext
2) <S>hell
3) <R>un
4) <q>uit
[1/3 [ Tmain/list-fields.d ]]? S
S
review> ls
ls
exit-expected.txt  input.java  run.sh		    stdout-actual.txt  stdout-expected.txt
input.c		   input.sh    stderr-expected.txt  stdout-diff.txt
review> cat stdout-diff.txt
cat stdout-diff.txt
--- ./Tmain/list-fields.d/stdout-expected.txt	2022-03-02 03:06:45.182972975 +0900
+++ /home/yamato/var/ctags-github/Tmain/list-fields.d/stdout-actual.txt	2022-03-22 06:23:49.818317916 +0900
@@ -39,0 +40 @@
+-	moduleName	yes	Elm	s--	no	--	actual name of renamed module
review> mv stdout-actual.txt stdout-expected.txt # if the difff matches your expectation.
review> exit
[1/3 [ Tmain/list-fields.d ]]? R
R
REPOINFO   main/repoinfo.h
  CCLD     ctags
  RUN      tmain

Testing list-fields
------------------------------------------------------------
stdout                                                      passed
stderr                                                      passed
exit                                                        passed

[1/3 [ Tmain/list-fields.d ]]? n
1) <n>ext
2) <S>hell
3) <R>un
4) <q>uit
[2/3 [ Tmain/list-fields-with-prefix.d ]]? ...

It seems that all cases in Units are passed. So I think I don't have to write how to use misc/review for Units.

@masatake
Copy link
Member

You don't have to add elm.c and elm.h. Our build system may generate them.

@masatake
Copy link
Member

I said I would rewrite the Elm parser using mtable-regex.
However, the rewritten was stopped. (So I'm happy with your peg based parser).
I read a book about Elm written in Japanese. I wondered how to handle type spec(?).

(Taken from "Elm" entry of Wikipedia)

hypotenuse : Float -> Float -> Float
hypotenuse a b =
    sqrt (a^2 + b^2)

typeref field for hypotenuse may be typeref:typename:Float.
The question is signature field. I wonder how it should be.
We have to integrate "a b" and "Float -> Float" into a string and put it to signature field. I wonder how the string should be. What I know is C, so one of an idea is "a Float, b Float". However, it may be odd for Elm programmer.
This question may be applicable to Haskel and/or other parsers for functional programming languages.

Another idea is extracting two tags, one for the function, another is for the type.
Consider Elm parser has kinds: function and typedecl. With the two kinds, the parser can extract like:

hypotenuse ...;" kind:typedecl  signature:Float -> Float typeref:typename:Float...
hypotenuse ...;" kind:function signature:a b...

or

hypotenuse ...;" kind:typedecl  typeref:typename:Float->Float->Float ...
hypotenuse ...;" kind:function signature:a b...

.

There are the fruits of my short study.
So I wonder how you handles these items:-P.

@niksilver
Copy link
Contributor Author

@masatake many thanks for your feedback. Responses below...

On make tmain:

I'm very sorry about those test failures - obviously hadn't read the documentation on contributing with sufficient care, as I didn't know those tests existed. I will review, fix, and update. Thank you for your guidance above - that will be helpful.

On elm.c and elm.h:

I am happy to delete them, and will do that.

On rewriting the Elm parser using mtable-regex:

I misunderstood your comment about that - I thought you were suggesting that as an idea for my work. I didn't realise you were planning to do it, and I didn't mean to derail your project. Apologies. However, I'm pleased you're happy with the idea for this parser.

On the typeref and signature fields:

I am not an expert on ctags, nor on language design. However, I will tell you what I know about Elm.

Consider the following function:

myFunc : A -> B -> C
myfunc a b = ...

And suppose that a1 is of type A and b1 is of type B. We can do this:

x = myFunc a1 b1    -- Value x is of type C

y = myFunc a1       -- Value y is of type B -> C

z = myFunc          -- Value z is of type A -> B -> C

It's very common in Elm to pass around values such as y and z, as well as x. You can see there isn't the same distinction between the "input types" and the "return types" as in other languages. That is why all the types are separated by the same symbol (the arrow).

So when thinking about the typeref of myFunc it is not a function of type C, because it depends how many parameters you call it with. I don't think Elm programmers would find it useful to say it is of type C. I think it has typeref A -> B -> C.

On the subject of signatures you gave this example:

hypotenuse : Float -> Float -> Float
hypotenuse a b =
    sqrt (a^2 + b^2)

and said this:

We have to integrate "a b" and "Float -> Float" into a string and put it to signature field.

In fact this is not true for Elm - and it surprised me when I was learning Elm!

For example, look at the API documentation for the String type. You'll see all the signatures are just sequences of types with arrows - none of them have any names.

As a very specific example, the source code for repeat takes two arguments called n and chunk. But the API documentation for repeat has no argument names - just types. The argument names don't appear in Elm and aren't meaningful when referencing the function.

So in summary, I think the signature for myFunc is the whole sequence A -> B -> C, and the signature for hypotenuse is Float -> Float -> Float - with no names at all. This is surprising to those of us with experience of some other languages, but this is what we see in Elm.

I am also saying that I think in Elm the signature and the typeref are the same thing. I'm sure it is better to have one and not both. At the moment the parser uses signature and not typeref, but if you think it's important to have typeref and not signature, please say so.

I hope that is all helpful. I welcome your thoughts.

@masatake
Copy link
Member

masatake commented Mar 23, 2022

Thank you for the great explanation.

And suppose that a1 is of type A and b1 is of type B. We can do this:

x = myFunc a1 b1    -- Value x is of type C

How do you call x derived from myFunc?
Specializing/specialized? I'll use this words in this comment.

So in summary, I think the signature for myFunc is the whole sequence A -> B -> C, and the signature for hypotenuse is Float -> Float -> Float - with no names at all. This is surprising to those of us with experience of some other languages, but this is what we see in Elm.

Oh, I see. Let’s focus on myFunc as the target input to go deeper.
I’m talking about an ideal parser. Implementation doesn’t have to support all items here.
However, the implementation should not have a conflict with the ideal design.
What kind of output does the ideal parser make?
I will illustrate tags files incrementally.
All the example output I will show here have names like "tagsNmasatake".
So you can specify one of them explicitly in your comment.

input.elm:

1: myFunc : A -> B -> C
...
9: myFunc a b = ...

Consider 1 and 9 are the line numbers in the input file.

With your idea, the tags extracted from the input are:

myFunc ...;" kind:function signature:A->B->C line:9

(tags0masatake)

Understandable. However, this output lost some information like:
X1. the name of parameters a and b
X2. the relation between a and myFunc
X3. the relation between b and myFunc
X4. the line number (1) for the type declaration

Even if an Elm programmer specialized version of myFunc, storing the information about the original myFunc still makes sense.

Thinking about X2, and X3. the following one may be better:

myFunc ...;" kind:function signature:a b typeref:typename:A->B->C line:9

(tags1masatake)

Thinking about X1, a and b can be tagged:

myFunc ...;" kind:function signature:a b typeref:typename:A->B->C line:9
a ...;" kind:parameter scope:function:myFunc nth:0 line:9
b ...;" kind:parameter scope:function:myFunc nth:1 line:9

(tags2masatake)

Thinking about X4, making a tag for the typedecl may be better:

myFunc ...;" kind:typedecl typeref:typename:A->B->C line:1
myFunc ...;" kind:function signature:a b line:9
a ...;" kind:parameter scope:function:myFunc nth:0 line:9
b ...;" kind:parameter scope:function:myFunc nth:1 line:9

(tags3masatake)

typename next to typeref is a placeholder.

"tags3mastake" records most of all raw information in the source code.
Extracting "raw information" and passing it to client tools is one of ctags mottos.

However, the most of tools reading tags file expect the function kind tag has the typeref field.
So, if the parser is not only ideal but also friendly, the parser emits:

myFunc ...;" kind:typedecl typeref:typename:A->B->C line:1
myFunc ...;" kind:function signature:a b typeref:typename:A->B->C line:9
a ...;" kind:parameter scope:function:myFunc nth:0 line:9
b ...;" kind:parameter scope:function:myFunc nth:1 line:9

(tags4masatake)

As the first implementation, the following tags file has no conflict with tags4masatake:

myFunc ...;" kind:typedecl typeref:typename:A->B->C line:1
myFunc ...;" kind:function line:9

(tags5masatake)

Making tags5masatake more friendly is acceptable:

myFunc ...;" kind:typedecl typeref:typename:A->B->C line:1
myFunc ...;" kind:function typeref:typename:A->B->C line:9

(tags6masatake)

We can shrink more:

myFunc ...;" kind:function typeref:typename:A->B->C line:9

(tags6masatake tags7masatake (edited))

I read Units/parser-elm.r/elm-signatures.d.
The difference between the expected.tags and tags6mastake is only the name of the field: typeref or signature.

As I wrote, for reserving the space for recording a b of myFunc in the future implementation, I think using typeref is better.
However, I have not considered other parts of Elm yet.

@masatake
Copy link
Member

It seems that we are in the same time zone :-).

Run make -BC win32 and see git diff. You will see a very minor difference. Please, include it to your commit.

@masatake
Copy link
Member

masatake commented Mar 23, 2022

With the following steps, you may be able to reproduce the failures observed in CI/CD environments.

make clean
./configure --enable-debugging
make CFLAGS='-g -O0'
make units LANGUAGES=Elm

(Edited for adding CFLAGS)

@masatake
Copy link
Member

[jet@living]~/var/ctags-github% cat /tmp/input.elm    

type A = A
[jet@living]~/var/ctags-github% ./ctags /tmp/input.elm

ctags: main/numarray.c:178: intArrayRemoveLast: Assertion `current->count > 0' failed.
ctags: main/numarray.c:178: parsing /tmp/input.elm:1 as Elm
zsh: IOT instruction (core dumped)  ./ctags /tmp/input.elm

@masatake
Copy link
Member

It seems that the scope stack is popped though it is empty.

@masatake
Copy link
Member

It seems that the bug from the peg_common.h. I will fix it.

peg/elm.peg Outdated Show resolved Hide resolved
peg/elm.peg Show resolved Hide resolved
Error reported is

```
ctags: main/numarray.c:178: intArrayRemoveLast: Assertion `current->count > 0' failed.
```

caused by unbalanced SET_SCOPE/POP_KIND. Thanks to Masaktake Yamato for
[finding the
cause](universal-ctags#3312 (review)).
Following an earlier comment, committed files generated
by `make -BC win32`.

Earlier comment is:
universal-ctags#3312 (comment)
@codecov
Copy link

codecov bot commented Mar 24, 2022

Codecov Report

Merging #3312 (1f4c0e6) into master (4492555) will increase coverage by 0.10%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master    #3312      +/-   ##
==========================================
+ Coverage   85.31%   85.41%   +0.10%     
==========================================
  Files         212      216       +4     
  Lines       49987    50278     +291     
==========================================
+ Hits        42645    42947     +302     
+ Misses       7342     7331      -11     
Impacted Files Coverage Δ
peg/elm_post.h 100.00% <100.00%> (ø)
parsers/rspec.c 91.78% <0.00%> (-0.72%) ⬇️
main/read.c 95.44% <0.00%> (-0.54%) ⬇️
optlib/pod.c 0.00% <0.00%> (ø)
optlib/rdoc.c 100.00% <0.00%> (ø)
parsers/rake.c 93.87% <0.00%> (ø)
parsers/yamlfrontmatter.c 98.33% <0.00%> (ø)
parsers/frontmatter.c 94.73% <0.00%> (ø)
main/parse.c 95.72% <0.00%> (+<0.01%) ⬆️
main/kind.c 98.09% <0.00%> (+0.01%) ⬆️
... and 5 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4492555...1f4c0e6. Read the comment docs.

@niksilver
Copy link
Contributor Author

niksilver commented Mar 24, 2022

Thank you for your continued feedback and support, @masatake. I will need to spend some time writing a reply to your questions and suggestions. I hope that will be very soon.

peg/elm_post.h Outdated Show resolved Hide resolved
@masatake
Copy link
Member

I found "X4. the line number (1) for the type declaration" doesn't make sense.
I assumed the following Elm input is valid:

import Html exposing (text)

f: String
g: String

f = "Y"
g = "Z"

main =
  text ("Hello!" ++ f ++ g)

However, the language processor behind https://elm-lang.org/examples/hello reports an error:

NAME MISMATCH
[Jump to problem]()
I just saw a type annotation for `f`, but it is followed by a definition for
`g`:

3| f: String

4| g: String

   ^
These names do not match! Is there a typo?

    g -> f

A definition and its type decl must be in close.

So ideal tags output may be:

myFunc ...;" kind:function signature:a b typeref:typename:A->B->C line:9
a ...;" kind:parameter scope:function:myFunc nth:0 line:9
b ...;" kind:parameter scope:function:myFunc nth:1 line:9

(tags8masatake, derived from tags4mastake)

Elm programmers may think calling "a b" "signature" is incorrect.
Maybe they call it "parameter". However, the "signature" field is statically allocated in "struct tagEntryInfo".
Reusing it is not so bad. However, such misusing words must be explained on the per-parser man page.

@niksilver
Copy link
Contributor Author

niksilver commented Mar 26, 2022

Thank you for your continued feedback and support. That's appreciated.

I’m talking about an ideal parser. Implementation doesn’t have to support all items here.
However, the implementation should not have a conflict with the ideal design.

These are good philosophies. I understand and support those.

Extracting "raw information" and passing it to client tools is one of ctags mottos.

That helps me understand your comments better, and I understand your comments X1 to X4.

Before I comment on your feedback, I should explain some more about Elm.

More about Elm

(elm1nik) The following code shows a function definition, and just before it is the type annotation:

hypotenuse : Float -> Float -> Float
hypotenuse a b =
    sqrt (a^2 + b^2)

As you have discovered for yourself, a type annotation has to come immediately before its corresponding function definition. Only whitespace and comments are allowed in between. Since there is never any reason to put a comment between a type annotation and its function definition, in practice you really only ever see a type annotation on the line immediately before its function definition.

(elm2nik) Consider the following code:

type MyType = Cons String

This code creates a new type called MyType, and it has a constructor called Cons, which takes one String parameter. If we were to write x = Cons "abc" then the value x is of type MyType.

The constructor Cons is of course a special type of function, just like contructors in other languages. The type of Cons is String -> MyType. But unlike an ordinary function there are no named parameters.

Signatures and typerefs

So now to discuss signatures and typerefs (which are concepts in ctags) and the things we find in Elm - and we should try to link them together.

One thing in Elm is the thing that describes a function's type, which looks like A -> B -> C or String -> MyType. Or if we create a simple value by writing f = "Y" then the type is just String.

I agree that this is best described as a ctags typeref.

Another thing in Elm is the thing that looks like a b in the hypotenuse function above. These are the function's own names for its parameters. I am okay when you say this:

Elm programmers may think calling "a b" "signature" is incorrect. Maybe they call it "parameter". However, the "signature" field is statically allocated in "struct tagEntryInfo". Reusing it is not so bad. However, such misusing words must be explained on the per-parser man page.

We agree that this is not perfect, but it's acceptable given what is already built into ctags and that I will explain it in the per-parser man page.

The user of the function won't see these names, but given the motto of ctags (Extracting "raw information" and passing it to client tools) I agree it is useful to include these in the ideal parser.

We should be aware that constructors will not have this kind of "signature" because of elm2nik above. However, they will have the typeref.

You also say

I found "X4. the line number (1) for the type declaration" doesn't make sense.

I also agree with this. It is because of (elm1nik), which you have also found for yourself. It does not make sense for the type declaration be separate from the function or constructor tag. It should be part of those.

At this point we agree on everything!

Typeref field

The next thing I want to talk about is what kind of typeref we are using.

You use the field typeref:typedecl (type declaration), for example typeref:typedecl:A -> B -> C. But I think we can use a better word than "declaration". This word implies announcing something new - for example, when definining a custom type. I don't think this is correct for this situation. Also, I don't think it is a type "name", because A -> B -> C is not really a name in the usual sense, even though A and B and C are names individually.

I think the best words to use are any of these:

  • typeref:description:A -> B -> C
  • typeref:desc:A -> B -> C
  • typeref:typedesc:A -> B -> C
  • typeref:type:A -> B -> C which is already used in the Go parser

Please let me know what you think about these options. I will use typeref:description because I think it is the most clear of the options and does not duplicate the word "type". But if you have a strong opinion otherwise, please let me know.

Mistake in constructor signatures

In Units/parser-elm.r/elm-constructor-signatures.d the expected tags file looks like this:

B	input.elm	/^type B$/;"	t	roles:def
B3Cons	input.elm	/^    | B3Cons$/;"	c	type:B	signature:	roles:def
B2Cons	input.elm	/^    | B2Cons String Integer$/;"	c	type:B	signature:String Integer	roles:def
B1Cons	input.elm	/^    = B1Cons$/;"	c	type:B	signature:{ x : Float , y : Float }	roles:def

Given all the above, I need to change this the signature to a typeref. But also what follows it is wrong. The typeref for B2Cons appears as String Integer but it should be String -> Integer -> B because B2Cons is a constructor that takes a string and an integer as input and returns a B.

Tagging parameters

You give some examples of tagging individual parameters (such as a and b above). I think that is good for the ideal parser, but I will not plan to implement it. It can be something for the future.

Summary

Now I have these actions:

  • Change signature to typeref. I will use typeref:description for now, unless you have a strong opinion otherwise.
  • Add a signature to functions, which is really the parameter list, like a b. As part of this I will explain it in the per-parser man page.
  • Fix the mistake in constructor signatures (which will also become typerefs). This will be more difficult and take me some time, so I may just remove the problematic field for now, because it's wrong, and add a correct field later.

And from one of your earlier comments

@niksilver
Copy link
Contributor Author

The actions above have been done now. (I had a bit more time available than I expected.)

@masatake
Copy link
Member

I read your comment. I agree with you what you wrote.

About typeref, I would like to adjust your understanding.

As you know, the typeref field has two subfields.
Most parsers don't use the 1st field; it is just a placeholder.
C/C++ parsers use the 1st field for keeping compatibility.

input.c:

struct point { float x, y; };

struct point currentPoint(void)
{
  return CURRENT;
}

int nextOf (int n)
{
  return n + 1;
}

tags output:

point	/tmp/bar.c	/^struct point { float x, y; };$/;"	s	file:
x	/tmp/bar.c	/^struct point { float x, y; };$/;"	m	struct:point	typeref:typename:float	file:
y	/tmp/bar.c	/^struct point { float x, y; };$/;"	m	struct:point	typeref:typename:float	file:
currentPoint	/tmp/bar.c	/^struct point currentPoint(void)$/;"	f	typeref:struct:point
nextOf	/tmp/bar.c	/^int nextOf (int n)$/;"	f	typeref:typename:int

The developers of Geany IDE using ctags internally expects the 1st subfield holds the
the namespace of type like struct in typeref:struct:point.
However, how should the subfield be filled for float x or int nextOf (...)?
The C/C++ parser introduced "typename" as the placeholder for such type having no namespace.
"typename" may be a keyword of C++ as far as I can remember.
The parser maintainer thought removing the 1st subfield would be better like:

point	/tmp/bar.c	/^struct point { float x, y; };$/;"	s	file:
x	/tmp/bar.c	/^struct point { float x, y; };$/;"	m	struct:point	typeref:float	file:
y	/tmp/bar.c	/^struct point { float x, y; };$/;"	m	struct:point	typeref:float	file:
currentPoint	/tmp/bar.c	/^struct point currentPoint(void)$/;"	f	typeref:struct point
nextOf	/tmp/bar.c	/^int nextOf (int n)$/;"	f	typeref:int

I agreed with the parser maintainer, but I kept things as-is because 2-fields-typedef had no serious harm.
We have to put a placeholder at the 1st field. So tags file becomes larger.

These things are discussed in
#862 (comment)
#862

About Elm, you can put anything you want to the 1st subfield. You can put something important information in the field.
However, if you don't have important information", I recommend using the string "typename" as a placeholder.
If you have a plan to put something meaningful to the 1st subfield, I would like you to write it to the parser's man page.

While reading test cases, I got a question.
For the input

type Box a = Cardboard a | Wooden

How does the ideal parser store the information about a?

The current implementation emits:

Box	input.elm	/^type Box a = Cardboard a | Wooden$/;"	t	roles:def
Cardboard	input.elm	/^type Box a = Cardboard a | Wooden$/;"	c	type:Box	roles:def
Wooden	input.elm	/^type Box a = Cardboard a | Wooden$/;"	c	type:Box	roles:def

Good. There is no issue here. But, how about the ideal parser?

@niksilver
Copy link
Contributor Author

About Elm, you can put anything you want to the 1st subfield. You can put something important information in the field.
However, if you don't have important information", I recommend using the string "typename" as a placeholder.
If you have a plan to put something meaningful to the 1st subfield, I would like you to write it to the parser's man page.

If "typename" is a placeholder, then I will use that. I will make a note to change the code and the man page.

How does the ideal parser store the information about a?

a is called a type variable. It is a placeholder for some type that will be chosen by the user. For example List a means a list of strings, ints, or anything else.

I think the ideal parser would output something like this:

a	input.elm	/^type Box a = Cardboard a | Wooden$/;"	v	type:Box	roles:bound

or in more detail, this

a	input.elm	/^type Box a = Cardboard a | Wooden$/;"	kind:typevar	scope:type:Box	roles:bound

v would be a (new) one-letter code for "type variable". I think its role might be "bound" which is already used in the XSLT parser. The role "def" (defined) does not feel right. Or perhaps "placeholder" would be a good word.

In the end, I think one influence on this is what IDE developers (or other people) would like to use it for. But those are my ideas.

@masatake masatake mentioned this pull request Mar 28, 2022
5 tasks
peg/elm.peg Show resolved Hide resolved
peg/elm_post.h Outdated Show resolved Hide resolved
/* For a signature such as "a1 b2 c3" we want to transform it
* to "a1 b2 c3" for the signature field.
*/
static vString *collapseWhitespace (const char *sig)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(MY TODO) This can be moved to vstring.h in the future.

- (Format) typeref:description: -> typeref:typename:
- (Refactor) Use isspace() for clarity.
- (Man page) Man page passes `make man-test`.
- (Comments) Remove incorrect task in comments.
- (Comments) Add thanks in comments.
@niksilver
Copy link
Contributor Author

Thank you for those comment, @masatake. Those recommendations and changes have been made now.

it's the closest concept available in ctags.
Use "--fields=+S".

.. code-block:: Elm
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The followig change is needed to test this example with "make man-test".

diff --git a/man/ctags-lang-elm.7.rst.in b/man/ctags-lang-elm.7.rst.in
index d9fbfa786..0d236aa04 100644
--- a/man/ctags-lang-elm.7.rst.in
+++ b/man/ctags-lang-elm.7.rst.in
@@ -123,18 +123,39 @@ signature field. They are not really function signatures, but
 it's the closest concept available in ctags.
 Use "--fields=+S".
 
+"input.elm"
+
 .. code-block:: Elm
 
     funcA a1 a2 =
         a1 + a2
 
 "output.tags"
-with "--sort=no --extras=+r --fields=+rS"
+with "--options=NONE -o - --sort=no --extras=+r --fields=+rS input.elm"
 
 .. code-block:: tags
 
     funcA	input.elm	/^funcA a1 a2 =$/;"	f	signature:a1 a2	roles:def

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll fix this.

@masatake
Copy link
Member

The commits are well-tailored. However, this pull request is mostly proposing a new parser.
So the history between "optlib parser" to "peg parser" is not important.
So I will use "squash and merge" in this case.

@masatake
Copy link
Member

I have found some areas we can improve the Elm parser.
@niksilver, I would like to get your consultation.
Other than bug reports and bug fixes, I would like to use #3319 for discussing the Elm parser.

@masatake masatake merged commit 94e964e into universal-ctags:master Mar 30, 2022
@masatake
Copy link
Member

Thank you for your contribution.

@masatake
Copy link
Member

I made a pull request to improve the Elm parser a bit. See #3320.

@masatake
Copy link
Member

@niksilver, I have a question about the Elm language.
How do you call f in module X exposing (f) ?

A. f is exposed from X.
B. f is exported from X.
C. Either o.k.

As you assigned the "imported" role to the language objects that appeared in "import module Y exposing (language objects), I'm thinking about "exported" ( or "exposed" ) role.

@niksilver
Copy link
Contributor Author

niksilver commented Mar 30, 2022

How do you call f in module X exposing (f) ?

@masatake This is a good question.

Currently the parser says f is "imported". But perhaps that is incorrect. I tried to follow the original optlib design, but perhaps I made a mistake when I extended that design.

In the original optlib parser the module X is "imported", and that is correct. But I also said that function f was imported, and perhaps that is incorrect. Perhaps it is more correct to say it is "exposed". That language is also used in the official Elm guide:

we can import a module and use its exposed values.

Would you like me to change that?

It is not correct to say it is "exported", though - that is not a word used in Elm


Edited to emphasise that "exported" is not a word used in Elm.

@masatake
Copy link
Member

Would you like me to change that?

YES!

@masatake
Copy link
Member

masatake commented Mar 30, 2022

Howver, that will be incompatible change.
I wonder the change will break client tools.
As far as I know, a client tool referring the roles field may be...ctire only.

https://www.bitterjug.com/2017/02/06/elm-support-compiled-into-universal-ctags/


What I wrote here is wrong. We are talking about f, not about X.
f is newly tagged with the new pew parser. So there is no incompatibility with the old optlib parser.

@niksilver
Copy link
Contributor Author

Howver, that will be incompatible change.

I don't think the change is incompatible. The Elm optlib parser only marked the module as imported - the language objects exposed by the module were not tagged (and so had no roles).

The proposal here is to keep the module as "imported" (which is still correct) but mark the language objects exposed by the module as "exposed".

@masatake
Copy link
Member

@niksilver You are correct.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants