Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

escape_sequence node is omitting information #45

Open
jgomezb11 opened this issue Jan 26, 2023 · 7 comments
Open

escape_sequence node is omitting information #45

jgomezb11 opened this issue Jan 26, 2023 · 7 comments

Comments

@jgomezb11
Copy link

When trying to force a keyword or number into a string as in the following example:

foo: "*"
foo2: '&'
number: '10'

The parsed result omits everything inside the double or single quotes, which means a loss of information compared to the initial file. Everything inside the quotes should be considered as a string and parsed as such, not omitted.

Or am I missing something?

@char0n
Copy link

char0n commented Jan 27, 2023

Using https://ikatyang.github.io/tree-sitter-yaml/ to reproduce, your fixture parses as correct CST tree

foo: `double_quote_scalar`
foo2: `single_quote_scalar`
number: `single_quote_scalar`

@jgomezb11
Copy link
Author

You are right, it parses correctly... but it still omits information. Another example to make myself clear:

Using the same tool, https://ikatyang.github.io/tree-sitter-yaml/, if you try to parse:

foo: "\n"

It will generate a double_quote_scalar node that has a child named escape_sequence which refers to the information inside the quotes (in this case \n)...

That doesn't happen when trying to force a keyword into a string like my first example (foo: "*"). The parsed result has a double_quote_scalar without a child that refers to the content inside the quotes.

That's why I'm saying that the parser omits information.

@char0n
Copy link

char0n commented Feb 3, 2023

It will generate a double_quote_scalar node that has a child named escape_sequence which refers to the information inside the quotes (in this case \n)...

Exactly. parser detected that double_quote_scalar CST node has child of escape_sequence.

That doesn't happen when trying to force a keyword into a string like my first example (foo: "*"). The parsed result has a double_quote_scalar without a child that refers to the content inside the quotes.

Because the double_quote_scalar CST node doesn't have an escape_sequence child, as the original source string doesn't contain escape sequences.

That's why I'm saying that the parser omits information.

I don't see your point. It does not omit anything. It parses what it sees. If it intercept the escape sequence in double quote scalar it will parse it, it if doesn't see any escape sequences in double quote scalar, it does not produce any CST nodes.

@jgomezb11
Copy link
Author

If it intercept the escape sequence in double quote scalar it will parse it, it if doesn't see any escape sequences in double quote scalar, it does not produce any CST nodes.
That's exactly the problem.

You are right when you say that the escape_sequence node only looks for occurrences of an escape sequence but then there should be another type of node that matches the contents of the quotes if it does not have escape sequences; otherwise, it is as if that the source string does not exist.

Another example

foo: "foo \n"

In this case there is a child node that points to \n but there isn't a child node that refers to the first part of the string (foo ) resulting in a loss of information.

Graphical representation of the example:

image

As you can see there is a child node that points to a newline but the rest of the source string is nowhere to be found.

@char0n
Copy link

char0n commented Feb 7, 2023

Right, I understand what you're saying now.

I'm not an author of this library, but I use grammar to create syntactic analyzer on top of the CST, that this grammar produces. In the case of foo: "foo \n", I take the content of double_quote_scalar node and run an unraw operation on it.

I don't care if the double_quote_scalar contains escape_sequence. Can't you just ignore escape_sequence as I'm doing?

@jgomezb11
Copy link
Author

Oh, that's interesting... I'll look to see if I can implement something similar.
Thank you for replying to my issue I hope some maintainer will someday look into this as well.

@char0n
Copy link

char0n commented Feb 8, 2023

Np, just try to think of it, as if double_quote_scalar not having any children and escape_sequence doesn't exist. I use this implementation of unraw in javascript: https://www.npmjs.com/package/unraw

There will be tools for other languages in their standard or vendor libraries I'm sure.

There are actually more things that needs to be done for getting value out of double_quote_scalar: here is implementation I did some time ago: https://github.com/swagger-api/apidom/blob/main/packages/apidom-ast/src/yaml/schemas/canonical-format.ts#L142

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants