Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Show capturing groups in --json mode #2325

Closed
pmkap opened this issue Oct 7, 2022 · 6 comments
Closed

Show capturing groups in --json mode #2325

pmkap opened this issue Oct 7, 2022 · 6 comments
Labels
enhancement An enhancement to the functionality of the software. question An issue that is lacking clarity on one or more points.

Comments

@pmkap
Copy link

pmkap commented Oct 7, 2022

I couldn't find any way to show capturing groups in --json mode.

Motivation:
Sometimes I'm interested in a group. After parsing the json, I need to filter the group in an additional step. Possibly with the same regex as in the first step. In such cases it would be nice if this was directly supported.

Example how this feature could look like:

echo 'Hello! !World! !foo!' | rg --json '!(\w+?)!' | jq

"submatches": [
  {
    "match": {
      "text": "!World!"
    },
    "groups" : [
      {
        "1": "World"
      }
    ]
  },
  {
    "match": {
      "text": "!foo!"
    },
    "groups" : [
      {
        "1": "foo"
      }
    ]
  }
]
@BurntSushi
Copy link
Owner

Related #1872

Could you please provide an end-to-end use case where you'd want this? My suspicion is that it isn't necessary. The other issue here is that resolving capturing groups can be slow, so it would need to be behind a flag, i.e., --json-captures.

@BurntSushi BurntSushi added enhancement An enhancement to the functionality of the software. question An issue that is lacking clarity on one or more points. labels Oct 7, 2022
@pmkap
Copy link
Author

pmkap commented Oct 7, 2022

Thank you for your help!

Like you suspected, I actually did find a solution (without the --json flag).

My use case was the following:
I am using Logseq for taking notes. It has pages that can reference each other with either [[ref]] of #ref. The idea that I'm playing around with is to search a directory for all those references with rg and hook the results into vim's completion.

The solution I now have is rg -o -e '\[\[(?P<g1>.+?)\]\]' -e '#(?P<g2>[^\s#]+)' -r '$g1$g2'

I stumbled upon the --json flag in the manpage and thought it was a good idea to use...

@pmkap pmkap closed this as completed Oct 7, 2022
@acheronfail
Copy link

acheronfail commented Jun 21, 2023

I'd like to consider re-opening this issue, since I have a use case for it!
I have built a tool that wraps ripgrep - it's called repgrep: https://github.com/acheronfail/repgrep/.

If we could somehow provide capturing groups in the JSON output (don't mind if it's gated behind a flag or something) then it would enable using repgrep to replace capturing groups, so users could match on something like foo (\w+) and then, when replacing with repgrep, they could use something like bar $1 to use the capturing group in the replacement text.


I suppose the only way I could work around this, is if I used ripgrep as a lib (which still seems to be in the experimental phase). That would require a significant refactor of my tool, though - and until libripgrep is stable, I don't think it's a good idea for repgrep.

@BurntSushi
Copy link
Owner

@acheronfail Is it possible for you to just re-run the regex on the matched lines to get capture groups?

@acheronfail
Copy link

acheronfail commented Jun 22, 2023

@BurntSushi that is a possible workaround, yes.

In fact, it does seem like this is the strategy that VSCode uses:

So, my program doesn't depend on any regular expression functionality right now, so including a regex crate and using that to match on the lines would definitely be a solution to the issue. I can't imagine it would be that hard, perhaps the only thing is detecting whether a regular expression with groups was passed... I imagine a regex crate would be able to tell me this in some way.

In terms of a performance trade off - I don't think it would be that bad. Since I only need to perform the regular expression matches on visible lines (repgrep is a terminal user interface) I can make it fast enough.

The main bottleneck in performance in repgrep though, is reading the JSON output itself , but that's only really fixed by using ripgrep as a lib.

Long story short: @BurntSushi I think it makes sense for repgrep just to re-run a regular expression on each matched line as needed! Forget about my comment, but I really appreciate you taking the time to humour me! ❤️

@CabalCrow
Copy link

I would like to reopen the issue since I also have a use case for it.

I'm using nushell to parse strings into structured data, but I want to use rg for the actual regex. The parsing requires the capture groups numbers & names, as well as what they actually capture to structure the data (each capture group number/name represent a column & each row is a match). Having a --json-captures flag would help a lot with this, since I could just parse the json instead to create that structured data. Currently I'm trying to just manually obtain the captured group names (via a regex checking the regex given to rg) & numbering to then programically create -or '1: $1\n...name:$name\n..' output for rg to then parse. This is obviously not the best solution - it would be more sensible to directly get the capture group names & what their contain directly from rg.

Additionally --json-captures is going to be very useful for debugging purposes when using rg.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement An enhancement to the functionality of the software. question An issue that is lacking clarity on one or more points.
Projects
None yet
Development

No branches or pull requests

4 participants