-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Construct Tree Sitter queries only once #18
Conversation
50s -> 12s is a big improvement, thank you. But even 12s still seems quite long. How much faster is ripgrep? |
Nvm I'll just run it and see myself |
Running at the base of rust-lang/rust. Wow! |
|
No we just naively walk the directory at the moment. Should open some issues for |
Regarding why it's so much slower, it's because tree-sitter parsing is slow. Here's a flamegraph of the latest version: Since it's CPU-bound, we might get some speed-up from parallelising this. It should be straightforward with rayon. Maybe we can figure out a way to exclude files without annotations before they're parsed. Straight regex testing for Edit: it seems that GitHub removes the interactive parts of the flamegraph. The stuff on the left that you can't read is |
I was wondering if
Firstly, this is one of those moments where LLMs amaze me. That took <5 minutes... Secondly, it makes me question what value our tool adds. Tree sitter integration, potentially? Some other yet-unknown feature we could build? |
This makes it easier to profile the program with a profiler (e.g. cargo flamegraph), even if the Rust main won't be distributed.
Creating the queries isn't free, and when we're parsing lots of files in directoy, it becomes a hotspot.
This is the speed-up mentioned in #14 (comment). I figured it was easy enough to just do it. I can post more rigorous results, but here's why I did this:
I was benchmarking the tool to see if filtering afterwards made any difference. I noticed that when parsing multiple files, the same query was being re-created, and it wasn't free. My test case was taking https://github.com/rust-lang/rust, adding tags to the comments with sed [1], and running
anot .
in the repository root. Before this PR, it took ~50s. Now, it takes ~12s. It doesn't make any difference if it runs on a single file.[1]
git ls-files '*.rs' | xargs sed -i '/[^\/]\/\/ /s/\/\/ /\/\/ @tag: /'