-
-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf(syntax): short-circuit if name matches language_id
#12407
base: master
Are you sure you want to change the base?
perf(syntax): short-circuit if name matches language_id
#12407
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good find. This code will probably change quite a bit soon with the new TS implementation (and with the new config system I want to change language detection also) but lets take the easy win for now and this will also remind us to be mindful/retest once it gets rewritten.
I wonder if we could add somekind of benchmarks for these kinds of thigns. Testing with flamegraphs is nice for discovery but a benchmark suite to test for regressions would be nice (for example based on brunch)
I have started looking into setting up some benchmarking infrastructure to see what kind of approach is simplest. I was looking into divan, marketed as being simple to use and setup, as I had heard good things about it. this part is particularly enticing to me. Rather than having to move a bunch of code around to specific modules/paths, and have to deal with private functions that might be important to bench, and making wrappers for them, dealing with feature flags, like a cargo tree for divan v0.1.17 (D:\source\divan)
├── cfg-if v1.0.0
├── clap v4.5.23
│ └── clap_builder v4.5.23
│ ├── anstyle v1.0.10
│ ├── clap_lex v0.7.4
│ └── terminal_size v0.4.1
│ └── windows-sys v0.59.0
│ └── windows-targets v0.52.6
│ └── windows_x86_64_msvc v0.52.6
├── condtype v1.3.0
├── divan-macros v0.1.17 (proc-macro) (D:\source\divan\macros)
│ ├── proc-macro2 v1.0.92
│ │ └── unicode-ident v1.0.14
│ ├── quote v1.0.38
│ │ └── proc-macro2 v1.0.92 (*)
│ └── syn v2.0.94
│ ├── proc-macro2 v1.0.92 (*)
│ ├── quote v1.0.38 (*)
│ └── unicode-ident v1.0.14
└── regex-lite v0.1.6
|
[dev-dependencies]
└── mimalloc v0.1.43
└── libmimalloc-sys v0.1.39
└── libc v0.2.169
[build-dependencies]
└── cc v1.2.7
└── shlex v1.3.0 I'll definitely look into this more, and open an issue about a path forward, if I can find an obvious one, that isn't going to be a hassle to maintain. |
I love divans API but unfortunaetly it doens't do any statistical analyses at all so I don't quite trust the results.... Also no tracking between runs. criterion is way too heavy but I have been having a preference for brunch so far (but divan definnitly has the nicest api by far) |
Ah, forgot to ask if this is intended to be used in CI or just locally, when making changes? Or perhaps we start with the easiest option first, local, to dip our toes, and then get more advanced as needed? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a behavior change, no? If a language doesn't have an injection-regex
set then it currently won't be considered for injection and the injection regex might not necessarily accept the language's name as we have it written in languages.toml
.
I think we should separate how we lookup languages when setting the injection.language
property - i.e. (#set! injection.language "bash")
- vs. capturing - i.e. (info_string (language) @injection.language)
. For capturing we should probably use a regex (maybe with this change too, I'm not sure) but for the property we should only accept language names (no regex). Then for the common case (property) we use the fast lookup and only resort to the slower regex for @injection.language
captures.
So you mean like there is a false positive, where it can find based on |
Previously, there would always be a
Regex::find
that would take place, even if the name perfectly matches an id(name
in thelanguage.toml
). This PR introduces a short-circuit opportunity, to check if the name matches the id, as regex is much more expensive and shouldn't be done unless as a fallback, especially when the most often case appears to be an exact match.The workload here was to go into
helix-term::commands::typed.rs
, and at the start of the file, enter insert mode, and hold enter to insert newlines until it reaches line ~1000. This is artificial, but it shows the point well. In just this operation, it accounted for 16% of samples. After, 0.5%.Before:
After: