-
Notifications
You must be signed in to change notification settings - Fork 748
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
implement stricter env filter parsing #1542
Conversation
This is, obviously, a breaking behavior change as it stands. Also fun: no more |
98403f2
to
7535e53
Compare
Thanks for working on this, this is great! The filter-parsing code is definitely my least favorite part of I'll try to give a more thorough review of the new code soon, but for now, here are some thoughts on the "immediate discussion items":
Quotes are currently used to delimit string field values. We should probably allow other reserved characters (e.g.
The current behavior of
As I mentioned above, we probably want to allow it inside of quoted string values. We should probably treat it as reserved if it's not inside a quoted string.
We currently have filtering semantics for spans with multiple field values, so we should continue to accept that. We don't have semantics for filters with multiple span names in them, currently, so the parser should reject that.
Hmm...targets generally won't have whitespace in them, as they're usually Rust module paths. However, they can be overridden with arbitrary user-provided strings, which could (theoretically) contain whitespace, including leading whitespace. This is kind of an edge case, though. Therefore, I think we should probably trim whitespace after |
Regarding where the code should go,
I'd really prefer for the filter-parsing code to be part of the tracing repo, ideally in the However, since you wrote it (and certainly have a deeper understanding of it than I do), I'd love for you to be added to the Since you'd like to use the parser implementation separately in your own code, is it possible that there are additional API changes we could make in As a side note:
This sounds really cool! When your project is ready to release, we'd love it if you added it to the list of related crates, and maybe even write a blog post or something to show it off, if you're interested? |
Sounds good to me.
Ideally, I'd be able to just use More details about tracing-memory and tracing-eguitracing-memory: a layer that archives event/span data so the event history of the runtime can be inspected. tracing-egui: an [egui](http://lib.rs/egui) widget that displays the recorded event history in an egui context and allows filtering.An additional wrinkle on top is that it'd need to use a
Those two combined have made me settle on just serializing the field data of spans. (I'm just holding
Definitely! tracing-egui definitely should live as a separate project, but I'd even be super happy to contribute tracing-memory into tracing proper as a shared in-memory archive. Though... I also need to check out how you're recording events in tokio-console as well, because there's a very good chance you're cleverer than I am here. Either way, the data console is capturing has significant overlap with what I'm doing, so it makes sense to see what implementation overlap there is and try to exploit that, at least from my side of the deal. The "only" difference with what I'm trying out and console is that console is sending the observability data into a separate process, where I'm just trying to capture a nicely structured and searchable in-app debug log for my hobbyist game engine. (External tools are great, but with gamedev tight iteration cycles having a simple version in-app is often better, so long as it can be no-overhead for release.)
This is going to sound very patronizing, but.... trying it out, it doesn't look like this is strictly the case, at least as I want to interpret what you're saying. e.g. I'm running the test case (with crates-io tracing) of let filter =
tracing_subscriber::EnvFilter::try_new(r#"playground[span{name=bob}]=warn"#).unwrap();
tracing_subscriber::registry()
.with(tracing_subscriber::fmt::layer().pretty())
.with(filter)
.init();
let span = tracing::error_span!("span", name = "bob");
span.in_scope(|| {
tracing::error!(name = "bob", "oh no");
}); which logs the event, as
whereas the filter directive of // ...
span("bob");
}
#[tracing::instrument]
fn span(name: &str) {
tracing::error!(name = "bob", "oh no");
} then the polarity is flipped; For the full array, let name = "bob";
let span = tracing::error_span!("span", /* what goes here */);
and in all cases the log says in playground::span with name: "bob" At least with this investigation I know I'm not seeing things; the handling of Based on observed effects, it seems that a captured-by-
Do we want any way to escape quotes within the quoted directive field, or is just quote-to-quote enough? Current implementation: just passes the field string from There's also the question of whether non-prefix-quotes should be allowed or considered syntax errors. I a few reasonable paths forward, though I don't really like any of them:
I kinda lied when I said I don't like any of the options, or I figured out the last one while I was writing that, one of the two. The last option seems actually not bad, just a decent amount of work (not all of which is parsing input, which has accidentally become my specialty apparently). But... since unquoted strings are a thing at all, we need to make sure that filters which are trying to match
I think the most reasonable option here is:
(This would prohibit ever allowing e.g. Alternatively, we could always just bail on the current directive only and attempt to resume at the next
I... just want to make sure that everyone knows that yes, while multiple field values is theoretically supported by Otherwise yes, this seems fair; I'll hook up multiple field directives and give a syntax error on multiple span directives in a single directive. (It's worth noting that if this is recovered from, it will mean adding multiple span directives later would be a behavior-breaking change, thus I'm more in favor of just bailing on the rest of the directive if a syntax error happens not at the root level.)
I agree with your analysis here, though I do just want to bring up the question of both
Just never allowing internal Strangely enough, if we go with all of my suggested directions in this post (which I all stand behind as the best options), I don't know how much if any of the code I've already written into this PR would be usable. Probably not much, but the knowledge gained will allow a new, stronger approach once we've ironed out the kinks. Anyway, thanks for putting up with my perhaps overly pedantic approach and I'm glad I can help improve tracing even a little bit; it's a great system! (Also, the more code I sneak into rustc and rust-analyzer the better I feel about myself. Please go away, impostor syndrome.) |
Just a note: I'm committed at this point to effectively rewiring this PR to more directly meet the desires of tracing-subscriber. The direction of the PR remains relevant; specific implementation much less so. |
7535e53
to
e8a9d2f
Compare
MSRV failure is not me, it's the http crate. That said, I may have used MSRV-bumping language features accidentally, though I don't know because of http 🙃 Anyway: rewrite is done, and full parsing support is here! I've just updated the OP with the new status of the PR, and marked it ready for review since all the tests are passing. Some minor improvements do still remain (see OP), but the meat is there. Maybe not my best-designed parser implementation, but it's successful at its goals. |
By the way, the reason for phrasing the parsers as Or I could use a custom |
e8a9d2f
to
5286ab6
Compare
Did a force push to retrigger CI, so now MSRV failure is my fault. Will get to that next week probably; otherwise the PR is good to review. |
btw, MSRV is fixed on master, so updating this branch should work |
GitHub CI actually runs on the merge commit. I was mistakenly using nested or patterns, which are recently stabilized. Force push was a zero change push just to trigger a CI build on a new provisional merge. |
It looks like netlify published but never reported back. |
Rebased. Also, note that my analysis of the breaking behavior change note has been changed by #1378 being published on 0.1.X, which effectively just did the "mitigation" follow-up suggested, and changed how |
Update: I'm moving this effort into a separate effort over at https://github.com/CAD97/tracing-filter. |
Motivation
I'm working on a system that records tracing events in-process and displays them in a window, allowing filtering with
EnvFilter
-like syntax. As such I needed to parse the syntax, and the current parsing routine in tracing-subscriber is... subpar to say the least. Due to the way the regex are written/used, you can pretty trivially construct directive strings with very odd parses (e.g.[[[span[[]
which parses as[span]
, or[{{{a}}}]
which parses as[]
).As such, I wrote a stricter parser for the format for my own use, and this PR polishes it up to the level to use it as the env filter parser for tracing.
Also, fixes #1584.
Solution
Implement a better parser and use it!
Important breaking behavior change note
Beyond the obvious changes in how things parse, there's a subtle one:
[{name="bob"}]
. Previously, this would parse as a directive to match fieldname
against the regex"bob"
. Now, as quotes are semantic, it parses as a directive to match fieldname
against the regexbob
. Also note that filter regexes are anchored on both sides: this is a big behavior change!For how this corresponds to captures, as of master 0fa74b9:
tracing::*_span!("f", name="bob")
(captured by-Value
) matches the regexbob
and pretty displays asname: "bob"
,tracing::*_span!("f", name=%"bob")
(captured by-Display
) matches the regexbob
and pretty displays asname: bob
,tracing::*_span!("f", name=?"bob")
(captured by-Debug
) matches the regex"bob"
and pretty displays asname: "bob"
, and#[tracing::instrument] fn f(name: &str); f("bob")
matches the regexbob
.In all cases, the obnoxiously pretty formatter says
in f with name: "bob"
(i.e. when printed no visual difference is made between the methods of capture).Since string arguments are already captured by-
Value
and matched without strings in master, this will make filters with"bob"
go from matching spans capturing strings by-Debug
(rare?) to capturing strings by-anything-else (common). If they want the current behavior, they would have to use a regex hex escape currently.Plus, if I'm not mistaken, pre #1378, argument strings were captured by-
Debug
, not by-Value
, so they would need to be matched as"bob"
. This means #1378 broke behavior there, and this PR returns to"bob"
matching an argument captured string.Test changes
tracing_subscriber::filter::env::directive::parse_directives_with_special_characters_in_span_name
:"/=[}
are now forbidden in (unquoted) span names.tracing_subscriber::filter::env::callsite_enabled_includes_span_directive_multiple_fields
: previously, the directive didn't parse at all (EnvFilter cannot parse filters with multiple fields #1584) and thus the test was completely broken; now the directive parses and the interest is sometimes (which I think is correct).tracing_subscriber::filter::env::roundtrip
: previously, the directive was[span1{foo=1}]=error,[span2{bar=2 baz=false}],crate2[{quux="quuux"}]=debug
, which included the problematic directive[{bar=2 baz=false}]
; this has been changed to[{bar=2, baz=false}]
, with the alternative which would continue to parse successfully being[{bar="2 baz=false"}]
or[{"bar=2 baz"=false}]
.tracing_subscriber::filter::env::parsing
.Remaining work / decision items
val="3"
is equivalent toval=3
, and performs a structural match). Of note is that all structurally matched values can be spelled without reserved syntax, though this may potentially change with more structural captures."
, which is disallowed (ends the quoted field).tracing_subscriber::filter::env::parsing
Future work
/
? It's reserved currently as it is used by env_logger to specify a (global) regex filter for log messages.#[tracing::instrument]
to mitigate the big behavior change, discussed above.EnvFilter
to filter third-party things that "quack likeSpanRef<'_>
/Event<'_>
". This is actually my motivating factor in writing a new parser, to implement env-filter like filtering of serialized events/spans.Original PR message
parse-env-filter
is essentially part of this PR (github); I can inline the implementation here if desired, but I'll also be using the implementation separately, so it'd be nicer for me if it can live in a dedicated crate, either within this repo or in a repo I own.Remaining work
This isn't quite mergable in its current state yet. Known deficiencies that should be fixed before merging:
,
. (This means a parsed env filter can't support multiple field bounds, even thoughDirective
nominally supports it! Whoops...) Whileparse-env-filter
properly handles comma-separated field directives (and span directives, which would be a new feature), this PR currently still does the top-levelsplit(',')
for convenience of implementation.parse-env-filter
currently completely bails on a directive containing the"
(or/
) character. This is because I haven't (but plan to) implement some form of quoting for the fields of the directive, so that the fields may contain syntax characters within the quotes. The exact quoting syntax needs to be decided, and the implementation work needs to happen./
is reserved because ofenv_logger
's syntax; this one might be desirable to keep as an error because of that?parse-env-filter
doesn't do any error recovery; it just bails upon bad syntax and calls the rest of the directive string invalid. The current impl can recover, as every,
is treated as the start of a new directive. Allowing,
inside a directive, mismatched brackets, and more, causes error recovery to continue parsing directives a much harder task, though.parse-env-filter
's behavior matchestry_new
's, butnew
's behavior of "best effort parse" is at odds withparse-env-filter
's goal of more strict, well-defined parsing.parse-env-filter
deliberately does no whitespace trimming; whitespace is provided as part of the directives.EnvFilter
needs to decide if it wants to trim whitespace, and do so if it wants to.Known test failures
filter::env::directive::test::parse_directives_with_special_characters_in_span_name
: special characters include"
(parser support todo) and[
/{
//
/, which are deliberately disallowed. How to handle
"quoting should be decided, I need to then implement
"handling, and the forbidden nature of
[/
{is an ideological question:
parse-env-filter` aims to be strict and validate, whereas the current impl is more forgiving and loose with the filter syntax.filter::env::tests::callsite_enabled_includes_span_directive_field
: test includes quoted field value (parser support todo)filter::env::tests::roundtrip
: test includes quoted field value (parser support todo), and the formatting of the directive which is being checked to roundtrip includes the quotes as well, so this is more inherent a problem than the previous testsr? @hawkw
Immediate discussion items:
"
is desired for env filter directivesEnvFilter
to fallEnvFilter::new
should do recovery along that axis/
target[span1,span2]=level
target[span{field1=value1, field2=value2}]
,
) should be done