-
-
Notifications
You must be signed in to change notification settings - Fork 147
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
console-api,console-subscriber,tokio-console: implement stack traces for tasks #363
base: main
Are you sure you want to change the base?
Conversation
console-api/proto/common.proto
Outdated
@@ -78,6 +78,8 @@ message Span { | |||
repeated Field fields = 3; | |||
// Timestamp for the span. | |||
google.protobuf.Timestamp at = 4; | |||
// An Id that uniquely identifies this span's parent (if any). | |||
optional SpanId parent_id = 5; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in proto3, every field (that is a Message) is optional by default, so you don't need the optional
keyword
|
||
message CloseDirect { | ||
Span span = 1; | ||
optional Span direct_cause = 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in proto3, every field (that is a Message) is optional by default, so you don't need the optional
keyword
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huh, that's unfortunate, because here it's semantically significant whether a direct_cause
is provided or not — some Span
s don't have parents.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that’s fine, it will be wrapped in Optional in the generated code, so only if the field is included explicitly will it show up as Some(…)
|
||
message CloseCyclic { | ||
Span span = 1; | ||
optional Span direct_cause = 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in proto3, every field (that is a Message) is optional by default, so you don't need the optional
keyword
causality: vec![], | ||
}; | ||
updates_and_watchers.retain(|(updates, watch)| { | ||
let updates = crate::consequences::updates_into_proto(updates); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tiny nit: you don't need crate::
here
let details = proto::tasks::TaskDetails { | ||
causality: updates, | ||
..details.clone() | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe make details
mutable so we can just update it and avoid the clone? or flip the order of the code so that we have the updates variable available before constructing details
use tracing_causality as causality; | ||
use tracing_subscriber::Registry; | ||
|
||
fn into_proto_span(span: causality::Span) -> proto::consequences::Span { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you import all the proto stuff but alias them to prefix proto
, like protoSpan
, it could look a bit cleaner
// XXX(jswrenn): A new thread is spawned solely for the sake of | ||
// getting access to the global subscriber. | ||
let (trace, updates) = std::thread::spawn(move || { | ||
// XXX(jswrenn): Need to gracefully handle the case where the | ||
// task span is already closed. | ||
consequences::trace(&span, buffer) | ||
.expect("the subscriber isn't a Registry") | ||
.expect("id is already closed") | ||
}) | ||
.join() | ||
.unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to find a better alternative to this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I found an alternative to this! It's just blocked on tokio-rs/tracing#2247.
Extant extant = 1; | ||
OpenDirect open_direct = 2; | ||
NewIndirect new_indirect = 3; | ||
CloseDirect close_direct = 4; | ||
CloseIndirect close_indirect = 5; | ||
CloseCyclic close_cyclic = 6; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's add comments to these explaining what they mean, please? the terminology here is not immediately obvious to me, especially because it's not part of the terminology used in the tracing
documentation; it would be good to explain what we mean when we say "direct" and "indirect", for example.
@@ -49,6 +50,9 @@ message TaskDetails { | |||
|
|||
// HdrHistogram.rs `Histogram` serialized to binary in the V2 format | |||
optional bytes poll_times_histogram = 3; | |||
|
|||
// What the task is currently busy doing. | |||
repeated consequences.Causality causality = 4; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i don't love calling this field causality
--- that suggests to me that a task's causality
is that which caused the task to be spawned. i might call this something like spans
or trace
or something, since it's specifically a list of the span trees that the task is in, i believe?
console-subscriber/examples/deep.rs
Outdated
@@ -0,0 +1,44 @@ | |||
use tokio::task; | |||
|
|||
#[tokio::main] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would be nice to have a doc comment somewhere in here explaining what this example demonstrates.
console-subscriber/examples/deep.rs
Outdated
|
||
let subscriber = tracing_subscriber::registry() | ||
.with(console_subscriber::spawn()) | ||
.with(tracing_subscriber::filter::filter_fn(|_| true)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this could alternatively be LevelFilter::TRACE
to enable everything explicitly; either is fine by me though.
let task_ref = self.tasks_state.task(task_id).unwrap(); | ||
let task_ref = task_ref.upgrade().unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think if we get an update for a task whose ID we've never seen before, we probably shouldn't panic --- it means the server almost certainly did something wrong, but it would probably be better for the console CLI to print some kind of error and keep going, skipping that update, instead of crashing...
if let None = fresh.causality { | ||
fresh.causality = Some(tracing_causality::Trace::with_root( | ||
tracing_causality::Span { | ||
id: tracing::Id::from_u64(span_id), | ||
metadata: meta_id, | ||
}, | ||
)); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style nit, take it or leave it: this is essentially a reimplementation of Option::get_or_insert_with
:
if let None = fresh.causality { | |
fresh.causality = Some(tracing_causality::Trace::with_root( | |
tracing_causality::Span { | |
id: tracing::Id::from_u64(span_id), | |
metadata: meta_id, | |
}, | |
)); | |
} | |
fresh.causality.get_or_insert_with(|| tracing_causality::Trace::with_root( | |
tracing_causality::Span { | |
id: tracing::Id::from_u64(span_id), | |
metadata: meta_id, | |
}, | |
)); |
if let Some(trace) = fresh.causality { | ||
fresh.causality = trace.apply(update); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit, take it or leave it: this could be:
if let Some(trace) = fresh.causality { | |
fresh.causality = trace.apply(update); | |
} | |
fresh.causality = fresh.causality.and_then(|trace| trace.apply(update)); |
impl Display for Location { | ||
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result { | ||
if let Some(file) = &self.file { | ||
write!(f, "{file}")?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the use of this syntax increases the tokio-console
crate's minimum Rust dependency, so we'll either have to update the MSRV in Cargo.toml and on CI, or use the old syntax.
|
||
/// Render the consequences as an ascii tree. | ||
fn make_stack_widget(&self, _styles: &view::Styles, state: &State) -> Text<'static> { | ||
// XXX(jswrenn): also implement an ASCII-only rendering mode |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think it's fine to just not display the stack in ASCII-only mode, if it's difficult to implement an ASCII-only rendering for this. we already don't display the poll duration histogram in ASCII-only mode...
@@ -497,7 +571,7 @@ impl Attribute { | |||
} | |||
} | |||
|
|||
// === impl Ids === | |||
// === impl Ids ===a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
?
// === impl Ids ===a | |
// === impl Ids === |
@@ -341,3 +377,53 @@ fn dur(styles: &view::Styles, dur: std::time::Duration) -> Span<'static> { | |||
// have to be given a string in order to do layout stuff? | |||
styles.time_units(format!("{:.prec$?}", dur, prec = DUR_PRECISION)) | |||
} | |||
|
|||
fn display<W: fmt::Write>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we maybe give this a more descriptive name? this is specifically for displaying stack elements, right? so maybe something that indicates that?
alternatively, we could make it an inner function of the make_stack_widget
function, i suppose.
display(&mut buf, state, causality, &root, true, "").unwrap(); | ||
} else { | ||
buf.push_str( | ||
"An unexpected error prevents us from showing consequences for this task.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
under what circumstances would this be the case?
move |(consequence, cause)| tracing_causality::Update::OpenDirect { | ||
cause: cause, | ||
consequence: span_from_pb(consequence), | ||
}, | ||
); | ||
let indirect_consequences = indirect_consequences.into_iter().zip(repeat(cause)).map( | ||
move |(consequence, cause)| tracing_causality::Update::NewIndirect { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is moved into these closures? AFAICT the move
keyword is unneeded:
move |(consequence, cause)| tracing_causality::Update::OpenDirect { | |
cause: cause, | |
consequence: span_from_pb(consequence), | |
}, | |
); | |
let indirect_consequences = indirect_consequences.into_iter().zip(repeat(cause)).map( | |
move |(consequence, cause)| tracing_causality::Update::NewIndirect { | |
|(consequence, cause)| tracing_causality::Update::OpenDirect { | |
cause: cause, | |
consequence: span_from_pb(consequence), | |
}, | |
); | |
let indirect_consequences = indirect_consequences.into_iter().zip(repeat(cause)).map( | |
|(consequence, cause)| tracing_causality::Update::NewIndirect { |
i'm assuming this was because cause
was captured into the closures at one point before the zip(repeat(cause))
thing?
// XXX(jswrenn): A new thread is spawned solely for the sake of | ||
// getting access to the global subscriber. | ||
let (trace, updates) = std::thread::spawn(move || { | ||
// XXX(jswrenn): Need to gracefully handle the case where the | ||
// task span is already closed. | ||
consequences::trace(&span, buffer) | ||
.expect("the subscriber isn't a Registry") | ||
.expect("id is already closed") | ||
}) | ||
.join() | ||
.unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is quite unfortunate, i'd like to figure out a better solution for this. at the very least, we could probably do this a single time when the aggregator task is spawned, and clone the Dispatch
, so that we don't have to do this every subsequent time we want to get details for a task...
@@ -34,6 +34,7 @@ tonic = { version = "0.7", features = ["transport"] } | |||
futures = "0.3" | |||
tui = { version = "0.16.0", default-features = false, features = ["crossterm"] } | |||
tracing = "0.1" | |||
tracing-causality = { path = "../../tracing-causality" } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can this be changed to a Git dependency rather than a path dependency, at least? i'd really like to be able to actually try out this branch without having to check that out.
…for tasks The task detail view of console now shows the tree of extant consequences stemming from that task; i.e., which spans are open beneath the task. The leaves of the task, in a sense, reflect what the task is busy doing at that moment.
fn on_register_dispatch(&self, dispatch: &Dispatch) { | ||
let tracer = consequences::Tracer::from_dispatch::<S>(dispatch); | ||
if let Err(_) = self.shared.tracer.set(tracer) { | ||
// XXX(jswrenn): This layer has seemingly been supplied *multiple* | ||
// subscribers. Is that possible? I guess you can call | ||
// `Layer::with_subscriber` multiple times; what happens if you | ||
// install the resulting `Layered` as a subscriber? Is `Arc`ing a | ||
// `Layered<ConsoleLayer, S>` also a recipe for weirdness? | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The on_register_dispatch
method defined by tokio-rs/tracing#2269 is implemented here, to construct a Tracer
(i.e., a type that defines a trace
method that consumes a span ID and produces a trace of its consequences), which is then made available to the Aggregator
thread (the part of console_subscriber that actually handles the rpcs for reporting traces).
// XXX(jswrenn): This layer has seemingly been supplied *multiple* | ||
// subscribers. Is that possible? I guess you can call | ||
// `Layer::with_subscriber` multiple times; what happens if you | ||
// install the resulting `Layered` as a subscriber? Is `Arc`ing a | ||
// `Layered<ConsoleLayer, S>` also a recipe for weirdness? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a Layer
implementation for Arc<L: Layer<_>>
is the only way this could be called twice. Fortunately, there is no such impl in the current release version
@jswrenn i believe the requisite upstream changes in |
The task detail view of console now shows the tree of extant consequences stemming from that task; i.e., which spans are open beneath the task. The leaves of the task, in a sense, reflect what the task is busy doing at that moment.
I'm opening this as a draft PR because several issues must be resolved before merging; most importantly:
tracing-causality
should not be a git dependencyPartialEq
,Eq
forMetadata
,FieldSet
tracing#2229 getting backported+released to tracing 0.1.x, or me writing an equality shim