-
Notifications
You must be signed in to change notification settings - Fork 344
Unclear lifecycle of spans #312
Comments
Even as it's current state, the spec indirectly forbids scoping a span where you don't have control over the lifecycle of the span. Because calls to a finished span are not allowed/undefined (=may blow up the application). But the API makes no restrictions for Which means that code like this is not legal: https://github.com/opentracing-contrib/java-concurrent/blob/83d9aaa6ea4a8764fefc3809cadff539366aabd8/src/main/java/io/opentracing/contrib/concurrent/TracedRunnable.java#L24 //cc @pavolloffay |
I have implemented a proof of concept for this: #319 |
Hi @felixbarny. I agree that object pooling (and low overhead/GC-avoidance in general) is a worthy goal to pursue. But I'm concerned about safety. Right now, the intention in the spec is to say that calls after For races occurring between threads, I think it's fine to say that ordering is undefined – the user should not be interacting with spans in multiple threads simultaneously, and definitely should not be finishing the same span in multiple threads. The pseudocode in the table you posted would be an example of this. But the fallout for poorly written instrumentation should be poor data, not an unsafe program. My concern is a little different from whether calling SpanContext after finish should be allowed, btw. I'm just wondering if object pooling and other usecases for making these spec changes would result in unsafe instrumentation code (as opposed to code that records bad data). |
Not sure how you define unsafe in this context but the worst that can happen when calling methods after finish on a pooled span object is that for example 5 min later (when the pool wraps) a span contains a tag which was supposed to be set on a different span.
ATM, I think this is not defined anywhere. Why is it allowed at all to call finish multiple times? |
Undefined != forbidden. |
No,
That is in contrast to other span methods, like I think it would be great to not only have the semanic_specification.md but also a separate file which explains the rationale behind the decisions.
Throwing an exception is arguably a valid thing if the behavior is undefined. But of course that would be a silly thing to do in a tracer implementation. Again - object pooling does not break the application in any way it could just lead to bad data. |
The way the ScopeManager is currently defined makes the lifecycle of Span objects unclear. This complicates span object recycling aka object pooling.
Recycling objects is a way to reduce the allocation overhead by maintaining a pool of pre-allocated Span instances. Instead of instantiating a new span every time a span is started, a span is taken out of the object pool. When the Span is finished and has been reported out of process, the span instance is reset and put back into the object pool. Using this technique, it is possible to add almost no allocation overhead to the monitored application which means less GC overhead caused by the tracer.
IMO, the lifecycle of a Span should be very straightforward. When finish() is called, no other interactions should be allowed so it is safe to recycle and reuse the whole span instance.
There are two APIs which break this assumption.
Span.context
Quote from The spec of Span.finish:
IMO, the SpanContext should be retrieved before the span is finished if it is intended to be used after the span is finished. The implementations are supposed to return an object which is not tied to the span’s lifecycle, for example making a copy of the internal span context.
ScopeManager
Even in the current version of the spec, it “forbids” calls after
finish()
:Scoping a span in a different thread than the one it is finished from, indirectly breaks this requirement. In the second thread, there is no way to know whether the
activeSpan()
has already finished, as it has no control over the lifecycle of that span. So callingtracer.activeSpan().addTag("foo", "bar")
is never legal if the active span is finished in a different thread. The Span reference returned bytracer.activeSpan()
might already be recycled and used to record an entirely different operation.Example scenario:
For Java, the damage is probably not as big unless the tracer implements object pooling. I don’t have a lot of experience with C++, but don’t you have to exactly know the lifecycle of spans in order to be able to deallocate the span’s memory?
Suggestion
Change
To
Scope activate(SpanContext spanContext)
toScopeManager
Scope.span()
returnsnull
in this case.SpanContext spanContext()
toScope
Scope#activate(Span)
has been called, acts as a shortcut forScope.span().context()
For SpanBuilders which did not call
SpanBuilder#ignoreActiveSpan
, an implicitchild_of
reference will be created forScopeManager.active().spanContext()
contains aSpanContext
. IfScopeManager.active().span()
is notnull
, this is the preferred way to set the reference.//cc @adriancole @raphw
The text was updated successfully, but these errors were encountered: