Add an extract_zipkin_attrs_from_headers() helper #143

dbishop · 2020-02-22T01:19:42Z

It turns out that extracting B3 headers correctly is harder than it
sounds. This helper function implements the standard as specified at:
https://github.com/openzipkin/b3-propagation

You can pass a dict or dict-like object to the new
extract_zipkin_attrs_from_headers() function, and get back a
ZipkinAttrs instance you can trust.

In some cases, a sample_rate float and a use_128bit_trace_id flag are
required, so they appear in the function signature as well.

This should take care of single-header extraction, which is part of
Issue #98.

This may help with Issue #73 (missing or empty X-B3-Sampled).

coveralls · 2020-02-22T01:30:15Z

Coverage remained the same at ?% when pulling 4daa04e on swiftstack:b3-header-extraction into bcd1ed9 on Yelp:master.

drolando

Thanks for doing this! Looks pretty good in general, I'd just get rid of the validation logic and simplify the tests by using pytest.mark.parameterize

py_zipkin/zipkin.py

tests/zipkin_test.py

py_zipkin/zipkin.py

tests/request_helpers_test.py

drolando · 2020-02-24T02:49:29Z

tests/request_helpers_test.py

+        }, sample_rate=88.2)
+        assert [] == mock_random.mock_calls
+
+    # non-defer debug-trace, no parent span


there's no d in here

I went back and double-checked https://github.com/openzipkin/b3-propagation and the "d" thing is only for single-header AFAICT.

For multi-header, it says only (no mention of "d" for Sampling State nor Debug Flag):

Sampling State An accept sampling decision is encoded as X-B3-Sampled: 1 and a deny as X-B3-Sampled: 0. Absent means defer the decision to the receiver of this header. For example, a Sampled header might look like: X-B3-Sampled: 1. Note: Before this specification was written, some tracers propagated X-B3-Sampled as true or false as opposed to 1 or 0. While you shouldn't encode X-B3-Sampled as true or false, a lenient implementation may accept them. Debug Flag Debug is encoded as X-B3-Flags: 1. Debug implies an accept decision, so don't also send the X-B3-Sampled header.

drolando · 2020-02-24T02:53:28Z

tests/request_helpers_test.py

+def test_extract_zipkin_attrs_from_headers_single(mock_random):
+    # b3={TraceId}-{SpanId}-{SamplingState}-{ParentSpanId}
+    # where the last two fields are optional.
+


Do we actually need all these tests to cover all the cases? There's a lot of them and they seem to repeat themselves.

I think the repeating part stems from the fat that we're basically implementing the same logic twice. Could we refactor away the common logic in request_helpers.py so that we only have to test it once? The only difference between the 2 afaict is whether all the fields are in a single header or in multiple ones.

So we could just have the logic to parse the headers t be separate and then call the same underlying function to generate the ZipkinAttrs

The specification is a little loose and I kept running into wacky under-specified edge cases. I think the volume of tests is justified by that.

Personally, I'd rather have overly-exhaustive tests than untested edge cases.

Because of the spec, it's not quite apples to apples. The single-header form is newer and drops some backwards-compat, the "d" thing is only in single-header, "flags" isn't a thing in single-header.

I guess those impedance mismatches could be smoothed over, but because of the spec, you'd really want the tests like they are to make sure that code is right and each side of the spec (multi-header/single-header) are correct.

Refactoring the implementation to have fewer lines seems reasonable for reasons like clarity, DRY, less bug surface area, etc., but I'm not sure having shorter tests is a great justification.

Let me see what the refactoring looks like...

@drolando That was a great idea, and I think the code cleaned up nicely. Please see latest commit.

I still the the tests are good, as they stand now, for the reasons I mentioned earlier.

ps only one flag value was ever defined, 1, so not sure how much under-specifying we'd need to do to clarify d is same as presence of that flag.. can we fix the spec somehow?

(I misspoke earlier and said "flags" isn't a thing in single-header; I meant "flags" isn't a thing in multi-header)

I'm not sure I understand what you're saying. I think the spec and the code of this PR are clear in that:

single-header b3 has a tri-state SamplingState segment with "1", "0", and "d" defined where "d" means "debug" and is like a "1" with a bit of additional semantic import.

multi-header has an ill-fated X-B3-Flags header where 1 means d (i.e. debug), and apparently no other flags are defined (not even 0 for "there are no flags" or "debug is not active"?)

As far as fixing the spec, I can only assume other folks have already coded to it and the cat is out of the bag? Like the true/false thing: I decided to go ahead and be lenient in both encodings here wrt true/false values of the sampling decision; after the refactoring, either both encodings accept them or neither do. I decided that being lenient would be better.

codefromthecrypt

I didn't look at the unit tests as I think they'd change if the feedback I made is accepted. especially the sample rate being invoked in a change like this seems something to undo.

py_zipkin/request_helpers.py

codefromthecrypt · 2020-02-25T06:02:18Z

py_zipkin/request_helpers.py

+    return parsed
+
+
+def extract_zipkin_attrs_from_headers(headers, sample_rate=100.0,


why does extracting use a sample rate? The result of extracting headers is just the data extracted.. turning the headers into a new context sure.. but why are the concerns conflated here?

Well, this is my call-site, which I'm pretty happy with:

# Note: this is a monkey-patch for eventlet.wsgi.HttpProtocol.handle_one_response # Also api.sample_rate_pct is a global for the process set via config file def _patched_handle_one_response(self): zipkin_attrs = api.extract_zipkin_attrs_from_headers( self.headers, sample_rate=api.sample_rate_pct, use_128bit_trace_id=True) ... with api.ezipkin_server_span( service_name=api.default_service_name(), span_name=self.command, zipkin_attrs=zipkin_attrs, sample_rate=None if zipkin_attrs else api.sample_rate_pct, ... ) as zipkin_span: ...

Is there something undesirable there that I'm missing?

in other codebases, we reuse code like you've written in scenarios that don't assume a specific sampling rate implementation. To make things more flexible, we compose the feature to extract the headers with whatever it is that needs it (ex applying a rate). By keeping these things separate, we can also clarify the unit tests which have little need to concern themselves with rate.

Interesting.

Like you, I don't maintain this codebase either. But this is a new helper-function. In the past, anyone using py_zipkin had to write their own functions to do this. So no existing users will be hampered by any lack of flexible composition here.

I think anyone needing that use-case could perform follow-on work here to add that complexity when it's necessary.

One of the things I liked about py_zipkin when looking for a python tracing library was how it did nearly all I wanted it to (subject to a couple open PRs, hehe), and not much more.

py_zipkin/request_helpers.py

codefromthecrypt · 2020-02-25T06:10:02Z

py_zipkin/request_helpers.py

+    else:
+        # sample flag missing; means "Defer" and we're responsible for
+        # rolling fresh dice
+        is_sampled = _should_sample(sample_rate)


I'd do this externally, especially as sampling like this is only one form of sampling. ex later it could be done via http parameters. having the sample rate code here is unnecessarily coupled.

Well, the ZipkinAttrs are a namedtuple, which is a tuple, which is immutable. If a ZipkinAttrs gets out of here with a is_sampled field that just needs to get filled in immediately, a whole new ZipkinAttrs would need to be created.

That just seems a bit byzantine to me...

I'll refer you to my example call-site above in another comment and ask again how that could be better? I'm pretty happy with it.

I'll leave this point up to @drolando as it is more about maintenance than anything else, and I don't maintain this particular code.

py_zipkin/util.py

dbishop · 2020-02-25T08:01:56Z

@adriancole thanks for the prompt review, I think validating in the parsers, moving the logging there, and using exceptions for flow control will simplify things nicely!

I'm working up a new commit for that now. 👍

codefromthecrypt · 2020-02-25T08:22:03Z

thanks for the work here and good luck!

dbishop · 2020-02-25T08:48:29Z

New commit up; I'll reiterate that based on how many edge cases got caught just in this clean-up, the level of testing here is quite appropriate.

It turns out that extracting B3 headers correctly is harder than it sounds. This helper function implements the standard as specified at: https://github.com/openzipkin/b3-propagation You can pass a dict or dict-like object to the new py_zipkin.request_helpers.extract_zipkin_attrs_from_headers() function, and get back a ZipkinAttrs instance you can trust. In some cases, a sample_rate float and a use_128bit_trace_id flag are required, so they appear in the function signature as well. This should take care of single-header extraction, which is part of Issue #98. This may help with Issue #73 (missing or empty X-B3-Sampled).

dbishop · 2020-02-29T00:42:15Z

I think this is cleanly rebased against black-formatted master branch. Please let me know if you want me to do anything else on this PR. I think the tests, though pretty long and exhaustive, did help guide implementation changes, refactorings, etc. and have great value for catching regressions.

drolando reviewed Feb 23, 2020

View reviewed changes

drolando reviewed Feb 24, 2020

View reviewed changes

codefromthecrypt reviewed Feb 25, 2020

View reviewed changes

drolando approved these changes Feb 29, 2020

View reviewed changes

drolando merged commit 930e445 into Yelp:master Feb 29, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add an extract_zipkin_attrs_from_headers() helper #143

Add an extract_zipkin_attrs_from_headers() helper #143

dbishop commented Feb 22, 2020

coveralls commented Feb 22, 2020 •

edited

Loading

drolando left a comment

drolando Feb 24, 2020

dbishop Feb 24, 2020

drolando Feb 24, 2020

dbishop Feb 24, 2020 •

edited

Loading

dbishop Feb 24, 2020

codefromthecrypt Feb 25, 2020

dbishop Feb 25, 2020

codefromthecrypt left a comment

codefromthecrypt Feb 25, 2020

dbishop Feb 25, 2020

codefromthecrypt Feb 25, 2020

dbishop Feb 25, 2020

codefromthecrypt Feb 25, 2020

dbishop Feb 25, 2020

codefromthecrypt Feb 25, 2020

dbishop commented Feb 25, 2020

codefromthecrypt commented Feb 25, 2020

dbishop commented Feb 25, 2020

dbishop commented Feb 29, 2020

		return parsed


		def extract_zipkin_attrs_from_headers(headers, sample_rate=100.0,

Add an extract_zipkin_attrs_from_headers() helper #143

Add an extract_zipkin_attrs_from_headers() helper #143

Conversation

dbishop commented Feb 22, 2020

coveralls commented Feb 22, 2020 • edited Loading

drolando left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dbishop Feb 24, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codefromthecrypt left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dbishop commented Feb 25, 2020

codefromthecrypt commented Feb 25, 2020

dbishop commented Feb 25, 2020

dbishop commented Feb 29, 2020

coveralls commented Feb 22, 2020 •

edited

Loading

dbishop Feb 24, 2020 •

edited

Loading