-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add open telemetry metrics and traces #3404
Conversation
internal/db/db.go
Outdated
@@ -39,7 +39,8 @@ import ( | |||
) | |||
|
|||
var ( | |||
log = corelog.NewLogger("db") | |||
log = corelog.NewLogger("db") | |||
tracer = otel.Tracer("github.com/sourcenetwork/defradb/internal/db") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thought: I think I agree with John's suggestion of extracting this out to a thin package so we can avoid scattering references throughout the code base to the same 3rd party package.
If we ever swap it out we'll probably still need to change the func signatures (I wouldn't put much thought into designing them), but at least everything will be in the same place, and it makes adding/standardising any middleware much easier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update: the metrics were all moved to the internal metric
package and put behind a build flag telemetry
.
internal/db/db.go
Outdated
@@ -212,6 +213,9 @@ func (db *DB) AddPolicy( | |||
ctx context.Context, | |||
policy string, | |||
) (client.AddPolicyResult, error) { | |||
ctx, span := tracer.Start(ctx, "AddPolicy") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: I do not know about performance/C-GO, but we can probably avoid having to manually specify "AddPolicy"
by using the runtime package and stuff like runtime.CallersFrames instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can look into that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update: I was able to get this working. Thanks for the suggestion!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I am missing something, but I still see the manual specification.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The preview above is outdated. Should be fixed in the actual code.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #3404 +/- ##
===========================================
+ Coverage 78.27% 78.34% +0.07%
===========================================
Files 393 395 +2
Lines 36113 36286 +173
===========================================
+ Hits 28266 28426 +160
- Misses 6187 6194 +7
- Partials 1660 1666 +6
Flags with carried forward coverage won't be shown. Click here to find out more.
... and 15 files with indirect coverage changes Continue to review full report in Codecov by Sentry.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
praise: This looks really good considering my natural aversion to it, good job at minimising it's impact on readability.
I have not really reviewed where you have added the tracer (i.e. all the tracer.Start
calls), as I would prefer to not have any by default and instead add them in when we really feel the need to (like debug logs) - I will let someone else review those.
todo: Please add a job to the CI test matrix so that we have one job that executes the tests with the tracer enabled.
@@ -46,7 +49,10 @@ func NewParser() (*parser, error) { | |||
return p, nil | |||
} | |||
|
|||
func (p *parser) BuildRequestAST(request string) (*ast.Document, error) { | |||
func (p *parser) BuildRequestAST(ctx context.Context, request string) (*ast.Document, error) { | |||
_, span := tracer.Start(ctx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: Why are you not assigning the return value to ctx
here? It looks like a bug-in-waiting. Did the compiler/linter complain?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it was a linter warning because the context is only used for creating the span. The reason for adding the context to the function signature is that it allows the creation of nested spans by using context values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah okay, thanks for the answer.
I think I have a slight preference to assigning it with a //nolint:foo comment as that would be less risky in case we later use ctx
in this function, but it is a bit ugly so please don't feel any pressure to change it if you aren't certain in your agreement :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do have a slightly stronger preference to leave as it was and not do the linter suppression. Sorry Keenan
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ended up reverting the change since more of us preferred the original.
cli/start.go
Outdated
if !cfg.GetBool("no-telemetry") { | ||
err := metric.ConfigureTelemetry(cmd.Context()) | ||
if err != nil { | ||
log.ErrorE("failed to configure telemetry", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thought: Perhaps define in cli/errors.go
(thought because its a log).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good stuff, just left some comments.
Please do add the CI test with the tracer enabled (doesn't have to be a matrix) just one workflow is good. You can do something like the test-view
or test-encryption
job in .github/workflows/test-coverage.yml
.
@@ -21,6 +21,7 @@ defradb start [flags] | |||
--max-txn-retries int Specify the maximum number of retries per transaction (default 5) | |||
--no-encryption Skip generating an encryption key. Encryption at rest will be disabled. WARNING: This cannot be undone. | |||
--no-p2p Disable the peer-to-peer network synchronization system | |||
--no-telemetry telemetry Disables telemetry reporting. Telemetry is only enabled in builds that use the telemetry flag. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: Is telemetry
next to --no-telemetry
supposed to be there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch! should be fixed now
.github/workflows/test-coverage.yml
Outdated
- name: Test coverage & save coverage report in an artifact | ||
uses: ./.github/composites/test-coverage-with-artifact | ||
with: | ||
coverage-artifact-name: "coverage_encryption" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue: This typo is very problematic as it will fail all reports due to duplicate artifact name (or overwrite which was the old behavior). Please fix the name to something relevant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice catch! should be fixed now
internal/metric/otel_test.go
Outdated
|
||
//go:build telemetry | ||
|
||
package metric |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: Feel free to rename the package to telemetry at this point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
internal/request/graphql/parser.go
Outdated
@@ -50,7 +50,7 @@ func NewParser() (*parser, error) { | |||
} | |||
|
|||
func (p *parser) BuildRequestAST(ctx context.Context, request string) (*ast.Document, error) { | |||
_, span := tracer.Start(ctx) | |||
ctx, span := tracer.Start(ctx) //nolint:ineffassign,staticcheck |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
todo: Why do you need to do this. 1) ctx
is not used anywhere and 2) it is ineffectual assignment.
I like what you had before.
Same for all other instances
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
He change it because Andy asked him to do so I think. I'm also not a fan of this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't seem like a strong preference by him, I commented on the original thread now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks Keenan!
Relevant issue(s)
Resolves #293
Resolves #74
Description
This PR adds OpenTelemetry metrics and tracing. Telemetry is only enabled if the
telemetry
build flag is set.Our default telemetry configuration uses the
http
exporter, but we can expand it togrpc
andconsole
if needed.The exporters are configured through the standardized OpenTelemetry environment variables found here:
https://pkg.go.dev/go.opentelemetry.io/otel/exporters/otlp/otlpmetric/otlpmetrichttp
https://pkg.go.dev/go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp
Tasks
How has this been tested?
Manually testing with Jaeger
Specify the platform(s) on which this was tested: