-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Add initial support for opentelemetry tracing in maestro server #253
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Vu Dinh <[email protected]>
Signed-off-by: Vu Dinh <[email protected]>
@@ -49,3 +54,7 @@ func GetOperationID(ctx context.Context) string { | |||
} | |||
return "" | |||
} | |||
|
|||
func operationIDAttribute(id string) attribute.KeyValue { | |||
return attribute.String(string(OpIDKey), id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The op ID here is "opID" which is different from the "opid" on CS side. I wonder if this is gonna be an issue here. @frzifus
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Y, i think i would wrap it with a toLower
to be on the safe side.
@@ -47,6 +49,10 @@ func (s *apiServer) routes() *mux.Router { | |||
mainRouter := mux.NewRouter() | |||
mainRouter.NotFoundHandler = http.HandlerFunc(api.SendNotFound) | |||
|
|||
if common.TracingEnabled() { | |||
mainRouter.Use(otelmux.Middleware("serve")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets use proto + method here as spanname.
@@ -0,0 +1,99 @@ | |||
package common |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should place the initialization to a place where it can be reused by the frontend too. Wdyt?
E.g. here: https://github.com/Azure/ARO-HCP/tree/main/internal
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agreed but should it be somewhere else instead of "internal" package? Internal pkg usually means it is not exportable or should not be exportable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
github.com/openshift-online/ocm-common might be the best option since it's already imported by https://gitlab.cee.redhat.com/service/uhc-clusters-service/ too. And it should be ok to import it in https://github.com/Azure/ARO-HCP/ too.
@@ -0,0 +1,99 @@ | |||
package common |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
github.com/openshift-online/ocm-common might be the best option since it's already imported by https://gitlab.cee.redhat.com/service/uhc-clusters-service/ too. And it should be ok to import it in https://github.com/Azure/ARO-HCP/ too.
@@ -17,9 +20,11 @@ const OpIDHeader OperationIDKey = "X-Operation-ID" | |||
func OperationIDMiddleware(handler http.Handler) http.Handler { | |||
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { | |||
ctx := WithOpID(r.Context()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it looks like the middleware will always generate its own operation ID. Is there a reason why it shouldn't try to read the value from the X-Operation-ID
header to facilitate the correlation between CS and Maestro?
I'm also surprised that Maestro doesn't seem to log the operation ID. IIUC it's only reported to the client in the HTTP response when an error occurs:
maestro/pkg/handlers/framework.go
Lines 32 to 41 in adb7e90
func handleError(ctx context.Context, w http.ResponseWriter, err *errors.ServiceError) { | |
log := logger.NewOCMLogger(ctx) | |
operationID := logger.GetOperationID(ctx) | |
// If this is a 400 error, its the user's issue, log as info rather than error | |
if err.HttpCode >= 400 && err.HttpCode <= 499 { | |
log.Infof(err.Error()) | |
} else { | |
log.Error(err.Error()) | |
} | |
writeJSONResponse(w, err.HttpCode, err.AsOpenapiError(operationID)) |
if err != nil { | ||
log.Error(fmt.Sprintf("Can't initialize OpenTelemetry trace provider: %v", err)) | ||
os.Exit(1) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it seems duplicated code?
if err != nil { | |
log.Error(fmt.Sprintf("Can't initialize OpenTelemetry trace provider: %v", err)) | |
os.Exit(1) | |
} |
@@ -119,6 +121,16 @@ func NewAPIServer(eventBroadcaster *event.EventBroadcaster) Server { | |||
)(mainHandler) | |||
|
|||
mainHandler = removeTrailingSlash(mainHandler) | |||
mainHandler = traceAttributeMiddleware(mainHandler) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like we did for CS, we can add more attributes to the span in a follow-up PR to keep things simple.
No description provided.