Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Add initial support for opentelemetry tracing in maestro server #253

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

dinhxuanvu
Copy link

No description provided.

@dinhxuanvu dinhxuanvu changed the title Add initial support for opentelemetry tracing in maestro server WIP: Add initial support for opentelemetry tracing in maestro server Feb 12, 2025
@dinhxuanvu
Copy link
Author

/cc @frzifus @simonpasquier

@@ -49,3 +54,7 @@ func GetOperationID(ctx context.Context) string {
}
return ""
}

func operationIDAttribute(id string) attribute.KeyValue {
return attribute.String(string(OpIDKey), id)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The op ID here is "opID" which is different from the "opid" on CS side. I wonder if this is gonna be an issue here. @frzifus

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Y, i think i would wrap it with a toLower to be on the safe side.

@@ -47,6 +49,10 @@ func (s *apiServer) routes() *mux.Router {
mainRouter := mux.NewRouter()
mainRouter.NotFoundHandler = http.HandlerFunc(api.SendNotFound)

if common.TracingEnabled() {
mainRouter.Use(otelmux.Middleware("serve"))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets use proto + method here as spanname.

@@ -0,0 +1,99 @@
package common
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should place the initialization to a place where it can be reused by the frontend too. Wdyt?

E.g. here: https://github.com/Azure/ARO-HCP/tree/main/internal

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agreed but should it be somewhere else instead of "internal" package? Internal pkg usually means it is not exportable or should not be exportable.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

github.com/openshift-online/ocm-common might be the best option since it's already imported by https://gitlab.cee.redhat.com/service/uhc-clusters-service/ too. And it should be ok to import it in https://github.com/Azure/ARO-HCP/ too.

@@ -0,0 +1,99 @@
package common

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

github.com/openshift-online/ocm-common might be the best option since it's already imported by https://gitlab.cee.redhat.com/service/uhc-clusters-service/ too. And it should be ok to import it in https://github.com/Azure/ARO-HCP/ too.

@@ -17,9 +20,11 @@ const OpIDHeader OperationIDKey = "X-Operation-ID"
func OperationIDMiddleware(handler http.Handler) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
ctx := WithOpID(r.Context())

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it looks like the middleware will always generate its own operation ID. Is there a reason why it shouldn't try to read the value from the X-Operation-ID header to facilitate the correlation between CS and Maestro?

I'm also surprised that Maestro doesn't seem to log the operation ID. IIUC it's only reported to the client in the HTTP response when an error occurs:

func handleError(ctx context.Context, w http.ResponseWriter, err *errors.ServiceError) {
log := logger.NewOCMLogger(ctx)
operationID := logger.GetOperationID(ctx)
// If this is a 400 error, its the user's issue, log as info rather than error
if err.HttpCode >= 400 && err.HttpCode <= 499 {
log.Infof(err.Error())
} else {
log.Error(err.Error())
}
writeJSONResponse(w, err.HttpCode, err.AsOpenapiError(operationID))

Comment on lines +91 to +94
if err != nil {
log.Error(fmt.Sprintf("Can't initialize OpenTelemetry trace provider: %v", err))
os.Exit(1)
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems duplicated code?

Suggested change
if err != nil {
log.Error(fmt.Sprintf("Can't initialize OpenTelemetry trace provider: %v", err))
os.Exit(1)
}

@@ -119,6 +121,16 @@ func NewAPIServer(eventBroadcaster *event.EventBroadcaster) Server {
)(mainHandler)

mainHandler = removeTrailingSlash(mainHandler)
mainHandler = traceAttributeMiddleware(mainHandler)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like we did for CS, we can add more attributes to the span in a follow-up PR to keep things simple.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants