Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can it send job level metrics only? #521

Open
jduan-highnote opened this issue Nov 24, 2022 · 5 comments
Open

Can it send job level metrics only? #521

jduan-highnote opened this issue Nov 24, 2022 · 5 comments

Comments

@jduan-highnote
Copy link

When the collect-job-metrics flag is set to true, metrics at the job level as well as step level are sent to datadog. This doesn't work well if a workflow has a lot of jobs and steps. For a large workflow we have, not every job & step level metric is sent to datadog because I think there's a limit.

Can this flag be broken down to two flags?

  • collect-job-metrics (collect job-level metrics only)
  • collect-step-metrics (collect step-level metrics only)

That way, people can choose what they want. Thanks!

@jduan-highnote
Copy link
Author

BTW, I've seen this error as well Error: HTTP-Code: 413 Message: {"errors":["Payload too large"]}. It can be avoided if there's a more granular configuration of what metrics to send.

@int128
Copy link
Owner

int128 commented Nov 26, 2022

It seems Datadog metrics API has 10MB limit.
open-telemetry/opentelemetry-collector-contrib#1925

I will add a flag of collect-step-metrics. It would be also effective for the custom metrics cost of Datadog.

I think it is possible to send the metrics by multiple requests. I will try it as well.

@int128
Copy link
Owner

int128 commented Nov 26, 2022

According to the API doc https://datadoghq.dev/datadog-api-client-typescript/classes/v1.MetricsApi.html#submitMetrics, the maximum payload size is 3.2MB.

@jduan-highnote
Copy link
Author

Thanks for fixing this so quickly!

@jduan-highnote
Copy link
Author

@int128 quick follow up: I have a very large workflow that has many jobs (actually the number of jobs is dynamic). It seems that job-level metrics are capped at 399? I see this in the log Sending 399 metrics to Datadog. Due to this limit, some of the job metrics aren't sent. Can all the job metrics be sent in batches?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants