Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracer idle_time et queue_time (pool Ecto) dans AppSignal #3443

Closed
thbar opened this issue Sep 4, 2023 · 8 comments
Closed

Tracer idle_time et queue_time (pool Ecto) dans AppSignal #3443

thbar opened this issue Sep 4, 2023 · 8 comments
Assignees
Labels
ops Gestion des serveurs et de la production

Comments

@thbar
Copy link
Contributor

thbar commented Sep 4, 2023

Dans:

Je cherche à tracer deux métriques:

  • :idle_time - the time the connection spent waiting before being checked out for the query
  • :queue_time - the time spent waiting to check out a database connection

pour nous aider à diagnostiquer l'usage de notre pool Ecto.

J'ai fait une demande à AppSignal pour voir comment faire cela, si c'est faisable.

@thbar thbar added the ops Gestion des serveurs et de la production label Sep 4, 2023
@thbar thbar self-assigned this Sep 4, 2023
@thbar
Copy link
Contributor Author

thbar commented Sep 5, 2023

Réponse de AppSignal:

We currently do not extract those metrics out of the box but you can use our custom metrics feature to report those metrics to AppSignal and create a graph to keep an eye on them.

@thbar
Copy link
Contributor Author

thbar commented Sep 5, 2023

Je garde pour plus tard, le souci actuel porte plutôt sur la mémoire.

@thbar
Copy link
Contributor Author

thbar commented Sep 5, 2023

@thbar
Copy link
Contributor Author

thbar commented Sep 6, 2023

Voir aussi discussions intéressantes:

@thbar
Copy link
Contributor Author

thbar commented Sep 12, 2023

En travaillant à intégrer ça, je me suis déjà abonné à 100% des events sur le repo, et j'observe les logs suivants (pour me rendre compte de la volumétrie générale).

Je découpe "en blocs" de proximité temporelle proche, j'ai mis les timestamps et ça me permet de voir mieux:

[info] 2023-09-12T06:54:22.972415Z telemetry %{idle_time: 1370451066, query_time: 334867, queue_time: 57666, total_time: 392533}
[info] 2023-09-12T06:54:22.973634Z telemetry %{decode_time: 6082, query_time: 391824, total_time: 397906}
[info] 2023-09-12T06:54:22.974263Z telemetry %{decode_time: 4417, query_time: 267993, total_time: 272410}
[info] 2023-09-12T06:54:22.975276Z telemetry %{decode_time: 5292, query_time: 318700, total_time: 323992}
[info] 2023-09-12T06:54:22.975747Z telemetry %{query_time: 260243, total_time: 260243}

[info] 2023-09-12T06:54:23.978205Z telemetry %{idle_time: 1376373049, query_time: 246536, queue_time: 32291, total_time: 278827}
[info] 2023-09-12T06:54:23.978726Z telemetry %{decode_time: 2791, query_time: 218329, total_time: 221120}
[info] 2023-09-12T06:54:23.979044Z telemetry %{decode_time: 2792, query_time: 160204, total_time: 162996}
[info] 2023-09-12T06:54:23.979620Z telemetry %{decode_time: 2334, query_time: 199036, total_time: 201370}
[info] 2023-09-12T06:54:23.979890Z telemetry %{query_time: 168746, total_time: 168746}

[info] 2023-09-12T06:54:24.981411Z telemetry %{idle_time: 1379373476, query_time: 207870, queue_time: 100748, total_time: 308618}
[info] 2023-09-12T06:54:24.981989Z telemetry %{decode_time: 3542, query_time: 229994, total_time: 233536}
[info] 2023-09-12T06:54:24.982313Z telemetry %{decode_time: 2542, query_time: 148412, total_time: 150954}
[info] 2023-09-12T06:54:24.982818Z telemetry %{decode_time: 2125, query_time: 170912, total_time: 173037}
[info] 2023-09-12T06:54:24.983086Z telemetry %{query_time: 161496, total_time: 161496}

[info] 2023-09-12T06:54:25.984262Z telemetry %{idle_time: 1382449362, query_time: 208703, queue_time: 39874, total_time: 248577}
[info] 2023-09-12T06:54:25.984948Z telemetry %{decode_time: 3959, query_time: 330367, total_time: 334326}
[info] 2023-09-12T06:54:25.985545Z telemetry %{decode_time: 3292, query_time: 428323, total_time: 431615}
[info] 2023-09-12T06:54:25.986302Z telemetry %{decode_time: 3334, query_time: 218370, total_time: 221704}
[info] 2023-09-12T06:54:25.986566Z telemetry %{query_time: 162496, total_time: 162496}

Je vais affiner pour voir quels évènements prendre en compte.

@thbar
Copy link
Contributor Author

thbar commented Sep 12, 2023

J'ai fouillé un peu et j'ai bien l'impression que ces évènements qui arrivent 2 fois par secondes par paquets de 5 sont liés à Oban. Je vais voir pour les filtrer. J'ai posé la question sur le Slack associé.

I am adding Ecto telemetry to our app (using Oban), and noticed that every 500ms or so, there are ~5 queries happening even if the app does nothing, which go away if I disable Oban (presumably some plugin work or just the general behaviour of Oban?). I am looking into filtering those specific events to reduce the volume of metrics generated, without using a separate pool.
Did anyone figure out how to filter those? Thanks !

@thbar
Copy link
Contributor Author

thbar commented Sep 12, 2023

J'ai demandé à AppSignal comment étaient facturés les métriques custom, la réponse est chouette:

Custom metrics do not count towards your plan. Only requests and logging. Custom metrics are included.

github-merge-queue bot pushed a commit that referenced this issue Sep 18, 2023
* Track Ecto pool pressure via AppSignal (#3443)

* Improve documentation

* Update doc
@thbar
Copy link
Contributor Author

thbar commented Sep 21, 2023

C'est tracé, toutefois les chiffres me paraissent petits, peut-être un souci d'unités, mais on va considérer que c'est suffisant pour le moment.

@thbar thbar closed this as completed Sep 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ops Gestion des serveurs et de la production
Projects
None yet
Development

No branches or pull requests

1 participant