Skip to content

Commit

Permalink
Track Ecto pool pressure via AppSignal (#3443)
Browse files Browse the repository at this point in the history
  • Loading branch information
thbar committed Sep 12, 2023
1 parent 2ccb1ed commit 5a0a394
Show file tree
Hide file tree
Showing 2 changed files with 72 additions and 0 deletions.
1 change: 1 addition & 0 deletions apps/transport/lib/transport/application.ex
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,7 @@ defmodule Transport.Application do
:ok = Transport.Jobs.ObanLogger.setup()

:ok = Transport.Telemetry.setup()
:ok = Transport.EctoTelemetry.setup()

opts = [strategy: :one_for_one, name: Transport.Supervisor]
Supervisor.start_link(children, opts)
Expand Down
71 changes: 71 additions & 0 deletions apps/transport/lib/transport/ecto_telemetry.ex
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
defmodule Transport.EctoTelemetry do
require Logger

@moduledoc """
A module receiving Ecto events and forwarding them (currently) to AppSignal, so that we can
chart the pool occupancy.
Reference: https://hexdocs.pm/ecto/Ecto.Repo.html#module-telemetry-events
Inspiration: https://github.com/appsignal/appsignal-elixir/issues/318#issuecomment-841156779
A few notes extracted from the doc (with extra comments):
* :idle_time - the time the connection spent waiting before being checked out for the query.
the higher the better ; if this gets low (close to 0), the pool is over-used (not good).
* :queue_time - the time spent waiting to check out a database connection.
the lower the better. if this gets too high, the pool is over-used (not good).
* :query_time - the time spent executing the query.
We do not track the following metrics for now, because they are less useful for pool optimization:
* :decode_time - the time spent decoding the data received from the database
* :total_time - the sum of (queue_time, query_time, and decode_time)️
"""

def handle_event([:db, :repo, :query], measurements, _metadata, _config) do
#
# NOTE: at time of writing, at AppSignal, "Custom metrics do not count towards your plan.
# Only requests and logging. Custom metrics are included.".
#
# If we need to reduce the volume of metrics (for billing or ops reasons), it appears that
# Oban emits roughly 5 calls every 500ms in development at least (setting WORKER=0) removes those events.
# If this is a problem, we can probably filter a part of them via metadata[:options][:oban_conf] which
# appears to be there, but for now I believe this gives useful information too.
#

case measurements do
%{queue_time: queue_time} ->
Appsignal.add_distribution_value("ecto.queue_time", System.convert_time_unit(queue_time, :native, :millisecond))

_ ->
nil
end

case measurements do
%{idle_time: idle_time} ->
Appsignal.add_distribution_value("ecto.idle_time", System.convert_time_unit(idle_time, :native, :millisecond))

_ ->
nil
end

case measurements do
%{query_time: query_time} ->
Appsignal.add_distribution_value("ecto.query_time", System.convert_time_unit(query_time, :native, :millisecond))

_ ->
nil
end
end

def setup do
Logger.info("Setting up telemetry for AppSignal + Ecto")

:telemetry.attach(
"transport-ecto",
# NOTE: the first two params are I believe mapped to `DB.Repo`
[:db, :repo, :query],
&Transport.EctoTelemetry.handle_event/4,
nil
)
end
end

0 comments on commit 5a0a394

Please sign in to comment.