Best practises for collecting metrics and statistics #1542
Replies: 2 comments 5 replies
-
Time series vs. accumulated metricsProblem definitionMetrics as per definition of prometheus are time-series data, meaning that they record the value of something at specific points in time and persist that value. An easy to understand example is the temperature measured by a sensor. This value changes over time (can go up and down, thus best modeled by a gauge) and most importantly, historical values don't affect future values. Looking at our current dashboard, most of what we are interested in doesn't naturally fit this definition. For example, the total open position size is an accumulated value that changes with every CFD. Technically it fits the definition of a gauge. However, metrics are reset to 0 on startup. Possible solutions
We can either query the database from within the
At least for counters, we can use the Unfortunately, this does not work for
This should work but feels hacky. Metrics are by design in-memory in prometheus so there should be a way of handling these resets in a general way.
These two counters could use |
Beta Was this translation helpful? Give feedback.
-
After watching https://www.youtube.com/watch?v=67Ulrq6DxwA, it became clear to me that this isn't really a possible way forward. The speaker mentioned like 10 times that metrics are inaccurate by design and anything that needs accuracy should use logs (or something else). In general, it seems like prometheus by itself is not designed for reporting on aspects of the domain like the closed position size. I wonder if there is something better that we can plug into grafana? Otherwise, having a daemon that separately queries a shared database (like postgres) would be an option that allows us to reduce the load on the actual system. |
Beta Was this translation helpful? Give feedback.
-
This discussion is meant to serve as a place where we can collect ideas around how to best collect metrics. I'll create a separate thread for each problem to not clutter things too much. I believe the current state of affairs is worth improving.
I think it will beneficial in overall maintenance if we can push as much as possible into prometheus instead of requiring another daemon to collect those metrics.
Beta Was this translation helpful? Give feedback.
All reactions