-
Notifications
You must be signed in to change notification settings - Fork 105
index lastUpdated timestamps are only stored on leaf nodes #714
Comments
alternatively, at read time could we not omit all branches that have no children (or only other empty branches) ? |
This could be considerably more expensive to do at read time. Think about something with ten nodes, each node cardinality ten. Just thinking about doing a DFS on the initial Now, with regards to the update path. I did implement a tree based memory idx a couple months back to try to reduce the memory footprint. The performance was comparable (if I recall correctly). I abandoned it because I couldn't get the memory improvement because the entire series name was stored in the archive object anyway :( |
what's a dfs? note that this issue can be worked around by finding a pattern that includes all required nodes. |
I just meant doing a Depth-first search to try to find a leaf within the time boundaries could be very expensive for a large enough index. |
So are you saying basically that if a subtree hasn't received an update in a long time, we will go through too much effort going through the entire subtree only to figure out at the end that it can be left out? either way, it seems the tradeoff basically comes down to :
Typical workload has a much higher ingest rate than render request rate, so seems to me we should do it at read time. even with very high |
I think this also affects pruning. |
- this issue was discovered in discussions of issue grafana#714 - this also fixes grafana#797
When searching the index, we allow users to pass a "from" param. Only series that have been updated since the from timestamp are returned.
However we only filter leaf nodes and not branch nodes.
so if you have a series
a.b.c.d.e
that has not been updated in 24hours you will still get results when quering everything updated in the last hour with a query expression that matches a branch node.eg.
metrics/find?from=-1h&query=a.b.c.*
will return "a.b.c.d"but
metrics/find?from=-1h&query=a.b.c.d.*
wont return anything as a.b.c.d.e has not been updated for 24hours.to resolve this we will need to keep a lastUpdated attribute on branch nodes. This would not be hard to implement but having to walk down the tree on every update (which happens every time a metrics is received) could be quite expensive.
The text was updated successfully, but these errors were encountered: