You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Aug 23, 2023. It is now read-only.
#606 probably fixed the lowest hanging fruit.
this may not be such high prio anymore, but we should still at least benchmark these operations and see if/when it becomes worth tackling.
The text was updated successfully, but these errors were encountered:
would it make sense to add a metric that takes the time right before .Rlock()/.Lock() calls and after them? Then we could record how long the code waited for the lock. Or do you think this would cause too much additional overhead because those locks are acquired very often?
As a metric it would certainly be very interesting to see how long we're waiting to acquire the index lock
we already measure how long index operations take. i think lock-hold times is too low-level to be consistently monitored at runtime. I think that's just something to look into on as as-needed basis, probably in a dev environment.
slow pruning is starting to affect some of our largest customers. groupByNode(consolidateBy(metrictank.stats.$environment.$instance.idx.*.prune.latency.{p90,max}.gauge32, 'max'), 8, 'maxSeries') shows >10s prune durations
this blocks ingestion and queries. and we need to optimize this.
when i got some stacktraces, the only active lines were the lines 508 and 512 of idx/memory/memory.go
(version 41d6eaa) which are the 2 log.Debug statements below
if len(bNode.Children) > 1 {
newChildren := make([]string, 0, len(bNode.Children)-1)
for _, child := range bNode.Children {
if child != nodes[i] {
newChildren = append(newChildren, child)
} else {
log.Debug("memory-idx: %s removed from children list of branch %s", child, bNode.Path)
}
}
bNode.Children = newChildren
log.Debug("memory-idx: branch %s has other children. Leaving it in place", bNode.Path)
// no need to delete any parents as they are needed by this node and its
// remaining children
break
}
so i expect optimizing those log.Debug statements should make this a multiple times faster already
certain operations have no upper bound on how long they can lock the index, which may stall ingest and delay responding to http requests:
see also #514
#606 probably fixed the lowest hanging fruit.
this may not be such high prio anymore, but we should still at least benchmark these operations and see if/when it becomes worth tackling.
The text was updated successfully, but these errors were encountered: