-
Notifications
You must be signed in to change notification settings - Fork 105
do not allow indefinite write lock during deletes #1897
Conversation
for _, node := range found { | ||
tl.Wait() | ||
lockStart := time.Now() | ||
bc = m.Lock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I'm misunderstanding something here, but it seems like the write lock is being acquired and released in each iteration. Perhaps it should only be released when it has used up its time slice?
Repeatedly acquiring write locks is very expensive in the memory index.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The intent here is to hold write locks for as short a period as possible. Additionally, the duration reported to the TimeLimiter for how long the lock has been held includes the time spent waiting for all existing RLocks to be released. In general, threads will be blocked for MAX(RLock duration) + Lock duration.
If all RLocks are held for a very short amount of time, then this current approach will work well. But as @shanson7
points out, if the RLocks are held for long periods then trying to acquire lots of Locks, no matter how fast they are, is going to result in low throughput due to threads spending all their time being blocked.
I think we need both approaches here. We still need to rate limit the how long we are blocking reads for, but we should also perform deletes in batches to reduce the number of locks needed. We dont want a single write lock to be held for the full 200ms, but holding it for 5ms or less would be fine. 5ms is a really long time, and most delete operations will complete in this time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am still experimenting with deleting by leaf. I don't think this current approach solves the entire problem set. If someone tries to delete toplevel.*
when they have a lot of child nodes it will still lock up the index since the current method would recursively call all the way down the tree.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is correct, deleting *
would be even worse.
We definitely need the delete call to find all leaf nodes to be deleted while only holding read locks. Then delete these in batches with write locks.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
Changes the behavior of delete operations to prevent them from completely locking up the index for extended periods of time
I initially tried getting all the leaves and deleting one by one, but this proved to be far too slow. I have removed that code but you can see in the commits.
Will this be enough to alleviate the issues of locking the index for too long though?
Fixes: #1885