-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG]: Delete collection resource leak (single-node Chroma) #3297
Open
tazarov
wants to merge
3
commits into
main
Choose a base branch
from
trayan-12-13-fix_delete_collection_resource_leak
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+78
−14
Open
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -78,10 +78,7 @@ def prepare_segments_for_new_collection( | |
|
||
@override | ||
def delete_segments(self, collection_id: UUID) -> Sequence[UUID]: | ||
# TODO: this should be a pass, delete_collection is expected to delete segments in | ||
# distributed | ||
segments = self._sysdb.get_segments(collection=collection_id) | ||
tazarov marked this conversation as resolved.
Show resolved
Hide resolved
|
||
return [s["id"] for s in segments] | ||
return [] # noop | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @HammadB, talked with @rohitcpbot and he mentioned that this should be noop, is this fine or should I revert back to the older version with distributed sysdb query? |
||
|
||
@trace_method( | ||
"DistributedSegmentManager.get_endpoint", | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rohitcpbot, this is the actual change as we discussed. rest is just
black
formatting changes.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @tazarov.
If possible leave a note with the following comment or similar -
"""
This call will delete segment related data that is stored locally and cannot be part of the atomic SQL transaction.
It is a NoOp for the distributed sysdb implementation.
Omitting this call will lead to leak of segment related resources.
"""
Can you answer something for me - If the process crashes immediately after self._manager.delete_segments(collection_id=existing[0].id)
Then the actual entries in SQL are not deleted, which means the collection is not deleted.
Now if user issues a Get or Query, will the local manager work correctly ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the fix to make local manager work in the above failure scenario is non trivial then we could leave a note here, and take it up as a separate task. But it will be good to know the state of the Db with above change.
The same scenario would had to be thought through even with your earlier changes of doing the local manager delete after the sysdb delete... where the sql could have gone through but the local manager did not because of a crash.. leading to a leak.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rohitcpbot, local manager has two segments for each collection:
segments
-chroma/chromadb/segment/impl/metadata/sqlite.py
Line 595 in d50a942
segments
dirSo here is a diagram to explain the point of failure:
The main problem as I see it in the current impl (with possible solutions):
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the only foolproof way to remove it all is possibly to wrap it all in a single transaction all the way from segment. Then if the physical dir removal fails we'll rollback the whole sqlite transaction.
As a side note, on Windows deleting the segment dir right after closing file handles frequently fails.