-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mfs: clean cache on sync #751
Conversation
320bf6d
to
19419fb
Compare
Codecov ReportAttention: Patch coverage is
@@ Coverage Diff @@
## main #751 +/- ##
==========================================
+ Coverage 60.37% 60.45% +0.08%
==========================================
Files 245 245
Lines 31117 31101 -16
==========================================
+ Hits 18786 18802 +16
+ Misses 10658 10627 -31
+ Partials 1673 1672 -1
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGMT. One question.
This helps/solves a known problem, so this change does have positive value. Can we run this is cluster or somewhere to gain more confidence that there are not any negative behaviors with this change? |
There have been multiple reports of people complaining that MFS slows down to a crawl and triggers OOM events when adding many items to a folder. One underlying issue is that the MFS Directory has a cache which can grown unbounded. All items added to a directory are cached here even though they are already written to the underlying Unixfs folder on mfs writes. The cache serves Child() requests only. The purpose is the a Directory can then give updated information about its children without being "up to date" on its own parent: when a new write happens to a folder, the parent folder needs to be notified about the new unixfs node of their child. With this cache, this notification can be delayed until a Flush() is requested, one of the children triggers a bubbling notification, or the parent folder is traversed. When the parent folder is traversed, it will write cached entries of its childen to disk, including that of the current folder. In order to write the current folder it has triggered a GetNode() on it, which in turn has written all the cached nodes in the current folder to disk. This cascades. In principle, leafs do not have children and should not need to be rewritten to unixfs, but without caching the TestConcurrentWrites tests start failing. Caching provides a way for concurrent writes to access a single reference to a unixfs node, somehow making things work. Apart from cosmetic changes, this commits allows resetting the cache when it has been synced to disk. Other approaches have been tested, like fully removing the cache, or resetting it when it reaches certain size. All were insatisfactory in that MFS breaks in one way or other (i.e. when setting maxCacheSize to 1). Setting size to ~50 allows tests to pass, but just because no tests are testing with 50+1 items. This change does not break any tests, but after what I've seen, I can assume that concurrent writes during a Flush event probably result in broken expectations, and I wouldn't be surprised if there is a way to concatenate writes, lookups and flushes on references to multiple folders on the tree and trigger errors where things just work now. The main problem remains: the current setup essentially keeps all mfs in memory. This commit could allevaite the fact that reading the directory object triggers as many unixfs.AddChild() as nodes in the directory. After this, AddChild() would be triggered only on nodes cached since the last time. I don't find a reliable way of fixing MFS and I have discovered multiple gotchas into what was going to be a quick fix.
Flushing represents a sort of "I'm done" with the folder. It is less likely that old references to the folder are used after it has been flushed. As such, we take the chance to clean up the cache then.
b807e72
to
fbe0229
Compare
Ok, I have been checking... as said, the cache solves dealing with writes that have not been bubbled to mfs root and references which have not been renewed. The idea is that your cache has a reference that was used to make changes, so even if those changes aren't reflected in your MFS dir, since you looked-up a node using the cached version, things work. If the cache wasn't there, it would be more cumbersome to work with MFS (do something, bubble up, re-obtain up-to-date references). At the same time, the whole thing works because we are holding all the MFS structure in memory. Only the The whole thing is very hairy, as said, things are written to unixfs, dag-service... persisted, but changes are not really distributed along the whole tree all the time. One could think that MFS-changes could all be in memory until Flush(), at which point the unixfs nodes are updated and they are committed to the DAGService etc. but this is not the case. Anyways, what I've done now is to clear the directory cache on Flush(). I think this is relatively safe in that, flushing hopefully indicates that you are generally done with what you were doing. This does not break any tests, but also note that there is no general "flushing" of the MFS tree. Flushing of a file/folder notifies the parents so that they update references, but parents don't "flush" themselves. I have tried clearing the cache on all parents but this again introduces issues with the concurrent writes test which is writing on multiple goroutines and therefore contains references that become outdated unless cached up to the top of the tree. In Kubo, we call FlushPath which calls Flush() on a specific mfs node and not on the parent folder, so we have to discuss if we somehow introduce a general MFS free-mem flush. |
The usual pattern of Kubo's MFS user is to mutate a directory with @hsanjuan mind opening a PR draft in Kubo with boxo from this PR to test if legacy MFS tests are all passing there as well, just a precaution (iirc |
ipfs/kubo#10628 seems we're good on that front. I still have to test if my computer blows up when adding files. Will report back. |
What I mean, is that we should flush the parent dir. i.e. I have verified that with the changes in this PR only Kubo still leaks memory when running ~6000 |
There have been multiple reports of people complaining that MFS slows down to a crawl and triggers OOM events when adding many items to a folder.
One underlying issue is that the MFS Directory has a cache which can grown unbounded. All items added to a directory are cached here even though they are already written to the underlying Unixfs folder on mfs writes.
The cache serves Child() requests only. The purpose is the a Directory can then give updated information about its children without being "up to date" on its own parent: when a new write happens to a folder, the parent folder needs to be notified about the new unixfs node of their child. With this cache, this notification can be delayed until a Flush() is requested, one of the children triggers a bubbling notification, or the parent folder is traversed.
When the parent folder is traversed, it will write cached entries of its childen to disk, including that of the current folder. In order to write the current folder it has triggered a GetNode() on it, which in turn has written all the cached nodes in the current folder to disk. This cascades. In principle, leafs do not have children and should not need to be rewritten to unixfs, but without caching the TestConcurrentWrites tests start failing. Caching provides a way for concurrent writes to access a single reference to a unixfs node, somehow making things work.
Apart from cosmetic changes, this commits allows resetting the cache when it has been synced to disk. Other approaches have been tested, like fully removing the cache, or resetting it when it reaches certain size. All were insatisfactory in that MFS breaks in one way or other (i.e. when setting maxCacheSize to 1). Setting size to ~50 allows tests to pass, but just because no tests are testing with 50+1 items.
This change does not break any tests, but after what I've seen, I can assume that concurrent writes during a Flush event probably result in broken expectations, and I wouldn't be surprised if there is a way to concatenate writes, lookups and flushes on references to multiple folders on the tree and trigger errors where things just work now.
The main problem remains: the current setup essentially keeps all mfs in memory. This commit could allevaite the fact that reading the directory object triggers as many unixfs.AddChild() as nodes in the directory. After this, AddChild() would be triggered only on nodes cached since the last time.
I don't find a reliable way of fixing MFS and I have discovered multiple gotchas into what was going to be a quick fix.