Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mfs: clean cache on sync #751

Merged
merged 3 commits into from
Dec 17, 2024
Merged

mfs: clean cache on sync #751

merged 3 commits into from
Dec 17, 2024

Conversation

hsanjuan
Copy link
Contributor

There have been multiple reports of people complaining that MFS slows down to a crawl and triggers OOM events when adding many items to a folder.

One underlying issue is that the MFS Directory has a cache which can grown unbounded. All items added to a directory are cached here even though they are already written to the underlying Unixfs folder on mfs writes.

The cache serves Child() requests only. The purpose is the a Directory can then give updated information about its children without being "up to date" on its own parent: when a new write happens to a folder, the parent folder needs to be notified about the new unixfs node of their child. With this cache, this notification can be delayed until a Flush() is requested, one of the children triggers a bubbling notification, or the parent folder is traversed.

When the parent folder is traversed, it will write cached entries of its childen to disk, including that of the current folder. In order to write the current folder it has triggered a GetNode() on it, which in turn has written all the cached nodes in the current folder to disk. This cascades. In principle, leafs do not have children and should not need to be rewritten to unixfs, but without caching the TestConcurrentWrites tests start failing. Caching provides a way for concurrent writes to access a single reference to a unixfs node, somehow making things work.

Apart from cosmetic changes, this commits allows resetting the cache when it has been synced to disk. Other approaches have been tested, like fully removing the cache, or resetting it when it reaches certain size. All were insatisfactory in that MFS breaks in one way or other (i.e. when setting maxCacheSize to 1). Setting size to ~50 allows tests to pass, but just because no tests are testing with 50+1 items.

This change does not break any tests, but after what I've seen, I can assume that concurrent writes during a Flush event probably result in broken expectations, and I wouldn't be surprised if there is a way to concatenate writes, lookups and flushes on references to multiple folders on the tree and trigger errors where things just work now.

The main problem remains: the current setup essentially keeps all mfs in memory. This commit could allevaite the fact that reading the directory object triggers as many unixfs.AddChild() as nodes in the directory. After this, AddChild() would be triggered only on nodes cached since the last time.

I don't find a reliable way of fixing MFS and I have discovered multiple gotchas into what was going to be a quick fix.

@hsanjuan hsanjuan requested a review from a team as a code owner December 12, 2024 15:26
@hsanjuan hsanjuan changed the title mfs: limit directory cache size mfs: clean cache on sync Dec 12, 2024
mfs/dir.go Show resolved Hide resolved
mfs/dir.go Show resolved Hide resolved
mfs/dir.go Show resolved Hide resolved
Copy link

codecov bot commented Dec 12, 2024

Codecov Report

Attention: Patch coverage is 94.44444% with 1 line in your changes missing coverage. Please review.

Project coverage is 60.45%. Comparing base (a83de68) to head (cbe230a).
Report is 4 commits behind head on main.

Files with missing lines Patch % Lines
mfs/root.go 0.00% 1 Missing ⚠️

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #751      +/-   ##
==========================================
+ Coverage   60.37%   60.45%   +0.08%     
==========================================
  Files         245      245              
  Lines       31117    31101      -16     
==========================================
+ Hits        18786    18802      +16     
+ Misses      10658    10627      -31     
+ Partials     1673     1672       -1     
Files with missing lines Coverage Δ
mfs/dir.go 53.79% <100.00%> (+0.89%) ⬆️
mfs/file.go 62.85% <100.00%> (-0.22%) ⬇️
mfs/root.go 41.53% <0.00%> (+4.55%) ⬆️

... and 12 files with indirect coverage changes

Copy link
Contributor

@gammazero gammazero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGMT. One question.

mfs/dir.go Outdated Show resolved Hide resolved
CHANGELOG.md Outdated Show resolved Hide resolved
@gammazero
Copy link
Contributor

This helps/solves a known problem, so this change does have positive value. Can we run this is cluster or somewhere to gain more confidence that there are not any negative behaviors with this change?

There have been multiple reports of people complaining that MFS slows down to a crawl and triggers OOM events when adding many items to a folder.

One underlying issue is that the MFS Directory has a cache which can grown unbounded. All items added to a directory are cached here even though they are already written to the underlying Unixfs folder on mfs writes.

The cache serves Child() requests only. The purpose is the a Directory can then give updated information about its children without being "up to date" on its own parent: when a new write happens to a folder, the parent folder needs to be notified about the new unixfs node of their child. With this cache, this notification can be delayed until a Flush() is requested, one of the children triggers a bubbling notification, or the parent folder is traversed.

When the parent folder is traversed, it will write cached entries of its childen to disk, including that of the current folder. In order to write the current folder it has triggered a GetNode() on it, which in turn has written all the cached nodes  in the current folder to disk. This cascades. In principle, leafs do not have children and should not need to be rewritten to unixfs, but without caching the TestConcurrentWrites tests start failing. Caching provides a way for concurrent writes to access a single reference to a unixfs node, somehow making things work.

Apart from cosmetic changes, this commits allows resetting the cache when it has been synced to disk. Other approaches have been tested, like fully removing the cache, or resetting it when it reaches certain size. All were insatisfactory in that MFS breaks in one way or other (i.e. when setting maxCacheSize to 1). Setting size to ~50 allows tests to pass, but just because no tests are testing with 50+1 items.

This change does not break any tests, but after what I've seen, I can assume that concurrent writes during a Flush event probably result in broken expectations, and I wouldn't be surprised if there is a way to concatenate writes, lookups and flushes on references to multiple folders on the tree and trigger errors where things just work now.

The main problem remains: the current setup essentially keeps all mfs in memory. This commit could allevaite the fact that reading the directory object triggers as many unixfs.AddChild() as nodes in the directory. After this, AddChild() would be triggered only on nodes cached since the last time.

I don't find a reliable way of fixing MFS and I have discovered multiple gotchas into what was going to be a quick fix.
Flushing represents a sort of "I'm done" with the folder.

It is less likely that old references to the folder are used after it has been
flushed. As such, we take the chance to clean up the cache then.
@hsanjuan
Copy link
Contributor Author

Ok,

I have been checking... as said, the cache solves dealing with writes that have not been bubbled to mfs root and references which have not been renewed. The idea is that your cache has a reference that was used to make changes, so even if those changes aren't reflected in your MFS dir, since you looked-up a node using the cached version, things work.

If the cache wasn't there, it would be more cumbersome to work with MFS (do something, bubble up, re-obtain up-to-date references). At the same time, the whole thing works because we are holding all the MFS structure in memory. Only the root has a method to clear up cache entries in the root dir (FlushMemFree) with big warnings around it.

The whole thing is very hairy, as said, things are written to unixfs, dag-service... persisted, but changes are not really distributed along the whole tree all the time. One could think that MFS-changes could all be in memory until Flush(), at which point the unixfs nodes are updated and they are committed to the DAGService etc. but this is not the case.

Anyways, what I've done now is to clear the directory cache on Flush(). I think this is relatively safe in that, flushing hopefully indicates that you are generally done with what you were doing. This does not break any tests, but also note that there is no general "flushing" of the MFS tree.

Flushing of a file/folder notifies the parents so that they update references, but parents don't "flush" themselves. I have tried clearing the cache on all parents but this again introduces issues with the concurrent writes test which is writing on multiple goroutines and therefore contains references that become outdated unless cached up to the top of the tree.

In Kubo, we call FlushPath which calls Flush() on a specific mfs node and not on the parent folder, so we have to discuss if we somehow introduce a general MFS free-mem flush.

@lidel
Copy link
Member

lidel commented Dec 17, 2024

The usual pattern of Kubo's MFS user is to mutate a directory with files cp|rm|mv, and then do ipfs files stat on it to learn new CID, so flushing to disk / cleaning cache of only current dir and not its parents should be fine.

@hsanjuan mind opening a PR draft in Kubo with boxo from this PR to test if legacy MFS tests are all passing there as well, just a precaution (iirc test/sharness/t0250-files-api.sh, test/sharness/t0040-add-and-cat.sh are not run in this repo).

@hsanjuan
Copy link
Contributor Author

@hsanjuan mind opening a PR draft in Kubo with boxo from this PR to test if legacy MFS tests are all passing there as well, just a precaution (iirc test/sharness/t0250-files-api.sh, test/sharness/t0040-add-and-cat.sh are not run in this repo).

ipfs/kubo#10628 seems we're good on that front.

I still have to test if my computer blows up when adding files. Will report back.

@hsanjuan
Copy link
Contributor Author

so flushing to disk / cleaning cache of only current dir and not its parents should be fine.

What I mean, is that we should flush the parent dir. i.e. ipfs files cp /ipfs/Qmxyz /mfs/myfile flushes /mfs/myfile by default, but this won't clear the cache of the /mfs folder.

I have verified that with the changes in this PR only Kubo still leaks memory when running ~6000 ipfs files cp operations and then starts freezing. However, if calling Flush on the parent folder, the memory consumption is greatly reduced and things no longer explode (there's a slowdown which I think I can attribute to the HAMT growing, making inserts more and more expensive). So perhaps we can do that? Move ahead with this and then ensure files cp/mv/write Flush the parent folder?

@hsanjuan hsanjuan enabled auto-merge December 17, 2024 17:12
@hsanjuan hsanjuan merged commit 6151dae into main Dec 17, 2024
15 checks passed
@gammazero gammazero deleted the fix/mfs-cache-3 branch December 17, 2024 20:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants