Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remote repository degraded performance #93

Open
alexandru-bagu opened this issue Jan 16, 2025 · 4 comments
Open

Remote repository degraded performance #93

alexandru-bagu opened this issue Jan 16, 2025 · 4 comments

Comments

@alexandru-bagu
Copy link

Hello,

Due to how borgstore::store works there is a huge performance hit if the repository is not within the local network.

self.backend.store(self.find(name), value)

Basically, before we store the file we do a find first to figure out the path I believe.
Using a local cache for the "find" call would speed up the process at the cost of uploading a file twice(?) or maybe in the wrong place(?). Honestly I am not entirely sure what find has to do with store.
What would the worst case be if we used a local cache to check if a blob already exists? We write it twice?

@ThomasWaldmann
Copy link
Member

I didn't look at the code right now, but IIRC the find call might be to support overwriting already present data while a nesting depth change is done (e.g. changing from nesting depth 1 to nesting depth 2).

@alexandru-bagu
Copy link
Author

alexandru-bagu commented Jan 16, 2025

I think a solution would need to be found for this because at the moment it would prevent proper parallelization of the upload process. By bypassing the self.find (since I know none of those blobs even exist) and adding some parallelization code for the rclone backend I am able to upload upwards of 150 Mbps to a server with a latency of ~50ms. When one considers that before doing an upload a 50ms pre-request needs to be done to check a path while uploading chunks of 1 MiB, you end up with at most 10 chunks per second and considering compression and deduplication you will end up with an upload speed of way less than 10 MiB/s.

@alexandru-bagu
Copy link
Author

Can borgstore benefit somehow from the list of chunk hashes that each archive has?

Another issue that I am running into is that if the borg create/import-tar process is not completed, the chunk list is not known and as such the create/import-tar behaves as if none of the blobs exist. A possible solution could be uploading a temporary chunk list every n new chunks or every x minutes to allow the process to continue from where it left off or close enough.
With this temporary chunk list + the existing list of chunk hashes from completed archives we could improve the processing speed a lot.

@ThomasWaldmann
Copy link
Member

Avoid putting different issues into same github issue. :-)

About your "other issue": guess you use latest beta. iirc that is already fixed in master branch (at least for borg create, not sure about import-tar) by uploading increments for the chunks cache.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants