Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(dowloader): resume partial downloads #4537

Merged
merged 10 commits into from
Jan 9, 2025

Conversation

Saavrm26
Copy link
Contributor

@Saavrm26 Saavrm26 commented Jan 4, 2025

Description

This PR adds support for resuming partial downloads.

  • Resuming partial downloads
  • Verifying checksum
  • Reflecting download percentage on front end after resuming download

Notes for Reviewers
A few questions:

  1. Should we check if the server supports Range Header(used here for partial downloads), most modern web servers do.
  2. I wanted to have a discussion when a model in gallery has multiple files.

Related to: #2156

Signed commits

  • Yes, I signed my commits.

… transfers

- Adds support for resuming partially downloaded files
- Uses HTTP Range header to continue from last byte position
- Maintains download progress across interruptions
- Preserves partial downloads with .partial extension
- Validates SHA256 checksum after completion

Signed-off-by: Saarthak Verma <[email protected]>
Copy link

netlify bot commented Jan 4, 2025

Deploy Preview for localai ready!

Name Link
🔨 Latest commit a17c9e3
🔍 Latest deploy log https://app.netlify.com/sites/localai/deploys/677ebd96baafa10008f8477d
😎 Deploy Preview https://deploy-preview-4537--localai.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@Saavrm26 Saavrm26 marked this pull request as ready for review January 6, 2025 18:44
@Saavrm26
Copy link
Contributor Author

Saavrm26 commented Jan 6, 2025

Hi @mudler, can you have a look at this PR and the notes for reviewers section.
Thanks

@mudler
Copy link
Owner

mudler commented Jan 7, 2025

Hi @mudler, can you have a look at this PR and the notes for reviewers section.
Thanks

Sure! will try to review today, thanks for the PR.

@mudler
Copy link
Owner

mudler commented Jan 7, 2025

1. Should we check if the server supports Range Header(used here for partial downloads), most modern web servers do.

I think we probably should - mainly because otherwise if a web server does not support it we leave a dangling .partial file which can't be removed, and the model can't be installed anymore

2. I wanted to have a discussion when a model in gallery has multiple files.

In this case it should use the same underlying download function, so partial files would be recovered as well by this changeset, or am I missing anything?

the changeset looks good to me. My only open point is about detecting if a webserver doesn't support streaming files from a certain point, otherwise we should find another way to remove the .partials when we can't resume for some reason and start a fresh download instead (e.g. the partial data is invalid, or the webserver does not support it, or there are other relevant errors to the download)

@Saavrm26
Copy link
Contributor Author

Saavrm26 commented Jan 7, 2025

I think we probably should - mainly because otherwise if a web server does not support it we leave a dangling .partial file which can't be removed, and the model can't be installed anymore

the changeset looks good to me. My only open point is about detecting if a webserver doesn't support streaming files from a certain point, otherwise we should find another way to remove the .partials when we can't resume for some reason and start a fresh download instead (e.g. the partial data is invalid, or the webserver does not support it, or there are other relevant errors to the download)

Yes, there will be the issue with dangling .partial file, I was unsure weather we should delete the partial file automatically or let users delete them manually. The way I am thinking of implementing this is sending a HEAD request and checking if there is presence of Accept-Ranges header in the response.

In this case it should use the same underlying download function, so partial files would be recovered as well by this changeset, or am I missing anything?

There is a scenario that might pose a problem, though I do not know how common it would be.
Suppose we have a model in gallery that has 2 files(let's say, SomeSupportingBinary, ModelA) . And one of them(SomeSupportingBinary) got downloaded successfully, while the other one was a partial download(ModelA.partial). Now by the time we had resumed the download, the ModelA in the web server got updated. On a user's system when they resume the download, after completion, the downloaded ModelA will not match the corresponding SHA in _gallery_MODELA.yml and hence it will be deleted.

But here lies the issue that the SomeSupportingBinary will still be in the user's file system. This is one of the scenarios, there are a few others as well

@mudler
Copy link
Owner

mudler commented Jan 7, 2025

I think we probably should - mainly because otherwise if a web server does not support it we leave a dangling .partial file which can't be removed, and the model can't be installed anymore

the changeset looks good to me. My only open point is about detecting if a webserver doesn't support streaming files from a certain point, otherwise we should find another way to remove the .partials when we can't resume for some reason and start a fresh download instead (e.g. the partial data is invalid, or the webserver does not support it, or there are other relevant errors to the download)

Yes, there will be the issue with dangling .partial file, I was unsure weather we should delete the partial file automatically or let users delete them manually. The way I am thinking of implementing this is sending a HEAD request and checking if there is presence of Accept-Ranges header in the response.

Cool, that actually looks a sane approach

In this case it should use the same underlying download function, so partial files would be recovered as well by this changeset, or am I missing anything?

There is a scenario that might pose a problem, though I do not know how common it would be. Suppose we have a model in gallery that has 2 files(let's say, SomeSupportingBinary, ModelA) . And one of them(SomeSupportingBinary) got downloaded successfully, while the other one was a partial download(ModelA.partial). Now by the time we had resumed the download, the ModelA in the web server got updated. On a user's system when they resume the download, after completion, the downloaded ModelA will not match the corresponding SHA in _gallery_MODELA.yml and hence it will be deleted.

But here lies the issue that the SomeSupportingBinary will still be in the user's file system. This is one of the scenarios, there are a few others as well

Gotcha, however it would be similar to the case if there is another failure (e.g. the server returns a 500) and not strictly related to the changes introduced in this PR.

I don't think should be part of this PR as well but rather be a follow-up to fix these behaviors - in the worse case there would be a file dangling (with the correct checksums), which would pass checks if the user would try to re-install the model (and deleting it), but that was the case before your changes.

@Saavrm26
Copy link
Contributor Author

Saavrm26 commented Jan 8, 2025

Hi, thanks for your feedback and sorry for the delayed response.

I don't think should be part of this PR as well but rather be a follow-up to fix these behaviors - in the worse case there would be a file dangling (with the correct checksums), which would pass checks if the user would try to re-install the model (and deleting it), but that was the case before your changes.

Understood!

I'll work on implementing the server check.

Thanks!

@Saavrm26
Copy link
Contributor Author

Saavrm26 commented Jan 8, 2025

Hi I've implemented the server check and a test for the same.

Thanks!

Copy link
Owner

@mudler mudler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good here, thanks @Saavrm26 !

@mudler mudler merged commit 6765b17 into mudler:master Jan 9, 2025
29 checks passed
@mudler mudler mentioned this pull request Jan 9, 2025
21 tasks
@Saavrm26
Copy link
Contributor Author

Saavrm26 commented Jan 9, 2025

Thanks for the review, looking to contribute in other areas of the repository!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants