Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add uploading support to the library! #72

Draft
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

arilotter
Copy link

@arilotter arilotter commented Oct 28, 2024

This is an absolutely massive set of changes, so while this PR is far from ready, I thought I'd share my work so far to begin the process of upstreaming & discussing changes early.

This adds ~2k lines of code, more than quadrupling the repo size.

This PR currently only has an implementation for the tokio / reqwests side of things, and does not touch the sync/ureq code.

It's also quite messy & has a bunch of copy-pasted comments from the Python huggingface_hub codebase.

It's a near-direct translation of the Python code, but with a whooole bunch of structs that I best-guess translated over, since I couldn't find any public documentation for these APIs.

implemented & works:

  • small files
  • git lfs files
  • git lfs multipart files
  • model repos

not yet implemented:

  • dataset repos, spaces
  • file copies
  • file deletions
  • folder uploads
  • repo management- create, delete, etc
  • much of anything else

supports:
- small files
- git lfs files
- git lfs multipart files
- model repos

does not yet support:
- dataset repos, spaces
- file copies
- file deletions
- folder uploads
- much of anything else
@Narsil
Copy link
Collaborator

Narsil commented Jan 7, 2025

We don't want to copy the python code here. This is Rust-land, the goal is to stick to the simplest possible thing.

For instance from_str is very unrusty. Having real enum makes everything simpler, user cannot even EXPRESS the wrong thing since the enum explicitly states what's valid, which is not the case of &str. Adding that layer is simply adding potential for errors. remove it.

Overall this PR is not really reviewable as it touches things EVERYWHERE, without focusing a making the simplest possible modification.

Also remove all the new dependencies, we simply do not want them, they are not necessary. Taking on dependency is a cost. Staying simple has strong benefits. Maybe the shas are necessary but then users should be able to opt-out of those.

Also remove all forms of environment dependency, we also explicitly removed every form of them, with the exception of HF_ENDPOINT and HF_HOME which are ONLY every read in from_env functions (again explicit user opting-in for those, reading environment in libraries is really a source of many issues)

You should check out this https://github.com/huggingface/hf_transfer/blob/main/src/lib.rs#L452 as it's implementing upload is only several hundred lines.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants