Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Live/auto-fetching model info #55

Open
varunneal opened this issue Sep 2, 2024 · 7 comments
Open

Live/auto-fetching model info #55

varunneal opened this issue Sep 2, 2024 · 7 comments

Comments

@varunneal
Copy link
Collaborator

I noticed the # downloads and # likes in the model dropdown is hard coded. I dug around into the Huggingface Hub API and found that we can access this info via GET requests. Here's an example of doing it for gte-mini:

const url = "https://huggingface.co/api/models/TaylorAI/gte-tiny";
const headers = {
    "user-agent": "unknown/None;",
    "Accept-Encoding": "gzip, deflate",
    "Accept": "*/*",
    "Connection": "keep-alive",
};

fetch(url, { method: 'GET', headers: headers })
    .then(response => response.json())  
    .then(data => console.log(data))
    .catch(error => console.error('Error:', error));

You can enter the following code in the JS console to verify it works. The downloads and likes entries directly get us those values, but the model size is a bit harder, especially because we are hardcoding the size of the Onyx model. I'm not sure if we can even get that value.

Additionally, I want to ask @do-me what it means when a model has many different download sizes in the dropdown, e.g. snowflake-arctic-embed-xs.

@do-me
Copy link
Owner

do-me commented Sep 4, 2024

Exactly, I was trying to find a more elegant processing pipeline to get the sizes but apparently there is no API for that.

The problem with the different sizes is the myriad of different quantization variants. It's not easy to tell (just based on the naming) what onnx model corresponds to what name. Some repos have only one, others many different versions, e.g.:

I documented all of my findings in here: https://github.com/do-me/trending-huggingface-models and the jupyter notebook. The repo also creates a ready-to-copy-and-paste html section for SemanticFinder.

As a side note, it also includes these experimental models like https://huggingface.co/onnx-community/decision-transformer-gym-halfcheetah-expert/tree/main/onnx that do not work for semantic similarity.

For ease of use I decided to include all file sizes to give the user at least some kind of idea how heavy the model is but that's certainly not the best way.

I was also considering saying goodbye to hard-coded models and using a free text field like here instead. But this way, it becomes harder for non-expert users...

Maybe something in between would be good:

  • a free text input field that can be used for any model
  • a dropdown for our "chef's selection" of good models. When selecting a model here, it would simply copy the value to the text input

We just need to keep an eye on the index file loading logic so that nothing breaks. Some people already contributed files and are (supposedly) actively using this logic.

What's your take on this?

@varunneal
Copy link
Collaborator Author

Just checking, are the files that Jhnbsomersxkhi2 contributed live anywhere? I think I'll definitely start trying to contribute to that HF repo

@varunneal
Copy link
Collaborator Author

For the GET API we can get the size of the default model in terms of number of fp32 and fp16 parameters, and from here we could compute the model, but as you said, I'm not sure we can get the size of the quantized model. That said, we could statically encode the size of the model (maybe both the default/quantized) and then dynamically fetch # likes and # downloads.

@do-me
Copy link
Owner

do-me commented Sep 5, 2024

Just checking, are the files that Jhnbsomersxkhi2 contributed live anywhere? I think I'll definitely start trying to contribute to that HF repo

Sure, you can find all files either in the readme catalogue or directly in the files section. HF is pretty much like GitHub.

I would love some kind of functionality where users simply click a button "Publish to Huggingface" to open a PR with the index and correctly formatted metadata, similar to when you share your results here.

@do-me
Copy link
Owner

do-me commented Sep 5, 2024

For the GET API we can get the size of the default model in terms of number of fp32 and fp16 parameters, and from here we could compute the model, but as you said, I'm not sure we can get the size of the quantized model. That said, we could statically encode the size of the model (maybe both the default/quantized) and then dynamically fetch # likes and # downloads.

So wait, we can get the regular size of the model via GET request right? If I remember correctly the quantized model's sizes then follow a pretty linear scheme, like e.g. fp16 is always ~50% of the regular model's size, q4 is always ~25% and so on. That seems like the easiest option to me to avoid hard-coding and having a future-proof method maybe.

The only issue might be that if we calculate all file sizes for all quantization methods on the fly for a model that e.g. does not have an fp16 version it might confuse users.

@varunneal
Copy link
Collaborator Author

Yes that's a great idea! What're the storage limits for Hugging Face? And yes we can get the regular size of the model. Check out the above js query and look under safetensors.parameters

@do-me
Copy link
Owner

do-me commented Sep 10, 2024

Fyi, I'm imagining something like this for selecting a model: https://jsfiddle.net/vtkrqxgh/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants