-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Live/auto-fetching model info #55
Comments
Exactly, I was trying to find a more elegant processing pipeline to get the sizes but apparently there is no API for that. The problem with the different sizes is the myriad of different quantization variants. It's not easy to tell (just based on the naming) what onnx model corresponds to what name. Some repos have only one, others many different versions, e.g.:
I documented all of my findings in here: https://github.com/do-me/trending-huggingface-models and the jupyter notebook. The repo also creates a ready-to-copy-and-paste html section for SemanticFinder. As a side note, it also includes these experimental models like https://huggingface.co/onnx-community/decision-transformer-gym-halfcheetah-expert/tree/main/onnx that do not work for semantic similarity. For ease of use I decided to include all file sizes to give the user at least some kind of idea how heavy the model is but that's certainly not the best way. I was also considering saying goodbye to hard-coded models and using a free text field like here instead. But this way, it becomes harder for non-expert users... Maybe something in between would be good:
We just need to keep an eye on the index file loading logic so that nothing breaks. Some people already contributed files and are (supposedly) actively using this logic. What's your take on this? |
Just checking, are the files that Jhnbsomersxkhi2 contributed live anywhere? I think I'll definitely start trying to contribute to that HF repo |
For the GET API we can get the size of the default model in terms of number of fp32 and fp16 parameters, and from here we could compute the model, but as you said, I'm not sure we can get the size of the quantized model. That said, we could statically encode the size of the model (maybe both the default/quantized) and then dynamically fetch # likes and # downloads. |
Sure, you can find all files either in the readme catalogue or directly in the files section. HF is pretty much like GitHub. I would love some kind of functionality where users simply click a button "Publish to Huggingface" to open a PR with the index and correctly formatted metadata, similar to when you share your results here. |
So wait, we can get the regular size of the model via GET request right? If I remember correctly the quantized model's sizes then follow a pretty linear scheme, like e.g. fp16 is always ~50% of the regular model's size, q4 is always ~25% and so on. That seems like the easiest option to me to avoid hard-coding and having a future-proof method maybe. The only issue might be that if we calculate all file sizes for all quantization methods on the fly for a model that e.g. does not have an fp16 version it might confuse users. |
Yes that's a great idea! What're the storage limits for Hugging Face? And yes we can get the regular size of the model. Check out the above js query and look under safetensors.parameters |
Fyi, I'm imagining something like this for selecting a model: https://jsfiddle.net/vtkrqxgh/ |
I noticed the # downloads and # likes in the model dropdown is hard coded. I dug around into the Huggingface Hub API and found that we can access this info via GET requests. Here's an example of doing it for gte-mini:
You can enter the following code in the JS console to verify it works. The
downloads
andlikes
entries directly get us those values, but the model size is a bit harder, especially because we are hardcoding the size of the Onyx model. I'm not sure if we can even get that value.Additionally, I want to ask @do-me what it means when a model has many different download sizes in the dropdown, e.g. snowflake-arctic-embed-xs.
The text was updated successfully, but these errors were encountered: