You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
House a number of examples that don't necessarily fit in the pure onnxruntime setting
Llava-next with mistral 7b for instance works really well on a single GPU, 4-bit quant, with ray serve and huggingface - adding example shortly. Noting that really well means it fits on a single GPU, it's still chunky so high latency on a consumer GPu at least
The text was updated successfully, but these errors were encountered:
The text was updated successfully, but these errors were encountered: