diff --git a/README.md b/README.md index d98cfbb..0f2e4b7 100644 --- a/README.md +++ b/README.md @@ -363,9 +363,10 @@ is trivial through use of 🤗 Transformers! To reduce our models memory footprint, we load the model in 8bit, this means we quantize the model to use 1/4th precision (when compared to float32) with minimal loss to performance. To read more about how this works, head over [here](https://huggingface.co/blog/hf-bitsandbytes-integration). ```python -from transformers import WhisperForConditionalGeneration +from transformers import WhisperForConditionalGeneration, BitsAndBytesConfig -model = WhisperForConditionalGeneration.from_pretrained(model_name_or_path, load_in_8bit=True, device_map="auto") +quantization_config=BitsAndBytesConfig(load_in_8bit=True) +model = WhisperForConditionalGeneration.from_pretrained(model_name_or_path, quantization_config=quantization_config, device_map="auto") ``` ### Post-processing on the model @@ -606,4 +607,4 @@ With PEFT, you can also go beyond Speech recognition and apply the same set of t Hungry to push this to the limits and test out more SoTA techniques? [Try Whisper with adalora!](https://github.com/huggingface/peft/blob/main/examples/int8_training/run_adalora_whisper_int8.sh) -Don't forget to tweet your results and tag us! [@huggingface](https://twitter.com/huggingface) and [@reach_vb](https://twitter.com/reach_vb) ❤️ \ No newline at end of file +Don't forget to tweet your results and tag us! [@huggingface](https://twitter.com/huggingface) and [@reach_vb](https://twitter.com/reach_vb) ❤️