-
Notifications
You must be signed in to change notification settings - Fork 310
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
attempting to convert tiiuae/falcon-180B-chat #1472
Comments
This specific error may require small changes in the converter, but there are currently more general issues related to very large models. During conversion it will likely hit this other error #1324 which is a limitation in the current model serialization. Then during runtime, the model will require at least 180GB in int8 and would need to be split on multiple GPUs, which is currently not supported in CTranslate2 #1052. |
we are happy to supply you with a server you can test with if that would help. |
non issue now. |
@silvacarl2 How did you get this to work? I still get the same error as you posted for |
i didn't, we decide to try somehting else. theres ten zillion choices now. |
Lol yeah that's true |
sorry i thought you tried to do falcon-180b this may work for 40b, i don't remember. ct2-transformers-converter --model tiiuae/falcon-40b-instruct --output_dir tiiuae-falcon-40b-instruct-int8_float16 --force --copy_files tokenizer.json README.md tokenizer_config.json generation_config.json special_tokens_map.json --quantization int8_float16 --trust_remote_code it needs 88 Gb of RAM |
Thanks @silvacarl2 |
yes, just use int8 instaead of int_8_float16
we gave up on it, there are many better chocies now. |
@silvacarl2 Thanks for the response! I agree there certainly are much stronger models now! |
got it. well, thats as far as we went with it. 8-( |
Ah got it! Thanks for the help 😊 |
we are attempting to convert tiiuae/falcon-180B-chat to ct2 format.
this is the command:
ct2-transformers-converter --model tiiuae/falcon-180B-chat --output_dir tiiuae-falcon-180b-instruct-int8-float16 --force --copy_files tokenizer.json README.md tokenizer_config.json generation_config.json special_tokens_map.json --quantization int8_float16 --trust_remote_code
but we get this crash:
[2023-09-11 16:04:11,429] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
Loading checkpoint shards: 100%|████████████████| 81/81 [02:28<00:00, 1.84s/it]
Traceback (most recent call last):
File "/home/silvacarl/.local/bin/ct2-transformers-converter", line 8, in
sys.exit(main())
File "/home/silvacarl/.local/lib/python3.8/site-packages/ctranslate2/converters/transformers.py", line 1719, in main
converter.convert_from_args(args)
File "/home/silvacarl/.local/lib/python3.8/site-packages/ctranslate2/converters/converter.py", line 50, in convert_from_args
return self.convert(
File "/home/silvacarl/.local/lib/python3.8/site-packages/ctranslate2/converters/converter.py", line 89, in convert
model_spec = self._load()
File "/home/silvacarl/.local/lib/python3.8/site-packages/ctranslate2/converters/transformers.py", line 140, in _load
spec = loader(model, tokenizer)
File "/home/silvacarl/.local/lib/python3.8/site-packages/ctranslate2/converters/transformers.py", line 192, in call
spec = self.get_model_spec(model)
File "/home/silvacarl/.local/lib/python3.8/site-packages/ctranslate2/converters/transformers.py", line 1331, in get_model_spec
self.set_decoder(spec.decoder, model.transformer)
File "/home/silvacarl/.local/lib/python3.8/site-packages/ctranslate2/converters/transformers.py", line 1359, in set_decoder
self.set_layer_norm(layer_spec.input_layer_norm, layer.ln_attn)
AttributeError: 'TransformerDecoderLayerSpec' object has no attribute 'input_layer_norm'
any ideas?
we are running on it on A40 with 576 Gi RAM
The text was updated successfully, but these errors were encountered: