Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

attempting to convert tiiuae/falcon-180B-chat #1472

Closed
silvacarl2 opened this issue Sep 11, 2023 · 12 comments
Closed

attempting to convert tiiuae/falcon-180B-chat #1472

silvacarl2 opened this issue Sep 11, 2023 · 12 comments
Labels
enhancement New feature or request

Comments

@silvacarl2
Copy link

silvacarl2 commented Sep 11, 2023

we are attempting to convert tiiuae/falcon-180B-chat to ct2 format.

this is the command:

ct2-transformers-converter --model tiiuae/falcon-180B-chat --output_dir tiiuae-falcon-180b-instruct-int8-float16 --force --copy_files tokenizer.json README.md tokenizer_config.json generation_config.json special_tokens_map.json --quantization int8_float16 --trust_remote_code

but we get this crash:

[2023-09-11 16:04:11,429] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
Loading checkpoint shards: 100%|████████████████| 81/81 [02:28<00:00, 1.84s/it]
Traceback (most recent call last):
File "/home/silvacarl/.local/bin/ct2-transformers-converter", line 8, in
sys.exit(main())
File "/home/silvacarl/.local/lib/python3.8/site-packages/ctranslate2/converters/transformers.py", line 1719, in main
converter.convert_from_args(args)
File "/home/silvacarl/.local/lib/python3.8/site-packages/ctranslate2/converters/converter.py", line 50, in convert_from_args
return self.convert(
File "/home/silvacarl/.local/lib/python3.8/site-packages/ctranslate2/converters/converter.py", line 89, in convert
model_spec = self._load()
File "/home/silvacarl/.local/lib/python3.8/site-packages/ctranslate2/converters/transformers.py", line 140, in _load
spec = loader(model, tokenizer)
File "/home/silvacarl/.local/lib/python3.8/site-packages/ctranslate2/converters/transformers.py", line 192, in call
spec = self.get_model_spec(model)
File "/home/silvacarl/.local/lib/python3.8/site-packages/ctranslate2/converters/transformers.py", line 1331, in get_model_spec
self.set_decoder(spec.decoder, model.transformer)
File "/home/silvacarl/.local/lib/python3.8/site-packages/ctranslate2/converters/transformers.py", line 1359, in set_decoder
self.set_layer_norm(layer_spec.input_layer_norm, layer.ln_attn)
AttributeError: 'TransformerDecoderLayerSpec' object has no attribute 'input_layer_norm'

any ideas?

we are running on it on A40 with 576 Gi RAM

@guillaumekln
Copy link
Collaborator

guillaumekln commented Sep 12, 2023

This specific error may require small changes in the converter, but there are currently more general issues related to very large models.

During conversion it will likely hit this other error #1324 which is a limitation in the current model serialization. Then during runtime, the model will require at least 180GB in int8 and would need to be split on multiple GPUs, which is currently not supported in CTranslate2 #1052.

@guillaumekln guillaumekln added the enhancement New feature or request label Sep 12, 2023
@silvacarl2
Copy link
Author

we are happy to supply you with a server you can test with if that would help.

@silvacarl2
Copy link
Author

non issue now.

@aflah02
Copy link

aflah02 commented Jan 16, 2024

@silvacarl2 How did you get this to work? I still get the same error as you posted for falcon-40b

@silvacarl2
Copy link
Author

i didn't, we decide to try somehting else. theres ten zillion choices now.

@aflah02
Copy link

aflah02 commented Jan 16, 2024

Lol yeah that's true
Can I know what worked best for you for inference on falcon-40b and other such large models? I've had good success with ctranslate2 for smaller models so far while trying to get logprobs for inputs

@silvacarl2
Copy link
Author

sorry i thought you tried to do falcon-180b

this may work for 40b, i don't remember.

ct2-transformers-converter --model tiiuae/falcon-40b-instruct --output_dir tiiuae-falcon-40b-instruct-int8_float16 --force --copy_files tokenizer.json README.md tokenizer_config.json generation_config.json special_tokens_map.json --quantization int8_float16 --trust_remote_code

it needs 88 Gb of RAM

@aflah02
Copy link

aflah02 commented Jan 16, 2024

Thanks @silvacarl2
Is there a way to do this without the int_8 quantization as well? As I tried the same command without the int8 quantization flags and got the same error as you had originally posted for the 180b model. I'm also curious about the 180b model, what way did you go with to run it?

@silvacarl2
Copy link
Author

Is there a way to do this without the int_8 quantization as well?

yes, just use int8 instaead of int_8_float16

'm also curious about the 180b model, what way did you go with to run it?

we gave up on it, there are many better chocies now.

@aflah02
Copy link

aflah02 commented Jan 16, 2024

@silvacarl2 Thanks for the response!
I think you might've misunderstood my first question. I want to do this without any quantization and hence I did not use that flag but it did not work and I got the error you mentioned in this issue.

I agree there certainly are much stronger models now!

@silvacarl2
Copy link
Author

got it. well, thats as far as we went with it. 8-(

@aflah02
Copy link

aflah02 commented Jan 16, 2024

Ah got it! Thanks for the help 😊

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants
@silvacarl2 @guillaumekln @aflah02 and others