-
Notifications
You must be signed in to change notification settings - Fork 310
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding PhiConfig #1568
Adding PhiConfig #1568
Conversation
Using the same test scripts as elsewhere: To reproduce run converter with:
import ctranslate2
from transformers import AutoTokenizer, AutoModelForCausalLM
from timeit import default_timer as timer
model_ct2 = ctranslate2.Generator(
"/home/michi/ct2_convert_phi",
device="cpu",
compute_type="float32",
)
name_hf = 'microsoft/phi-1_5'
model_hf = AutoModelForCausalLM.from_pretrained(name_hf, trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained(name_hf, device="cpu", trust_remote_code=True)
text = "# this code print('hello', name) in python3 \n\ndef hello_name(name: "
def inference_ct2(text: str):
tokens_in = tokenizer.convert_ids_to_tokens(tokenizer.encode(text))
s = timer()
tokens_out = model_ct2.generate_batch([tokens_in], max_length=32, min_length=32)
total_ct2 = timer() - s
text_out = tokenizer.decode(tokens_out[0].sequences_ids[0])
return total_ct2, text_out
def inference_hf(text: str):
inputs = tokenizer.encode(text, return_tensors="pt")
s = timer()
outputs = model_hf.generate(inputs, max_length=32, min_length=32)
total_hf = timer() - s
text_out2 = tokenizer.batch_decode(outputs)[0]
return total_hf, text_out2
_, _ = inference_ct2("warm up")
_, _ = inference_hf("warm up")
total_ct2, text_out_ct2 = inference_ct2(text)
total_hf, text_out_hf = inference_hf(text)
print("hf\n",text_out_hf, "\nct2\n",text_out_ct2)
print(f"time for ct2={total_ct2} time for hf={total_hf}")
assert text_out_ct2[:len(text_out_hf)] == text_out_hf # same, ct2 generates +1 token compared to HF
print("done") I get: >>> inference_ct2("warm up")
(6.334434970000075, 'warm up before exercising.\n\n3. Why did John feel energized and ready to tackle the day after his workout?\nAnswer: John felt energized and')
>>> inference_hf("warm up")
(7.759652636999817, 'warm up before exercising.\n\n3. Why did John feel energized and ready to tackle the day after his workout?\nAnswer: John felt energized') |
can you fix flake8 ? |
Done! |
looking at the transformers library I don't see anywhere the use of MixFormerSequentialLoader, so just wondering if we could just get rid of it or if there is really a foundation to keep this class. |
Valid, point, i forgot that. It was my starting point. |
sorry to be annoying but shouldn't we remove the other one completely ? unless we want to support older versions of the transformers library for now. |
No idea, which models from huggingface use „MixFormerSequentialLoader“ - I assume there are some models for that. This PR is for supporting „PhiConfig“. I initally thought the structure is very similar to MixFormerSequentialLoader. That was not the case. I think it makes sense to keep it, and open an extra class right? |
let's keep both for now but I really think this is the same model and they only renamed it. I see you changes vs the older one but I think specs are equivalent. Obviously you tested it so I'll merge it. |
Work in progress to add one more model. (microsoft/phi)
I am really not sure if my conversions are right, maybe give it a check. heavily based on
MixFormerSequentialLoader
.Closes #1559