You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thank you for putting together this great work. I have a question and wonder if I could get some help!
I'm experimenting with get_full_sentence_logprob in models.py, and just realized that "<extra_id_0>" in ids_to_ignore appear to be different from "<extra_id_0>" in the result of labels = self._tokenizer(output_str, return_tensors="pt").input_ids.to(self.device). This might because tokenizer.convert_tokens_to_ids() map token to id, but "<extra_id_0>" was recognized as word instead. This results in that "<extra_id_0>" not being included inmask = torch.BoolTensor([tok_id not in self.ids_to_ignore for tok_id in labels[0]]) .
Is this behavior expected? or please let me know if I misunderstood anything. Thank you so much for your help in advance.
The text was updated successfully, but these errors were encountered:
Hi there!
Thank you for putting together this great work. I have a question and wonder if I could get some help!
I'm experimenting with
get_full_sentence_logprob
inmodels.py
, and just realized that"<extra_id_0>"
inids_to_ignore
appear to be different from"<extra_id_0>"
in the result oflabels = self._tokenizer(output_str, return_tensors="pt").input_ids.to(self.device)
. This might becausetokenizer.convert_tokens_to_ids()
map token to id, but"<extra_id_0>"
was recognized as word instead. This results in that"<extra_id_0>"
not being included inmask = torch.BoolTensor([tok_id not in self.ids_to_ignore for tok_id in labels[0]])
.Is this behavior expected? or please let me know if I misunderstood anything. Thank you so much for your help in advance.
The text was updated successfully, but these errors were encountered: