Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question regarding tokens_to_ignore and ids_to_ignore #1

Open
yihan-zhou opened this issue Mar 23, 2024 · 0 comments
Open

Question regarding tokens_to_ignore and ids_to_ignore #1

yihan-zhou opened this issue Mar 23, 2024 · 0 comments

Comments

@yihan-zhou
Copy link

Hi there!

Thank you for putting together this great work. I have a question and wonder if I could get some help!

I'm experimenting with get_full_sentence_logprob in models.py, and just realized that "<extra_id_0>" in ids_to_ignore appear to be different from "<extra_id_0>" in the result of labels = self._tokenizer(output_str, return_tensors="pt").input_ids.to(self.device). This might because tokenizer.convert_tokens_to_ids() map token to id, but "<extra_id_0>" was recognized as word instead. This results in that "<extra_id_0>" not being included inmask = torch.BoolTensor([tok_id not in self.ids_to_ignore for tok_id in labels[0]]) .

Is this behavior expected? or please let me know if I misunderstood anything. Thank you so much for your help in advance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant