Question regarding tokens_to_ignore and ids_to_ignore #1

yihan-zhou · 2024-03-23T22:30:07Z

Hi there!

Thank you for putting together this great work. I have a question and wonder if I could get some help!

I'm experimenting with get_full_sentence_logprob in models.py, and just realized that "<extra_id_0>" in ids_to_ignore appear to be different from "<extra_id_0>" in the result of labels = self._tokenizer(output_str, return_tensors="pt").input_ids.to(self.device). This might because tokenizer.convert_tokens_to_ids() map token to id, but "<extra_id_0>" was recognized as word instead. This results in that "<extra_id_0>" not being included inmask = torch.BoolTensor([tok_id not in self.ids_to_ignore for tok_id in labels[0]]) .

Is this behavior expected? or please let me know if I misunderstood anything. Thank you so much for your help in advance.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question regarding tokens_to_ignore and ids_to_ignore #1

Question regarding tokens_to_ignore and ids_to_ignore #1

yihan-zhou commented Mar 23, 2024

Question regarding tokens_to_ignore and ids_to_ignore #1

Question regarding tokens_to_ignore and ids_to_ignore #1

Comments

yihan-zhou commented Mar 23, 2024