Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MUR Task #17

Open
gyhou123 opened this issue Apr 26, 2023 · 1 comment
Open

MUR Task #17

gyhou123 opened this issue Apr 26, 2023 · 1 comment

Comments

@gyhou123
Copy link

Hello, I would like to consult the following line of code.

, mlm_tgt_encodings, * = self.utt_encoder.bert(context_mlm_targets[ctx_mlm_mask], context_utts_attn_mask[ctx_mlm_mask])

context_mlm_targets[ctx_mlm_mask] represents the utterance tokenization before [MASK]
context_utts_attn_mask[ctx_mlm_mask] represents the attention mask after [MASK]

They don't match.
Why not recalculate the attention mask?

@guxd
Copy link
Owner

guxd commented May 24, 2023

By saying [MASK], do you mean masking utterances in contexts or masking words in utterances?
If the former, then 'context_utts_attn_mask' represents the attention mask before [MASK].
Please check Line 249 in data_loader.py: context_utts_attn_mask = [[1]*len(utt) for utt in context], which does not set masked positions to 0's.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants