You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Here text[:, 1:] should be removal of the first [BOS] text token label, then there are only 128-1=127 text tokens left in labels. But in CE loss, text logits with seq_len=128 and labels[:, :self.text_seq_length]) # shape: (bs, 128) come to calculate the text loss. I guess that the very first image token after all text tokens are taken into text loss computation by mistake.
Am I understanding the code correctly? Will the text token length in CE loss calculation affect the training process?
The text was updated successfully, but these errors were encountered:
Hi!
In
forward()
function ofmodel.py
, text loss and image loss is computed byHere
text[:, 1:]
should be removal of the first[BOS]
text token label, then there are only128-1=127
text tokens left in labels. But in CE loss,text logits
withseq_len=128
andlabels[:, :self.text_seq_length]) # shape: (bs, 128)
come to calculate the text loss. I guess that the very first image token after all text tokens are taken into text loss computation by mistake.Am I understanding the code correctly? Will the text token length in CE loss calculation affect the training process?
The text was updated successfully, but these errors were encountered: