You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
#868 implements varying the mixture over training duration for where the transition should happen. For this, it relies on user-provided step counts. When the dataset is asked for a batch, it compared the requested indices to this threshold.
Unfortunately, I have now realized that the indices supplied to the dataset might not exactly correspond to the specific training run. In a two stage curriculum, this results in data mixture 2 starting slightly before the requested transition point. This can be observed when running two experiments with the exact same mixture for two stages but different transition points: the runs are completely identical except for right before the transition point. Picture attached for two runs where the mixtures are same but the transition point is either 80% through training or 90% through training
Two solutions come to mind
Change the requested indices to be in the order of training
Look into which indices are being requested and manually account for this within the varying mixture dataset
The text was updated successfully, but these errors were encountered:
In some initial debugging, I think I found that during a training run, the indices being requested by the trainer are not perfectly aligned with the actual step count during training? I should make a minimal script to share.
Is it possible that some intermediate steps are being thrown out, or used for eval, when the trainer iterates? If batch i, element j, always requests index i + batch_size + j, then I can't understand why this bug would occur.
#868 implements varying the mixture over training duration for where the transition should happen. For this, it relies on user-provided step counts. When the dataset is asked for a batch, it compared the requested indices to this threshold.
Unfortunately, I have now realized that the indices supplied to the dataset might not exactly correspond to the specific training run. In a two stage curriculum, this results in data mixture 2 starting slightly before the requested transition point. This can be observed when running two experiments with the exact same mixture for two stages but different transition points: the runs are completely identical except for right before the transition point. Picture attached for two runs where the mixtures are same but the transition point is either 80% through training or 90% through training
Two solutions come to mind
The text was updated successfully, but these errors were encountered: