You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The parameter 289 has already been reduced. Gradient computed twice for this partition. Multiple gradient reduction is currently not supported
The exception is raised from deepspeed the secomd time surrogate.backward() is called in CachedMultipleNegativesRankingLoss.
def_backward_hook(
grad_output: Tensor,
sentence_features: Iterable[dict[str, Tensor]],
loss_obj: CachedMultipleNegativesRankingLoss,
) ->None:
"""A backward hook to backpropagate the cached gradients mini-batch by mini-batch."""assertloss_obj.cacheisnotNoneassertloss_obj.random_statesisnotNonewithtorch.enable_grad():
forsentence_feature, grad, random_statesinzip(sentence_features, loss_obj.cache, loss_obj.random_states):
for (reps_mb, _), grad_mbinzip(
loss_obj.embed_minibatch_iter(
sentence_feature=sentence_feature,
with_grad=True,
copy_random_state=False,
random_states=random_states,
),
grad,
):
surrogate=torch.dot(reps_mb.flatten(), grad_mb.flatten()) *grad_outputsurrogate.backward() # exception raised here
The text was updated successfully, but these errors were encountered:
Hypothesis-Z
changed the title
Deepspeed + CachedNMRL: Gradient computed twice for this partition. Multiple gradient reduction is currently not supported
Deepspeed + CachedMNRL: Gradient computed twice for this partition. Multiple gradient reduction is currently not supported
Jan 15, 2025
Hmm, that is bothersome. Indeed, CachedMNRL computes gradients twice. It's based on GradCache where the same happens. I see that DeepSpeed is not expecting that, so there's an incompatibility there. I don't think there is a simple workaround for that at the moment, other than 1) not using DeepSpeed or 2) not using a Cached loss.
There's some small extra details on the GradCache project itself: luyug/GradCache#11
@tomaarsen Thank you! I've upgraded accelerate/deepspeed/transforms to latest verison and set zero stage to 1. I'm not sure whether the model is well trained but it works, even with MatryoshkaLoss (from main branch)...
My script
My Deepspeed config
Error
The exception is raised from deepspeed the secomd time
surrogate.backward()
is called in CachedMultipleNegativesRankingLoss.The text was updated successfully, but these errors were encountered: