You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Note: you can simulate 64 GPUs by using k GPUs and adding command line parameters (before --config-dir) distributed_training.distributed_world_size=k +optimization.update_freq='[x]' where x = 64/k
Have you found that setting update_freq to be 24/NUM_GPU is better for training or is it a bug?
The text was updated successfully, but these errors were encountered:
I was wondering why in the finetune.py file you've set update_freq to be 24/NUM_GPU.
In the wav2vec Readme https://github.com/pytorch/fairseq/blob/master/examples/wav2vec/README.md they say that the base model was trained using 64 V100 GPUs and as I understood if we want to do more training on the base model we should simulate the number of the GPUs they've used.
Have you found that setting update_freq to be 24/NUM_GPU is better for training or is it a bug?
The text was updated successfully, but these errors were encountered: