finetune.py optimization.update_freq #35

TaridaGeorge · 2021-03-03T13:20:56Z

I was wondering why in the finetune.py file you've set update_freq to be 24/NUM_GPU.

    cmd.append("+optimization.update_freq='[" + str(int(24/NUM_GPU)) + "]'")

In the wav2vec Readme https://github.com/pytorch/fairseq/blob/master/examples/wav2vec/README.md they say that the base model was trained using 64 V100 GPUs and as I understood if we want to do more training on the base model we should simulate the number of the GPUs they've used.

Note: you can simulate 64 GPUs by using k GPUs and adding command line parameters (before --config-dir) distributed_training.distributed_world_size=k +optimization.update_freq='[x]' where x = 64/k

Have you found that setting update_freq to be 24/NUM_GPU is better for training or is it a bug?

The text was updated successfully, but these errors were encountered:

mailong25 · 2021-03-03T15:10:30Z

optimization.update_freq='[x]' where x = 64/k should belong to the pre-train step

TaridaGeorge · 2021-03-03T16:27:48Z

And 24 should belong to finetuning? Is it 24 or 8? I saw that for the base model they've used 8 GPUs and for the large model 24.

mailong25 · 2021-03-03T17:14:19Z

Yup! the number should follow the wa2vec repo instruction.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

finetune.py optimization.update_freq #35

finetune.py optimization.update_freq #35

TaridaGeorge commented Mar 3, 2021

mailong25 commented Mar 3, 2021

TaridaGeorge commented Mar 3, 2021

mailong25 commented Mar 3, 2021

finetune.py optimization.update_freq #35

finetune.py optimization.update_freq #35

Comments

TaridaGeorge commented Mar 3, 2021

mailong25 commented Mar 3, 2021

TaridaGeorge commented Mar 3, 2021

mailong25 commented Mar 3, 2021