-
-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to run this program on multiple GPUs #22
Comments
Hi @sewellGUO Have you taken a look at these two guides written for TensorFlow 1.x? The guides written by the TensorFlow team at Google are certainly the best resources to start with. I would also take a look at the state of the art in Distributed Training before deciding which is the right strategy. Please note that some of the models used in this project have been ported to TensorFlow 2 in the Taris repository: Do you find the two guides above useful to your use case ? |
Thanks for reply, i will try it.
I tried to reduce the learning rate and batch size, but it didn’t work. Have you ever had this problem? |
This looks like a tricky one. I have never encountered it in over 3 years.
I remember having difficulties debugging my code in TF1, which is one reason I migrated to TF2 last year. |
Thank you for your prompt reply. I have tested one by one according to your reply and found there are numbers in my output token. I regenerated the label and no longer reported an error. |
Nice, really glad to see that it worked. |
hello @georgesterpu,
Thank you for the open source code, I have a problem now.
When I run this program on multiple GPUs, I found that only one GPU is full, and the remaining GPUs are not used.
I am just a new comer for tensorflow. Some of the methods provided by Google are also useless, so I want to know how to modify the code to solve my problem?
The text was updated successfully, but these errors were encountered: