You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am training a Keras Tensorflow ResNet50 model in my Nvidia RTX4090 GPU. I am using Python 3.10, TF 2.10 (I am using Windows), Keras 2.10, CUDA 12.5, CuDNN 8.9 and PyCharm as my interpreter. The model usually takes around 2 minutes per epoch. However, I have observed that sometimes, without changing any hiperparameters and with exactly the same inputs, an epoch can take up to an hour. Moreover, this sometimes happen within the same run: the first epoch takes 15 minutes but the other epochs take 2 minutes, or the model runs at normal speed for three epochs and the fourth epoch takes over half an hour. At the beginning of the code I'm using tf.config.experimental.set_memory_growth(device, True) and tf.keras.backend.clear_session(), and I have checked the GPU usage and its the same in both cases. I am new to Machine Learning and Keras, so is there anything I'm missing?
The text was updated successfully, but these errors were encountered:
I am training a Keras Tensorflow ResNet50 model in my Nvidia RTX4090 GPU. I am using Python 3.10, TF 2.10 (I am using Windows), Keras 2.10, CUDA 12.5, CuDNN 8.9 and PyCharm as my interpreter. The model usually takes around 2 minutes per epoch. However, I have observed that sometimes, without changing any hiperparameters and with exactly the same inputs, an epoch can take up to an hour. Moreover, this sometimes happen within the same run: the first epoch takes 15 minutes but the other epochs take 2 minutes, or the model runs at normal speed for three epochs and the fourth epoch takes over half an hour. At the beginning of the code I'm using tf.config.experimental.set_memory_growth(device, True) and tf.keras.backend.clear_session(), and I have checked the GPU usage and its the same in both cases. I am new to Machine Learning and Keras, so is there anything I'm missing?
The text was updated successfully, but these errors were encountered: