error during training with opso 0.10.0 #893
-
HI Samm, Thank you for your help with the decibel_imits issue I experienced on the Spectrogram preprocessor. I am now having difficulty training my model in the new OpSo version. My code below: model.logging_level = 2 # request lots of logged content
model.log_file = './multispecies_model_train_3/training_log_20102023.txt' #specify a file to log output to
Path(model.log_file).parent.mkdir(parents=True,exist_ok=True) #make the folder ./
model.verbose = 1 #don't print anything to the screen during training
#learning rate schedule
model.lr_cooling_factor = 0.3
model.lr_update_interval = 4
#Regularization weight decay
model.optimizer_params['weight_decay']=0.001
# Train the model on the full training set and validate on the validation set. Save the model to directory.
model.train(
balanced_train_df,
#train_df,
valid_df,
epochs= 1,
batch_size= 64,
log_interval = 5,
num_workers= 4, # if multiple processors are available utilise them to speed up training
#wandb_session = wandb_session,
save_interval= 10,
save_path= checkpoint_folder #location to save checkpoints
)
#Let Wandb know that we finished training successfully
wandb.unwatch(model.network)
wandb.finish() Throws us the following error:
You’ll notice I have also commented out the WandB_session = WandBsession argument, as this also throws up an error related to missing samplez / training samples dir. I have tried various tweaks to prevent the RuntimeErrorDataLoader error but without success. Have you come across this problem before? Thanks very much, |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 3 replies
-
Hi Lorenzo, it's hard to tell what's happening here but I'm happy to try to help. The error message doesn't contain any information about opensoundscape, only that the "workers" failed. The "workers" in this context are the multi-threaded CPU processes that the DataLoader is launching for preprocessing when you use num_workers>1. I would try setting num_workers=0 to only use the root process, and check if any more intelligible errors are given. |
Beta Was this translation helpful? Give feedback.
I managed to fix this error by setting sample shape to sample_shape=(224, 224, 1) in my model = CNN()