Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training stop unexpected at iteration 3618 #2

Open
lajictw opened this issue Oct 23, 2023 · 2 comments
Open

Training stop unexpected at iteration 3618 #2

lajictw opened this issue Oct 23, 2023 · 2 comments

Comments

@lajictw
Copy link

lajictw commented Oct 23, 2023

Hi. I really like this work that uses the molecular fragmentation idea. But when I try to reproduce it, at the 3618th iteration, the iterator says that there are no more elements. Then the training stops. I only added the tqdm module to the loop for observing the training process on your code. In addition I modified the batch_size parameter to equal 64 and the num_workers parameter to equal 8. The error message is attached below. I think it's some kind of error in the setup. But I'm not quite sure. Please reply to me at your convenience if possible. Thanks!

`2023-10-23 01:37:14,919 :: train :: INFO] [Train] Iter 3617 | loss: 1.285635 | loss_pos: 1.181161 | loss_node: 0.091807 | loss_edge: 0.012668
Training: 3%|██████▏ | 3617/110000 [1:02:57<29:16:15, 1.01it/s][2023-10-23 01:37:15,320 :: train :: INFO] [Train] Iter 3618 | loss: 1.513661 | loss_pos: 1.436085 | loss_node: 0.067695 | loss_edge: 0.009880
Training: 3%|██████▏ | 3618/110000 [1:02:58<30:51:47, 1.04s/it]
Traceback (most recent call last):
File "D:\PaperCode\MolDiff.\utils\train.py", line 51, in inf_iterator
yield iterator.next()
File "C:\Users\ctw31\anaconda3\envs\GNN1\lib\site-packages\torch\utils\data\dataloader.py", line 633, in next
data = self._next_data()
File "C:\Users\ctw31\anaconda3\envs\GNN1\lib\site-packages\torch\utils\data\dataloader.py", line 1318, in _next_data
raise StopIteration
StopIteration

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "D:\PaperCode\MolDiff\scripts\train_drug3d.py", line 172, in
train(it)
File "D:\PaperCode\MolDiff\scripts\train_drug3d.py", line 90, in train
batch = next(train_iterator).to(args.device)
File "D:\PaperCode\MolDiff.\utils\train.py", line 53, in inf_iterator
iterator = iterable.iter()
File "C:\Users\ctw31\anaconda3\envs\GNN1\lib\site-packages\torch\utils\data\dataloader.py", line 441, in iter
return self._get_iterator()
File "C:\Users\ctw31\anaconda3\envs\GNN1\lib\site-packages\torch\utils\data\dataloader.py", line 388, in _get_iterator
return _MultiProcessingDataLoaderIter(self)
File "C:\Users\ctw31\anaconda3\envs\GNN1\lib\site-packages\torch\utils\data\dataloader.py", line 1042, in init
w.start()
File "C:\Users\ctw31\anaconda3\envs\GNN1\lib\multiprocessing\process.py", line 121, in start
self._popen = self._Popen(self)
File "C:\Users\ctw31\anaconda3\envs\GNN1\lib\multiprocessing\context.py", line 224, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "C:\Users\ctw31\anaconda3\envs\GNN1\lib\multiprocessing\context.py", line 327, in _Popen
return Popen(process_obj)
File "C:\Users\ctw31\anaconda3\envs\GNN1\lib\multiprocessing\popen_spawn_win32.py", line 93, in init
reduction.dump(process_obj, to_child)
File "C:\Users\ctw31\anaconda3\envs\GNN1\lib\multiprocessing\reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
TypeError: cannot pickle 'Environment' object

(GNN1) D:\PaperCode\MolDiff>Traceback (most recent call last):
File "", line 1, in
File "C:\Users\ctw31\anaconda3\envs\GNN1\lib\multiprocessing\spawn.py", line 116, in spawn_main
exitcode = _main(fd, parent_sentinel)
File "C:\Users\ctw31\anaconda3\envs\GNN1\lib\multiprocessing\spawn.py", line 126, in _main
self = reduction.pickle.load(from_parent)
EOFError: Ran out of input`

@pengxingang
Copy link
Owner

Hello, it's hard to pinpoint the issue based on your description alone. Have you tested the functionality without any modifications to see if it works in its original state? Furthermore, have you attempted running the code without using the tqdm module? This might help in troubleshooting the problem.

@lajictw
Copy link
Author

lajictw commented Nov 16, 2023

Thanks for your reply! This issue was solved. I have no idea what triggered this error and how it was solved. But it is solved. Thanks for your reply!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants