-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fairseq integration: h5py error when using multiple GPUs #3
Comments
When using multiple GPUs, fairseq uses pytorch's multiprocessing to speed up the batch creation and processing (i.e. I/O). The inter-process communication relies on pickle to exchange python objects. One of the objects exchanged is the dataset itself. This means that it must be possible to pickle and unpickle the dataset. However, if you use The solution to this is to instruct pickle not to serialize the This way, you need to implement a a
|
There is a related problem with reading the same HDF5 file from multiple threads within the same process. The issue is due to the compilation flags of the underlying HDF5 native library. You can see the details in this stackoverflow question. As Soumith Chintala advised here, the solution for this is to have PyTorch use separate processes instead of threads for its data loaders. For this, we should add these lines before anything is executed: import torch.multiprocessing as mp
mp.set_start_method('spawn') If the main module is not under your control and you cannot change it (e.g. fairseq), I suggest using Python's |
When integrating an
Hdf5RecordReader
in a custom implementation of a fairseq dataset, the following error pops up as soon as multiple GPUs are used:The same code runs perfectly fine when only one GPU is used.
The text was updated successfully, but these errors were encountered: