Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dependency Issues #4

Open
pepper-jk opened this issue Nov 1, 2021 · 1 comment
Open

Dependency Issues #4

pepper-jk opened this issue Nov 1, 2021 · 1 comment

Comments

@pepper-jk
Copy link

pepper-jk commented Nov 1, 2021

Hello,

I wanted to try out your code and came across an issue regarding pytorch dependencies.

I installed all the requirements in a fresh conda environment with python 3.7.11 via your requirements.txt.
I made sure the versions are at least the ones listed in the readme.

I installed openmpi via: conda install openmpi

However, it appears the module torchpack.mtpack.

I also tried to go back from torch==1.9.1 to torch==1.5, but no change.

Hope you can help me.
Thanks in advance.

$ python train.py
Extension horovod.torch has not been built: /home/pepper-jk/.conda/envs/deep_comp/lib/python3.7/site-packages/horovod/torch/mpi_lib/_mpi_lib.cpython-37m-x86_64-linux-gnu.so not found
If this is not expected, reinstall Horovod with HOROVOD_WITH_PYTORCH=1 to debug the build error.
Warning! MPI libs are missing, but python applications are still avaiable.
Traceback (most recent call last):
  File "train.py", line 15, in <module>
    from torchpack.mtpack.utils.config import Config, configs
ModuleNotFoundError: No module named 'torchpack.mtpack'

p.s. I will try this again tomorrow and update this issue if I find a solution.

@pepper-jk
Copy link
Author

I figured it out.

I had to init the submodule https://github.com/synxlin/mini-torchpack.

$ git submodule init
$ git submodule update

Also installing the additional requirement torchvision>=0.4 for the submodule and the missing requirement six.

$ pip install torchvision
$ pip install six

However, it still is not running.

There is some issue with horovod. It seems like openmpi needs to be installed first. I uninstalled horovod and reinstalled it with the suggested parameters below, but it still produces the same error.

Also got some pytorch version mix up. I think the submodule requires a cuda version from what I can tell. I'm on a machine without GPUs here though, so this might be a problem later.

I'll keep at it though and post my updates here.

python train.py --devices cpu  
Extension horovod.torch has not been built: /home/pepper-jk/.conda/envs/deep_comp/lib/python3.7/site-packages/horovod/torch/mpi_lib/_mpi_lib.cpython-37m-x86_64-linux-gnu.so not found
If this is not expected, reinstall Horovod with HOROVOD_WITH_PYTORCH=1 to debug the build error.
Warning! MPI libs are missing, but python applications are still avaiable.
Traceback (most recent call last):
  File "/home/pepper-jk/.conda/envs/deep_comp/lib/python3.7/site-packages/horovod/torch/mpi_ops.py", line 33, in <module>
    from horovod.torch import mpi_lib_v2 as mpi_lib
ImportError: cannot import name 'mpi_lib_v2' from 'horovod.torch' (/home/pepper-jk/.conda/envs/deep_comp/lib/python3.7/site-packages/horovod/torch/__init__.py)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "train.py", line 17, in <module>
    from dgc.horovod.optimizer import DistributedOptimizer
  File "/home/pepper-jk/code/deep-gradient-compression/dgc/horovod/__init__.py", line 2, in <module>
    from dgc.horovod.optimizer import DistributedOptimizer
  File "/home/pepper-jk/code/deep-gradient-compression/dgc/horovod/optimizer.py", line 24, in <module>
    from horovod.torch.mpi_ops import allreduce_async_
  File "/home/pepper-jk/.conda/envs/deep_comp/lib/python3.7/site-packages/horovod/torch/mpi_ops.py", line 35, in <module>
    check_installed_version('pytorch', torch.__version__, e)
  File "/home/pepper-jk/.conda/envs/deep_comp/lib/python3.7/site-packages/horovod/common/util.py", line 260, in check_installed_version
    raise HorovodVersionMismatchError(name, version, installed_version) from exception
horovod.common.exceptions.HorovodVersionMismatchError: Framework pytorch installed with version None but found version 1.10.0+cu102.
             This can result in unexpected behavior including runtime errors.
             Reinstall Horovod using `pip install --no-cache-dir` to build with the new version.

@pepper-jk pepper-jk changed the title Missing Module torchpack.mtpack Dependency Issues Nov 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant