Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

./download_models.sh and run_experiments.py: Torch invalid memory size - maybe an overflow? #47

Open
blrtvs opened this issue Oct 1, 2021 · 5 comments

Comments

@blrtvs
Copy link

blrtvs commented Oct 1, 2021

Hi,

when I run ./download_models.sh., I get the following exception:

Building common vocab
Better speed can be achieved with apex installed from https://www.github.com/nvidia/apex.
Namespace(lm='transformerxl', transformerxl_model_dir='pre-trained_language_models/transformerxl/transfo-xl-wt103/')
Loading transformerxl model...
Loading Transformer XL model from pre-trained_language_models/transformerxl/transfo-xl-wt103/
Traceback (most recent call last):
  File "lama/vocab_intersection.py", line 158, in <module>
    main()
  File "lama/vocab_intersection.py", line 152, in main
    __vocab_intersection(CASED_MODELS, CASED_COMMON_VOCAB_FILENAME)
  File "lama/vocab_intersection.py", line 97, in __vocab_intersection
    model = build_model_by_name(args.lm, args)
  File "/LAMA/lama/modules/__init__.py", line 31, in build_model_by_name
    return MODEL_NAME_TO_CLASS[lm](args)
  File "/LAMA/lama/modules/transformerxl_connector.py", line 37, in __init__
    self.model = TransfoXLLMHeadModel.from_pretrained(model_name)
  File "/home/user123/anaconda3/envs/lama37/lib/python3.7/site-packages/pytorch_pretrained_bert/modeling_transfo_xl.py", line 939, in from_pretrained
    model = cls(config, *inputs, **kwargs)
  File "/home/user123/anaconda3/envs/lama37/lib/python3.7/site-packages/pytorch_pretrained_bert/modeling_transfo_xl.py", line 1312, in __init__
    self.transformer = TransfoXLModel(config)
  File "/home/user123/anaconda3/envs/lama37/lib/python3.7/site-packages/pytorch_pretrained_bert/modeling_transfo_xl.py", line 1033, in __init__
    div_val=config.div_val)
  File "/home/user123/anaconda3/envs/lama37/lib/python3.7/site-packages/pytorch_pretrained_bert/modeling_transfo_xl.py", line 780, in __init__
    self.emb_layers.append(nn.Embedding(r_idx-l_idx, d_emb_i))
  File "/home/user123/anaconda3/envs/lama37/lib/python3.7/site-packages/torch/nn/modules/sparse.py", line 100, in __init__
    self.weight = Parameter(torch.Tensor(num_embeddings, embedding_dim))
RuntimeError: $ Torch: invalid memory size -- maybe an overflow? at /pytorch/aten/src/TH/THGeneral.cpp:188

I tried different (newer) versions of torch, but that lead to the exact same dimension error that JXZe reports in Issue #32 :

      RuntimeError: Trying to create tensor with negative dimension -200001: [-200001, 16]

But in #32 there is no recommendation how to fix this dimension error.

All the packages from requirements.txt are installed correctly, but I have overrides==3.1.0 instead of overrides==6.1.0 as the import "from allennlp.modules.elmo import _ElmoBiLm" in elmo_connector.py didn't work, it worked only after changing to 3.1.0. I also tried to skip the building vocab-part and downloaded the provided common_vocab.txts from the README, but the same Torch: invalid memory size -- maybe an overflow?-error occurs when running run_experiments.py .

Does anybody have an idea how to fix this?

@blrtvs
Copy link
Author

blrtvs commented Oct 6, 2021

I can solve this by updating pytorch-pretrained-bert to transformers but that leads to some import errors, for example with allennlp. So I updated also allennlp and that worked until one is trying to run the experiments. Using transformers instead of pytorch-pretrained-bert produces many exceptions in the code due to slightly different syntax and so on. So its really an overhead. If somebody knows how to get LAMA working with the old pytorch-pretrained-bert package, let me know. I even tried to change the cuda version, but still got the overflow error from above.

@Kickboxin
Copy link

Okey

@Kickboxin Kickboxin mentioned this issue Nov 17, 2021
@Zjh-819
Copy link

Zjh-819 commented Nov 17, 2021

Hi! @blrtvs
I got the solution:
The reason is that the configuration file for Transformer XL had been updated in Apr 2020. It's not conpatible with those packages in the requirements.txt. Replace the config.json in transformerxl/transfo-xl-wt103 with this one, then it might work.
https://huggingface.co/transfo-xl-wt103/raw/50554b1a7e440d988096dbdf0b3a0edc73470d3d/config.json

@blrtvs
Copy link
Author

blrtvs commented Nov 19, 2021

@Zjh-819 great! Thanks, I will try it. It would be awesome if it works :)

@laurinpaech
Copy link

Hi! @blrtvs
I got the solution:
The reason is that the configuration file for Transformer XL had been updated in Apr 2020. It's not conpatible with those packages in the requirements.txt. Replace the config.json in transformerxl/transfo-xl-wt103 with this one, then it might work.
https://huggingface.co/transfo-xl-wt103/raw/50554b1a7e440d988096dbdf0b3a0edc73470d3d/config.json

Worked for me. Good job!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants