./download_models.sh and run_experiments.py: Torch invalid memory size - maybe an overflow? #47

blrtvs · 2021-10-01T19:06:57Z

Hi,

when I run ./download_models.sh., I get the following exception:

Building common vocab
Better speed can be achieved with apex installed from https://www.github.com/nvidia/apex.
Namespace(lm='transformerxl', transformerxl_model_dir='pre-trained_language_models/transformerxl/transfo-xl-wt103/')
Loading transformerxl model...
Loading Transformer XL model from pre-trained_language_models/transformerxl/transfo-xl-wt103/
Traceback (most recent call last):
  File "lama/vocab_intersection.py", line 158, in <module>
    main()
  File "lama/vocab_intersection.py", line 152, in main
    __vocab_intersection(CASED_MODELS, CASED_COMMON_VOCAB_FILENAME)
  File "lama/vocab_intersection.py", line 97, in __vocab_intersection
    model = build_model_by_name(args.lm, args)
  File "/LAMA/lama/modules/__init__.py", line 31, in build_model_by_name
    return MODEL_NAME_TO_CLASS[lm](args)
  File "/LAMA/lama/modules/transformerxl_connector.py", line 37, in __init__
    self.model = TransfoXLLMHeadModel.from_pretrained(model_name)
  File "/home/user123/anaconda3/envs/lama37/lib/python3.7/site-packages/pytorch_pretrained_bert/modeling_transfo_xl.py", line 939, in from_pretrained
    model = cls(config, *inputs, **kwargs)
  File "/home/user123/anaconda3/envs/lama37/lib/python3.7/site-packages/pytorch_pretrained_bert/modeling_transfo_xl.py", line 1312, in __init__
    self.transformer = TransfoXLModel(config)
  File "/home/user123/anaconda3/envs/lama37/lib/python3.7/site-packages/pytorch_pretrained_bert/modeling_transfo_xl.py", line 1033, in __init__
    div_val=config.div_val)
  File "/home/user123/anaconda3/envs/lama37/lib/python3.7/site-packages/pytorch_pretrained_bert/modeling_transfo_xl.py", line 780, in __init__
    self.emb_layers.append(nn.Embedding(r_idx-l_idx, d_emb_i))
  File "/home/user123/anaconda3/envs/lama37/lib/python3.7/site-packages/torch/nn/modules/sparse.py", line 100, in __init__
    self.weight = Parameter(torch.Tensor(num_embeddings, embedding_dim))
RuntimeError: $ Torch: invalid memory size -- maybe an overflow? at /pytorch/aten/src/TH/THGeneral.cpp:188

I tried different (newer) versions of torch, but that lead to the exact same dimension error that JXZe reports in Issue #32 :

      RuntimeError: Trying to create tensor with negative dimension -200001: [-200001, 16]

But in #32 there is no recommendation how to fix this dimension error.

All the packages from requirements.txt are installed correctly, but I have overrides==3.1.0 instead of overrides==6.1.0 as the import "from allennlp.modules.elmo import _ElmoBiLm" in elmo_connector.py didn't work, it worked only after changing to 3.1.0. I also tried to skip the building vocab-part and downloaded the provided common_vocab.txts from the README, but the same Torch: invalid memory size -- maybe an overflow?-error occurs when running run_experiments.py .

Does anybody have an idea how to fix this?

The text was updated successfully, but these errors were encountered:

blrtvs · 2021-10-06T12:47:44Z

I can solve this by updating pytorch-pretrained-bert to transformers but that leads to some import errors, for example with allennlp. So I updated also allennlp and that worked until one is trying to run the experiments. Using transformers instead of pytorch-pretrained-bert produces many exceptions in the code due to slightly different syntax and so on. So its really an overhead. If somebody knows how to get LAMA working with the old pytorch-pretrained-bert package, let me know. I even tried to change the cuda version, but still got the overflow error from above.

Kickboxin · 2021-11-17T04:10:44Z

Okey

Zjh-819 · 2021-11-17T11:19:09Z

Hi! @blrtvs
I got the solution:
The reason is that the configuration file for Transformer XL had been updated in Apr 2020. It's not conpatible with those packages in the requirements.txt. Replace the config.json in transformerxl/transfo-xl-wt103 with this one, then it might work.
https://huggingface.co/transfo-xl-wt103/raw/50554b1a7e440d988096dbdf0b3a0edc73470d3d/config.json

blrtvs · 2021-11-19T12:24:49Z

@Zjh-819 great! Thanks, I will try it. It would be awesome if it works :)

laurinpaech · 2022-01-31T11:58:41Z

Hi! @blrtvs
I got the solution:
The reason is that the configuration file for Transformer XL had been updated in Apr 2020. It's not conpatible with those packages in the requirements.txt. Replace the config.json in transformerxl/transfo-xl-wt103 with this one, then it might work.
https://huggingface.co/transfo-xl-wt103/raw/50554b1a7e440d988096dbdf0b3a0edc73470d3d/config.json

Worked for me. Good job!

Kickboxin mentioned this issue Nov 17, 2021

Hi, #48

Open

Hannibal046 mentioned this issue May 19, 2022

import allennlp.modules.highway fails #53

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

./download_models.sh and run_experiments.py: Torch invalid memory size - maybe an overflow? #47

./download_models.sh and run_experiments.py: Torch invalid memory size - maybe an overflow? #47

blrtvs commented Oct 1, 2021 •

edited

Loading

blrtvs commented Oct 6, 2021

Kickboxin commented Nov 17, 2021

Zjh-819 commented Nov 17, 2021

blrtvs commented Nov 19, 2021

laurinpaech commented Jan 31, 2022

./download_models.sh and run_experiments.py: Torch invalid memory size - maybe an overflow? #47

./download_models.sh and run_experiments.py: Torch invalid memory size - maybe an overflow? #47

Comments

blrtvs commented Oct 1, 2021 • edited Loading

blrtvs commented Oct 6, 2021

Kickboxin commented Nov 17, 2021

Zjh-819 commented Nov 17, 2021

blrtvs commented Nov 19, 2021

laurinpaech commented Jan 31, 2022

blrtvs commented Oct 1, 2021 •

edited

Loading