Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError: cannot convert float NaN to integer #2

Open
Lucieno opened this issue Aug 10, 2019 · 4 comments
Open

ValueError: cannot convert float NaN to integer #2

Lucieno opened this issue Aug 10, 2019 · 4 comments

Comments

@Lucieno
Copy link

Lucieno commented Aug 10, 2019

I am trying to run the example VGG16LP with dataset CIFAR10.
However, the following error occurs:
ValueError: cannot convert float NaN to integer
It would be greatly appreciated if anyone can help.
The details of the execution, error, and the environment are shown as following.

(swalp_cuda9) ➜  SWALP git:(master) ✗ seed=100                                      # Specify experiment seed.
bash exp/block_vgg_swa.sh CIFAR10 ${seed}     # SWALP training on VGG16 with Small-block BFP in CIFAR10

Checkpoint directory ./checkpoint/block-CIFAR10-VGG16LP/seed100
Tensorboard loggint at runs/block-CIFAR10-VGG16LP/seed100_08_10_14_16
Prepare data loaders:
Loading dataset CIFAR10 from .
Files already downloaded and verified
Files already downloaded and verified
Files already downloaded and verified
Prepare quantizers:
Block rounding, W:8, A:8, G:8, E:8, Acc:8
lr init: 0.05
swa start: 200.0 swa lr: 0.01
Model: VGG16LP
Prepare SWA training
Traceback (most recent call last):
  File "train.py", line 189, in <module>
    quantize_momentum=args.quantize_momentum)
  File "/home/user/git/SWALP/utils.py", line 48, in train_batch
    output = model(input_var)
  File "/home/user/anaconda2/envs/swalp_cuda9/lib/python3.6/site-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/user/git/SWALP/models/vgg_low.py", line 69, in forward
    x = self.features(x)
  File "/home/user/anaconda2/envs/swalp_cuda9/lib/python3.6/site-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/user/anaconda2/envs/swalp_cuda9/lib/python3.6/site-packages/torch/nn/modules/container.py", line 92, in forward
    input = module(input)
  File "/home/user/anaconda2/envs/swalp_cuda9/lib/python3.6/site-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/user/git/SWALP/models/quantizer.py", line 103, in forward
    self.small_block, self.block_dim)
  File "/home/user/git/SWALP/models/quantizer.py", line 76, in forward
    return block_quantize(x, forward_bits, self.mode, small_block=self.small_block, block_dim=self.block_dim)
  File "/home/user/git/SWALP/models/quantizer.py", line 42, in block_quantize
    max_exponent = math.floor(math.log2(max_entry))
ValueError: cannot convert float NaN to integer

I tried to print out max_entry right before it computes max_exponet:

max_entry: 2.3815226554870605
max_entry: 2.215369701385498
max_entry: 1.9265875816345215
max_entry: 1.8378633260726929
max_entry: 1.3576314449310303
max_entry: 1.2682085037231445
...
max_entry: 1.4677644968032837
max_entry: 1.256148099899292
max_entry: 1.4361257553100586
max_entry: 1.4105850458145142
max_entry: 0.8756170272827148
max_entry: 0.8310933113098145
max_entry: nan

Environment:

(swalp_cuda9) ➜  SWALP git:(master) conda list
# packages in environment at /home/user/anaconda2/envs/swalp_cuda9:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                        main
blas                      1.0                         mkl
ca-certificates           2019.5.15                     1
certifi                   2019.6.16                py36_1
cffi                      1.12.3           py36h2e261b9_0
cuda90                    1.0                  h6433d27_0    pytorch
cudatoolkit               10.0.130                      0
freetype                  2.9.1                h8a8886c_1
intel-openmp              2019.4                      243
jpeg                      9b                   h024ee3a_2
libedit                   3.1.20181209         hc058e9b_0
libffi                    3.2.1                hd88cf55_4
libgcc-ng                 9.1.0                hdf63c60_0
libgfortran-ng            7.3.0                hdf63c60_0
libpng                    1.6.37               hbc83047_0
libstdcxx-ng              9.1.0                hdf63c60_0
libtiff                   4.0.10               h2733197_2
mkl                       2019.4                      243
mkl_fft                   1.0.12           py36ha843d7b_0
mkl_random                1.0.2            py36hd81dba3_0
ncurses                   6.1                  he6710b0_1
ninja                     1.9.0            py36hfd86e86_0
numpy                     1.16.4           py36h7e9f1db_0
numpy-base                1.16.4           py36hde5b4d6_0
olefile                   0.46                     py36_0
openssl                   1.1.1c               h7b6447c_1
pillow                    6.1.0            py36h34e0f95_0
pip                       19.1.1                   py36_0
protobuf                  3.9.1                    pypi_0    pypi
pycparser                 2.19                     py36_0
python                    3.6.9                h265db76_0
pytorch                   1.2.0           py3.6_cuda10.0.130_cudnn7.6.2_0    pytorch
readline                  7.0                  h7b6447c_5
setuptools                41.0.1                   py36_0
six                       1.12.0                   pypi_0    pypi
sqlite                    3.29.0               h7b6447c_0
tabulate                  0.8.3                    pypi_0    pypi
tensorboardx              1.8                      pypi_0    pypi
tk                        8.6.8                hbc83047_0
torchvision               0.4.0                py36_cu100    pytorch
wheel                     0.33.4                   py36_0
xz                        5.2.4                h14c3975_4
zlib                      1.2.11               h7b6447c_3
zstd                      1.3.7                h0b5b093_0

OS: Ubuntu 18.04
My GPU: GeForce® GTX 1060 3GB

@jmluu
Copy link

jmluu commented Oct 15, 2019

@Lucieno @stevenygd I met the same problem after ten epochs, does it mean the blowup or overflow ?

heckpoint directory ./checkpoints/block-CIFAR10-PreResNet164LP/seed200
Tensorboard loggint at checkpoints/block-CIFAR10-PreResNet164LP/seed200_10_15_18_42
Prepare data loaders:
Loading dataset CIFAR10 from ../data/
Files already downloaded and verified
Files already downloaded and verified
Files already downloaded and verified
Prepare quantizers:
Block rounding, W:8, A:8, G:8, E:8, Acc:8
lr init: 0.1
swa start: 150.0 swa lr: 0.01
Model: PreResNet164LP
Prepare SWA training
----  --------  ---------  --------  ---------  --------  ------------  --------
  ep        lr    tr_loss    tr_acc    te_loss    te_acc  swa_te_acc        time
----  --------  ---------  --------  ---------  --------  ------------  --------
   1    0.1000     1.6607   37.8340     1.3099   52.2000                440.3035
   2    0.1000     1.1418   58.9820     1.0026   64.3400                438.9215
   3    0.1000     0.9461   66.0900     0.9486   67.0200                442.0348
   4    0.1000     0.8427   70.4760     0.7507   74.1500                443.9436
   5    0.1000     0.7321   74.4340     0.7454   74.2900                461.8002
   6    0.1000     0.6698   76.7520     0.6293   78.5400                508.3268
   7    0.1000     0.6212   78.5380     0.5772   80.2200                507.5976
   8    0.1000     0.5911   79.5060     0.5712   80.1700                509.4609
   9    0.1000     0.5713   80.1780     0.6028   79.0300                508.7512
  10    0.1000     0.5492   81.0900     0.5643   80.5200                508.5259
  11    0.1000     0.5317   81.5480     0.5733   80.5200                504.3482
Traceback (most recent call last):
  File "swa_cifar.py", line 189, in <module>
    quantize_momentum=args.quantize_momentum)
  File "/home/jmlu/Worksapce/ConvNet_Fxp_2.0/utils.py", line 48, in train_batch
    output = model(input_var)
  File "/home/jmlu/anaconda3/envs/ML/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/jmlu/Worksapce/ConvNet_Fxp_2.0/models/preresnet_low.py", line 152, in forward
    x = self.layer3(x)  # 8x8
  File "/home/jmlu/anaconda3/envs/ML/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/jmlu/anaconda3/envs/ML/lib/python3.6/site-packages/torch/nn/modules/container.py", line 92, in forward
    input = module(input)
  File "/home/jmlu/anaconda3/envs/ML/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/jmlu/Worksapce/ConvNet_Fxp_2.0/models/preresnet_low.py", line 91, in forward
    out = self.quant(out)
  File "/home/jmlu/anaconda3/envs/ML/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/jmlu/Worksapce/ConvNet_Fxp_2.0/models/quantizer.py", line 103, in forward
    self.small_block, self.block_dim)
  File "/home/jmlu/Worksapce/ConvNet_Fxp_2.0/models/quantizer.py", line 76, in forward
    return block_quantize(x, forward_bits, self.mode, small_block=self.small_block, block_dim=self.block_dim)
  File "/home/jmlu/Worksapce/ConvNet_Fxp_2.0/models/quantizer.py", line 42, in block_quantize
    max_exponent = math.floor(math.log2(max_entry + 1e-32))
ValueError: cannot convert float NaN to integer

@vmelement
Copy link

+1 to this problem. having the same issue when trying to run the code.

@Nader-Merai
Copy link

download the correct dependencies, they are stated in the readme
MAKE SURE to download pytorch 1.0.1 and not 1.0.0

@smsskil
Copy link

smsskil commented Dec 20, 2020

Hello,
So how do you solve this problem? Is there any method to bypass it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants