Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gpu out of memory #98

Open
luoyangen opened this issue May 16, 2016 · 8 comments
Open

gpu out of memory #98

luoyangen opened this issue May 16, 2016 · 8 comments

Comments

@luoyangen
Copy link

I just ran the example NMT, the data used was automatic downloaded by prepare_data.py
while using gpu(GTX 850M, 2G gpu mem),
only after 5 iterations, then ran out of memory.
So i'm wandering, how many gpu mem we should have to train the model,
or did I have some mistakes.

@nouiz
Copy link

nouiz commented May 17, 2016

What nvidia-smi tell you? Maybe the GUI take too much ram. From memory 2G
was enough for the examples.

On Mon, May 16, 2016 at 4:36 AM, luoyangen [email protected] wrote:

I just ran the example NMT, the data used was automatic downloaded by
prepare_data.py
while using gpu(GTX 850M, 2G gpu mem),
only after 5 iterations, then ran out of memory.
So i'm wandering, how many gpu mem we should have to train the model,
or did I have some mistakes.


You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub
#98

@orhanf
Copy link
Contributor

orhanf commented May 17, 2016

2G might not be enough for the NMT example when vocabulary size is large.
@luoyangen can you try with smaller vocabulary, batch size and number of hidden units ?

@nouiz
Copy link

nouiz commented May 17, 2016

Sorry, I taught this comment where on another repository!

I don't know about the size requirement of this examples.

On Tue, May 17, 2016 at 2:43 PM, Orhan Firat [email protected]
wrote:

2G might not be enough for the NMT example when vocabulary size is large.
@luoyangen https://github.com/luoyangen can you try with smaller
vocabulary, batch size and number of hidden units ?


You are receiving this because you commented.
Reply to this email directly or view it on GitHub
#98 (comment)

@luoyangen
Copy link
Author

Yeah, the reason is memory not enough, After I decreased the hidden layer of both encoder and decoder to 800, and the word embedding to 500, It won't throw that error.
Thanks, @nouiz @orhanf

@andybug912
Copy link

@luoyangen hi, could you tell me how to run the code on GPU? I tried THEANO_FLAGS=device=gpu,floatX=float32 python __main__.py
kept receiving error

TypeError: ('Bad input argument to theano function with name "/nfs/disk/work/users/zhangandy/.local/lib/python2.7/site-packages/blocks/algorithms/init.py:254" at index 0(0-based)', 'TensorType(float32, matrix) cannot store a value of dtype float64 without risking loss of precision. If you do not mind this loss, you can: 1) explicitly cast your data to float32, or 2) set "allow_input_downcast=True" when calling "function".

then I am not sure what should I do next to fix this problem

thanks!

@nouiz
Copy link

nouiz commented Nov 4, 2016

Your dataset is in float64. Make sure to cast it to floatX.

On Fri, Nov 4, 2016 at 4:39 AM, andybug912 [email protected] wrote:

@luoyangen https://github.com/luoyangen hi, could you tell me how to
run the code on GPU? I tried THEANO_FLAGS=device=gpu,floatX=float32
python main.py
kept receiving error

TypeError: ('Bad input argument to theano function with name
"/nfs/disk/work/users/zhangandy/.local/lib/python2.7/site-packages/blocks/
algorithms/init.py:254" at index 0(0-based)', 'TensorType(float32,
matrix) cannot store a value of dtype float64 without risking loss of
precision. If you do not mind this loss, you can: 1) explicitly cast your
data to float32, or 2) set "allow_input_downcast=True" when calling
"function".

then I am not sure what should I do next to fix this problem

thanks!


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#98 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AALC-zdx9R07sfBi-ua1BkBTJaDmsrADks5q6u9OgaJpZM4IfIu3
.

@andybug912
Copy link

sorry for replying so late

could you explain more in detail? I'm new both in python and theano


From: Frédéric Bastien [email protected]
Sent: Friday, November 4, 2016 2:07 PM
To: mila-udem/blocks-examples
Cc: andybug912; Comment
Subject: Re: [mila-udem/blocks-examples] gpu out of memory (#98)

Your dataset is in float64. Make sure to cast it to floatX.

On Fri, Nov 4, 2016 at 4:39 AM, andybug912 [email protected] wrote:

@luoyangen https://github.com/luoyangen hi, could you tell me how to
run the code on GPU? I tried THEANO_FLAGS=device=gpu,floatX=float32
python main.py
kept receiving error

TypeError: ('Bad input argument to theano function with name
"/nfs/disk/work/users/zhangandy/.local/lib/python2.7/site-packages/blocks/
algorithms/init.py:254" at index 0(0-based)', 'TensorType(float32,
matrix) cannot store a value of dtype float64 without risking loss of
precision. If you do not mind this loss, you can: 1) explicitly cast your
data to float32, or 2) set "allow_input_downcast=True" when calling
"function".

then I am not sure what should I do next to fix this problem

thanks!

You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#98 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AALC-zdx9R07sfBi-ua1BkBTJaDmsrADks5q6u9OgaJpZM4IfIu3
.

You are receiving this because you commented.
Reply to this email directly, view it on GitHubhttps://github.com//issues/98#issuecomment-258440503, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AVJHDiFNG4vRHUySGhdTpehEibuuUivfks5q6zw3gaJpZM4IfIu3.

@andybug912
Copy link

@nouiz I added
masked_stream = Cast(data_stream=masked_stream, dtype='floatX')
dev_stream = Cast(data_stream=dev_stream, dtype='floatX')

just before functions return masked_stream and dev_stream in stream.py, and received error

Traceback (most recent call last):
File "main.py", line 41, in
get_dev_stream(**configuration), args.bokeh)
File "/work3/zhangandy/machine_translation/init.py", line 145, in main
every_n_batches=config['bleu_val_freq']))
File "/work3/zhangandy/machine_translation/sampling.py", line 136, in init
self.vocab = data_stream.dataset.dictionary
AttributeError: 'Cast' object has no attribute 'dataset'

so I think I may cast the data in a wrong way. Could you tell me how to fix it to run on gpu?
Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants