Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with get_data() #29

Open
meaningfromdata opened this issue Nov 13, 2019 · 2 comments
Open

Problem with get_data() #29

meaningfromdata opened this issue Nov 13, 2019 · 2 comments
Assignees
Labels

Comments

@meaningfromdata
Copy link

meaningfromdata commented Nov 13, 2019

I'm trying to work through the CNN code on p. 232 of NLPIA and the get_data() function is getting hung up. The pip install of nlpia seemed to be fine.

Here's the offending line (changing limit setting doesn't seem to change anything, I have gone as low a 5000):
word_vectors = get_data('w2v', limit=50000)

I also see this output the first time I run it:
2019-11-13 14:09:23,227 WARNING:nlpia.constants:107: Starting logger in nlpia.constants...

I'm running Ubuntu 16.04 and using the Spyder IDE. Any suggestions?

@hobson hobson self-assigned this Dec 1, 2019
@hobson hobson added the question label Dec 1, 2019
@hobson
Copy link
Contributor

hobson commented Dec 1, 2019

The warning is not a bug, just a bit too verbose. We've gotten rid of it in the latest release.
Unfortunately the word2vec file format provided by Google is compressed in a way that cannot be limited for the download. So the "hangup" may be in the download from dropbox where we stored the w2v file. You'll need a machine with enough disk space and internet bandwidth to download the entire file. The limit arg will only reduce the amount of RAM consumed. And it's implemented within the gensim "KeyedVector" class where we just pass it through, so we can't control how it works and whether it effectively limits the amount of RAM consumed within the gensim code. You may have to get a machine with more RAM in order to experiment with CNNs and NLP.

@hobson
Copy link
Contributor

hobson commented Dec 1, 2019

If you use Anaconda you will be able to install nlpia in a python 3.6 environment. It has not been tested on python 3.7 and this may be why it is hanging up on you. In python 3.7 the re package seems to have a problem with the regular expressions we use to change the filenames during decompression. I'll check it and make sure there's not a bug in get_data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants