-
Notifications
You must be signed in to change notification settings - Fork 258
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
memory leak #18
Comments
I notice this as well, huge memory leak that does not improve by invoking python's garbage collector |
Hi! I will look this up, but the TextRank algorithm is actually designed to summarise articles. I don't know if results with 20MB input can be trusted. |
Hi @fbarrios , |
Profiling with cProfile, when the issue occurs for me, execution seems to be stuck in graph.py:159:edge_weight(), with tons of calls. The relevant edge_weight call is in pagerank_weighted.py:50:build_adjacency_matrix(). My guess is that it is due to there being too many sentences and/or too many connections between sentences, and since the adjacency matrix runtime is O(n^2) relative to number of sentences, this can blow up if there's lots of sentences. A quick, dirty fix is to limit the number of sentences to some maximum in the keywords() method, though of course then you won't be analyzing the whole text anymore. Switching over to a sparse adjacency matrix and estimating likely candidates, as described on https://cran.r-project.org/web/packages/textrank/vignettes/textrank.html (In the MinHash section) might make it feasible for larger documents. But I don't know your codebase or the textrank algorithm well enough to implement it myself easily. |
when passing a long text, like 20MB long, the script will soon run out of the computer's memory. My machine has 16GB of RAM, it takes about 5 minutes to freeze.
The text was updated successfully, but these errors were encountered: