-
Notifications
You must be signed in to change notification settings - Fork 481
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use multiprocessing to parallely process PDF pages #20
Comments
Hi @vinayak-mehta , Even I thought of implementing this. dramatiq or celery are my suggestions for asynchronous processing of pages. |
I'm doing this with dask but it's chosen out of habit. |
Is there any improvement in there? I have a file that has only one page. The page has a table (25 rows x 13 columns). |
@selcukusta I think this is more about running multiple pages at the same time rather than speeding up the extraction of a single page. But does anyone have a solution for multiple pages in parallel? |
Yes!
Using multiprocessing, we should be able to distribute multiple pages on all cores, processing them in parallel. |
I get this though
Oh; what is the difference between https://github.com/atlanhq/camelot and https://github.com/camelot-dev/camelot ? Didn't notice two repos before now... |
Yeah, I know. Actually it's related with that but the issue was closed and referenced to it. |
Does anyone have an update? I've tried inheriting PageHandler and making pages multithreaded / multicore, and multi threading processing multiple pdfs, but I'm running into a ghostscript error (seems like it's not thread safe?) |
I did implement a multi-threading layer above camelot.read_pdf using multiprocessing library. |
@phoewass That would be awesome if you're still interested! |
can anyone tell me how to use multiprocess in camelot ? or did this issues still on progress ? |
Hi all. Sorry it took me a while to publish the PR while the code was already available. |
👀 |
Any update on this? My PDFs are 100s of pages and I can really use this feature. |
@phoewass @vinayak-mehta is this feature part of library now? If not, is there any way I can utilize multiprocessing to read multipage PDF? |
Any updates on this features? |
Hey! As camelot is dead, we try to build a maintained fork at There is a discussion about this in: |
We could try and use all cores present on the machine using multiprocessing. More ideas are welcome.
The text was updated successfully, but these errors were encountered: