-
-
Notifications
You must be signed in to change notification settings - Fork 311
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spider: Chicago Southwest Home Equity Commission I #676
Comments
Hi, just cloned the repo. I'm going to make a branch for this spider |
@pjsier looks like most info besides the date and meeting type == BOARD is contained in a pdf. Let's discuss how to proceed but since this is my first, I think I'll start work on a more straightforward one. |
@mattpair that approach makes sense to me, let me know if you need any help finding a clearer spider to work on but we have a good amount available |
I'll resume work on this one |
If this issue is unclaimed, I would like to work on that. |
@haidtang sure! In general we like to contributors to stick to one issue at a time, so I'll assign you to this one for now and not the other. Let me know if you'd like to switch that though |
Sure thing! |
Hey is this issue unclaimed? I'd be happy to work on this one if so. Also, it seems like it will involve reading pdfs; has that been done within this project before? |
It looks like it's been inactive for more than a month, so it's all yours if you're interested! |
Hi @pjsier, most of information for these meetings is included in PDFs of the minutes and agenda for each meeting. Are there any other spiders that have downloaded / parsed PDFs in this project already? Just working on my own computer, I'm able to parse the files using the package My question though is do you want to introduce that new package into the project? And is the fact that it needs to download files going to be a problem for running the spider on different computers? |
@egfrank thanks for checking that out! We're currently using PyPDF2, but we've used city-scrapers/city_scrapers/spiders/chi_human_relations.py Lines 56 to 68 in a6a0ea8
For now let's see if the parsing will work in PyPDF2, but if you run into more issues I think it would be fine to add Related to the Let me know if you run into any issues with this, and thanks again for doing that research! |
Oh awesome I don't know why I missed that in the codebase! Sweet okay I'll look at PyPDF2 and BytesIO. |
I finally looked back at this and opened up a new PR! Sorry about the delay - once my branch got out of date it was difficult to get the checks to pass and it ended being easier to start fresh. |
URL: https://swhomeequity.com/agenda-%26-minutes
Spider Name:
chi_southwest_home_equity_i
Agency Name: Chicago Southwest Home Equity Commission I
See the contribution guide for information on how to get started
The text was updated successfully, but these errors were encountered: