A personal repository to scrape and parse PhD projects from findaphd.com so that they can be sorted and accessed more easily.
- Clone the main branch of the project using
https://github.com/bitesizing/PhDScraper.git
- [OPTIONAL] create a virtual environment to run the project in
- Install
requirements.txt
file, e.g. usingpip install -r requirements.txt
- Open
main.py
- Manually fill in variables, e.g.
keywords
,subjects
, etc. - Change bools related to saving, depending on the output files you want.
- Run the file, which should generate the output data you need.
- PROCESSING OF OUTPUT FILES COMING SOON.