Use StackOverflow API to get the data here: https://data.stackexchange.com/stackoverflow/query/new
write the following query
SELECT title,concat('https://stackoverflow.com/questions/',id), tags, score, creationDate From Posts where body like '%<code>%' and tags like '%dataframe%' and tags like '%python%' and score > 9 and (body like '%error%' or body like '%bug%' or body like '%not work%' or body like '%fail%' or body like '%performance%' or body like '%expect%' or body like '%crash%' or body like '%incorrect%')
- Run the python script to get the json file of the GitHub repositories for a specific package from 2009 to 2020:
- Run the python script:
python get_GH.py
- Run the python script:
- Then, for each repository, get all the commits that have word "fix":
- Run the python script:
python get_commits.py
- Run the python script:
GitHub and StackOverflow datasets are in the R and Python folder separately.