You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In step 2, we parse the resulting data, munge it, and output markdown, statistics, etc.
However, the functionality in step 1 is kind-of hacky and messy, and hard to reason with.
I recently came across a tool recommended by @simonw , which essentially replicates all of this functionality but with a more well-structured and maintainer implementation:
This is a python library that will grab all of the issues, pull requests, and comments (among other things) from a repository and store them in a local sqlite database so that you can do what you want with them. They are structured to be able to work with datasette as well (though we may not have use for that in this package, just FYI).
Two questions that I have and I'm not sure the answer:
How to speed it up. I'm not sure whether github-to-sqlite does any cacheing or allows you to filter by date. If not, then it might take quite a long time to run this interactively.
How to run via a Python API. All the examples use a CLI, and while this is probably fine it would be nice if we could grab / update datasets by running this as part of other scripts.
Proposal
What do folks think about re-using github-to-sqlite for our "grab all of the activity in a repository" step, and focusing this repository on the munging / filtering by date / calculating statistics / generating markdown aspects?
I think this might be a nice way to reduce some unnecessary complexity here and to re-use code from others in the ecosystem. I also like the idea of becoming familiar with datasette structures as is opens the possibility that we could expose this kind of data in the future for others in the community to munge and use.
At this point I'm just exploring the idea and curious what others think!
Tasks and updates
No response
The text was updated successfully, but these errors were encountered:
Context
This tool is basically a two-step process.
github-activity/github_activity/github_activity.py
Lines 64 to 95 in c149ba0
However, the functionality in step 1 is kind-of hacky and messy, and hard to reason with.
I recently came across a tool recommended by @simonw , which essentially replicates all of this functionality but with a more well-structured and maintainer implementation:
This is a python library that will grab all of the issues, pull requests, and comments (among other things) from a repository and store them in a local sqlite database so that you can do what you want with them. They are structured to be able to work with datasette as well (though we may not have use for that in this package, just FYI).
Two questions that I have and I'm not sure the answer:
Proposal
What do folks think about re-using
github-to-sqlite
for our "grab all of the activity in a repository" step, and focusing this repository on the munging / filtering by date / calculating statistics / generating markdown aspects?I think this might be a nice way to reduce some unnecessary complexity here and to re-use code from others in the ecosystem. I also like the idea of becoming familiar with datasette structures as is opens the possibility that we could expose this kind of data in the future for others in the community to munge and use.
At this point I'm just exploring the idea and curious what others think!
Tasks and updates
No response
The text was updated successfully, but these errors were encountered: