-
-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf: Fix performance issue when assigning previous version #78
Conversation
Assigning the previous version was checking commits multiple times, when ever a merge commit was encountered. This leads to severe performance issues on a commit graph with lots of merge commits. Instead of traversing the commit graph again, finding the previous version is now part of the commit to version grouping.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll trust you again on this one. What would be nice is that we add fuzzing to the test suite: a test that runs the code on a big temp repo, with hundreds of commits, dozens of branches, merge commits, etc., all randomly generated, just to see if the code finishes running in a small amount of time.
I added a "test" that generates profiling data with cProfile. It has no assertions, not sure if it should have any ... I did run the test on different versions of the code:
Note: the 2.4.0 version was not the "real" 2.4.0, but I just reverted the changes to build.py. And 2.4.1 was in reality the current Running the test with 20 merges was unbearable with current main (I did not wait until it finished and aborted it ...). My tests on a "real" git repo were successful with this PR applied. git-changelog was fast and it felt as if the response was instant. For reference, here the generated files: perf_stats_2.4.0_15.txt |
Amazing, thanks a lot! Since it's not really a test, I'd move this profiling code to a task, in duties.py. Let me do that tomorrow 🙂 |
Actually it's using Pytest fixtures so it could be hard to move that outside of a test run. So instead I think we can simply add a timeout to the test to make sure that it runs fast. WDYT? |
I rewrote it as a task. Lets merge it as is, and we can refine it later if needed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for your hard work! 🚀
I still encounter severe performance problems on a commit graph that has a lot of merge commits (each feature is developed in a feature branch; a version can have more than 20 merge commits - sorry for not testing this as part of #76).
The problematic code path is now in
build._assign_previous_versions
. It has the same issue that was fixed with #76 for grouping commits into versions.The commit graph is traversed through all possible branches. Branching/merge commits result in the same commit being checked multiple times.
With large commit graphs, I think, it is best to optimize it, to only follow the commit graph once and gather all information necessary. This PR will find the previous version information during the grouping of the commits.
I will do some more tests tomorrow, but would appreciate, if you could take a look at the proposed solution.