Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix "skipped files" count calculation #10141

Merged
merged 4 commits into from
Jan 11, 2025

Conversation

purajit
Copy link
Contributor

@purajit purajit commented Dec 15, 2024

Type of Changes

Type
βœ“ πŸ› Bug fix

Description

Closes #10073. This bug has existed ever since the logic was introduced in #9122. This was
the initial issue that inspired that PR, so we can see that the "skipped" count should obviously
have nothing to do with modules without docstrings.

Also enables more reports/output about skipped files in the future, which has been asked for.
I haven't implemented that since it would probably be a larger discussion as to how verbose
--verbose should be.

Testing

Fleshed out the test that was written when this was introduced that didn't actually test anything.

Ran this with several configs, including overlapping paths and modules. Additionally ran it on
our biggest repo with multiple packages, vendored code, etc, and got the expected results:

Before: Checked 10xxx files, skipped 9391 files (alarming!)
After: Checked 10xxx files, skipped 75 files/modules (whew, ok)

@purajit purajit requested a review from DanielNoord as a code owner December 15, 2024 00:48
@purajit purajit marked this pull request as draft December 15, 2024 00:48
@purajit purajit changed the title fix skipped count calculation [WIP] fix skipped count calculation Dec 15, 2024
@purajit purajit force-pushed the 20241214-skipped-files-count branch 4 times, most recently from feeb8f5 to b29b3b6 Compare December 15, 2024 02:25
@purajit purajit marked this pull request as ready for review December 15, 2024 02:27
@purajit purajit changed the title [WIP] fix skipped count calculation fix skipped count calculation Dec 15, 2024
@purajit purajit changed the title fix skipped count calculation fix "skipped files" count calculation Dec 15, 2024
pylint/lint/pylinter.py Outdated Show resolved Hide resolved
@@ -51,6 +51,7 @@ class ModuleDescriptionDict(TypedDict):
isarg: bool
basepath: str
basename: str
isignored: bool
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was initially hoping to have all ignored files reporting go through this, but that
seems like it would be a larger refactor and out of scope.

Copy link
Member

@Pierre-Sassoulas Pierre-Sassoulas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you this look great. Agree that there is a large refactor to do it we want to have a verbose mode that make sense (display what will be skipped or not before the analysis and not after). I don't want to block bug fixes waiting for a refactor I don't have any bandwith for however.

This comment has been minimized.

@purajit
Copy link
Contributor Author

purajit commented Dec 15, 2024

Resolved the tests.

I've decided not to walk the tree to find the precise number of skipped files since it can get
complicated (it can include virtualenvs and such; it also adds complexity to the function). I can
do that next in a separate PR; it just requires more thinking and can't be a plain os.walk.
So I've gone with showing the number of skipped files or modules, for which we can make a better
guarantee.

Some of the failed tests as well as the ^ message above are both seemingly related to the os.walk
changes that are now removed.

@purajit
Copy link
Contributor Author

purajit commented Dec 15, 2024

Ah, I think astroid's get_module_files would be the way to go to get the actual set of skipped files from the paths. I can work on that separately - let me know if you want it in this PR or not.

Copy link
Collaborator

@DanielNoord DanielNoord left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the fix. I have one comment about how we store the result (also to not have to worry about storing too many unneeded str in memory).

Tests themselves already look good so it is just about a little refactoring of the code :)

@@ -851,6 +852,7 @@ def _get_file_descr_from_stdin(self, filepath: str) -> Iterator[FileItem]:
self.config.ignore_patterns,
self.config.ignore_paths,
):
self.skipped_paths.add(filepath)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we instead store this as a an attribute on LinterStats? Preferably just as an int as we never actually need the file name. That would centralise all statistics counting.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had that originally! I had two concerns that made me switch

  • the ignore logic happens at a few different layers. From the way it's structured, there should be no overlap, but I was weary of double-counting
  • in the next step (or potentially in this PR itself, depending on how you feel) I'd also like to use get_module_files to get an accurate file count rather than a β€œfile or module" count; it would also enable pylint to report precisely what got skipped, though I guess that could be reported as we iterate and ignore them

If you're cool with the first, I can switch it back and leave the second point to a future discussion.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think the first is fine. If it turns out we are double counting we would have an actual reproducer and know what the bug is. Let's not optimize for something that might not happen.

Just doing a quick confirmation with @Pierre-Sassoulas that we are fine with doing an initial PR where we just store this on LinterStats. I don't want to you to change the code and then have another maintainer request the exact opposite.

As for the second change, I agree: let's do that in a follow up.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, was waiting on Pierre's response, then got busy - pushed an update; ran all tests locally and reconfirmed that everything worked.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for missing that. We can do it in linterstat I don't want to block bug fixes for a refactor I don't have bandwith for. (But this is going to be nuked if/when we refactor).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I've pushed the linterstat version!

This comment has been minimized.

@jacobtylerwalls jacobtylerwalls modified the milestones: 3.3.3, 3.3.4 Dec 21, 2024
@purajit purajit force-pushed the 20241214-skipped-files-count branch from 54aeecd to f08024b Compare January 10, 2025 18:57
Copy link
Member

@Pierre-Sassoulas Pierre-Sassoulas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for being persistent :) It'll need a changelog there: https://github.com/pylint-dev/pylint/tree/main/doc/whatsnew/fragments

create a news fragment with towncrier create . which will be included in the changelog. can be one of the types defined in ./towncrier.toml.

https://pylint.readthedocs.io/en/stable/development_guide/contributor_guide/contribute.html#creating-a-pull-request

Copy link

codecov bot commented Jan 11, 2025

Codecov Report

All modified and coverable lines are covered by tests βœ…

Project coverage is 95.83%. Comparing base (14b242f) to head (d3cb8de).
Report is 1 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff           @@
##             main   #10141   +/-   ##
=======================================
  Coverage   95.83%   95.83%           
=======================================
  Files         174      174           
  Lines       18995    19002    +7     
=======================================
+ Hits        18204    18211    +7     
  Misses        791      791           
Files with missing lines Coverage Ξ”
pylint/lint/expand_modules.py 95.40% <100.00%> (+0.10%) ⬆️
pylint/lint/pylinter.py 96.66% <100.00%> (+0.01%) ⬆️
pylint/typing.py 100.00% <100.00%> (ΓΈ)
pylint/utils/linterstats.py 98.80% <100.00%> (+0.01%) ⬆️

This comment has been minimized.

@DanielNoord DanielNoord enabled auto-merge (squash) January 11, 2025 12:55
@DanielNoord DanielNoord merged commit d94194b into pylint-dev:main Jan 11, 2025
44 checks passed
github-actions bot pushed a commit that referenced this pull request Jan 11, 2025
Copy link
Contributor

πŸ€– According to the primer, this change has no effect on the checked open source code. πŸ€–πŸŽ‰

This comment was generated for commit d3cb8de

DanielNoord pushed a commit that referenced this pull request Jan 11, 2025
@purajit purajit deleted the 20241214-skipped-files-count branch January 11, 2025 19:56
@Pierre-Sassoulas
Copy link
Member

Congrat on becoming a pylint contributor ! We're going to release this in 3.3.4

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Verbose summary shows wrong number of skipped files
4 participants