-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[tests] [unit] Fix the CLI test so it doesn't leave a bunch of temp files behind #832
base: main
Are you sure you want to change the base?
Conversation
60a2b78
to
666f36d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two commits in the middle are awesome, two commits at the edges are questionable (please see my comments inline).
@@ -928,10 +928,6 @@ def test_a_user_sees_error_when_they_provide_a_non_cachi2_sbom_for_a_merge( | |||
@pytest.mark.parametrize( | |||
"sbom_files_to_merge", | |||
[ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I strongly suggest not skipping simpler tests. The idea behind having a progression of complexity is to help with detecting a breaking point. If some update breaks two-SBOMs case then it will likely break three-SBOMs case. If, however, something breaks three-SBOMs, but keeps two-SBOMs intact then there is quite a bit of narrowing down. It usually does not make sense to go beyond three, that's why I have never added more cases here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I strongly suggest not skipping simpler tests. The idea behind having a progression of complexity is to help with detecting a breaking point. If some update breaks two-SBOMs case then it will likely break three-SBOMs case
May be true in general, but common sense tells me otherwise in this particular case. If you look at the actual functional code:
cachi2/cachi2/interface/cli.py
Lines 459 to 460 in 2235854
sbom = sum(sboms_to_merge[1:], start=start_sbom) | |
sbom_json = sbom.model_dump_json(indent=2, by_alias=True, exclude_none=True) |
We are really only doing a string concatentation of JSONs and calling into the JSON formatter exactly once. I don't think testing 2 and 3 SBOM merges brings us any value because of this. Again, might make sense elsewhere but I don't think it makes it here. Additionally, if you look at our test suite, there's always only going to be more tests, so we really need to be careful and pragmatic for the number of variants we test to not go out of hand so as to make the test suite lose its meaning - provide instant local feedback (ideally) fast! This also goes in line with a recently filed issue #803 which is a very reasonable ask and a very real case, the tests are still taking ~17minutes (give or take), well, do they need to? Or can we get the feedback faster and be more productive in the meantime?
So, if possible, I'd like to keep this particular change in, because I don't think the parameter matrix as it currently is defined is IMO justified well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you look at the actual functional code
This is exactly how tests get coupled to implementations. What is happening inside a unit is unit's busyness, a test writer should at least abstract that away.
I am sorry, I don't follow the IT argument: this is a unit test. UTs run for ~10 seconds on my machine, I'd say that is pretty fast. The full nox suit runs for about a minute which is a regression from tox where --parallel flag would make the entire test matrix finish in about 12 seconds.
I won't block on this one, but I still believe that small incremental tests (especially when each takes a tiny fraction of a second) are better than a lump of all features crammed into a single one. If tests are code then lumped tests are the equivalent of enterprise-grade god-class which is demonstrably bad.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about this in the background a bit and realized that smaller, more compact test cases followed by an expanded tests that build on the previous ones act as a form of induction proof steps for me.
tests/unit/test_cli.py
Outdated
@@ -935,79 +935,51 @@ def test_a_user_sees_error_when_they_provide_a_non_cachi2_sbom_for_a_merge( | |||
], | |||
], | |||
) | |||
def test_a_user_can_successfully_merge_several_cachi2_sboms( | |||
def test_merge_several_cachi2_sboms( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please don't. It is totally fine to have duplications in test cases. Test cases are not code, test cases exist to state boring facts about program's units behavior. Test cases names are to tell a summary of what is going to happen in a test, to tell someone what to expect from this test. This summary falls apart when one has to go back and forth and assemble it from ids and test names. I insisted on not modifying pre-existing tests when SPDX was added and I insist on that now. Tests must not be touched unless those are non-valuable tests which do not actually test anything. Having multiple test cases gradually adding features to be tested could greatly help during refactoring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll try to respond inline and let's see if we can meet somewhere opinion-wise.
It is totally fine to have duplications in test cases. Test cases are not code
I categorically disagree. Tests are still code (just not a production one) and any duplication in any code simply smells. The impact on tests is that any refactoring/fixes on the actual code will result in many identical changes to several tests which not only may (and by Murphy's law they will) impede with any other type of productive work, but more importantly one may forget to update ALL of the tests and reviewers might simply not catch it either.
Pytest is a widely used test framework and the developers did give us a powerful tool to avoid code duplication because I assume the market demanded it - parametrize
- and so while I am a big proponent of code readability, one should always approach things with common sense, in this particular case, the test case are literally 3-4 lines of code calling into a subprocess, there really is no good reason why this could not be consolidated into a single test case IMO. When this ends up generating a bunch of conditions modifying the code during runtime to satisfy all the parametrize variations, that's a different story! Then again, one should use common sense and the contributor and the reviewer should find some common ground that works for both parties so as not to lead to completely unproductive work.
Test cases names are to tell a summary of what is going to happen in a test, to tell someone what to expect from this test.
Not objecting and frankly I wouldn't mind keeping the test names as they are because they are descriptive (maybe even too much), but unfortunately once you condense several tests into a single one, that action alone invalidates this premise.
This summary falls apart when one has to go back and forth and assemble it from ids and test names
How is this a problem? Consider the following:
FAILED tests/unit/test_cli.py::TestMergeSboms::test_a_user_sees_error_when_they_provide_a_non_json_file_for_a_merge[sbom_files_to_merge0-does not look like] - assert False
FAILED tests/unit/test_cli.py::TestMergeSboms::test_save_merged_sboms_to_file[merge_our_own_cyclonedx] - assert False
FAILED tests/unit/test_cli.py::TestMergeSboms::test_save_merged_sboms_to_file[merge_spdx_format] - assert False
I'd argue that you're not missing any important piece of information to allow you to debug the test. Quite the contrary, the fact that the original (the top one) test is defined with parametrize
and yet doesn't provide an ID (a common antipattern in our old tests) makes it more confusing.
I insisted on not modifying pre-existing tests when SPDX was added and I insist on that now. Tests must not be touched unless those are non-valuable tests which do not actually test anything.
That's fair and I so I won't pursue that change now and will drop it. However, when the time comes and we need to update the tests because of a code change I will insist on making the change then and get rid of this trivial code duplication.
Having multiple test cases gradually adding features to be tested could greatly help during refactoring.
How is this in direct contrast with parametrizing tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, one more thing - using and not using parametrize
at the same time is also confusing and very unusual IMO which was another motivator for the change in question :) .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tests are still code
Treating them as regular code (and not a very special kind) implies that they need tests. This is a bit paradoxical.
any refactoring ... will result in many identical changes to several tests
I'd say this is an indication of brittle tests which are tightly coupled to an implementation. Brittleness aside this usually indicates that the tests are, in fact, just a crude reimplementation of existing logic and exist solely to bump coverage numbers which is something to avoid. Tests which test through a contract, on the other hand, don't need to and must not to be changed during refactoring at all. Tests are supposed to verify external behavior, how this behavior is implemented is immaterial.
Pytest part reads a bit like an ad populum. While parametrization is an awesome tool for one-dimensional substitutions and should be used for that, I think multidimensional expansion is a tool that has to be used sparingly. Parametrization and overgeneralization lead to tests like those for gomod. It is a bit hard to tell what is going on in some of those because of the multitude of parameters supplied.
The other thing which I find problematic is that here we are effectively lumping together two distinct features into one test. Merging different types of SBOMs are different features and have been introduced separately in separate user stories, that's why I think it is worth it keeping their tests separate. The fact that they now share implementation is again an implementation detail and should not affect how tests are written.
when the time comes and we need to update the tests because of a code change
This means that our tests are tightly coupled to our implementation and that we are checking how we wrote some piece of code vs checking if it does what it is expected to do.
This all stems from the fact that we have not too many busyness rules on one hand and use a lot of external services on the other. This is somewhat similar to microservices which are usually heavily tested with integration tests and sporadically with unit and e2e. I believe we should revisit our approach to testing and more importantly to coverage calculation and interpretation.
This is definitely a bigger discussion than could (and should) fit into a cleanup PR thread.
There really is no difference to merging 2 or 3 SBOMs, so let's just test the more complex case of merging 3 and reduce the matrix. Signed-off-by: Erik Skultety <[email protected]>
Fixes: b4a413d Signed-off-by: Erik Skultety <[email protected]>
Cosmetic change, but anyone ever needs to interactively debug this it's going to be easier to identify the tmp files based on our tool name, test domain and the test case which generated them. Signed-off-by: Erik Skultety <[email protected]>
666f36d
to
1b1879e
Compare
@@ -928,10 +928,6 @@ def test_a_user_sees_error_when_they_provide_a_non_cachi2_sbom_for_a_merge( | |||
@pytest.mark.parametrize( | |||
"sbom_files_to_merge", | |||
[ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you look at the actual functional code
This is exactly how tests get coupled to implementations. What is happening inside a unit is unit's busyness, a test writer should at least abstract that away.
I am sorry, I don't follow the IT argument: this is a unit test. UTs run for ~10 seconds on my machine, I'd say that is pretty fast. The full nox suit runs for about a minute which is a regression from tox where --parallel flag would make the entire test matrix finish in about 12 seconds.
I won't block on this one, but I still believe that small incremental tests (especially when each takes a tiny fraction of a second) are better than a lump of all features crammed into a single one. If tests are code then lumped tests are the equivalent of enterprise-grade god-class which is demonstrably bad.
Maintainers will complete the following section
Note: if the contribution is external (not from an organization member), the CI
pipeline will not run automatically. After verifying that the CI is safe to run:
/ok-to-test
(as is the standard for Pipelines as Code)