Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[tests] [unit] Fix the CLI test so it doesn't leave a bunch of temp files behind #832

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

eskultety
Copy link
Member

Maintainers will complete the following section

  • Commit messages are descriptive enough
  • Code coverage from testing does not decrease and new code is covered
  • Docs updated (if applicable)
  • Docs links in the code are still valid (if docs were updated)

Note: if the contribution is external (not from an organization member), the CI
pipeline will not run automatically. After verifying that the CI is safe to run:

Copy link
Contributor

@a-ovchinnikov a-ovchinnikov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two commits in the middle are awesome, two commits at the edges are questionable (please see my comments inline).

tests/unit/test_cli.py Outdated Show resolved Hide resolved
@@ -928,10 +928,6 @@ def test_a_user_sees_error_when_they_provide_a_non_cachi2_sbom_for_a_merge(
@pytest.mark.parametrize(
"sbom_files_to_merge",
[
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I strongly suggest not skipping simpler tests. The idea behind having a progression of complexity is to help with detecting a breaking point. If some update breaks two-SBOMs case then it will likely break three-SBOMs case. If, however, something breaks three-SBOMs, but keeps two-SBOMs intact then there is quite a bit of narrowing down. It usually does not make sense to go beyond three, that's why I have never added more cases here.

Copy link
Member Author

@eskultety eskultety Feb 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I strongly suggest not skipping simpler tests. The idea behind having a progression of complexity is to help with detecting a breaking point. If some update breaks two-SBOMs case then it will likely break three-SBOMs case

May be true in general, but common sense tells me otherwise in this particular case. If you look at the actual functional code:

sbom = sum(sboms_to_merge[1:], start=start_sbom)
sbom_json = sbom.model_dump_json(indent=2, by_alias=True, exclude_none=True)

We are really only doing a string concatentation of JSONs and calling into the JSON formatter exactly once. I don't think testing 2 and 3 SBOM merges brings us any value because of this. Again, might make sense elsewhere but I don't think it makes it here. Additionally, if you look at our test suite, there's always only going to be more tests, so we really need to be careful and pragmatic for the number of variants we test to not go out of hand so as to make the test suite lose its meaning - provide instant local feedback (ideally) fast! This also goes in line with a recently filed issue #803 which is a very reasonable ask and a very real case, the tests are still taking ~17minutes (give or take), well, do they need to? Or can we get the feedback faster and be more productive in the meantime?

So, if possible, I'd like to keep this particular change in, because I don't think the parameter matrix as it currently is defined is IMO justified well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you look at the actual functional code

This is exactly how tests get coupled to implementations. What is happening inside a unit is unit's busyness, a test writer should at least abstract that away.

I am sorry, I don't follow the IT argument: this is a unit test. UTs run for ~10 seconds on my machine, I'd say that is pretty fast. The full nox suit runs for about a minute which is a regression from tox where --parallel flag would make the entire test matrix finish in about 12 seconds.

I won't block on this one, but I still believe that small incremental tests (especially when each takes a tiny fraction of a second) are better than a lump of all features crammed into a single one. If tests are code then lumped tests are the equivalent of enterprise-grade god-class which is demonstrably bad.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about this in the background a bit and realized that smaller, more compact test cases followed by an expanded tests that build on the previous ones act as a form of induction proof steps for me.

@@ -935,79 +935,51 @@ def test_a_user_sees_error_when_they_provide_a_non_cachi2_sbom_for_a_merge(
],
],
)
def test_a_user_can_successfully_merge_several_cachi2_sboms(
def test_merge_several_cachi2_sboms(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't. It is totally fine to have duplications in test cases. Test cases are not code, test cases exist to state boring facts about program's units behavior. Test cases names are to tell a summary of what is going to happen in a test, to tell someone what to expect from this test. This summary falls apart when one has to go back and forth and assemble it from ids and test names. I insisted on not modifying pre-existing tests when SPDX was added and I insist on that now. Tests must not be touched unless those are non-valuable tests which do not actually test anything. Having multiple test cases gradually adding features to be tested could greatly help during refactoring.

Copy link
Member Author

@eskultety eskultety Feb 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll try to respond inline and let's see if we can meet somewhere opinion-wise.

It is totally fine to have duplications in test cases. Test cases are not code

I categorically disagree. Tests are still code (just not a production one) and any duplication in any code simply smells. The impact on tests is that any refactoring/fixes on the actual code will result in many identical changes to several tests which not only may (and by Murphy's law they will) impede with any other type of productive work, but more importantly one may forget to update ALL of the tests and reviewers might simply not catch it either.
Pytest is a widely used test framework and the developers did give us a powerful tool to avoid code duplication because I assume the market demanded it - parametrize - and so while I am a big proponent of code readability, one should always approach things with common sense, in this particular case, the test case are literally 3-4 lines of code calling into a subprocess, there really is no good reason why this could not be consolidated into a single test case IMO. When this ends up generating a bunch of conditions modifying the code during runtime to satisfy all the parametrize variations, that's a different story! Then again, one should use common sense and the contributor and the reviewer should find some common ground that works for both parties so as not to lead to completely unproductive work.

Test cases names are to tell a summary of what is going to happen in a test, to tell someone what to expect from this test.

Not objecting and frankly I wouldn't mind keeping the test names as they are because they are descriptive (maybe even too much), but unfortunately once you condense several tests into a single one, that action alone invalidates this premise.

This summary falls apart when one has to go back and forth and assemble it from ids and test names

How is this a problem? Consider the following:

FAILED tests/unit/test_cli.py::TestMergeSboms::test_a_user_sees_error_when_they_provide_a_non_json_file_for_a_merge[sbom_files_to_merge0-does not look like] - assert False
FAILED tests/unit/test_cli.py::TestMergeSboms::test_save_merged_sboms_to_file[merge_our_own_cyclonedx] - assert False
FAILED tests/unit/test_cli.py::TestMergeSboms::test_save_merged_sboms_to_file[merge_spdx_format] - assert False

I'd argue that you're not missing any important piece of information to allow you to debug the test. Quite the contrary, the fact that the original (the top one) test is defined with parametrize and yet doesn't provide an ID (a common antipattern in our old tests) makes it more confusing.

I insisted on not modifying pre-existing tests when SPDX was added and I insist on that now. Tests must not be touched unless those are non-valuable tests which do not actually test anything.

That's fair and I so I won't pursue that change now and will drop it. However, when the time comes and we need to update the tests because of a code change I will insist on making the change then and get rid of this trivial code duplication.

Having multiple test cases gradually adding features to be tested could greatly help during refactoring.

How is this in direct contrast with parametrizing tests?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, one more thing - using and not using parametrize at the same time is also confusing and very unusual IMO which was another motivator for the change in question :) .

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests are still code

Treating them as regular code (and not a very special kind) implies that they need tests. This is a bit paradoxical.

any refactoring ... will result in many identical changes to several tests

I'd say this is an indication of brittle tests which are tightly coupled to an implementation. Brittleness aside this usually indicates that the tests are, in fact, just a crude reimplementation of existing logic and exist solely to bump coverage numbers which is something to avoid. Tests which test through a contract, on the other hand, don't need to and must not to be changed during refactoring at all. Tests are supposed to verify external behavior, how this behavior is implemented is immaterial.

Pytest part reads a bit like an ad populum. While parametrization is an awesome tool for one-dimensional substitutions and should be used for that, I think multidimensional expansion is a tool that has to be used sparingly. Parametrization and overgeneralization lead to tests like those for gomod. It is a bit hard to tell what is going on in some of those because of the multitude of parameters supplied.

The other thing which I find problematic is that here we are effectively lumping together two distinct features into one test. Merging different types of SBOMs are different features and have been introduced separately in separate user stories, that's why I think it is worth it keeping their tests separate. The fact that they now share implementation is again an implementation detail and should not affect how tests are written.

when the time comes and we need to update the tests because of a code change

This means that our tests are tightly coupled to our implementation and that we are checking how we wrote some piece of code vs checking if it does what it is expected to do.

This all stems from the fact that we have not too many busyness rules on one hand and use a lot of external services on the other. This is somewhat similar to microservices which are usually heavily tested with integration tests and sporadically with unit and e2e. I believe we should revisit our approach to testing and more importantly to coverage calculation and interpretation.

This is definitely a bigger discussion than could (and should) fit into a cleanup PR thread.

tests/unit/test_cli.py Show resolved Hide resolved
There really is no difference to merging 2 or 3 SBOMs, so let's just
test the more complex case of merging 3 and reduce the matrix.

Signed-off-by: Erik Skultety <[email protected]>
Cosmetic change, but anyone ever needs to interactively debug this
it's going to be easier to identify the tmp files based on our tool
name, test domain and the test case which generated them.

Signed-off-by: Erik Skultety <[email protected]>
@@ -928,10 +928,6 @@ def test_a_user_sees_error_when_they_provide_a_non_cachi2_sbom_for_a_merge(
@pytest.mark.parametrize(
"sbom_files_to_merge",
[
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you look at the actual functional code

This is exactly how tests get coupled to implementations. What is happening inside a unit is unit's busyness, a test writer should at least abstract that away.

I am sorry, I don't follow the IT argument: this is a unit test. UTs run for ~10 seconds on my machine, I'd say that is pretty fast. The full nox suit runs for about a minute which is a regression from tox where --parallel flag would make the entire test matrix finish in about 12 seconds.

I won't block on this one, but I still believe that small incremental tests (especially when each takes a tiny fraction of a second) are better than a lump of all features crammed into a single one. If tests are code then lumped tests are the equivalent of enterprise-grade god-class which is demonstrably bad.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants