Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider adopting the Open Test Reporting format #13045

Open
marcphilipp opened this issue Dec 9, 2024 · 12 comments
Open

Consider adopting the Open Test Reporting format #13045

marcphilipp opened this issue Dec 9, 2024 · 12 comments
Labels
status: help wanted developers would like help from experts on this topic type: proposal proposal for a new feature, often to gather opinions or design the API around the new feature

Comments

@marcphilipp
Copy link

TL;DR The JUnit team has defined a new language-agnostic test reporting format and implemented a CLI tool for validation, conversion, and HTML report generation. We're reaching out to well-known testing frameworks and reporting tools to ask for feedback and, ultimately, adoption, if you think this format provides value to your users.

Motivation and Context

You've probably come across the "JUnit XML" format for test reporting. This format did not originate from the JUnit project but was initially introduced by the Ant build tool and then adopted by other Java build tools like Maven and Gradle. Many build servers know how to parse the XML-based format, and even non-Java tools often support it. However, it’s based on the concept of test classes and methods, so using it for frameworks and tools where those elements are not present is awkward at best. Moreover, it does not support nested structures beyond a simple parent-child relationship. Finally, it is not extensible: no additional attributes can be added without the risk of breaking existing tools.

For those reasons, many testing frameworks (for example, TestNG and Spock in the Java ecosystem) have defined their own reporting formats. This has given them the flexibility they need, but the number of tools that can parse, display, or transform their custom formats is very limited.

To overcome these limitations, the JUnit team is defining a new format for test reporting. Its goal is to be platform-agnostic so that as many testing frameworks as possible can benefit from it. Moreover, it is designed to be extensible so new data can be added as needed, without breaking consumers. However, all well-known attributes are properly defined so it remains consumable by downstream reporting tools.

Of course, it will take a while for downstream tools to support the new format. However, as the number of testing frameworks that have adopted it increases, the more likely downstream tools are to do so as well.

Overview

The new format is based on XML because it provides more expressive ways to define schemas. Moreover, XML has typed extensions built-in via the use of multiple schemas. If a testing framework provides a listener mechanism, it should be possible to write an Open Test Reporting XML file from an extension.

Benefits

  • Easy to write event-based XML format (with optional Java API)
    • Supports infrastructure information, Git metadata, tags, data/file attachments
    • Custom data can be added by defining an extension schema
  • Language-agnostic
    • No Java-specific elements in the core schema
    • Not tied to the concept of test classes and methods
    • Full support for nested structures
  • Extensible HTML report generator

Next Steps

The JUnit team would be happy to get your feedback on this initiative. We can discuss here or you're welcome to start a thread or open an issue in the Open Test Reporting repo. Should you consider adopting the new format, we'd be happy to provide guidance but we won't have the resources to actually contribute an implementation.


This is a bit of an unusual request so please forgive me for not sticking to the issue template.

@webknjaz
Copy link
Member

webknjaz commented Dec 9, 2024

Sounds reasonable to support. Not sure if we'd rush into it, though. One way could be starting this as a plugin with future adoption into core if it proves useful.

I remember facing xunit2 limitations (#7537) so having something new that might potentially addressing such things be useful.

@marcphilipp I haven't looked into the spec but have you considered custom test result statuses? Pytest has a few that I haven't seen anywhere else (xfail/xpass, for example) and some plugins extend that with more custom status names.

@marcphilipp
Copy link
Author

One way could be starting this as a plugin with future adoption into core if it proves useful.

That sounds like a good strategy! 👍

I haven't looked into the spec but have you considered custom test result statuses? Pytest has a few that I haven't seen anywhere else (xfail/xpass, for example) and some plugins extend that with more custom status names.

Currently, there's a predefined list: https://github.com/ota4j-team/open-test-reporting/blob/66b1f088b599eaecb982eb9a3ccaa15a29c1e3ec/schema/src/main/resources/org/opentest4j/reporting/schema/core-0.2.0.xsd#L93-L100

Can your statuses all be mapped to those? Additionally, you could define a custom pytest.xsd that includes the more detailed ones. That would look sth. like this:

<result status="SUCCESSFUL">
  <pytest:status="xpass"/>
</result>

@webknjaz
Copy link
Member

webknjaz commented Dec 9, 2024

Well, those statuses exist because the semantics is different and they should be represented separately. Like xpass means that a test is expected to fail because of some unfixed bug in the tested code but it didn't, which is unexpected. And in strict mode, it'd not evaluate to success but to a failure, when a test started passing suddenly. And xfail is the opposite — we expect that a test against broken code fails and it does so that doesn't fail the entire test session but shows that the expectations match.
With third-party plugins, though they could stick a random string into a status name. Some implement retries and report statuses as re-run, for example.
Mapping those to the list of known statuses is of course required internally to know if the test session failed but doing that in a structured test report would loose context. If an agnostic format was to be useful, this would need to be preserved with representation of arbitrary statuses defined and tools that later present the data in formats like HTTP actually displaying all this, not reducing the test runs to a smaller subset of potentially misleading statutes.

@The-Compiler
Copy link
Member

A few explanations around test statuses:

  • pytest core has "error" (as opposed to "failed") which means something went wrong during the setup phase (in a fixture), before the test function did run. I suppose that would map to your ABORTED?
  • It also has "xfail", as in "expected test failure". Imagine writing a test for a bug report, and then finding out that you can't fix the bug right now (say, it's in an upstream library). Instead of deleting the test or skipping it, you run it anyways, but mark the failure as expected. That could somewhat be mapped to either "skipped" or "passed", but is semantically somewhat different.
  • Then there is "xpass", a test which was marked as xfail but passed anyways. Depending on pytest configuration, that counts as a passed test with a sort of warning (yellow rather than green), or as a failure. Generalizing this a bit, I feel like there would at least need to be some sort of "warn" status maybe?
  • Plugins such as rerunfailures add their own test outcomes as well, such as "rerun" (a flaky test that failed, but the failure was ignored and we're doing a second run to find out if it's really failing or just flaky).

Those can all be mapped to success/skipped/aborted/failure, but it's a somewhat lossy operation as those all are semantically somewhat different to that list. I believe the topic came up in the past a few times, with the desire to have the semantics show up properly in "export" reporting formats as well.

@marcphilipp
Copy link
Author

  • pytest core has "error" (as opposed to "failed") which means something went wrong during the setup phase (in a fixture), before the test function did run. I suppose that would map to your ABORTED?

In JUnit that would be reported as a failure of the "container" of the test function (in our case usually a test class) but I sounds to me like ABORTED would work.

Those can all be mapped to success/skipped/aborted/failure, but it's a somewhat lossy operation as those all are semantically somewhat different to that list. I believe the topic came up in the past a few times, with the desire to have the semantics show up properly in "export" reporting formats as well.

Thanks for your explanations! I'm up for trying to model these more precisely. I don't want to make it completely generic because I'd like tools to be able to interpret.

Maybe by adding two additional ones like this?

  • successful
  • unexpectedly successful (for xpass)
  • skipped
  • aborted
  • failed
  • expectedly failed (for xfail)
  • Plugins such as rerunfailures add their own test outcomes as well, such as "rerun" (a flaky test that failed, but the failure was ignored and we're doing a second run to find out if it's really failing or just flaky).

Would those be reported as a separate test run or within the same run?

@RonnyPfannschmidt
Copy link
Member

Rerunfailures is within the same test run but it would trigger multiple reports

@eli-schwartz
Copy link

eli-schwartz commented Dec 10, 2024

@marcphilipp I haven't looked into the spec but have you considered custom test result statuses? Pytest has a few that I haven't seen anywhere else (xfail/xpass, for example) and some plugins extend that with more custom status names.

@webknjaz I am surprised that you've never seen these extremely common statuses before. :)

Automake supports them too: https://www.gnu.org/software/automake/manual/html_node/Generalities-about-Testing.html
(since 1999: https://git.savannah.gnu.org/cgit/automake.git/commit/?id=95e3bbed181ecb3497213ad01f07927e04737071)

TAP supports the underlying concept and calls it "TODO tests": https://testanything.org/tap-version-14-specification.html#todo-tests
(It doesn't mandate behavior for a reporter. A TODO test that fails maps to xfail, and cannot be reported as an error, but one that starts passing, mapping to the concept of xpass, isn't required to generate a reporter failure.)

https://mesonbuild.com/Unit-tests.html implements xpass and xfail as something we "copied from automake", including support for automake's testsuite harness handling of tests that exit 77 to indicate a unittest SKIP, and exit 99 to indicate a unittest ERROR.

Anyway, I agree these are extremely useful statuses that probably every testing framework should want to implement.

@ferdnyc
Copy link

ferdnyc commented Dec 14, 2024

Those can all be mapped to success/skipped/aborted/failure, but it's a somewhat lossy operation as those all are semantically somewhat different to that list.

They are, but since (as @webknjaz notes) there's the added wrinkle that those semantics can change at runtime (for example, based on whether or not strict mode is enabled), then for a common reporting format they probably should be mapped, and the mapping should change accordingly.

IOW, if strict mode transforms a passing xfail test into a failure, then pytest should probably report "FAIL" in the common format when strict is enabled. That way, tools that can parse that output get all of the information they need in a format they can understand. Yes, some nuance is lost along the way, but that's a not-unexpected tradeoff when conforming output to fit a shared standard.

(Edit: The other option would be for the reporting to have a separate severity field, to complement the test result field. IOW, an "xfail" test could report WARNING+unexpected success when it passes, and FATAL+unexpected success when strict mode is enabled.)

@Zac-HD Zac-HD added status: help wanted developers would like help from experts on this topic type: proposal proposal for a new feature, often to gather opinions or design the API around the new feature labels Dec 24, 2024
@webknjaz
Copy link
Member

@eli-schwartz thanks for educating me :) My experience is obviously skewed towards the Python world, which is likely why I haven't seen the prior art on xfail.. I also interpret these as a TDD thing + bug reproducers / acceptance tests that haven't yet been fixed.

I think I learned of this from pytest and @pganssle's lightning talks + posts shaped my understanding even more, helping me integrate them into my own set of best practices: https://blog.ganssle.io/articles/2021/11/pytest-xfail.html.

Also, let's link the issue you've started over in the other repo FTR: ota4j-team/open-test-reporting#224.

@marcphilipp it's important to bake all these common states into the standard because with that, their inclusion wouldn't require hacks from the tools that produce the reports. And the tools that consume / display / represent / render the pre-existing reports could decide how they can be shown.

One thing that concerns me is that the xpass has configurable influence on the test session outcome. In strict mode, xpass becomes a failure, while xfail remains a success at all times. I don't know if an XPASS (strict) needs to be represented as its own separate outcome, or this could be signalled in some other way.

Tools that represent the testing stats as tables, might display “failed”, “skipped”, “successful” and “total” numbers. And depending on how this is stored, those might not add up visually, confusing the human staring at said representation.

@eli-schwartz
Copy link

That does look like a pretty good overview of the concept, yeah. :) The explicit term "TODO tests" used by TAP conveys the same TDD idea, by the way (whereas Automake doesn't exactly spell it out, though "especially during early development stages" implies it a bit).

One thing that concerns me is that the xpass has configurable influence on the test session outcome. In strict mode, xpass becomes a failure, while xfail remains a success at all times. I don't know if an XPASS (strict) needs to be represented as its own separate outcome, or this could be signalled in some other way.

Is it relevant for pytest that "XPASS (strict)" is distinct from "XPASS" or does it suffice to let the software interpreting the report make its own decision about whether to treat XPASS strictly?

@marcphilipp
Copy link
Author

it's important to bake all these common states into the standard because with that, their inclusion wouldn't require hacks from the tools that produce the reports. And the tools that consume / display / represent / render the pre-existing reports could decide how they can be shown.

@webknjaz I'm following along here and I'm open to add statuses (in ota4j-team/open-test-reporting#224).

@webknjaz
Copy link
Member

Is it relevant for pytest that "XPASS (strict)" is distinct from "XPASS" or does it suffice to let the software interpreting the report make its own decision about whether to treat XPASS strictly?

@eli-schwartz that's what I'm not sure about. Technically, pytest would infer its own overall status based on that.
It might not be as important when something else shows tests in a table format. But then, it might mislead the users that there's no connection.
Perhaps, statuses should have an additional/secondary property. Like an impact factor or something...
It just occurred to me that in GitHub Actions, both steps and jobs have two properties — result and outcome. One is the actual status and the other is its interpretation in how it's treated in some conditionals (that continue on failure thing). It seems to me that this would be similar semantically.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: help wanted developers would like help from experts on this topic type: proposal proposal for a new feature, often to gather opinions or design the API around the new feature
Projects
None yet
Development

No branches or pull requests

7 participants