-
Notifications
You must be signed in to change notification settings - Fork 279
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEATURE] Introduce commit queue on Jenkins (main
branch only) to proactively spot flaky tests
#4810
Comments
@peterzhuamazon FYI, would be great to have it implemented, ready to help as well |
Adding an initial change of switching gradle check to run every 2 hours and add concurrent runs from 20 to 30. Will need to monitor for tomorrow and see if it is stable. Thanks. |
Would need some inputs from @prudhvigodithi on how to link the metrics of gradle check failures to PRs with specific commit once merged. I think we can currently use the autocut issue created due to gradle check failure, and link them to specific PRs if certain failures pass a threshold. We can then gather these PRs information and display them into a dashboard. Also need some help from @reta on how to run specific tests if given an autocut issue, so that I can implement the new parameters in jenkins lib and allow gradle check workflow to run in full vs partial mode. If the partial mode is quick enough we can increase the run times. Thanks. |
Hey @reta and @peterzhuamazon
For this we have the Github issues created which are marked as Flaky tests (failing for a commit that was merged). Irrespective they failed upon retry or not, they are flagged as failures and issue is created. Also from
This is possible, taking the HEAD commit and run them periodically and find the tests that failed from
Coming to this, lets say upon periodic runs, found a flaky issue on a commit and once found the PR for that commit, still we cant flag that PR as offending as the test failure can be caused by some commit way back in history but now failed randomly (flaky) for this PR commit. There might be or might not be a co-relation for a failed test and with PR changes.
Ya once we found the right commit that has introduced the flaky test, then yes its easy to raise a revert PR or create an issue to revert the offending commit. From the issue title, all we want is find the flaky tests ? if so we already have the automation to find the flaky tests and create/update the issue. Are we looking for a mechanism to find the exact PR that introduced this flakiness ? Thank you |
I think from the baseline if we find a flaky test is linked to a specific PR, we can mark it for revert. Thanks. |
Also what we can think of is If a test fails once within the last X runs, we can mark it as unstable and flaky. A test is considered stable if it passes N times in a row. Then Ignoring flaky tests (which are marked as unstable and flaky) during gradle check helps reduce the impact on PR's, though it can affect reliability since the functionality covered by the flaky test remains untested while it's classified as flaky, we can mark this test as entry criteria for upcoming release, once the test is fixed then it should pass the release entry criteria and move forward with the release. With this mechanism we are forcing to fix the flaky tests and also for future what ever code is merged we need to make sure the commit passes N times in a row (after merged) and if failed once within the last X runs then plan for a revert. Having this we can ensure the |
Thanks @prudhvigodithi
This is correct, giving where we are now - the false positives are possible and it will take some time before we gain any confidence on the potential offending changes here.
The way it works - some tests are marked as "flaky" in Gradle build and will be retried by Gradle
"No" to first and "Yes" to second - we want to find the offender
This is what @andrross diligently does now: the tests are moved to from "flaky" to "unstable" manually, on ad-hoc fashion. I would prefer to keep it this way for now - otherwise we would be moving tests between buckets without any fixes in sight. |
I think one thing we can do now to quickly baseline in a hybrid fashion is:
At least when we have a baseline now, we can decide that any new flaky test introduced after the baseline has to be either blocked or reverted. And in the meantime we can gather resources to fix those flaky tests if possible. |
WDYT please let me know @reta @peterzhuamazon @andrross @dblock @getsaurabh02 |
Thanks @prudhvigodithi
Why not to collect them from GA issues upon scheduling the job? (once a day)
👍
💯
I think those would be naturally phased out by opensearch-project/OpenSearch#14475 (comment) if we construct the list every time we schedule the job. |
Thanks @reta, also we can do that or as well just run the same query on Metrics Cluster to get the list of tests to exclude 👍. |
main
branch only) to proactively spor flaky testsmain
branch only) to proactively spot flaky tests
@prudhvigodithi Did we decide on any solution, metrics cluster seems like a cleaner approach. If yes, can we close this issue? |
Is your feature request related to a problem? Please describe
The flaky tests are painful problem with no solution in sight (besides just fixing all of them). At least the idea we massaged along with @andrross and @dblock is how we could spot them proactively and revert the changes that introduce more of them.
Describe the solution you'd like
Here is what we would like to try out:
main
branch, do 10gradle check
runs (for now), if that takes too long - consider splitting them togradle internalClusterTest
&&gradle check -x internalClusterTest
Describe alternatives you've considered
N/A
Additional context
Discussed also today at the 2.15 retro call
The text was updated successfully, but these errors were encountered: