bugfixes #201

Ladicek · 2020-03-03T12:52:15Z

Resolves #197.
Resolves #198.
Resolves #199.

Previously, the fallback strategies used a fixed interception context for all fallback invocations. Specifically, it was the one from first invocation of the guarded method. This means that all fallback invocations were performed against the original instance and using the original parameters. That's obviously wrong. Each fallback needs to be performed against the "current" instance with "current" parameters. At the same time, we want to precompute as much as possible upfront, so that when the fallback is invoked, we don't do much more than forwarding the call. This commit achieves that by generalizing the `FallbackFunction` to no longer take just the causing `Throwable`, but a new abstraction called `FallbackContext`. This `FallbackContext` contains the causing `Throwable` and also the "current" `InvocationContext`. We also extend the `InvocationContext` to support carrying around arbitrary contextual data. When we create the `InvocationContext`, we store the "current" interception context into it. Later, when the `FallbackFunction` needs the interception context, it is easy to look up.

All asynchronous fault tolerance strategies now use a single pattern for handling asynchronous results: CompletableFuture<T> result = new CompletableFuture<>(); completionStage.whenComplete((value, exception) -> { ... if (exception == null) { ... result.complete(value); ... } else { ... result.completeExceptionally(exception); ... } ... }); return result; This is low-level, but easy to understand and conserves exceptions. Operators like `handle`, `thenApply` or `thenCompose` always wrap exceptions into `CompletionException`, which would have to be checked in other fault tolerance strategies, leading to more complex code.

No asynchronous strategy should ever offload tasks to an executor on its own, this should be managed elsewhere. For the purpose of unit tests of CompletionStageTimeout, which rely on this behavior, we temporarily add a flag to interrupt the "current" thread, but that should eventually be removed. In asynchronous setting, we can't know upfront which thread to interrupt, so we instead cancel the running task with interruption enabled. In addition to that, this commit also fixes how thread pools are disposed during shutdown. Previously, we only called `shutdown` on each thread pool, which lets currently executing tasks finish. This is bad for our current implementation of `CompletionStageBulkhead`, which lets too many tasks in and some of them immediately start waiting on a semaphore. Instead of `shutdown`, we now call `shutdownNow`, which interrupts running tasks. After that, we also `awaitTermination` of all executors for 1 second, so that at least all interrupts are delivered. The thread pools shutdown was previously triggered upon the `@BeforeDestroyed(ApplicationScoped.class)` event, but the Weld embedded Arquillian container doesn't fire it. Instead, we now use `@PreDestroyed`, which should be roughly the same thing. As a bonus, this is symmetrical to `@PostConstruct`, which is when the thread pools are set up.

Ladicek · 2020-03-03T13:03:40Z

Also CC @rhusar @mnovak1

This is best reviewed separately, one commit after the other.

I tried to run the WildFly tests by EAP QE and there's a couple of failures:

[ERROR] Failures: 
[ERROR]   MultipleFaultToleranceModuleEarTest.testDeployTwoModuleEarLogsWarning:110 Log does not contain warning that Hystrix was already initialized by first module
[ERROR]   MultipleJarDeploymentTest.testDeploySecondJarLogsHystrixNoConfigChangeWarning:81 Log does not contain WARNING Log: WFLYMPFTEXT0002: Hystrix was already configured! Skipping configuration from deployment - but it should.
[ERROR]   UndeployDeployTest.testDeployTwoWarsLogsWarning:252
[ERROR]   UndeployDeployTest.testNonMPFTDeploymentDoesNotBlockRestartHystrix:225 1 expectation failed.
Response body doesn't match expectation.
Expected: a string containing "Hello from @Timeout method, context = foobar"
  Actual: Fallback Hello, context = foobar

[ERROR]   UndeployDeployTest.testUndeployDeployChangesHystrixConfiguration:145 1 expectation failed.
Response body doesn't match expectation.
Expected: a string containing "Hello from @Timeout method, context = foobar"
  Actual: Fallback Hello, context = foobar

Just looking at the names, it seems these tests verify behavior of the old implementation that is no longer present, but it would be great if you guys could verify. It's possible these are genuine bugs.

Also if you could run the MP FT TCK with WildFly with this changeset, that would be great as well. I need to look into updating the Quarkus extension with this, because I had to do some changes in the CDI extension here :-)

Ladicek · 2020-03-03T13:07:52Z

In general, running as many tests as we can with this would be beneficial :-)

mnovak1 · 2020-03-03T13:18:34Z

Just looking at the names, it seems these tests verify behavior of the old implementation that is no longer present, but it would be great if you guys could verify. It's possible these are genuine bugs.

Correct, the other test failures can be ignored as they were relevant just for Hystrix. @tterem has updated branch of the tests for non-hystrix SM FT. @tterem could you link it here, please?

tterem · 2020-03-03T13:26:11Z

Hi @Ladicek,

here is my branch with updated tests:

https://github.com/tterem/eap-microprofile-test-suite/tree/1428

Tests that were depending on hystrix are removed there. You can try to run your fix against it. Meanwhile I will run it as well.

Ladicek · 2020-03-03T13:29:16Z

Cool, thanks.

tterem · 2020-03-03T14:56:00Z

I build your branch locally and I confirm that our MicroProfile testsuite is passing with these changes.

michalszynkiewicz · 2020-03-03T14:56:49Z

...main/java/io/smallrye/faulttolerance/core/circuit/breaker/CompletionStageCircuitBreaker.java

-                previousHalfOpenTime.addAndGet(now - halfOpenStart);
+            delegate.apply(target).whenComplete((value, error) -> {
+                if (error != null) {
+                    metricsRecorder.circuitBreakerFailed();


Maybe worth pulling out the three lines here to a separate method?

I'd prefer not. I tend to not create methods that can possibly only be called from a single place.

you have exactly the same lines in 145-147

Ah, true. That's in fact the same for all CompletionStage* strategies, and something I'd like to address in another way (later): the Invocation strategy would be, in the async flow, wrapped into an exception-handling strategy, which would catch exceptions thrown by the Invocation and propagate them to the CompletableFuture. No other CompletionStage* strategy would have to do that, anymore.

Ladicek · 2020-03-03T14:56:57Z

Very nice, thanks! I'm just doing the same, it took me a while :-)

rhusar · 2020-03-03T15:06:15Z

@tterem Great, thanks for the update! I was testing the integration and everything is passing there too (TCK+integ):

[INFO] Tests run: 407, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 406.126 s - in TestSuite

Ladicek · 2020-03-03T15:08:03Z

Awesomesauce!

(I just ran the QE test suite locally and it's passing for me too.)

rhusar · 2020-03-03T15:10:10Z

Are you sure you need the version bump? Nobody cares about internal/private API change.

michalszynkiewicz · 2020-03-03T15:11:00Z

...mentation/core/src/main/java/io/smallrye/faulttolerance/core/retry/CompletionStageRetry.java

-            int attempt,
-            RunningStopwatch stopwatch,
-            Throwable latestFailure) {
+    public CompletionStage<V> doRetry(InvocationContext<CompletionStage<V>> target, int attempt,


was it the auto-formatter? :(

Looks like unintended and unnecessary change.

But n.b. given the time pressure its moot to me.

This might have actually been me :-) The one line per param format is so inefficient with screen space :-)

rhusar · 2020-03-03T15:11:37Z

Moreover, @quarkusio will need this fixes anyway so the 4.0.x will be just not in use by anybody.

Ladicek · 2020-03-03T15:15:27Z

I'm not sure about the version bump, but I figured I'd bump a minor as it might affect integrations. Now thinking about it again, the internal API changes only required changing the FaultToleranceInterceptor, so they are fully contained and integrations don't need to know. I'll probably remove that version-bumping commit.

Ladicek · 2020-03-03T16:02:05Z

Removed the version-bumping commit.

rhusar

FWIW LGTM.

michalszynkiewicz · 2020-03-04T09:45:08Z

...tion/fault-tolerance/src/main/java/io/smallrye/faulttolerance/FaultToleranceInterceptor.java

-            InvocationContext invocationContext,
-            MetricsCollector collector,
-            InterceptionPoint point) throws Exception {
+    private <T> T syncFlow(FaultToleranceOperation operation, InvocationContext invocationContext,


this is much less readable than one arg per line

Ladicek added 5 commits March 3, 2020 13:50

clarify API stability note in the "core" module

fe39669

fix computing circuit breaker delay

918dccf

Ladicek added this to the 4.1.1 milestone Mar 3, 2020

Ladicek requested a review from michalszynkiewicz March 3, 2020 12:52

michalszynkiewicz reviewed Mar 3, 2020

View reviewed changes

Ladicek force-pushed the bugfixes branch from 263469c to 9c0d7e2 Compare March 3, 2020 16:01

rhusar approved these changes Mar 3, 2020

View reviewed changes

tterem approved these changes Mar 3, 2020

View reviewed changes

michalszynkiewicz reviewed Mar 4, 2020

View reviewed changes

michalszynkiewicz approved these changes Mar 4, 2020

View reviewed changes

Ladicek merged commit d0e706b into smallrye:master Mar 4, 2020

Ladicek deleted the bugfixes branch March 4, 2020 10:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bugfixes #201

bugfixes #201

Ladicek commented Mar 3, 2020

Ladicek commented Mar 3, 2020

Ladicek commented Mar 3, 2020

mnovak1 commented Mar 3, 2020

tterem commented Mar 3, 2020 •

edited

Loading

Ladicek commented Mar 3, 2020

tterem commented Mar 3, 2020

michalszynkiewicz Mar 3, 2020

Ladicek Mar 3, 2020

michalszynkiewicz Mar 3, 2020

Ladicek Mar 4, 2020

Ladicek Mar 4, 2020

michalszynkiewicz Mar 4, 2020

Ladicek commented Mar 3, 2020

rhusar commented Mar 3, 2020 •

edited

Loading

Ladicek commented Mar 3, 2020

rhusar commented Mar 3, 2020

michalszynkiewicz Mar 3, 2020

rhusar Mar 3, 2020

rhusar Mar 3, 2020 •

edited

Loading

Ladicek Mar 3, 2020

rhusar commented Mar 3, 2020

Ladicek commented Mar 3, 2020

Ladicek commented Mar 3, 2020

rhusar left a comment

michalszynkiewicz Mar 4, 2020

bugfixes #201

bugfixes #201

Conversation

Ladicek commented Mar 3, 2020

Ladicek commented Mar 3, 2020

Ladicek commented Mar 3, 2020

mnovak1 commented Mar 3, 2020

tterem commented Mar 3, 2020 • edited Loading

Ladicek commented Mar 3, 2020

tterem commented Mar 3, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Ladicek commented Mar 3, 2020

rhusar commented Mar 3, 2020 • edited Loading

Ladicek commented Mar 3, 2020

rhusar commented Mar 3, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rhusar Mar 3, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rhusar commented Mar 3, 2020

Ladicek commented Mar 3, 2020

Ladicek commented Mar 3, 2020

rhusar left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tterem commented Mar 3, 2020 •

edited

Loading

rhusar commented Mar 3, 2020 •

edited

Loading

rhusar Mar 3, 2020 •

edited

Loading