Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bugfixes #201

Merged
merged 5 commits into from
Mar 4, 2020
Merged

bugfixes #201

merged 5 commits into from
Mar 4, 2020

Conversation

Ladicek
Copy link
Contributor

@Ladicek Ladicek commented Mar 3, 2020

Resolves #197.
Resolves #198.
Resolves #199.

Ladicek added 5 commits March 3, 2020 13:50
Previously, the fallback strategies used a fixed interception context
for all fallback invocations. Specifically, it was the one from first
invocation of the guarded method. This means that all fallback
invocations were performed against the original instance and using
the original parameters.

That's obviously wrong. Each fallback needs to be performed against
the "current" instance with "current" parameters. At the same time,
we want to precompute as much as possible upfront, so that when
the fallback is invoked, we don't do much more than forwarding
the call.

This commit achieves that by generalizing the `FallbackFunction`
to no longer take just the causing `Throwable`, but a new abstraction
called `FallbackContext`. This `FallbackContext` contains the causing
`Throwable` and also the "current" `InvocationContext`. We also
extend the `InvocationContext` to support carrying around arbitrary
contextual data.

When we create the `InvocationContext`, we store the "current"
interception context into it. Later, when the `FallbackFunction`
needs the interception context, it is easy to look up.
All asynchronous fault tolerance strategies now use a single pattern
for handling asynchronous results:

    CompletableFuture<T> result = new CompletableFuture<>();
    completionStage.whenComplete((value, exception) -> {
        ...
        if (exception == null) {
          ...
          result.complete(value);
          ...
        } else {
          ...
          result.completeExceptionally(exception);
          ...
        }
        ...
    });
    return result;

This is low-level, but easy to understand and conserves exceptions.
Operators like `handle`, `thenApply` or `thenCompose` always wrap
exceptions into `CompletionException`, which would have to be checked
in other fault tolerance strategies, leading to more complex code.
No asynchronous strategy should ever offload tasks to an executor
on its own, this should be managed elsewhere. For the purpose of
unit tests of CompletionStageTimeout, which rely on this behavior,
we temporarily add a flag to interrupt the "current" thread,
but that should eventually be removed. In asynchronous setting,
we can't know upfront which thread to interrupt, so we instead
cancel the running task with interruption enabled.

In addition to that, this commit also fixes how thread pools
are disposed during shutdown. Previously, we only called
`shutdown` on each thread pool, which lets currently executing
tasks finish. This is bad for our current implementation
of `CompletionStageBulkhead`, which lets too many tasks in and
some of them immediately start waiting on a semaphore.
Instead of `shutdown`, we now call `shutdownNow`, which interrupts
running tasks. After that, we also `awaitTermination` of all
executors for 1 second, so that at least all interrupts
are delivered.

The thread pools shutdown was previously triggered upon
the `@BeforeDestroyed(ApplicationScoped.class)` event, but
the Weld embedded Arquillian container doesn't fire it.
Instead, we now use `@PreDestroyed`, which should be roughly
the same thing. As a bonus, this is symmetrical
to `@PostConstruct`, which is when the thread pools are set up.
@Ladicek Ladicek added this to the 4.1.1 milestone Mar 3, 2020
@Ladicek
Copy link
Contributor Author

Ladicek commented Mar 3, 2020

Also CC @rhusar @mnovak1

This is best reviewed separately, one commit after the other.

I tried to run the WildFly tests by EAP QE and there's a couple of failures:

[ERROR] Failures: 
[ERROR]   MultipleFaultToleranceModuleEarTest.testDeployTwoModuleEarLogsWarning:110 Log does not contain warning that Hystrix was already initialized by first module
[ERROR]   MultipleJarDeploymentTest.testDeploySecondJarLogsHystrixNoConfigChangeWarning:81 Log does not contain WARNING Log: WFLYMPFTEXT0002: Hystrix was already configured! Skipping configuration from deployment - but it should.
[ERROR]   UndeployDeployTest.testDeployTwoWarsLogsWarning:252
[ERROR]   UndeployDeployTest.testNonMPFTDeploymentDoesNotBlockRestartHystrix:225 1 expectation failed.
Response body doesn't match expectation.
Expected: a string containing "Hello from @Timeout method, context = foobar"
  Actual: Fallback Hello, context = foobar

[ERROR]   UndeployDeployTest.testUndeployDeployChangesHystrixConfiguration:145 1 expectation failed.
Response body doesn't match expectation.
Expected: a string containing "Hello from @Timeout method, context = foobar"
  Actual: Fallback Hello, context = foobar

Just looking at the names, it seems these tests verify behavior of the old implementation that is no longer present, but it would be great if you guys could verify. It's possible these are genuine bugs.

Also if you could run the MP FT TCK with WildFly with this changeset, that would be great as well. I need to look into updating the Quarkus extension with this, because I had to do some changes in the CDI extension here :-)

@Ladicek
Copy link
Contributor Author

Ladicek commented Mar 3, 2020

In general, running as many tests as we can with this would be beneficial :-)

@mnovak1
Copy link

mnovak1 commented Mar 3, 2020

Just looking at the names, it seems these tests verify behavior of the old implementation that is no longer present, but it would be great if you guys could verify. It's possible these are genuine bugs.

Correct, the other test failures can be ignored as they were relevant just for Hystrix. @tterem has updated branch of the tests for non-hystrix SM FT. @tterem could you link it here, please?

@tterem
Copy link

tterem commented Mar 3, 2020

Hi @Ladicek,

here is my branch with updated tests:

https://github.com/tterem/eap-microprofile-test-suite/tree/1428

Tests that were depending on hystrix are removed there. You can try to run your fix against it. Meanwhile I will run it as well.

@Ladicek
Copy link
Contributor Author

Ladicek commented Mar 3, 2020

Cool, thanks.

@tterem
Copy link

tterem commented Mar 3, 2020

I build your branch locally and I confirm that our MicroProfile testsuite is passing with these changes.

previousHalfOpenTime.addAndGet(now - halfOpenStart);
delegate.apply(target).whenComplete((value, error) -> {
if (error != null) {
metricsRecorder.circuitBreakerFailed();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe worth pulling out the three lines here to a separate method?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer not. I tend to not create methods that can possibly only be called from a single place.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you have exactly the same lines in 145-147

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, true. That's in fact the same for all CompletionStage* strategies, and something I'd like to address in another way (later): the Invocation strategy would be, in the async flow, wrapped into an exception-handling strategy, which would catch exceptions thrown by the Invocation and propagate them to the CompletableFuture. No other CompletionStage* strategy would have to do that, anymore.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filed #202

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@Ladicek
Copy link
Contributor Author

Ladicek commented Mar 3, 2020

Very nice, thanks! I'm just doing the same, it took me a while :-)

@rhusar
Copy link
Contributor

rhusar commented Mar 3, 2020

@tterem Great, thanks for the update! I was testing the integration and everything is passing there too (TCK+integ):

[INFO] Tests run: 407, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 406.126 s - in TestSuite

@Ladicek
Copy link
Contributor Author

Ladicek commented Mar 3, 2020

Awesomesauce!

(I just ran the QE test suite locally and it's passing for me too.)

@rhusar
Copy link
Contributor

rhusar commented Mar 3, 2020

Are you sure you need the version bump? Nobody cares about internal/private API change.

int attempt,
RunningStopwatch stopwatch,
Throwable latestFailure) {
public CompletionStage<V> doRetry(InvocationContext<CompletionStage<V>> target, int attempt,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was it the auto-formatter? :(

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like unintended and unnecessary change.

Copy link
Contributor

@rhusar rhusar Mar 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But n.b. given the time pressure its moot to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might have actually been me :-) The one line per param format is so inefficient with screen space :-)

@rhusar
Copy link
Contributor

rhusar commented Mar 3, 2020

Moreover, @quarkusio will need this fixes anyway so the 4.0.x will be just not in use by anybody.

@Ladicek
Copy link
Contributor Author

Ladicek commented Mar 3, 2020

I'm not sure about the version bump, but I figured I'd bump a minor as it might affect integrations. Now thinking about it again, the internal API changes only required changing the FaultToleranceInterceptor, so they are fully contained and integrations don't need to know. I'll probably remove that version-bumping commit.

@Ladicek
Copy link
Contributor Author

Ladicek commented Mar 3, 2020

Removed the version-bumping commit.

Copy link
Contributor

@rhusar rhusar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW LGTM.

InvocationContext invocationContext,
MetricsCollector collector,
InterceptionPoint point) throws Exception {
private <T> T syncFlow(FaultToleranceOperation operation, InvocationContext invocationContext,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is much less readable than one arg per line

@Ladicek Ladicek merged commit d0e706b into smallrye:master Mar 4, 2020
@Ladicek Ladicek deleted the bugfixes branch March 4, 2020 10:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants