Skip to content

Failure handling and resilience patterns for the JVM

License

Notifications You must be signed in to change notification settings

Karthik-Gupta/failsafe

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Failsafe

Build Status Maven Central License JavaDoc Join the chat at https://gitter.im/jhalterman/failsafe

Introduction

Failsafe is a lightweight, zero-dependency library for handling failures in Java 8+, with a concise API for handling everyday use cases and the flexibility to handle everything else. It works by wrapping executable logic with one or more resilience policies, which can be combined and composed as needed. These policies include:

It also provides features that allow you to integrate with various scenarios, including:

Setup

Add the latest Failsafe Maven dependency to your project.

Migrating from 1.x

Failsafe 2.0 has API and behavior changes from 1.x. See the CHANGES doc for more details.

Usage

Getting Started

To start, we'll create a RetryPolicy that defines which failures should be handled and when retries should be performed:

RetryPolicy<Object> retryPolicy = new RetryPolicy<>()
  .handle(ConnectException.class)
  .withDelay(Duration.ofSeconds(1))
  .withMaxRetries(3);

We can then execute a Runnable or Supplier with retries:

// Run with retries
Failsafe.with(retryPolicy).run(() -> connect());

// Get with retries
Connection connection = Failsafe.with(retryPolicy).get(() -> connect());

We can also execute a Runnable or Supplier asynchronously with retries:

// Run with retries asynchronously
CompletableFuture<Void> future = Failsafe.with(retryPolicy).runAsync(() -> connect());

// Get with retries asynchronously
CompletableFuture<Connection> future = Failsafe.with(retryPolicy).getAsync(() -> connect());

Composing Policies

Multiple policies can be arbitrarily composed to add additional layers of resilience or to handle different failures in different ways:

CircuitBreaker<Object> circuitBreaker = new CircuitBreaker<>();
Fallback<Object> fallback = Fallback.of(this::connectToBackup);

Failsafe.with(fallback, retryPolicy, circuitBreaker).get(this::connect);

Order does matter when composing policies. See the section below for more details.

Failsafe Executor

Policy compositions can also be saved for later use via a FailsafeExecutor:

FailsafeExecutor<Object> executor = Failsafe.with(fallback, retryPolicy, circuitBreaker);
executor.run(this::connect);

Failure Policies

Failsafe uses policies to handle failures. By default, policies treat any Exception as a failure. But policies can also be configured to handle more specific failures or conditions:

policy
  .handle(ConnectException.class, SocketException.class)
  .handleIf(failure -> failure instanceof ConnectException);

They can also be configured to handle specific results or result conditions:

policy
  .handleResult(null)
  .handleResultIf(result -> result == null);  

Retries

Retry policies express when retries should be performed for an execution failure.

By default, a RetryPolicy will perform a maximum of 3 execution attempts. You can configure a max number of attempts or retries:

retryPolicy.withMaxAttempts(3);

And a delay between attempts:

retryPolicy.withDelay(Duration.ofSeconds(1));

You can add delay that backs off exponentially:

retryPolicy.withBackoff(1, 30, ChronoUnit.SECONDS);

A random delay for some range:

retryPolicy.withDelay(1, 10, ChronoUnit.SECONDS);

Or a computed delay based on an execution. You can add a random jitter factor to a delay:

retryPolicy.withJitter(.1);

Or a time based jitter:

retryPolicy.withJitter(Duration.ofMillis(100));

You can add a max retry duration:

retryPolicy.withMaxDuration(Duration.ofMinutes(5));

You can specify which results, failures or conditions to abort retries on:

retryPolicy
  .abortWhen(true)
  .abortOn(NoRouteToHostException.class)
  .abortIf(result -> result == true)

And of course you can arbitrarily combine any of these things into a single policy.

Circuit Breakers

Circuit breakers allow you to create systems that fail-fast by temporarily disabling execution as a way of preventing system overload. Creating a CircuitBreaker is straightforward:

CircuitBreaker<Object> breaker = new CircuitBreaker<>()
  .handle(ConnectException.class)
  .withFailureThreshold(3, 10)
  .withSuccessThreshold(5)
  .withDelay(Duration.ofMinutes(1));

When a configured threshold of execution failures occurs on a circuit breaker, the circuit is opened and further execution requests fail with CircuitBreakerOpenException. After a delay, the circuit is half-opened and trial executions are attempted to determine whether the circuit should be closed or opened again. If the trial executions meet a success threshold, the breaker is closed again and executions will proceed as normal.

Circuit Breaker Configuration

Circuit breakers can be flexibly configured to express when the circuit should be opened or closed.

A circuit breaker can be configured to open when a successive number of executions have failed:

breaker.withFailureThreshold(5);

Or when, for example, the last 3 out of 5 executions have failed:

breaker.withFailureThreshold(3, 5);

After opening, a breaker will delay for 1 minute by default before before attempting to close again, or you can configure a specific delay:

breaker.withDelay(Duration.ofSeconds(30));

The breaker can be configured to close again if a number of trial executions succeed, else it will re-open:

breaker.withSuccessThreshold(5);

The breaker can also be configured to close again if, for example, the last 3 out of 5 executions succeed, else it will re-open:

breaker.withSuccessThreshold(3, 5);

And the breaker can be configured to recognize executions that exceed a certain timeout as failures:

breaker.withTimeout(Duration.ofSeconds(10));

Circuit Breaker Metrics

CircuitBreaker can provide metrics regarding the number of recorded successes or failures in the current state.

Best Practices

A circuit breaker can and should be shared across code that accesses inter-dependent system components that fail together. This ensures that if the circuit is opened, executions against one component that rely on another component will not be allowed until the circuit is closed again. For example, if multiple connections or requests are made to the same external server, typically they should all go through the same circuit breaker.

Standalone Usage

A CircuitBreaker can also be manually operated in a standalone way:

breaker.open();
breaker.halfOpen();
breaker.close();

if (breaker.allowsExecution()) {
  try {
    breaker.preExecute();
    doSomething();
    breaker.recordSuccess();
  } catch (Exception e) {
    breaker.recordFailure(e);
  }
}

Fallbacks

Fallbacks allow you to provide an alternative result for a failed execution. They can also be used to suppress exceptions and provide a default result:

Fallback<Object> fallback = Fallback.of(null);

Throw a custom exception:

Fallback<Object> fallback = Fallback.of(failure -> { throw new CustomException(failure); });

Or compute an alternative result such as from a backup resource:

Fallback<Object> fallback = Fallback.of(this::connectToBackup);

For computations that block, a Fallback can be configured to run asynchronously:

Fallback<Object> fallback = Fallback.ofAsync(this::blockingCall);

Policy Composition

Policies can be composed in any way desired, including multiple policies of the same type. Policies handle execution results in reverse order, similar to the way that function composition works. For example, consider:

Failsafe.with(fallback, retryPolicy, circuitBreaker).get(supplier);

This results in the following internal composition when executing the supplier and handling its result:

Fallback(RetryPolicy(CircuitBreaker(Supplier)))

This means the CircuitBreaker is first to evaluate the Supplier's result, then the RetryPolicy, then the Fallback. Each policy makes its own determination as to whether the result represents a failure. This allows different policies to be used for handling different types of failures.

Typical Composition

A typical Failsafe configuration that uses multiple policies will place a Fallback as the outer-most policy, followed by a RetryPolicy, and a CircuitBreaker as the inner-most policy:

Failsafe.with(fallback, retryPolicy, circuitBreaker)

That said, it really depends on how the policies are being used, and different compositions make sense for different use cases.

Additional Features

Configurable Schedulers

By default, Failsafe uses the ForkJoinPool's common pool to perform async executions, but you can also configure a specific ScheduledExecutorService, custom Scheduler, or ExecutorService to use:

Failsafe.with(policy).with(scheduler).getAsync(this::connect);

Event Listeners

Failsafe supports event listeners, both in the top level Failsafe API, and in the different Policy implementations.

At the top level, it can notify you when an execution completes for all policies:

Failsafe.with(retryPolicy, circuitBreaker)
  .onComplete(e -> {
    if (e.getResult() != null)
      log.info("Connected to {}", e.getResult());
    else if (e.getFailure() != null)
      log.error("Failed to create connection", e.getFailure());
  })
  .get(this::connect);

It can notify you when an execution completes successfully for all policies:

Failsafe.with(retryPolicy, circuitBreaker)
  .onSuccess(e -> log.info("Connected to {}", e.getResult()))
  .get(this::connect);

Or when an execution fails for any policy:

Failsafe.with(retryPolicy, circuitBreaker)
  .onFailure(e -> log.error("Failed to create connection", e.getFailure()))
  .get(this::connect);

At the policy level, it can notify you when an execution succeeds or fails for a particular policy:

policy
  .onSuccess(e -> log.info("Connected to {}", e.getResult()))
  .onFailure(e -> log.error("Failed to create connection", e.getFailure()))
  .get(this::connect);

When an execution attempt fails and before a retry is performed for a RetryPolicy:

retryPolicy
  .onFailedAttempt(e -> log.error("Connection attempt failed", e.getLastFailure()))
  .onRetry(e -> log.warn("Failure #{}. Retrying.", ctx.getAttemptCount()));

Or when an execution fails and the max retries are exceeded for a RetryPolicy:

retryPolicy.onRetriesExceeded(e -> log.warn("Failed to connect. Max retries exceeded."));

For CircuitBreakers, Failsafe can notify you when the state changes:

circuitBreaker
  .onClose(() -> log.info("The circuit breaker was closed"));
  .onOpen(() -> log.info("The circuit breaker was opened"))
  .onHalfOpen(() -> log.info("The circuit breaker was half-opened"))

Execution Context

Failsafe can provide an ExecutionContext containing execution related information such as the number of execution attempts as well as start and elapsed times:

Failsafe.with(retryPolicy).run(ctx -> {
  log.debug("Connection attempt #{}", ctx.getAttemptCount());
  connect();
});

Strong typing

Failsafe Policies are typed based on the expected result. For generic policies that are used for various executions, the result type may just be Object:

RetryPolicy<Object> retryPolicy = new RetryPolicy<>();

But for other policies we may declare a more specific result type:

RetryPolicy<HttpResponse> retryPolicy = new RetryPolicy<HttpResponse>()
  .handleResultIf(response -> response.getStatusCode == 500)
  .onFailedAttempt(e -> log.warn("Failed attempt: {}", e.getLastResult().getStatusCode()));

This allows Failsafe to ensure that the same result type used for the policy is returned by the execution:

HttpResponse response = Failsafe.with(retryPolicy)
  .onSuccess(e -> log.info("Success: {}", e.getResult().getStatusCode()))  
  .get(this::sendHttpRequest);

Asynchronous API Integration

Failsafe can be integrated with asynchronous code that reports completion via callbacks. The runAsyncExecution, getAsyncExecution and futureAsyncExecution methods provide an AsyncExecution reference that can be used to manually schedule retries or complete the execution from inside asynchronous callbacks:

Failsafe.with(retryPolicy)
  .getAsyncExecution(execution -> service.connect().whenComplete((result, failure) -> {
    if (execution.complete(result, failure))
      log.info("Connected");
    else if (!execution.retry())
      log.error("Connection attempts failed", failure);
  }));

Failsafe can also perform asynchronous executions and retries on 3rd party schedulers via the Scheduler interface. See the Vert.x example for a more detailed implementation.

CompletionStage Integration

Failsafe can accept a CompletionStage and return a new CompletableFuture with failure handling built-in:

Failsafe.with(retryPolicy)
  .getStageAsync(this::connectAsync)
  .thenApplyAsync(value -> value + "bar")
  .thenAccept(System.out::println));

Functional Interface Integration

Failsafe can be used to create resilient functional interfaces:

Function<String, Connection> connect = address -> Failsafe.with(retryPolicy).get(() -> connect(address));

We can wrap Stream operations:

Stream.of("foo").map(value -> Failsafe.with(retryPolicy).get(() -> value + "bar"));

Or individual CompletableFuture stages:

CompletableFuture.supplyAsync(() -> Failsafe.with(retryPolicy).get(() -> "foo"))
  .thenApplyAsync(value -> Failsafe.with(retryPolicy).get(() -> value + "bar"));

Execution Tracking

In addition to automatically performing retries, Failsafe can be used to track executions for you, allowing you to manually retry as needed:

Execution execution = new Execution(retryPolicy);
while (!execution.isComplete()) {
  try {
    doSomething();
    execution.complete();
  } catch (ConnectException e) {
    execution.recordFailure(e);
  }
}

Execution tracking is also useful for integrating with APIs that have their own retry mechanism:

Execution execution = new Execution(retryPolicy);

// On failure
if (execution.canRetryOn(someFailure))
  service.scheduleRetry(execution.getWaitTime().toNanos(), TimeUnit.MILLISECONDS);

See the RxJava example for a more detailed implementation.

Policy SPI

Failsafe provides an SPI that allows you to implement your own Policy and plug it into Failsafe. Each Policy implementation must return a PolicyExecutor which is responsible for performing synchronous or asynchronous execution, handling pre-execution requests, or handling post-execution results. The existing PolicyExecutor implementations are a good reference for creating additional implementations.

Additional Resources

Library and API Integration

For library and public API developers, Failsafe integrates nicely into existing APIs, allowing your users to configure retry policies for different operations. One integration approach is to subclass the RetryPolicy class and expose that as part of your API while the rest of Failsafe remains internal. Another approach is to use something like the Maven shade plugin to rename and relocate Failsafe classes into your project's package structure as desired.

Contribute

Failsafe is a volunteer effort. If you use it and you like it, let us know, and also help by spreading the word!

License

Copyright 2015-2019 Jonathan Halterman and friends. Released under the Apache 2.0 license.

About

Failure handling and resilience patterns for the JVM

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Java 99.9%
  • Shell 0.1%