Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retry on configurable exception #6991

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -1,2 +1,7 @@
Comparing source compatibility of opentelemetry-sdk-common-1.46.0-SNAPSHOT.jar against opentelemetry-sdk-common-1.45.0.jar
No changes.
**** MODIFIED CLASS: PUBLIC ABSTRACT io.opentelemetry.sdk.common.export.RetryPolicy (not serializable)
=== CLASS FILE FORMAT VERSION: 52.0 <- 52.0
+++* NEW METHOD: PUBLIC(+) ABSTRACT(+) java.util.function.Predicate<java.io.IOException> getRetryExceptionPredicate()
**** MODIFIED CLASS: PUBLIC ABSTRACT STATIC io.opentelemetry.sdk.common.export.RetryPolicy$RetryPolicyBuilder (not serializable)
=== CLASS FILE FORMAT VERSION: 52.0 <- 52.0
+++* NEW METHOD: PUBLIC(+) ABSTRACT(+) io.opentelemetry.sdk.common.export.RetryPolicy$RetryPolicyBuilder setRetryExceptionPredicate(java.util.function.Predicate<java.io.IOException>)
Original file line number Diff line number Diff line change
Expand Up @@ -1000,7 +1000,7 @@ void stringRepresentation() throws IOException, CertificateEncodingException {
+ ", "
+ "compressorEncoding=gzip, "
+ "headers=Headers\\{.*foo=OBFUSCATED.*\\}, "
+ "retryPolicy=RetryPolicy\\{maxAttempts=2, initialBackoff=PT0\\.05S, maxBackoff=PT3S, backoffMultiplier=1\\.3\\}"
+ "retryPolicy=RetryPolicy\\{maxAttempts=2, initialBackoff=PT0\\.05S, maxBackoff=PT3S, backoffMultiplier=1\\.3, retryExceptionPredicate=io.opentelemetry.sdk.common.export.RetryPolicy.*\\}"
+ ".*" // Maybe additional grpcChannel field, signal specific fields
+ "\\}");
} finally {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -904,7 +904,7 @@ void stringRepresentation() throws IOException, CertificateEncodingException {
+ ", "
+ "exportAsJson=false, "
+ "headers=Headers\\{.*foo=OBFUSCATED.*\\}, "
+ "retryPolicy=RetryPolicy\\{maxAttempts=2, initialBackoff=PT0\\.05S, maxBackoff=PT3S, backoffMultiplier=1\\.3\\}"
+ "retryPolicy=RetryPolicy\\{maxAttempts=2, initialBackoff=PT0\\.05S, maxBackoff=PT3S, backoffMultiplier=1\\.3, retryExceptionPredicate=io.opentelemetry.sdk.common.export.RetryPolicy.*\\}"
+ ".*" // Maybe additional signal specific fields
+ "\\}");
} finally {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,9 @@ public RetryInterceptor(RetryPolicy retryPolicy, Function<Response, Boolean> isR
this(
retryPolicy,
isRetryable,
RetryInterceptor::isRetryableException,
e ->
retryPolicy.getRetryExceptionPredicate().test(e)
|| RetryInterceptor.isRetryableException(e),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The OR here is interesting. It means a user can choose to expand the definition of what is retryable but not reduce it. I wonder if there are any cases when you would not want to retry when the default would retry. 🤔

Copy link
Author

@YuriyHolinko YuriyHolinko Jan 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a user can choose to expand the definition of what is retryable but not reduce it

it's exactly the idea

I wonder if there are any cases when you would not want to retry when the default would retry

I would say no 🤔

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say no 🤔

I think we might..

Suppose we want expose options to give full control to the user over what's retryable (as I alluded to in the end of this comment), we'd probably want to do something like:

  • Expose a single configurable predicate option of the form setRetryPredicate(Predicate<Throwable>)
  • Funnel all failed requests through this predicate, whether they resolved a response with a status code or ended with an exception
  • This means we'd need to translate requests with a non-200 status code to an equivalent exception to pass to the predicate
  • If the user doesn't define their own predicate, default to one that retriable when status is retryable (429, 502, 503, 504) or when the exception is retryable (like one of the SocketTimeoutException we've discussed).

In this case, its possible that a user doesn't want to retry on a particular response status code like 502, even when the default behavior is to retry on it.

TimeUnit.NANOSECONDS::sleep,
bound -> ThreadLocalRandom.current().nextLong(bound));
}
Expand Down Expand Up @@ -144,6 +146,11 @@ private static String responseStringRepresentation(Response response) {
return joiner.toString();
}

// Visible for testing
boolean shouldRetryOnException(IOException e) {
return isRetryableException.apply(e);
}

// Visible for testing
static boolean isRetryableException(IOException e) {
if (e instanceof SocketTimeoutException) {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@
import io.opentelemetry.sdk.common.export.RetryPolicy;
import java.io.IOException;
import java.net.ConnectException;
import java.net.HttpRetryException;
import java.net.ServerSocket;
import java.net.SocketTimeoutException;
import java.time.Duration;
Expand Down Expand Up @@ -61,7 +62,9 @@ void setUp() {
new Function<IOException, Boolean>() {
@Override
public Boolean apply(IOException exception) {
return RetryInterceptor.isRetryableException(exception);
return RetryInterceptor.isRetryableException(exception)
|| (exception instanceof HttpRetryException
&& exception.getMessage().contains("timeout retry"));
}
});
retrier =
Expand Down Expand Up @@ -214,20 +217,51 @@ private OkHttpClient connectTimeoutClient() {
void isRetryableException() {
// Should retry on connection timeouts, where error message is "Connect timed out" or "connect
// timed out"
assertThat(
RetryInterceptor.isRetryableException(new SocketTimeoutException("Connect timed out")))
assertThat(retrier.shouldRetryOnException(new SocketTimeoutException("Connect timed out")))
.isTrue();
assertThat(
RetryInterceptor.isRetryableException(new SocketTimeoutException("connect timed out")))
assertThat(retrier.shouldRetryOnException(new SocketTimeoutException("connect timed out")))
.isTrue();
// Shouldn't retry on read timeouts, where error message is "Read timed out"
assertThat(RetryInterceptor.isRetryableException(new SocketTimeoutException("Read timed out")))
assertThat(retrier.shouldRetryOnException(new SocketTimeoutException("Read timed out")))
.isFalse();
// Shouldn't retry on write timeouts, where error message is "timeout", or other IOException
assertThat(RetryInterceptor.isRetryableException(new SocketTimeoutException("timeout")))
assertThat(retrier.shouldRetryOnException(new SocketTimeoutException("timeout"))).isFalse();
assertThat(retrier.shouldRetryOnException(new SocketTimeoutException())).isTrue();
assertThat(retrier.shouldRetryOnException(new IOException("error"))).isFalse();

// Testing configured predicate
assertThat(retrier.shouldRetryOnException(new HttpRetryException("error", 400))).isFalse();
assertThat(retrier.shouldRetryOnException(new HttpRetryException("timeout retry", 400)))
.isTrue();
}

@Test
void isRetryableExceptionDefaultBehaviour() {
RetryInterceptor retryInterceptor =
new RetryInterceptor(RetryPolicy.getDefault(), OkHttpHttpSender::isRetryable);
assertThat(
retryInterceptor.shouldRetryOnException(
new SocketTimeoutException("Connect timed out")))
.isTrue();
assertThat(retryInterceptor.shouldRetryOnException(new IOException("Connect timed out")))
.isFalse();
assertThat(RetryInterceptor.isRetryableException(new SocketTimeoutException())).isTrue();
assertThat(RetryInterceptor.isRetryableException(new IOException("error"))).isFalse();
}

@Test
void isRetryableExceptionCustomRetryPredicate() {
RetryInterceptor retryInterceptor =
new RetryInterceptor(
RetryPolicy.builder()
.setRetryExceptionPredicate((IOException e) -> e.getMessage().equals("retry"))
.build(),
OkHttpHttpSender::isRetryable);

assertThat(retryInterceptor.shouldRetryOnException(new IOException("some message"))).isFalse();
assertThat(retryInterceptor.shouldRetryOnException(new IOException("retry"))).isTrue();
assertThat(
retryInterceptor.shouldRetryOnException(
new SocketTimeoutException("Connect timed out")))
.isTrue();
}

private Response sendRequest() throws IOException {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,9 @@
import static io.opentelemetry.api.internal.Utils.checkArgument;

import com.google.auto.value.AutoValue;
import java.io.IOException;
import java.time.Duration;
import java.util.function.Predicate;

/**
* Configuration for exporter exponential retry policy.
Expand Down Expand Up @@ -43,7 +45,8 @@ public static RetryPolicyBuilder builder() {
.setMaxAttempts(DEFAULT_MAX_ATTEMPTS)
.setInitialBackoff(Duration.ofSeconds(DEFAULT_INITIAL_BACKOFF_SECONDS))
.setMaxBackoff(Duration.ofSeconds(DEFAULT_MAX_BACKOFF_SECONDS))
.setBackoffMultiplier(DEFAULT_BACKOFF_MULTIPLIER);
.setBackoffMultiplier(DEFAULT_BACKOFF_MULTIPLIER)
.setRetryExceptionPredicate((e) -> false);
}

/**
Expand All @@ -66,6 +69,9 @@ public static RetryPolicyBuilder builder() {
/** Returns the backoff multiplier. */
public abstract double getBackoffMultiplier();

/** Returns the predicate if exception is retryable. */
public abstract Predicate<IOException> getRetryExceptionPredicate();

/** Builder for {@link RetryPolicy}. */
@AutoValue.Builder
public abstract static class RetryPolicyBuilder {
Expand Down Expand Up @@ -96,6 +102,13 @@ public abstract static class RetryPolicyBuilder {
*/
public abstract RetryPolicyBuilder setBackoffMultiplier(double backoffMultiplier);

/**
* Set the predicate to determine if retry should happen based on exception. No retry by
* default.
*/
public abstract RetryPolicyBuilder setRetryExceptionPredicate(
Predicate<IOException> retryExceptionPredicate);

abstract RetryPolicy autoBuild();

/** Build and return a {@link RetryPolicy} with the values of this builder. */
Expand Down
Loading