docs: update retry

This commit is contained in:
Ilkka Seppälä
2024-05-19 11:39:39 +03:00
parent a8433cc634
commit f7c00f2b16
3 changed files with 80 additions and 71 deletions
+76 -70
View File
@@ -1,16 +1,22 @@
---
title: Retry
category: Behavioral
category: Resilience
language: en
tag:
- Performance
- Cloud distributed
- Fault tolerance
- Performance
- Retry
- Resilience
---
## Also known as
* Retry Logic
* Retry Mechanism
## Intent
Transparently retry certain operations that involve communication with external resources,
particularly over the network, isolating calling code from the retry implementation details.
Transparently retry certain operations that involve communication with external resources, particularly over the network, isolating calling code from the retry implementation details.
## Explanation
@@ -39,9 +45,7 @@ the simple example that accompanies this `README.md`.
Real world example
> Our application uses a service providing customer information. Once in a while the service seems
> to be flaky and can return errors or sometimes it just times out. To circumvent these problems we
> apply the retry pattern.
> Imagine you're a delivery driver attempting to deliver a package to a customer's house. You ring the doorbell, but no one answers. Instead of leaving immediately, you wait for a few minutes and try again, repeating this process a few times. This is similar to the Retry pattern in software, where a system retries a failed operation (e.g., making a network request) a certain number of times before finally giving up, in hopes that the issue (e.g., transient network glitch) will be resolved and the operation will succeed.
In plain words
@@ -49,14 +53,13 @@ In plain words
[Microsoft documentation](https://docs.microsoft.com/en-us/azure/architecture/patterns/retry) says
> Enable an application to handle transient failures when it tries to connect to a service or
> network resource, by transparently retrying a failed operation. This can improve the stability of
> the application.
> Enable an application to handle transient failures when it tries to connect to a service or network resource, by transparently retrying a failed operation. This can improve the stability of the application.
**Programmatic Example**
In our hypothetical application, we have a generic interface for all operations on remote
interfaces.
The Retry design pattern is a resilience pattern that allows an application to retry an operation in the expectation that it'll succeed. This pattern is particularly useful when the application is connecting to a network service or a remote resource, where temporary failures are common.
First, we have a `BusinessOperation` interface that represents an operation that can be performed and might throw a `BusinessException`.
```java
public interface BusinessOperation<T> {
@@ -64,92 +67,95 @@ public interface BusinessOperation<T> {
}
```
And we have an implementation of this interface that finds our customers by looking up a database.
Next, we have a `FindCustomer` class that implements this interface. This class simulates a flaky service that intermittently fails by throwing `BusinessException`s before eventually returning a customer's ID.
```java
public final class FindCustomer implements BusinessOperation<String> {
@Override
public String perform() throws BusinessException {
...
// ...
}
}
```
Our `FindCustomer` implementation can be configured to throw `BusinessException`s before returning
the customer's ID, thereby simulating a flaky service that intermittently fails. Some exceptions,
like the `CustomerNotFoundException`, are deemed to be recoverable after some hypothetical analysis
because the root cause of the error stems from "some database locking issue". However, the
`DatabaseNotAvailableException` is considered to be a definite showstopper - the application should
not attempt to recover from this error.
We can model a recoverable scenario by instantiating `FindCustomer` like this:
The `Retry` class is where the Retry pattern is implemented. It takes a `BusinessOperation` and a number of attempts, and it will keep trying to perform the operation until it either succeeds or the maximum number of attempts is reached.
```java
final var op = new FindCustomer(
"12345",
new CustomerNotFoundException("not found"),
new CustomerNotFoundException("still not found"),
new CustomerNotFoundException("don't give up yet!")
);
public final class Retry<T> implements BusinessOperation<T> {
private final BusinessOperation<T> operation;
private final int maxAttempts;
private final long delay;
private final Predicate<Exception> isRecoverable;
public Retry(BusinessOperation<T> operation, int maxAttempts, long delay, Predicate<Exception> isRecoverable) {
this.operation = operation;
this.maxAttempts = maxAttempts;
this.delay = delay;
this.isRecoverable = isRecoverable;
}
@Override
public T perform() throws BusinessException {
for (int attempt = 0; attempt < maxAttempts; attempt++) {
try {
return operation.perform();
} catch (Exception e) {
if (!isRecoverable.test(e) || attempt == maxAttempts - 1) {
throw e;
}
try {
Thread.sleep(delay);
} catch (InterruptedException ie) {
Thread.currentThread().interrupt();
throw new BusinessException("Retry operation was interrupted", ie);
}
}
}
throw new BusinessException("Retry attempts exceeded");
}
}
```
In this configuration, `FindCustomer` will throw `CustomerNotFoundException` three times, after
which it will consistently return the customer's ID (`12345`).
In this class, the `perform` method tries to perform the operation. If the operation throws an exception, it checks if the exception is recoverable and if the maximum number of attempts has not been reached. If both conditions are true, it waits for a specified delay and then tries again. If the exception is not recoverable or the maximum number of attempts has been reached, it rethrows the exception.
In our hypothetical scenario, our analysts indicate that this operation typically fails 2-4 times
for a given input during peak hours, and that each worker thread in the database subsystem typically
needs 50ms to "recover from an error". Applying these policies would yield something like this:
```java
final var op = new Retry<>(
new FindCustomer(
"1235",
new CustomerNotFoundException("not found"),
new CustomerNotFoundException("still not found"),
new CustomerNotFoundException("don't give up yet!")
),
5,
100,
e -> CustomerNotFoundException.class.isAssignableFrom(e.getClass())
);
```
Executing `op` once would automatically trigger at most 5 retry attempts, with a 100 millisecond
delay between attempts, ignoring any `CustomerNotFoundException` thrown while trying. In this
particular scenario, due to the configuration for `FindCustomer`, there will be 1 initial attempt
and 3 additional retries before finally returning the desired result `12345`.
If our `FindCustomer` operation were instead to throw a fatal `DatabaseNotFoundException`, which we
were instructed not to ignore, but more importantly we did not instruct our `Retry` to ignore, then
the operation would have failed immediately upon receiving the error, not matter how many attempts
were left.
This way, the Retry pattern allows the application to handle temporary failures gracefully, improving its resilience and reliability.
## Class diagram
![alt text](./etc/retry.png "Retry")
![Retry](./etc/retry.png "Retry")
## Applicability
Whenever an application needs to communicate with an external resource, particularly in a cloud
environment, and if the business requirements allow it.
* Use when operations can fail transiently, such as network calls, database connections, or external service integrations.
* Ideal for scenarios where the likelihood of transient failure is high but the cost of retries is low.
## Known Uses
* In network communication libraries to handle transient failures.
* Database connection libraries to manage temporary outages or timeouts.
* APIs interacting with third-party services that may be temporarily unavailable.
## Consequences
**Pros:**
Benefits:
* Resiliency
* Provides hard data on external failures
* Increases the robustness and fault tolerance of applications.
* Can significantly reduce the impact of transient failures.
**Cons:**
Trade-offs:
* Complexity
* Operations maintenance
* May introduce latency due to retries.
* Can lead to resource exhaustion if not managed properly.
* Requires careful configuration of retry parameters to avoid exacerbating the problem.
## Related Patterns
* [Circuit Breaker](https://java-design-patterns.com/patterns/circuit-breaker/)
* [Circuit Breaker](https://java-design-patterns.com/patterns/circuit-breaker/): Used to stop the flow of requests to an external service after a failure threshold is reached, preventing system overload.
## Credits
* [Retry pattern](https://docs.microsoft.com/en-us/azure/architecture/patterns/retry)
* [Cloud Design Patterns: Prescriptive Architecture Guidance for Cloud Applications](https://www.amazon.com/gp/product/1621140369/ref=as_li_tl?ie=UTF8&tag=javadesignpat-20&camp=1789&creative=9325&linkCode=as2&creativeASIN=1621140369&linkId=3e3f686af5e60a7a453b48adb286797b)
* [Cloud Design Patterns: Prescriptive Architecture Guidance for Cloud Applications](https://amzn.to/4dLvowg)
* [Design Patterns: Elements of Reusable Object-Oriented Software](https://amzn.to/3w0pvKI)
* [Java Concurrency in Practice](https://amzn.to/4aRMruW)
* [Release It!: Design and Deploy Production-Ready Software](https://amzn.to/3UPwmPh)
* [Retry pattern - Microsoft](https://docs.microsoft.com/en-us/azure/architecture/patterns/retry)
@@ -24,6 +24,8 @@
*/
package com.iluwatar.retry;
import java.io.Serial;
/**
* Indicates that the customer was not found.
*
@@ -34,6 +36,8 @@ package com.iluwatar.retry;
* @author George Aristy (george.aristy@gmail.com)
*/
public final class CustomerNotFoundException extends BusinessException {
@Serial
private static final long serialVersionUID = -6972888602621778664L;
/**
@@ -111,4 +111,3 @@ public final class RetryExponentialBackoff<T> implements BusinessOperation<T> {
} while (true);
}
}