The English version of quarkus.io is the official project site. Translated sites are community supported on a best-effort basis.

Mutiny - How does retry... retries?

Last week, David, a Quarkus user, asked me about retrying an asynchronous operation. David has a picky remote HTTP service, which sometimes misbehaves. Easy answer! Just do: uni.onFailure().retry().atMost(x). But, David is curious, and asked me a second question: how does retry work? Well, that’s simple; it just retries…​ As you can imagine, that did not satisfy David’s curiosity.

While I was answering to David, I realized that retrying is not that simple and deserves more explanation. That’s what we are going to see in this blog post.

Disclaimer about retries

Okay, if you are the kind of reader who skips the terms and conditions, you can jump to the next section. But, for others, I need to warn you about retries. Retries may not be the solution to your problem, as it can be terrible. Retrying can only be done if your system can handle duplicated requests or messages. In other words, you can only retry if your system is idempotent, i.e., sending a request or a message multiple times does not change the overall state. In doubt, check before implementing a retry, as it can harm your system.

Disclaimer said! Let’s look under the hood of retry.

Retry is about resubscribing

Let’s imagine you have a Uni representing your asynchronous action, like in David’s case, an invocation of a remote service:

Uni<String> uni = invokePickyService(client);

Unis are lazy. Until someone subscribes to them, nothing happens. In our case, the request is only sent to the remote service when someone subscribes to the uni. So to execute the request, we need to subscribe:

Uni<String> uni = invokePickyService(client);
uni.subscribe().with(
    resp -> System.out.println("Success: " + resp),
    failure -> System.out.println("Failed: " + failure.getMessage())
);

In Quarkus, most of the time, you return the Uni, and Quarkus subscribes to it. So, don’t be mistaken, there is a subscription, you may not see it.

This laziness is the retry secret. In the following sequence diagram, you can see that the request is sent when we get a subscriber:

subscription

An interesting consequence of this is that if you subscribe twice, you emit two requests, and so get two responses:

double subscription

But let’s go back to retry. What’s a retry? A retry is a reaction to a failure, so you are going to write: uni.onFailure().retry(). You also need to indicate how long do you want to retry:

Uni<String> uni = invokePickyService(client)
    .onFailure().retry().indefinitely();
uni.subscribe().with(
        resp -> System.out.println("Success: " + resp)
);

In this snippet, we retry indefinitely until we get a successful result.

But, how does it work under the hood? It’s quite simple. If it gets a failure, it just re-subscribes:

retry

It resubscribes until it gets a successful response. In other words, retrying is resubscribing.

How many times should I retry?

That’s always a good question. Should I retry indefinitely? Most probably, not. Indefinitely can be very long, and it could never end if the called service fails continuously.

You can configure the number of retries using atMost:

 Uni<String> uni = invokePickyService(client)
    .onFailure().retry().atMost(2);
uni.subscribe().with(
        resp -> System.out.println("Success: " + resp),
        failure -> System.out.println("Failure: " + failure.getMessage())
);

atMost indicates that at most n attempts will be done before failing. If we still get a failure after that number of resubscription, the last failure is sent to the subscriber.

retry failed

You can also use until and decide to retry by looking at the received failure:

Uni<String> uni = invokePickyService(client)
    .onFailure().retry().until(failure -> ! (failure instanceof TooManyRequestsException));

Bonus: Exponential backoff

So far, our retries happen immediately. It might not be very wise, and separating a bit our retries may give better results, especially when facing intermittent failures due to the load or other external causes. Using exponential backoff provides a reasonable tradeoff. Retrying with exponential backoff:

  • retries after an initial delay,

  • on every failure, it doubles the previous delay, with an optional maximum,

  • it can use a jitter to add a random duration to the delay,

  • it can have a max delay if needed,

  • it is still constrained by atMost

Uni<String> uni = invokePickyService(client)
    .onFailure().retry()
        .withBackOff(Duration.ofSeconds(1)).withJitter(0.2).atMost(10);

This last snippet configures the retry with exponential backoff. The first retry happens after 1 second, and then it doubles every time without an upper limit. Random jitter is applied to delays. If, unfortunately, after ten attempts, it still fails, the failure is sent to the subscriber.

Demo time!

Ok, enough talking. I’ve built a simple playground where you can adjust and try the various retry policies: https://gist.github.com/cescoffier/e9abce907a1c3d05d70bea3dae6dc3d5. You can jbang the script by downloading it and running jbang Retry.java, or just run:

jbang https://gist.github.com/cescoffier/e9abce907a1c3d05d70bea3dae6dc3d5

The called service fails 50% of the time (well, it uses a random, so statistically 50% of the time).

That concludes this blog post. Again, before using retry, please verify that your system can support it. Retrying is resubscribing. You can configure how long, how many times, and when retrying should be attempted. There are many more options offered by Mutiny, like when or using deadlines (expireIn and expireAt) when using exponential backoff. You can try all these with the provided playground.

Stay tuned! We have plenty of other gems to discuss here!