In computer software, failures happen all the time: databases fail, networks are unreliable, disk space and RAM are limited and third party APIs can become temporarily unavailable. These are just a few examples - basically almost anything can go wrong.

In building Zenaton, we have always focused on how can we help our users deal with failures. Failures are identified and reported on your Zenaton dashboard. When a task fails, you have the option to retry simply by clicking on the retry button. Convenient, right?

Well, sometimes it is not convenient enough. For example, you may have some tasks that are failing on a regular basis, possibly because you are using a third-party API that often have technical issues.

errors overview zenaton

Failed tasks are reported on your dashboard and you can retry them by clicking on the button.

When you have such a task, it can become cumbersome to go to your dashboard every day, or even multiple times per day, just to check and hit the “retry all” button. That’s why one of the most requested features by our users is the ability to define retry policies that allow for automatically retries. We are working on this! But in the meantime, I’m going to show you a little hack to accomplish automatic retries by coding it directly into your workflow.

Being able to write workflows in code is one of the most interesting features of Zenaton, and our users sometimes don’t realize that. It allows you to do many powerful things and today we will implement an automatic retry of a task inside a workflow to demonstrate it. The workflow will be the following:

  • Execute a randomly failing task.
  • If the task fails, we want to wait 1 minute, and then try again.
  • If the task fails again, we will wait 3 minutes and proceed to a new try.
  • If it still fails, this time we will wait 6 minutes and proceed to the last try.
  • If the task is still failing at this point, we want it to be displayed on the dashboard to allow a manual retry at a later date.

This gives us a total of 4 tries over a 10 minutes period, which should cover most of the short-time unavailability that could happen. The following code samples are written in PHP but this will work on every language supported by Zenaton.

Let’s start with the task! The trick is to make the task catch exceptions and errors that can happen and return them instead of throwing them. Throwing errors will interrupt the execution of code and be reported as an error on the dashboard, whereas if we return an exception, it will be like any other return value a task can return, and we will be able to make our workflow act differently based on this result. Also, if the task is still failing after all these automatic retries, we want it to throw and error so that it can be reported on the dashboard. Here is what it could look like:

<?php
use Zenaton\Interfaces\TaskInterface;
use Zenaton\Traits\Zenatonable;
class RandomlyFailingTask implements TaskInterface
{
    use Zenatonable;
    /** @var bool */
    private $throw;
    /**
     * @param bool $throw Whether the task will throw or return when an error happen
     */
    public function __construct(bool $throw)
    {
        $this->throw = $throw;
    }
    public function handle()
    {
        try {
            return $this->doHandle();
        } catch (\Throwable $e) {
            if ($this->throw) {
                throw $e;
            }
            return $e;
        }
    }
    private function doHandle()
    {
        if (random_int(0, 1) % 2 === 0) {
            return 42;
        }
        throw new \RuntimeException('Failure');
    }
}

The code is pretty straightforward. This task takes a bool parameter $throw that will determine, in case of an error, if the error is to be returned or thrown. The workflow will send the correct value for this parameter: as long as we want automatic retries to happen, the workflow will send the value false so that the task will return errors. When all automatic retries are complete, the value sent by the workflow will be true and allow the task to throw error, which will make it appear on your dashboard — where you can choose to manually retry the task.

Now let’s continue with the workflow code. The workflow will need to loop until the task returns a correct result or cycles through all of the automatic and the task is still failing. The workflow could look like this:

<?php
use Zenaton\Interfaces\WorkflowInterface;
use Zenaton\Tasks\Wait;
use Zenaton\Traits\Zenatonable;
class AutomaticRetryWorkflow implements WorkflowInterface
{
    use Zenatonable;
    public function handle()
    {
        // the different retry delays (in minutes) we want to use
        $retries = [1, 3, 6];
        $result = (new RandomlyFailingTask(false))->execute();
        // Loop until there are some retries available
        while (count($retries) > 0 && $result instanceof \Throwable) {
            // Wait for the retry delay before doing the next task execution
            (new Wait())->minutes(array_shift($retries))->execute();
            // Task should throw if this is the last loop iteration
            $taskShouldThrow = count($retries) === 0;
            $result = (new RandomlyFailingTask($taskShouldThrow))->execute();
        }
    }
}

I have commented in the workflow so hopefully it is easy to understand.

  • Line 14: we define the different delays we want to use between retries.
  • Line 17: we loop as long as we have some elements in the $retry array. Every time we will do an automatic retry, the corresponding element will be removed from the array, which will eventually end the loop. The loop will end by the break statement on line 25 actually. We could have written while (true) and get the same result, but unnecessary infinite loops are not a good idea.
  • Line 19: we define if the task should throw or return errors. Basically, we want to throw an error only when we are on the last iteration of the retry loop. This will make the task appear on the dashboard where it can be manual retried.
  • Lines 34 & 35: This defines that as soon as the task returns a correct result, we will exit the loop.
  • Line 29: This is the wait that will be executed between tries. The array_shift() function is the one that is removing elements from the $retries array.

And that’s all you need to have 3 automatic retries of this task, using a variable delay between them.

Let’s check the dashboard to monitor our results. If the task succeeded the first time, it looks like this:

error task details zenaton

The task has been executed only once, returning 42.

If it’s succeeding after a few tries, it looks like this:

automatic retries zenaton

The task succeeded at the third try. There is one last try that was not used.

Finally, when the task fails four times in a row, it will look like this and give us the option to manually retry the task:

last retry error handling

The task has failed four times on a row. We can clearly see the variable delays between tries.

The behavior is exactly what we specified at the beginning of this article, and we were able to do that with only a few lines of code.

The monitoring feature is a great way to see what happened for a workflow. You can read more about it here.

Conclusion

Hopefully after seeing this you will be able to better appreciate why we built Zenaton workflows as code, and why we are convinced it is the right choice. It gives you a lot of power and versatility when using the platform. It allows for maximum flexibility, because most of the time, even if we don’t have a specific feature, you will be able to do it yourself using a little creativity and a few lines of code.

Although we were able to implement automatic retries using our ‘hack’, we believe that workflows should represent business logic and should not be cluttered with technical stuff. If you have a complex workflow, and you were to use this kind of code to handle technical failures, it would clutter the workflow and you would lose readability. So we are planning an elegant way of specifying automatic retries so that your workflows can stay focused only on the business case they are solving. Stay tuned!