Zenaton white logo

Error Handling

Debugging your jobs

Background tasks will fail. APIs can become temporarily unavailable, databases fail, networks are unreliable and more. In order to fix errors, Zenaton provides the tools to quickly identify errors in tasks and retry them.

  • Errors are automatically reported on the Dashboard and categorized by workflow or task and error type
  • Drill down to see error messages, stacktrace and more
  • Pause, retry or kill the workflow or tasks

Failures are identified and reported on your Zenaton Dashboard . In case of a failure during a running instance, you can quickly see the details by clicking on the failed task. The task detail will appear with stacktrace.

Stacktrace detail async jobs

Errors Tab

On the Zenaton Dashboard, the Errors Tab shows a list of all failed tasks, whether there are single tasks or included workflows.

Errors Tab Asynchronous Jobs

Errors details

When you click on an error class, you have all the details about failed tasks and have the opportunity to retry all manually.

Errors Tab Asynchronous Jobs

Retries

When using Zenaton you can retry failed tasks manually on the dashboard or automatically by writing it into the code.

Manual Retry

Login to the dashboard and retry failed tasks from the workflow tab or on the Errors Tab. On the errors tab, you can retry an individual instance of an error or 'retry all' occurrences.

Automatic Retry

Automatic retry require the Zenaton Agent version 0.8.0 and the Zenaton library version .

You can build in automatic retries for single tasks or tasks that are part of a workflow. Tasks can be retried automatically after a specified delay. To enable automatic retries for a task, you must implement a onErrorRetryDelay method into your task.

The onErrorRetryDelay method receives the error as its first parameter and returns a positive number representing the delay in seconds to wait before the next try.

You can access the execution context of the task using the context property. It will allow you to implement any retry strategy you need.

Here is an example of a task which will automatically be retried at most 3 times, increasing the delay time between each try:

<?php

use Zenaton\Interfaces\TaskInterface;
use Zenaton\Traits\Zenatonable;

class SimpleTask implements TaskInterface
{
    use Zenatonable;

    public function handle()
    {
        // [...] task implementation
    }

    public function onErrorRetryDelay($exception)
    {
        // The retry index starts at 1 and increases by one for every retry.
        // This can be used to to increase the time between each attempt.
        $n = $this->getContext()->getRetryIndex();
        if ($n > 3) {
            return false;
        }

        return $n * 60;
    }
}
const { Task } = require("zenaton");

module.exports = Task("SimpleTask", {

    async handle() {
        // [...] task implementation
    },

    onErrorRetryDelay(exception) {
        // The retry index starts at 1 and increases by one for every retry.
        // This can be used to to increase the time between each attempt.
        const n = this.context.retryIndex;
        if (n > 3) {
            return false;
        }

        return n * 60;
    }
});
const { task } = require("zenaton");

module.exports = task("SimpleTask", {

    async handle() {
        // [...] task implementation
    },

    onErrorRetryDelay(exception) {
        // The retry index starts at 1 and increases by one for every retry.
        // This can be used to to increase the time between each attempt.
        const n = this.context.retryIndex;
        if (n > 3) {
            return false;
        }

        return n * 60;
    }
});
class SimpleTask < Zenaton::Interfaces::Task
    include Zenaton::Traits::Zenatonable

    def handle
        # [...] task implementation
    end

    def on_error_retry_delay(exception)
        # The retry index starts at 1 and increases by one for every retry.
        # This can be used to to increase the time between each attempt.
        n = @context.retry_index
        if n > 3
            false
        else
            n * 60
        end
    end
end
from zenaton.abstracts.task import Task
from zenaton.traits.zenatonable import Zenatonable

class SimpleTask(Task, Zenatonable):

    def handle(self):
        # [...] task implementation

    def on_error_retry_delay(self, exception):
        # The retry index starts at 1 and increases by one for every retry.
        # This can be used to to increase the time between each attempt.
        n = self._context.retry_index
        if n > 3:
            return False

        return n * 60

You can implement the onErrorRetryDelay method in any manny that suits your needs. Here is an example of an exponential-backoff strategy that is widely used:

public function onErrorRetryDelay($exception)
{
    $n = $this->getContext()->getRetryIndex();

    return $n <= 12 ? 5 * mt_rand(0, 2 ** $n) : false;
}
onErrorRetryDelay(exception) {
    const n = this.context.retryIndex;
    const rand = (min, max) => Math.floor(Math.random() * (max - min + 1)) + min;

    return n <= 12 ? 5 * rand(0, 2 ** n) : false;
}
onErrorRetryDelay(exception) {
    const n = this.context.retryIndex;
    const rand = (min, max) => Math.floor(Math.random() * (max - min + 1)) + min;

    return n <= 12 ? 5 * rand(0, 2 ** n) : false;
}
def on_error_retry_delay(exception)
    n = @context.retry_index

    if n < 12
        5 * rand(0..(2 ** n))
    else
        false
    end
end
import random

def on_error_retry_delay(self, exception):
    n = self._context.retry_index
    if n < 12:
        return 5 * random.randint(0, 2 ** n)
    else:
        return False

When a task has a configured automatic retried, it will still be displayed as an error on the dashboard and you will still have the option to manually retry it on the dashboard. This would allow you to quickly retry a task rather than wait for the automatic retry.

A task can be retried a max of 100 'tries' using automatic retry. When this limit is reached, the task will still be displayed in the list of errors on the dashboard and you will have the option to retry it manually. When a task is manually retried, the automatic retry count restarts.

Alerting

You will receive alerts whenever an error occurs for a task or workflow so that you can log into your dashboard and investigate or retry the task or resume the workflow.

According to your alerting preferences you can receive the following emails:

  • Immediate email for the first daily occurrence of a task or decision error (including timeouts)
  • A daily summary of all the errors from the day before, if any

Timeouts

Timeouts can occur for different reasons - lack of response from an API or launching workflows without the sources, etc. If there is a timeout error for your tasks or workflows it will appear in the list of errors where you can see the details.

Note that a timeout occurs if a task lasts more than 5 minutes (max processing time) or if a decision lasts more than 30 seconds.