Troubleshooting and Versioning Workflows with Zenaton

Zenaton is a SAAS solution to orchestrate long-running processes. It includes such useful things as retrying capabilities, workload distribution, event handling, and a full-featured monitoring website.

We are going to walk you through what would happen if you are a new developer working on a team that has used Zenaton and an error occurs on the weekend and you are in charge of fixing it.

So you were given access to this website as part of your “on call” package. Today, you’re going to discover some of the dos and don’ts of Zenaton regarding decisions, and how you can fix them with the versioning capabilities.

The ‘ModifiedDecider’ error

Modified decider error

You check your Zenaton dashboard and there are 490 ModifiedDecider errors on your workflow. No less. And it’s just a screenshot because you can actually see the number rising in real time, just like your stress level.

What’s a ModifiedDecider error you ask? It has to do with idempotence. Too erudite? OK, let me explain.

Zenaton helps you to describe a series of steps you need to take to perform a certain operation. We call that a workflow. You can write your workflow in Javascript or with one of the many other supported languages. It looks like this:

module.exports = Workflow("SimpleConditionalWorkflow", async function() {
  const foo = await new WhateverTask().execute();

  if (foo === "bar") {
    await new DoThisTask().execute();
  } else {
    await new OrDoThatTask().execute();
  }
});

Workflows are typically written in natural language, which allows you to use all the usual tools: conditions, loops, recursion... Here, for example, we execute a first task called WhateverTask, and depending on the value it returns, we execute one of two other tasks, conditionally.

Now, to understand what is a ModifiedDecider error, we need to remember that those tasks are run by a scheduling system that will distribute them over your various workers — the machines where you have installed the Zenaton Agent. Every time it needs to establish what to do next, the scheduling system will ask one of your workers to execute the workflow in a process called the decision. This will make the worker have the following inner monologue:

OK, so it says I have to execute WhateverTask. Mmmmmh… But this one is already finished, and the result was the string “ounga ounga”. So let’s continue, shall we? OK, now it says I have to execute OrDoThatTask. Have I done that? No. OK, let’s schedule its execution and stop for the time being.

During the lifetime of a workflow this will happen many many times, and it’s important to remember that each of those times the whole workflow code will be executed again, from the beginning.

But of course, workflows are expected to evolve. Imagine that you now need to perform an additional step after WhateverTask:

module.exports = Workflow("SimpleConditionalWorkflow", async function() {
  const foo = await new WhateverTask().execute();
  
  await new SomethingSomethingTask().execute();

  if (foo === "bar") {
    await new DoThisTask().execute();
  } else {
    await new OrDoThatTask().execute();
  }
});

Notice the new SomethingSomethingTask? It means that your workflow is now behaving differently. But what if we already have an instance of that workflow running in the wild? In such a case, this is the inner monologue that will happen on the next decision:

OK, so it says I have to execute WhateverTask. Mmmmmh… But this one is already finished, and the result was the string “ounga ounga”. So let’s continue, shall we? OK, now it says I have to execute SomethingSomethingTask.Wait a minute… Last time at this point I was told to execute OrDoThatTask. This is a scandal! I can’t work in such conditions! Ph’nglui mglw’nafh Cthulhu!

And this is exactly how you’ll get a ModifiedDecider error being thrown… Lesson learned: you cannot arbitrarily modify a workflow once it has started to run in production.

Versioning a workflow

Now that you have a better understanding of what a ModifiedDecider error is, it’s a lot easier to investigate what might have happened with your customer acquisition pipeline. You check the source code of the corresponding Zenaton workflow, and notice that Roger updated it before the last production release. There is now an additional step right in the middle of it: AdvertizeVacuumCleanerTask. But wait! There’s more!

const day = new Date().getDay();
const isWeekend = (day === 6) || (day === 0);

if (isWeekend) {
  await new AdvertizeVacuumCleanerTask().execute();
}

This additional step is conditioned by the current day of the week being on a weekend. Sure, why not? But if you followed our lesson on ModifiedDecider errors carefully, you’ll remember that workflows need to be idempotent. That is: however many times you execute them, they need to yield the same results. Here, depending on the day a decision will occur, AdvertizeVacuumCleanerTask may or may not be scheduled for execution. You might argue that this is not a problem if your workflow doesn’t take more than a day to execute entirely, but those are called long-running workflows for a reason and you shouldn’t rely on that. For example, just imagine that one of those workflows start on a Friday at 11:59 pm and that one of the following decisions runs the next Saturday at 00:10 am. Get the idea?

To fix this, we are going to take two actions:

First, we need to refactor the code to externalize the calculation of whether or not this is a weekend day in a separate task.
Second, we need to fix once and for all the ModifiedDecider error. To achieve this, we are going to leverage the versioning capability of Zenaton.

The first step is easy enough. Let’s just create a IsWeekendTask.

Don’t forget that in Javascript, Zenaton’s tasks must return a promise. async will do the trick.

const { Task } = require("zenaton");

module.exports = Task("IsWeekendTask", async function () {
  const day = new Date().getDay();
  const isWeekend = (day === 6) || (day === 0);
  return isWeekend;
});

Then we can use it in CustomerAcquisitionWorkflow:

const { Workflow } = require("zenaton");

const WelcomeCustomerTask = require("../tasks/WelcomeCustomerTask");
const IsWeekendTask = require("../tasks/IsWeekendTask");
const AdvertizeVacuumCleanerTask = require("../tasks/AdvertizeVacuumCleanerTask");

module.exports = Workflow("CustomerAcquisitionWorkflow", async function () {
  await WelcomeCustomerTask().execute();
  
  const isWeekend = await new IsWeekendTask().execute();

  if (isWeekend) {
    await new AdvertizeVacuumCleanerTask().execute();
  }
  
  /* Here more tasks... */
});

At this point, you might ask: what’s the difference? We’re still conditioning a task execution based on whether or not we’re on the weekend? True. But you see, the calculation of whether or not we’re on the weekend will happen only once because Zenaton caches task results. When executing a workflow to make a decision, every time we encounter a task already completed, we will use the result it yielded at the time of its completion. This is what guarantees the coherency of your workflows.

And this is also why versioning is important. With long-running workflows that can take days, weeks or mere months to execute entirely, you have to allow Zenaton to figure out which code source was used to bootstrap whichever instance is currently running in production.

Versioning a workflow boils down to making a copy of its last definition, give it a different name which reflects its difference — and can be as simple as ...V2 — and add it at the end of the list of versions of the workflow.

To continue on our customer acquisition pipeline example, this would translate like this:

Rename the existing file containing the workflow CustomerAcquisitionWorkflowV1.js, and make a copy of it named CustomerAcquisitionWorkflowV2.js.
Revert changes for CustomerAcquisitionWorkflowV1.js to the original source code of the workflow the way it was before Roger put his paws on it, and rename it CustomerAcquisitionWorkflowV1.
In the CustomerAcquisitionWorkflowV2.js copy that you have made, rename the workflow to CustomerAcquisitionWorkflowV2.
Now create a new file CustomerAcquisitionWorkflow.js — same name than the original workflow file used to be — which will be the new root definition of the workflow and will list its various versions.

const { Version } = require("zenaton");

const CustomerAcquisitionV1 = require("./CustomerAcquisitionWorkflowV1");
const CustomerAcquisitionV2 = require("./CustomerAcquisitionWorkflowV2");

module.exports = Version("CustomerAcquisitionWorkflow", [
  CustomerAcquisitionWorkflowV1,
  CustomerAcquisitionWorkflowV2,
]);

Finally, publish this new source code to all your machines using the Zenaton Agent, and that’s it. You’re done!

What’s going to happen from now on is:

For already running instances of CustomerAcquisitionWorkflow, which were versionless, Zenaton will take what we call the “initial version”, which is always the first in the list.
For newly starting instances of CustomerAcquisitionWorkflow, Zenaton will take the last version in the list.

In both circumstances, the workflows are guaranteed to run the way they were intended to when they started.

Retry failed tasks

We are not completely done yet. We still have a few hundred workflows stuck because of a ModifiedDecider error. Should we cancel and restart them? Of course not! That would potentially replay some already completed tasks.

The good answer is a lot simpler than that: just go back to the dashboard, and click the “Retry all” button in front of the error. Failed decisions will be re-executed, and since the workflow is now properly idempotent, they should complete normally, allowing your pipeline to resume business as usual. Actually, you can easily control this by jumping to the “Workflows” part of the Zenaton dashboard, click on CustomerAcquisitionWorkflow.

versionning workflow

Let yourself be hypnotized by the sweet ballet of your workflow’s instances starting and completing in unison. Production and your weekend are saved. So much for Roger.

jim carrey gif

Troubleshooting and Versioning Workflows with Zenaton

The ‘ModifiedDecider’ error

Versioning a workflow

Retry failed tasks

Replay of Live Coding Session: Personalized workflows for a customer activation process

Zenaton is live on Product Hunt

How to replace crons with Zenaton

Subscribe for Updates