line

Tutorial: ETL

A simple ETL workflow using an open API that allows you to call an API, run calculations and send the data to a google sheet.

Here is a simple ETL workflow using an open API that allows you to call an API, run calculations and send the data to a google sheet:

  • Calls the SWAPI (star wars API) using an http request
  • Pulls a list of Star Wars films and characters
  • Calculates which characters appear in at least 5 films
  • Pulls data for those characters
  • Saves the results to a Google Sheet
  • Sends a report by email using Sendgrid.

View the Github project. View all of the files, fork the project and deploy to Heroku in a few clicks.

Tutorial Steps

  1. Launch a workflow and view the details in 'Workflow Executions'.
  2. Push the aggregated data to GoogleSheet using connectors.
  3. Send a report by email using the Sendgrid connector.
  4. Deploy to Heroku or to another hosting solution

What you will learn

  • how to do HTTP requests within a workflow
  • how to use the Zenaton API connectors for GoogleSheet and Sendgrid

1) Launch the workflow

Launch a workflow instance using HTTP

curl -X POST https://gateway.zenaton.com/rest-api/v1/instances \
  -H "Content-Type: application/json" \
  -H "Accept: application/json" \
  -H "app-env: dev" \
  -H "app-id: FBKTJXUMLO" \
  -H "api-token: Bt29HVlvrY9atH41LNwnAp5MQYkEut8oCqXKH8NGOjUAtJltB6Sea82FbP2P" \
  -d '{"name":"SimpleEtl","input":[],"version":"v1"}'

or via ou SDK

const { run } = require("../client");

run.workflow("SimpleEtl");

This type of ETL workflows are usually scheduled periodically, see our schedule feature.

The first thing the workflow will do, is to call SWAPI using the http connector

It is a public API so we won't need to provide any credentials.

module.exports.handle = function*() {
  const http = this.connector("http");

  const films = (yield http.get("https://swapi.co/api/films/")).data.results;
}

You can follow the real-time execution on your dashboard.

To see the details of the API request, click on the first "http:get" step to see the response of the SWAPI API call.

In the "Task's Output", under 'data > results', you will see the API result:

Then for the second phase "Transformation" it's mostly plain javascript using the map, filter, reduce functions to run calculations.

 // Extract characters ids of all films, some can appear multiple times.
  const characters = films.flatMap(x => x.characters);


  // Count the number of films per characters ids.
  const nbMoviesByCharacter = characters.reduce(
    (acc, c) => ({ ...acc, [c]: 1 + (acc[c] || 0) }),
    {}
  );

  // Filter the characters ids to keep only those that appears in at least that 4 movies.
  const famousIds = Object.keys(nbMoviesByCharacter)
    .map(uri => ({ uri, nb: nbMoviesByCharacter[uri] }))
    .filter(x => x.nb >= 4);

Now that we have extracted the ids of the most famous characters, let's get some details for each of them using the Swapi API.

Here it's a simple loop that call the SWAPI API for every famous character's id.

// Get details on those most famous characters.
  let famousCharacters = [];
  for (const c of famousIds) {
    // Get the details about a given character from the API
    const characterData = (yield http.get(c.uri)).data;
    famousCharacters.push({
      birth_year: characterData.birth_year,
      name: characterData.name,
      height: characterData.height,
      movies: nbMoviesByCharacter[c.uri]
    });
  }

On the dashboard, you can see an "http:get" step for each API calls, so you can see the details of the request and the response.

Here we have the details of the characters with the id "2".

2) Send data to GoogleSheet (optional)

Now we will push the results to GoogleSheet using the Zenaton API connector function.

To set up a connection to your GoogleSheets account, go to the connectors section of your dashboard : search for "sheet", and click the "Add" button.

Then follow the authentication prompts for OAuth. Google sheets will appear in your list of connectors.

The ID 6cf48070-1b2e-11ea-b11c-c54519488059 will be your GoogleSheets connector id and Zenaton will manage the authentication for all API calls that you make.

Then get the id of a GoogleSheet spreadsheet, from it's url ( 13AJpMxxxxxxxxxxxxxxxxxxxxxx in the example below)

https://docs.google.com/spreadsheets/d/13AJpMxxxxxxxxxxxxxxxxxxxxxx/edit

Uncomment the body of the sendToGoogleSheet function and set the googleSheetConnectorId and spreadsheetId variables at the top of the code in the workflow/SimpleEtl.js file.

Here is the snippet of code in the workflow that pushes the data to GoogleSheets:

const googleSheetConnectorId= "6cf48070-1b2e-11ea-b11c-c54519488059";
const googleSheets = this.connector("google_sheets", googleSheetConnectorId);
const spreadsheetId = "13AJpMxxxxxxxxxxxxxxxxxxxxxx";

const headers = ["Birth year", "Name", "Height", "Movies"];
const cells = famousCharacters.map(x => Object.values(x));

// Send to Google Sheet
const sendToGoogleSheet = function*(googleSheets, headers, cells) {
  googleSheets.post(`/${spreadsheetId}/values:batchUpdate`, {
    body: {
      valueInputOption: "USER_ENTERED",
      data: [{
          range: "Sheet1!A1:D40",
          majorDimension: "ROWS",
          values: [headers].concat(cells)
      }]
    }
  });
};

Now launch the workflow again and view the "google_sheets:post" with the details of the API call

Then check the GoogleSheet to see the report.

3) Send a report by email using Sendgrid (optional)

Set up the Sendgrid connector from your dashboard as you did with GoogleSheets.

Uncomment the body of the `sendByEmail` function and set the `sendgridConnectorId` and `yourEmail` variables at the top of the code in the workflow/SimpleEtl.js file.

const sendgrid = this.connector("sendgrid", sendgridConnectorId);

const sendByEmail = function*(sendGrid, from, to, headers, cells) {
  const payload = {
    body: {
      personalizations: [{ to: [{ email: to }] }],
      content: [{
          type: "text/plain",
          value: "Here is your result: \n" + JSON.stringify([headers].concat(cells), null ,2)
      }],
      subject: "SW results",
      from: { email:  from}
    }
  };

  sendGrid.post("/mail/send", payload);
};

Now launch the workflow again, and view the API request/response details on the dashboard in the step called "sendgrid:post"

Then check your email for the report.

Workflow code

View the Github project. View all of the files, fork the project and deploy to heroku in a few clicks.

const googleSheetConnectorId = "";
const spreadsheetId = "";
const sendgridConnectorId = "";
const yourEmail = "";
const emailFrom = ["zenaton-tutorial@zenaton.com](mailto:%22zenaton-tutorial@zenaton.com)";

module.exports.handle = function*() {
const http = this.connector("http");
const googleSheets = this.connector("google_sheets", googleSheetConnectorId);
const sendgrid = this.connector("sendgrid", sendgridConnectorId);

/*
EXTRACT Phase
*/

// Get the list of Star wars films including the characters list as ids.
const films = (yield http.get("[https://swapi.co/api/films/](https://swapi.co/api/films/)")).data.results;

// Extract characters ids of all films, some can appear multiple times.
const characters = films.reduce((acc, x) => [...acc, ...x.characters], []);
/*
TRANSFORM Phase
*/

// Count the number of films per characters ids.
const nbMoviesByCharacter = characters.reduce(
(acc, c) => ({ ...acc, [c]: 1 + (acc[c] || 0) }),
{}
);

// Filter the characters ids to keep only those that appears in at least that 4 movies.
const famousIds = Object.keys(nbMoviesByCharacter)
.map(uri => ({ uri, nb: nbMoviesByCharacter[uri] }))
.filter(x => x.nb >= 4);

// Get details on those most famous characters.
let famousCharacters = [];
for (const c of famousIds) {
// Get the details about a given character from the API
const characterData = (yield http.get(c.uri)).data;
famousCharacters.push({
birth_year: characterData.birth_year,
name: [characterData.name](http://characterdata.name/),
height: characterData.height,
movies: nbMoviesByCharacter[c.uri]
});
}

/*
LOAD phase
*/

// Save da to GoogleSheet
const headers = ["Birth year", "Name", "Height", "Movies"];
const cells = famousCharacters.map(x => Object.values(x));

// send results to google sheets
yield* sendToGoogleSheet(googleSheets, headers, cells);

// send results by email using sendgrid
yield* sendByEmail(sendgrid, emailFrom, yourEmail, headers, cells);
};

// Send to Google Sheet Task
const sendToGoogleSheet = function*(googleSheets, headers, cells) {
/*
googleSheets.post(`/${spreadsheetId}/values:batchUpdate`, {
body: {
valueInputOption: "USER_ENTERED",
data: [{
range: "Sheet1!A1:D40",
majorDimension: "ROWS",
values: [headers].concat(cells)
}]
}
});
*/
};

// Send By Email Task
const sendByEmail = function*(sendGrid, from, to, headers, cells) {
/*
const payload = {
body: {
personalizations: [{ to: [{ email: to }] }],
content: [{
type: "text/plain",
value: "Here is your result: \n" + JSON.stringify([headers].concat(cells), null ,2)
}],
subject: "SW results",
from: { email: from}
}
};

sendGrid.post("/mail/send", payload);
*/
};