Using Javascript to Create an Acquisition Funnel Dataset | Dataform
Tutorial

Using Javascript to Create an Acquisition Funnel Dataset

How to use a combination of SQL and Javascript to create an acquisition funnel dataset.

Javascript
picture of author

Emily Wilson @ Echo on November 9, 2020

I work as a data analyst on the growth team at Echo, where a key goal is to improve signup conversion rates. We use acquisition funnel datasets to get visibility into where users are dropping off in the sign up process. To manually write all of the SQL queries needed for this would be time consuming and difficult to manage. Using a combination of SQL and JavaScript makes this process much easier.

I use Dataform to do this, which provides a place to write and save the queries, and runs them on schedules which publish up to date data every hour to a table in our warehouse (BigQuery). This table can then be queried for analyses and to create visualisations.

This post aims to explain the steps to create this dataset.

Step 1: Define the Funnel Steps

Firstly, we need to set up a data structure that defines each stage of the funnel.

In a file called funneldata.js in the includes folder, a list is created that will contain one entry for each funnel. Each funnel has a name (e.g. signup_funnel), and a list of steps (e.g. landing, start_signup, verify_email). Each step has a name, and a list of tracks that indicate the user has reached that stage of the funnel. In our case, we use specific frontend tracks that we get from Segment. Some steps have multiple tracks because there are multiple ways a user could reach that step - for example on the landing step, users can either reach it through landing on the homepage, on web, or installing the app on mobile.

const funnels = [
  {
    name: "signup_funnel",
    steps: [
      {
        name: "landing",
        events: ["application_installed", "home_page"]
      },
      {
        name: "start_signup",
        events: ["signup_started"]
      },
      {
        name: "verify_email",
        events: ["email_verified"]
      },
      {
        name: "signup_complete",
        events: ["signup_complete"]
      }
    ]
  }
];

module.exports = funnels;

Step 2: Create One Table per Step

Next we create another file called funnel_steps.js, in the definitions folder. This creates one table per funnel step, which has one row per user, and records when they first reached that step of the funnel.

  1. We use the ForEach JavaScript function to loop through each funnel (we only have one for now, but we may add more later) and then loop through each step in the funnel.
funnels.forEach(funnel => {
    funnel.steps.forEach((step, i) => {
  1. For each step, we use the publish and config functions (Dataform-specific functions) to create a table in our data warehouse with a unique name (the funnel name & step name combined).
  publish(`${funnel.name}_${step.name}`).config({
      type: "table",
      schema: "funnel_data"
    }).query(
    ctx => {
  1. We add a variable to the SQL query which means for every step after the first, the where clause will have an additional filter on the user id being in the previous step. This ensures that the funnel is sequential, i.e. users have to have hit the first step to be recorded in the second step, and so on. The ${ctx.ref( )} syntax is another Dataform-specific feature, which ensures that the previous step's table is published before the current table, to ensure data is up to date.
let where = "";
if (i > 0) {
  let previousStep = `${funnel.name}_${funnel.steps[i - 1].name}`;
  where = `and user_id in (SELECT user_id from ${ctx.ref(previousStep)})`;
}
  1. Lastly we write the SQL query, plugging in the variables. The bulk of the query is hard-coded as it’s the same for each funnel step. The variables are the parts that can change for each step - the step name, step number, the frontend tracks for the step that we defined in funneldata.js, and the optional additional filter using the where clause.
 return `
            SELECT * EXCEPT(row_number) FROM (
	            SELECT
                user_id,
                "${step.name}" as step_name,
                ${i} as step_number,
                timestamp as step_started_at,
                row_number() OVER (PARTITION BY user_id ORDER BY timestamp asc) as row_number
	            FROM ${ctx.ref("segment")}
	            where lower(event) in ("${step.events.join('", "')}")
	            ${where}
	            GROUP BY 1,2,3,4
		          )
            where row_number = 1
				`
            }
        );
    });
})

This is the SQL that the JavaScript generates for the first step:

SELECT * EXCEPT(row_number) FROM (
  SELECT
    user_id,
    "landing" as step_name,
    0 as step_number,
    timestamp as step_started_at,
    row_number() OVER (PARTITION BY user_id ORDER BY timestamp asc) as row_number
  FROM `api.segment`
  WHERE lower(event) in ("application_installed", "home_page")
  GROUP BY 1,2,3,4
  )
where row_number = 1

This is an example of the SQL generated by the JavaScript for the subsequent steps:

SELECT * EXCEPT(row_number) FROM (
	SELECT
	  user_id,
	  "start_signup" as step_name,
	  1 as step_number,
	  timestamp as step_started_at,
	  row_number() OVER (PARTITION BY user_id ORDER BY timestamp asc) as row_number
	FROM `api.segment`
	where lower(event) in ("signup_started")
		and user_id in (SELECT user_id from `funnel_data.signup_funnel_landing`)
	GROUP BY 1,2,3,4
	)
where row_number = 1

Step 3: Combine the Data Into One Funnel Table

Finally, we combine the data from each of the step tables into one table that summarises the funnel - this is the one we query for analyses. We do this in a file called funnel_combine.js, which needs to be in the definitions folder in your Dataform project.

Again, we start by referencing the funneldata.js file. Then we use the ForEach function to loop through each funnel (we only have one for now). A select * statement for each step is created, and they're joined with union all. The publish and config functions are again used to create a table in the warehouse.

const funnels = require('./funneldata.js');

funnels.forEach(funnel => {
    publish(`${funnel.name}`, {
        type: "table",
        schema: "funnel_test"
    })
    .query(ctx =>
      funnel.steps
      .map(step => `select * from ${ctx.ref(`${funnel.name}_${step.name}`)}`)
      .join(`\n union all \n`)
    );
})

This is the SQL that the Javascript generates:

 select * from `funnels.signup_funnel_landing`
  union all
select * from `funnels.signup_funnel_start_signup`
  union all
select * from `funnels.signup_funnel_verify_email`
  union all
select * from `funnels.signup_funnel_signup_complete`

Running the Javascript in the final file publishes a table that looks like this to the data warehouse:

An image showing what the joint funnel table looks like

This dataset can be used to quickly get insights on activity during sign up, such as what % of users that start the funnel get to each step, and how long users spend on each step. Future funnels can be created easily too - all that needs to be done is to define the steps in the funneldata.js file, and to run the funnel_combine.js script. Anyone that wants to create a funnel dataset can do so now - no prior SQL experience is needed! Dataform makes it super easy to use JavaScript to define simple scripts, constants or macros that can be reused across your project, saving you time writing out long SQL queries. Sign up for a free Dataform account here.

More content from Dataform

Dataform is joining Google Cloud

Announcement

Dataform is joining Google Cloud

Today we're excited to announce that we've joined Google Cloud. Over the course of the past several months, our partnership with Google Cloud has deepened and we believe that our new combined efforts can make our customers and partners even more successful.
Learn more
How to add a new warehouse adapter to Dataform illustration
Guide

How to add a new warehouse adapter to Dataform

I recently wrote a Presto adapter for Dataform during a week long quarterly hackathon. Here's a guide on how I'd do it again.
Learn more
Switching to a better data warehouse naming convention illustration
Opinion

Switching to a better data warehouse naming convention

We realised that our data warehouse was starting to become a mess and difficult to use. This post walks through the process we followed to decide upon a better solution, and then roll out the change.
Learn more

Learn more about the Dataform data modeling platform

Dataform brings open source tooling, best practices, and software engineering inspired workflows to advanced data teams that are seeking to scale, enabling you to deliver reliable data to your entire organization.