Using environments with SQL pipelines | Dataform
Product Update

Using environments with SQL pipelines

At Dataform, we believe that analytics should follow software engineering best practices and therefore use multiple deployment environments to test SQL code.

Features
picture of author

Josie Hall on March 13, 2020

A recommended practice for any software development process is the use of multiple deployment environments. Using multiple environments ensures that your code is well tested before it is deployed to production and made available to the end user.

An example setup would have three main environments:

  • Development: The development environment is your local computer. Here, is where you deploy code and test any newly implemented features. Any bugs found are dealt with before re-deploying for further testing.
  • Staging: The staging environment is as similar to the production environment as it can be. You'll have all of the code on a server this time instead of a local machine. Any configuration changes or major version updates can be tested here.
  • Production: Every time you talk about making your project live, this is the environment you are talking about. The production environment is where users access the final code after all of the updates and testing.

At Dataform, we believe that analytics should follow software engineering best practices and therefore SQL based pipelines should be subject to the same rigorous testing process.

Environments in Dataform

By default, Dataform runs all of your project code from your project's master Git branch. Configuring environments allows you to control this behaviour, enabling you to run multiple different versions of your project code and run schedules against them.

An environment is effectively a wrapper around a version of your project code. Once you have defined an environment, and added some schedules to that environment, Dataform runs all of those schedules using the environment's version of the project code.

Using separate databases for development and production data

Some teams may not be at the stage where they require a staging environment, but still would like to keep development and production data separated. A clean and simple way to do this is to use a different database for each environment. This can be done using a <configOverride> in the production environment.

In the example below:

  • any code deployed during development will use <defaultDatabase> from the dataform.json file
  • schedules are defined in the production environment, and so use the <defaultDatabase>from that environment's <configOverride>
  • the production environment specifies the master Git branch, so all of its schedules will run using the latest version of the code

dataform.json:

{
  "warehouse": "bigquery",
  "defaultSchema": "dataform_data",
  "defaultDatabase": "analytics-development"
}

environments.json:

{
  "environments": [
    {
      "name": "production",
      "gitRef": "master",
      "configOverride": { "defaultDatabase": "analytics-production" },
      "schedules": [ ... ]
    }
  ]
}

Note that the <defaultDatabase> setting is only supported for BigQuery and Snowflake. For other warehouses, we recommend overriding schema suffixes (read the docs here).

Managing a production release process using environments

For more advanced Dataform projects a common use-case for environments is to run a staged release process. All changes to project code go into a staging environment which is intentionally kept separate from production. Once the code in staging has been verified to be sufficiently stable, that version of the code is then promoted to the production environment.

Read more about how to configure environments in our docs.

As always if you have any feedback on features please message us on Slack or email us at dataform-support@google.com

More content from Dataform

Dataform is joining Google Cloud

Announcement

Dataform is joining Google Cloud

Today we're excited to announce that we've joined Google Cloud. Over the course of the past several months, our partnership with Google Cloud has deepened and we believe that our new combined efforts can make our customers and partners even more successful.
Learn more
Using Javascript to Create an Acquisition Funnel Dataset illustration
Tutorial

Using Javascript to Create an Acquisition Funnel Dataset

How to use a combination of SQL and Javascript to create an acquisition funnel dataset.
Learn more
How to add a new warehouse adapter to Dataform illustration
Guide

How to add a new warehouse adapter to Dataform

I recently wrote a Presto adapter for Dataform during a week long quarterly hackathon. Here's a guide on how I'd do it again.
Learn more

Learn more about the Dataform data modeling platform

Dataform brings open source tooling, best practices, and software engineering inspired workflows to advanced data teams that are seeking to scale, enabling you to deliver reliable data to your entire organization.