Using environments with SQL pipelines | Dataform
Product Update

Using environments with SQL pipelines

At Dataform, we believe that analytics should follow software engineering best practices and therefore use multiple deployment environments to test SQL code.

Features
picture of author

Josie Hall on March 13, 2020

A recommended practice for any software development process is the use of multiple deployment environments. Using multiple environments ensures that your code is well tested before it is deployed to production and made available to the end user.

An example setup would have three main environments:

  • Development: The development environment is your local computer. Here, is where you deploy code and test any newly implemented features. Any bugs found are dealt with before re-deploying for further testing.
  • Staging: The staging environment is as similar to the production environment as it can be. You'll have all of the code on a server this time instead of a local machine. Any configuration changes or major version updates can be tested here.
  • Production: Every time you talk about making your project live, this is the environment you are talking about. The production environment is where users access the final code after all of the updates and testing.

At Dataform, we believe that analytics should follow software engineering best practices and therefore SQL based pipelines should be subject to the same rigorous testing process.

Environments in Dataform

By default, Dataform runs all of your project code from your project's master Git branch. Configuring environments allows you to control this behaviour, enabling you to run multiple different versions of your project code and run schedules against them.

An environment is effectively a wrapper around a version of your project code. Once you have defined an environment, and added some schedules to that environment, Dataform runs all of those schedules using the environment's version of the project code.

Using separate databases for development and production data

Some teams may not be at the stage where they require a staging environment, but still would like to keep development and production data separated. A clean and simple way to do this is to use a different database for each environment. This can be done using a <configOverride> in the production environment.

In the example below:

  • any code deployed during development will use <defaultDatabase> from the dataform.json file
  • schedules are defined in the production environment, and so use the <defaultDatabase>from that environment's <configOverride>
  • the production environment specifies the master Git branch, so all of its schedules will run using the latest version of the code

dataform.json:

{
  "warehouse": "bigquery",
  "defaultSchema": "dataform_data",
  "defaultDatabase": "analytics-development"
}

environments.json:

{
  "environments": [
    {
      "name": "production",
      "gitRef": "master",
      "configOverride": { "defaultDatabase": "analytics-production" },
      "schedules": [ ... ]
    }
  ]
}

Note that the <defaultDatabase> setting is only supported for BigQuery and Snowflake. For other warehouses, we recommend overriding schema suffixes (read the docs here).

Managing a production release process using environments

For more advanced Dataform projects a common use-case for environments is to run a staged release process. All changes to project code go into a staging environment which is intentionally kept separate from production. Once the code in staging has been verified to be sufficiently stable, that version of the code is then promoted to the production environment.

Read more about how to configure environments in our docs.

As always if you have any feedback on features please message us on Slack or email us at team@dataform.co

More content from Dataform

What can data teams learn from Google’s State of DevOps report? illustration
Opinion

What can data teams learn from Google’s State of DevOps report?

Google's research shows us that elite engineering teams have a lot in common when it comes to DevOps. We think similar principles apply to data teams or, more precisely, DataOps.
Learn more
Cut data warehouse costs with run caching illustration
Product Update

Cut data warehouse costs with run caching

How to save time and money by using our run caching feature
Learn more
Building the Dataform VS Code extension illustration
Guide

Building the Dataform VS Code extension

How we made our own extension for Visual Studio Code.
Learn more

Learn more about the Dataform data modeling platform

Dataform brings open source tooling, best practices, and software engineering inspired workflows to advanced data teams that are seeking to scale, enabling you to deliver reliable data to your entire organization.