CI/CD for ETL/ELT SQL pipelines | Dataform

CI/CD for ETL/ELT SQL pipelines

Configure CI/CD workflows for your Dataform SQL pipelines to prevent project breakages.

Data teamsData stacks
picture of author

Ben Birt on June 5, 2020

One of Dataform’s key motivations has been to bring software engineering best practices to teams building ETL/ELT SQL pipelines. To further that goal, we recently launched support for you to run Continuous Integration (CI) checks against your Dataform projects.

What is CI/CD?

CI/CD is a set of processes which aim to help teams ship software quickly and reliably.

Continuous integration (CI) checks automatically verify that all changes to your code work as expected, and typically run before the change is merged into your Git master branch. This ensures that the version of the code on the master branch always works correctly.

Continuous deployment (CD) tools automatically (and frequently) deploy the latest version of your code to production. This is intended to minimize the time it takes for new features or bugfixes to be available in production.

CI/CD for Dataform projects

Dataform already does most of the CD gruntwork for you. By default, all code committed to the master branch is automatically deployed. For more advanced use cases, you can configure exactly what you want to be deployed and when using environments.

CI checks, however, are usually configured as part of your Git repository (usually hosted on GitHub, though Dataform supports other Git hosting providers).

How to configure CI checks

Dataform distributes a Docker image which can be used to run the equivalent of Dataform CLI commands. For most CI tools, this Docker image is what you'll use to run your automated checks.

If you host your Dataform Git repository on GitHub, you can use GitHub Actions to run CI workflows. This post assumes you’re using GitHub Actions, but other CI tools are configured in a similar way.

Here’s a simple example of a GitHub Actions workflow for a Dataform project. Once you put this in a .github/workflows/<some filename>.yaml file, GitHub will run the workflow on each pull request and commit to your master branch.

name: CI

      - master
      - master

    runs-on: ubuntu-latest
      - name: Checkout code into workspace directory
        uses: actions/checkout@v2
      - name: Install project dependencies
        uses: docker://dataformco/dataform:1.6.11
          args: install
      - name: Run dataform compile
        uses: docker://dataformco/dataform:1.6.11
          args: compile

This workflow runs dataform compile - this means that if the project fails to compile, the workflow will fail, and this will be reflected in the GitHub UI.

Note that it’s possible to run any dataform CLI command in a CI workflow. However, some commands do need credentials in order to run queries against your data warehouse. In these circumstances, you should encrypt those credentials and commit the encrypted file to your Git repository. Then, in your CI workflow, you decrypt the credentials so that the Dataform CLI can use them.

We recently hosted an office hours where we talk through this process in more detail:

For further details on configuring CI/CD for your Dataform SQL pipelines, please see our docs. As always, if you have any questions, or would like to get in touch with us, please send us a message on Slack!

More content from Dataform

Dataform is joining Google Cloud


Dataform is joining Google Cloud

Today we're excited to announce that we've joined Google Cloud. Over the course of the past several months, our partnership with Google Cloud has deepened and we believe that our new combined efforts can make our customers and partners even more successful.
Learn more
Using Javascript to Create an Acquisition Funnel Dataset illustration

Using Javascript to Create an Acquisition Funnel Dataset

How to use a combination of SQL and Javascript to create an acquisition funnel dataset.
Learn more
How to add a new warehouse adapter to Dataform illustration

How to add a new warehouse adapter to Dataform

I recently wrote a Presto adapter for Dataform during a week long quarterly hackathon. Here's a guide on how I'd do it again.
Learn more

Learn more about the Dataform data modeling platform

Dataform brings open source tooling, best practices, and software engineering inspired workflows to advanced data teams that are seeking to scale, enabling you to deliver reliable data to your entire organization.