CI/CD for ETL/ELT SQL pipelines | Dataform

CI/CD for ETL/ELT SQL pipelines

Configure CI/CD workflows for your Dataform SQL pipelines to prevent project breakages.

Data teamsData stacks
picture of author

Ben Birt on June 5, 2020

One of Dataform’s key motivations has been to bring software engineering best practices to teams building ETL/ELT SQL pipelines. To further that goal, we recently launched support for you to run Continuous Integration (CI) checks against your Dataform projects.

What is CI/CD?

CI/CD is a set of processes which aim to help teams ship software quickly and reliably.

Continuous integration (CI) checks automatically verify that all changes to your code work as expected, and typically run before the change is merged into your Git master branch. This ensures that the version of the code on the master branch always works correctly.

Continuous deployment (CD) tools automatically (and frequently) deploy the latest version of your code to production. This is intended to minimize the time it takes for new features or bugfixes to be available in production.

CI/CD for Dataform projects

Dataform already does most of the CD gruntwork for you. By default, all code committed to the master branch is automatically deployed. For more advanced use cases, you can configure exactly what you want to be deployed and when using environments.

CI checks, however, are usually configured as part of your Git repository (usually hosted on GitHub, though Dataform supports other Git hosting providers).

How to configure CI checks

Dataform distributes a Docker image which can be used to run the equivalent of Dataform CLI commands. For most CI tools, this Docker image is what you'll use to run your automated checks.

If you host your Dataform Git repository on GitHub, you can use GitHub Actions to run CI workflows. This post assumes you’re using GitHub Actions, but other CI tools are configured in a similar way.

Here’s a simple example of a GitHub Actions workflow for a Dataform project. Once you put this in a .github/workflows/<some filename>.yaml file, GitHub will run the workflow on each pull request and commit to your master branch.

name: CI

      - master
      - master

    runs-on: ubuntu-latest
      - name: Checkout code into workspace directory
        uses: actions/checkout@v2
      - name: Install project dependencies
        uses: docker://dataformco/dataform:1.6.11
          args: install
      - name: Run dataform compile
        uses: docker://dataformco/dataform:1.6.11
          args: compile

This workflow runs dataform compile - this means that if the project fails to compile, the workflow will fail, and this will be reflected in the GitHub UI.

Note that it’s possible to run any dataform CLI command in a CI workflow. However, some commands do need credentials in order to run queries against your data warehouse. In these circumstances, you should encrypt those credentials and commit the encrypted file to your Git repository. Then, in your CI workflow, you decrypt the credentials so that the Dataform CLI can use them.

We recently hosted an office hours where we talk through this process in more detail:

For further details on configuring CI/CD for your Dataform SQL pipelines, please see our docs. As always, if you have any questions, or would like to get in touch with us, please send us a message on Slack!

More content from Dataform

What can data teams learn from Google’s State of DevOps report? illustration

What can data teams learn from Google’s State of DevOps report?

Google's research shows us that elite engineering teams have a lot in common when it comes to DevOps. We think similar principles apply to data teams or, more precisely, DataOps.
Learn more
Cut data warehouse costs with run caching illustration
Product Update

Cut data warehouse costs with run caching

How to save time and money by using our run caching feature
Learn more
Building the Dataform VS Code extension illustration

Building the Dataform VS Code extension

How we made our own extension for Visual Studio Code.
Learn more

Learn more about the Dataform data modeling platform

Dataform brings open source tooling, best practices, and software engineering inspired workflows to advanced data teams that are seeking to scale, enabling you to deliver reliable data to your entire organization.