Data Engineers | Dataform

Build SQL pipelines that scale

Create and manage a centralized data model the same way you build software. SQLX extends the BigQuery SQL dialect to add features that support dependency management, testing, documentation, and more.

Where Dataform fits in your stack

Dataform enables analysts to manage all data processes in your warehouse, turning raw data into the clean datasets you need for analytics.

How does SQLX work?

SQLX is an open source extension of SQL. SQLX brings additional features to SQL to make development faster, more reliable, and scalable.

Define data with SQLX


config { type: "view" }

select
  country         as country
  device_type     as device_type
  sum(page_views) as page_views
  sum(sessions)   as sessions
from ${ref("sessions")}
group by 1, 2

Define everything as a select statement. Tell Dataform what type of relation you want to create. Dataform manages any create, insert, and drop boilerplate. See documentation.

Manage dependencies


select
  *
from ${ref("users")}
left join ${ref("sessions")} on (user_id)
left join ${ref("pageviews")} on (user_id)
group by 1, 2

Use Dataform's built-in ref function to automatically build a dependency graph between datasets and actions. See documentation.

Test data quality


tests {
  assertions: {
    uniqueKeys: [
      "column1",
      ["column1", "column2"],
    ],
    nonNull: ["column1"],
    rowConditions: [
      "column1 > 0",
      "column2 is null or column2 >= column1"
    ]
  }
}

Define data quality checks in the same file as the model you're testing. Automatically receive alerts when those checks fail. See documentation.

Write documentation

docs {
  description: "Stats about app sessions",
  columns: {
    country: "Country of the user",
    device_type: "Either 'mobile', 'desktop' or 'tablet'"
  }
}

Write data documentation that is automatically included in Dataform's data catalog. Share this with your team, or integrate with other data cataloging tools. See documentation.

Dataform deployment options

</>

Dataform Web (hosted)

Dataform Web is a fully-managed web-based interface for developing, deploying, and managing Dataform projects. It is free to use, and saves you the hassle of managing your own infrastructure.

</>

Dataform CLI (open source)

The core Dataform framework is open source, and is bundled with a command-line tool that can be used to initialize, test, and run Dataform projects. The Dataform CLI is useful for local development, and can also be integrated into an Airflow pipeline using Airflow’s bash operator.

Change driven, observable, data pipelines

Dataform detects changes in your source data and only updates downstream datasets when it needs to, helping you reduce end to end data latency and saving on BigQuery costs.

contacts
salesforce
updated
identifies
segment
source
tracks
segment
source
customers
dataform
success
customer_stats
dataform
success
users
segment
cached
domain_stats
dataform
error
daily_customer_stats
dataform
running
</>

Cut BigQuery costs with run caching

With run caching enabled, Dataform will only update your datasets if either their definition, or the source data has changed, keeping your pipelines fast and reducing BigQuery costs.

</>

Designed for enterprise scale

Dataform is designed to scale to 1000s of data models, and can compile your entire project into SQL in seconds, giving you instant real-time feedback on your code even as your team and complexity grows.

A wealth of learning resources

Join our growing community of data leaders who use Dataform to empower their teams, streamline workflows, and answer their toughest questions.

Powerful and extensible data modeling

Designed for modern BigQuery engineering-driven data teams.

Incremental table loads

Save time and money with incremental table loads which only process new data.

Reusable SQL snippets

Write common, parameterized SQL code once and reuse it across multiple queries.

Extensible APIs

Using the JavaScript API and custom SQL operations, you can extend Dataform to your own requirements.

SQL unit testing

For your most critical SQL queries and business logic, write unit tests and mock data sources for reliable testing.

CI/CD integration

Use Dataform's pre-built Docker images to integrate testing and deployment into your own CI/CD tools.

Snapshots

Create snapshots of datasets every day, week or month for easy archival.