You can’t make informed decisions if you don’t trust your underlying data. Dataform helps your team automate data quality testing to ensure that data in BigQuery is accurate for all your analytics projects.
Test for nulls, uniqueness, expected values or custom logic against all your columns in the same file as your SQL code.
Write tests against any error condition by writing custom assertions in SQL.
Optionally add assertions as dependencies to your tables and views to ensure bad data doesn’t get populated further down your pipeline.
tests {
assertions: {
uniqueKeys: [
"column1",
["column1", "column2"],
],
nonNull: ["column1"],
rowConditions: [
"column1 > 0",
"column2 is null or column2 >= column1"
]
}
}
select * from ...
Being able to produce analytics tables that we are confident in the output of (because of assertions) and are as up to date as we need them to be (because of scheduling) makes our lives really easy. The UI is incredibly easy and intuitive to use, meaning we spend little of our time setting these things up, and most of our time writing SQL!
I love the dependency tree in Dataform. For me this is a central place for sanity checking my data flows, understanding if I'm reimplementing a dataset which already exists, and verifying logic. Secondly, I love SQLX for generating SQL of a similar structure again and again, it really speeds up development and let's your abstract away logic.
Having modeled data using other tools in the past, this is much simpler and an easier environment to code in. The code compiles in real time and lets you know if there are errors in the syntax. It also helps generate a dependency graph for the data pipeline which is insanely useful.