Snowflake is a SQL based warehouse with a unique architecture that is specifically designed for the cloud. Snowflake’s support for semi-structured data, in the JSON file format, is excellent. Compute usage is billed per second (for a minimum of 60 seconds), meaning you only pay for what you use. Dataform allows you to manage all data processes happening in your Snowflake warehouse, turning raw data into datasets that power your company’s analytics.
The Dataform web IDE is natively integrated with GitHub and GitLab. Version controlling your SQL has never been easier: create branches, commit changes, revert files and create pull requests without ever needing to touch the command line.
Keep your Snowflake warehouse up to date with Dataform’s powerful scheduling features. Schedules can be triggered by API, webhook or a time of your choosing. Success and failure alerts are sent to your team by Slack or email. Detailed run logs show exactly which SQL statements ran when, making debugging simple. And our parallel execution strategy minimises schedule durations.
Dataform’s built in SQLX functions enable Dataform to infer dependencies and automatically build the dependency graph for your data transformation pipeline.
Being able to produce analytics tables that we are confident in the output of (because of assertions) and are as up to date as we need them to be (because of scheduling) makes our lives really easy. The UI is incredibly easy and intuitive to use, meaning we spend little of our time setting these things up, and most of our time writing SQL!
I love the dependency tree in Dataform. For me this is a central place for sanity checking my data flows, understanding if I'm reimplementing a dataset which already exists, and verifying logic. Secondly, I love SQLX for generating SQL of a similar structure again and again, it really speeds up development and let's your abstract away logic.
Having modeled data using other tools in the past, this is much simpler and an easier environment to code in. The code compiles in real time and lets you know if there are errors in the syntax. It also helps generate a dependency graph for the data pipeline which is insanely useful.