Introducing a faster and more efficient way to manage data in your warehouse with Dataform.
Data is the lifeblood of many businesses. Managing it is one of the most critical activities of any company. Using data to make better-informed decisions or accurately predict the future can mean the difference between the success or death of a company.
Our founding team met at Google where we managed engineering teams and led product analytics for several products. We relied on data (and data pipelines!) heavily to generate insights to drive better decisions and build better products. Companies like Google invested a lot to build internal data tools for analysts to manage data and build data pipelines. In 5 minutes, I could define a new dataset in SQL and use it my reports without having to worry about how to refresh it or worrying I wouldn’t know if it stopped being updated.
When we left, we realized that most companies were not well-equipped to leverage their data effectively. Data is often siloed, ill-defined, hard to find and difficult to work with. Each data project, from conducting an analysis to answering a simple data question, can take days to months. It’s often hard to know if the data in dashboards is correct and the data team is always afraid the pipelines will break. Lacking the right tools to manage their data efficiently, a lot of businesses have to rely on manual work or need to invest precious engineering hours in building custom in-house systems.
We saw a need for a self-service solution for data teams to manage data efficiently and decided to start Dataform. Through the development of our open-source SDK and web platform, we provide a solution where analysts can own the entire workflow from raw data to analytics, directly from their cloud data warehouse. We are building Dataform with the following principles in mind:
Data teams waste a lot of their precious time just to make their pipelines work in order to get the data they need for analysis or to refresh dashboards. Other companies decide to build custom in-house solutions to manage their data. Investing in months of data engineering work just to get the basics working.
We believe data experts want to focus on the areas where they can add the most value: having a deep understanding of their data, transforming it, and analyzing it. Not managing custom infrastructure.
Dataform provides a cheaper, faster and better version of what businesses have to build and maintain in-house today. Using our platform, our customers have built their core data analytics stack in days instead of months. Teams of analysts manage hundreds of datasets in a fraction of the time and can manage their pipeline without requiring engineering resources.
The explosion of data sources is generating increasing complexity that data teams have to deal with. As a result, many problems can arise: it takes hours to add new dimensions or metrics, business data depend on manual processes, errors are continuously introduced, and organizations lose trust in their dashboards. All of these problems can be addressed with a good approach to DataOps and following industry best practices like configuration as code, version control, testing, sandboxing and isolated deployments.
Dataform makes it simple for entire data teams to adopt DataOps to build reliable analytics.
Many products attempt to lock customers in. These products force businesses to store or process data on their servers or make it hard to use their business’ data outside of the product they use.
Dataform does the opposite. Your data is transformed in your warehouse, never leaves your servers and is always available. The code you write in our platform is available in GitHub and you can run it with our open-source framework.
Dataform is a platform for data analysts to manage data workflows in cloud data warehouses such as Google BigQuery, Amazon Redshift or Snowflake. It provides all the tools analysts need to build workflows that transform raw data into reliable datasets ready for analysis.
Develop SQL from a rich cloud IDE. As you write SQL, use the Dataform SDK to seamlessly create tables, define dependencies and more. Dataform lets your team members develop simultaneously in the cloud, from different branches.
Create a single repository for all your data management. All your data definitions are stored in a single repository, synced with GitHub and accessible by your entire team.
Version control. Dataform development environment integrates with Git where users can work from individual branches. Push your changes directly or enforce code reviews for all changes.
Automate data quality testing. Dataform lets you define tests against your input raw data and the output of data transformations, with issues triggering alerts before they hit your analytics. Read more about testing with Dataform here.
Safe deployments. Dataform helps you enforce sandboxing, development environments and CI/CD to integrate new changes safely.
Schedule your datasets to update every day, every hour or even every 5 minutes without having to maintain custom infrastructure. Dataform alerts you when potential errors occurs and gives you detailed logs so you can fix issues quickly.
Cloud solutions (including BigQuery, Redshift, and Snowflake) have become the de-facto data warehouse standard for most companies. Many products help businesses load data in those warehouses, from web events to production databases and SaaS applications.
With Dataform, data teams and analysts can manage all data processes happening in the warehouse, turning raw data into the datasets organizations need for use in BI tools and to conduct analysis.
Today, we’re privileged to support our customers in their quest to be more data-driven. Many companies from high-tech startups to high street retailers like Charlotte Tilbury use Dataform every day to manage their data efficiently.
We’re just getting started. Our mission is to provide the tools for data experts to solve big problems with data, enabling businesses to be data driven and helping them leverage data to create new applications. If that sounds like something you want to be part of, please come and talk to us!
We publish great new resources every week, get them straight to your inbox.
Turn on BigQuery audit log exports to start analysing your BigQuery usage
Data warehousing technologies are advancing fast. The cloud data warehousing revolution means more and more companies are moving away from an ETL approach and towards an ELT approach for managing analytical data.
A deep dive into some advanced data quality testing use cases with SQL and the open-source Dataform framework.