Data warehousing technologies are advancing fast. The cloud data warehousing revolution means more and more companies are moving away from an ETL approach and towards an ELT approach for managing analytical data.
Over the last year at Dataform we have spoken to hundreds of data teams that are migrating from an ETL to ELT approach in their data stack. ETL (Extract, Transform, Load) and ELT (Extract, Load, Transform) are processes for moving data from one system to another (data sources to a data warehouse). The fundamental difference between these two approaches lies in how the raw data is managed, at which stage it is loaded into the warehouse and how analysis is then performed.
In ETL data moves from the data source, to staging and then into the warehouse. All transformations are performed before the data is loaded into the warehouse. ELT offers a modern alternative to ETL where analysts load data into the warehouse before transforming it, supporting a more flexible and agile way of working.
In this article we’ll discuss the differences between these two approaches and what the key benefits are of making the flip to an ELT approach.
Regardless of whether it’s ETL or ELT, both processes involve the following steps:
ETL (extract, transform, load) has been the traditional approach in analytics for several decades. It was originally designed to work with relational databases which have historically dominated the market. ETL requires the transformations to happen before the loading process. Data is extracted from data sources and then deposited into a staging area. Data is then cleaned, enriched, transformed and finally loaded into the data warehouse.
Data teams using this approach have to predict all the uses cases for the data before any analysis is even performed and then create transformations accordingly. This often means that data teams are left waiting as they can’t access any information until the whole process is completed.
Because Engineering teams typically own the extract and transform process, analysts have little visibility into the logic that has been used for the transformation process. This makes it hard for analysts to understand exactly what the data represents, often leading to incorrectly drawn conclusions. And because the analysts have little visibility, they can’t help the engineering team fix any bugs with the transformation code.
ELT is a modern variation of ETL where data cleaning, enrichment and transformation happen after the loading process. This is enabled by the fact that modern cloud data warehouses are extremely scalable and separate storage from compute resources. So essentially the main difference between ETL and ELT is the order that these steps take place.
With the transformation happening in the warehouse, it’s typically defined using SQL. This allows analysts to contribute to (or indeed entirely own) the transformation logic.
So why is it beneficial to transform your data after loading it into the data warehouse?
The ETL process was developed in order to cope with limitations of traditional legacy data warehouses such as the high costs of computation and storage. With the rapid growth of cloud based solutions and the decreasing costs of cloud based storage, ETL is becoming slightly out-dated.
There are still some good situations where a traditional ETL approach makes sense, or should be combined with ELT (ETLT).
The main advantage of using an ELT approach is that you can move all raw data from a multitude of sources into a single, unified repository (a single source of truth) and have unlimited access to all of your data at any time. You can work more flexibly and it makes it easy to store new, unstructured data. Data analysts and engineers save time when working with new information as they no longer have to develop complex ETL processes before the data is loaded into the warehouse.
Dataform is a framework fundamentally designed around an ELT approach. Raw data is piped into your warehouse where Dataform sits and conducts transformations as and when you need them. It helps to automate some of the manual tasks involved in building and maintaining a data pipeline, allowing data teams to focus on what they want to be doing; gaining valuable insights for the business.
We publish great new resources every week, get them straight to your inbox.
Turn on BigQuery audit log exports to start analysing your BigQuery usage
A deep dive into some advanced data quality testing use cases with SQL and the open-source Dataform framework.
In this article we walk through building a simple end to end BigQuery Machine Learning pipeline using Dataform to help us manage the end to end process of data preparation, training and prediction.