In my current role as CTO at Dataform, I spend a lot of time speaking to data teams around the world, the challenges they are facing and the problems they encounter.
Of the many trends we've observed one of the most striking is that the best teams we speak to see themselves and their role quite differently to a traditional analyst. The team increasingly see themselves as responsible for building and maintaining a form of internal product for the company.
Building data products requires a different approach to the traditional service model of a data team, including new tooling and development workflows.
Below we have a look at what exactly it means to build data products, what happens when this isn’t done, and what needs to change both culturally and technically in order to make sure this can be achieved.
The data team
When we speak to advanced data teams we usually see the following core roles:
- Data engineers - manage core infrastructure, automate ETL pipelines and produce stable, well defined datasets that can be consumed by others.
- Data analysts - partner with business stakeholders to help answer questions, build dashboards, and possibly do some exploratory analysis.
- Data scientists - solve optimization problems, typically using machine learning, exploratory data analysis, etc.
The pre-existing expectations of a data analyst is one that is predominantly reactive. To serve requests and answer questions coming from the business.
Only data engineering teams in the above model have typically served a role in which they build something that is to be used by others (in this case typically the data analysts who are consuming the datasets maintained and built by engineering).
That makes sense, because they are after all engineers, who are accustomed to building things.
But what about the data / BI analyst? We believe that role is changing, to one in which the analyst is also responsible for building data products that can be used by the rest of the business. We like to refer to this as the “full stack analyst”, or the “analytics engineer”.
This partly requires a cultural change, but also a change in skills, tooling and approach.
What products should data teams be building
A data model
A central repository for data definitions across the company.
Data analysts are building a product for themselves, as well as the organization as a whole.
Without a data model that is well defined and can be developed collaboratively, building self-service data tools is likely to fail. Finding the right datasets and making sure they are accurate and up to date is a time-consuming task that we see analysts spending too much time on.
The data model is a centralized source of truth for the data team and other data consumers. It really is the result of hard work and collaboration across many individuals. When everyone invests their knowledge into building a data model, it lives beyond the individuals who created and can serve the rest of the organization.
Self service data tooling
Where anyone can answer data questions.
Great analysts don’t just answer questions, they also make it possible for anyone in the organization to quickly and easily answer questions themselves. This isn’t new, and fits under the drive towards data democratization.
This is the difference between a data team that builds and maintains dashboards that are used regularly by the business versus those that build one off reports and share spreadsheets which become out of date and break.
Visualization tools such as Looker have enabled analysts to do this well. Analysts focus less on answering one off questions, building charts, or single-use dashboards with few controls. Instead they invest heavily in describing data: building dashboards that live for the lifetime of the company and can be continually adapted and improved and used by everyone.
What happens when data teams don’t build products
Without an investment in building a centralised data model, data analyst teams struggle to grow and collaborate, as knowledge of the data model becomes fragmented across a number of different people.
Onboarding new team members takes an increasing amount of time, as the new joiner has no reference (the data model) to learn from. They must pull knowledge from others, and build up their own assortment of scripts and understanding before they can become productive.
Work is duplicated, productivity decreases and inevitably mistakes are made, leading to inconsistent answers, and misinterpreted, incorrect or just plain out of date dashboards.
When there isn’t an investment in self-service tooling, analysts become increasingly bogged down with reactive question answering, which can be repetitive and draining for the team. This is frustrating for business stakeholders who aren’t able to answer questions without a helping analyst, or worst case having to file tickets and wait for someone to pick it up.
When there is a well defined data model, this becomes your onboarding material. When your superstar analyst decides to leave, you need not panic - because all of their wisdom is built into the data model, the rest of the team can pick up right where they left off.
How you should approach building data products
Building products collaboratively as a team requires new tooling and new workflows to those which analysts are typically used to. Thankfully there is a model we can follow here - engineering teams have been collaboratively building products for decades and have adopted workflows and tooling that is increasingly standardized across the industry.
- Version control: centralizing your teams knowledge into a single repository
- Code reviews: enforce standards, delegate ownership across the team
- Testing: reduces breakages and detects issues early
- Continuous integration, deployment and monitoring: eliminate manual, repetitive work
Without the above, engineering teams typically start to grind to a halt beyond 5 or so people. An increasing amount of time is spent on repetitive tasks, fixing mistakes, and onboarding new team members.
Adopting all of the above sounds hard, particularly for a data team - but it doesn’t have to be.
At Dataform, we’ve spent the last few years building tooling designed for data analysts who want to build products instead of just answering questions. We leverage the skills they typically already know such as SQL, and give them an easy way to adopt product engineering best practices without having to learn how to code or manage infrastructure.