One of the things I love about my role at Dataform is that I get to speak to people working in data teams at other companies almost every day. Each of these conversations helps me broaden my knowledge - and I’m meeting people and making friends in the process!
But all too often, these conversations make me sad. One thing I repeatedly hear is that data analysts are expected to do their jobs with unfit tools, slowing them down and limiting the potential of what they can achieve.
Full disclosure: I have a vested interest in this issue, as I work for a company that makes and sells software for analysts. You’ll have to trust me when I say that I’m not writing this post for selfish reasons. I’m writing this post because it makes me sad and angry that there are smart, passionate analysts out there that are striving to do great things for their organisation, but spend far too much time on menial tasks because of the tools they’re being forced to work with.
Analysts: tools of the trade
There are of course many things that analysts need to do their job properly: a good laptop, supportive managers, a comfortable chair... but there are two things that are specific to analysts that seem to be commonly underestimated that I’d like to go into more detail about here.
Analysts use data to help organisations make informed decisions, quickly (if you’d like to read more about what an analyst is, I strongly recommend reading this article). So clearly analysts need two things: access to data, and the ability to move fast with it.
Access: the data warehouse
I won’t go into detail in this post about why a data warehouse is the optimal place for storing and accessing data (more on that here), but I’ll point out two key things: modern, cloud based data warehouses are fast and SQL based. Fast, like you can scan terabytes of data in seconds. And SQL based - SQL being a language that all good analysts are familiar with.
Too many analysts I’ve spoken to are expected to access data from some other system: MySQL, Grafana, Elastic Search, or even Excel (!). These options are not all as bad as each other, but they all fail at one (or both) of the key requirements: speed, and ease of access.
Expecting analysts to work with data outside of a data warehouse is like asking software engineers to serve their application from a machine sat under your desk. It could be done, sure. And on the surface it may seem a little cheaper than using AWS or GCP. But it’s a bad decision in so many ways, and by the way, who are you to tell the engineers how they should set up their infrastructure?
Development experience: working with data
Step one to setting up your analysts for success is letting them work with data in a place that’s built for purpose: a cloud data warehouse. BigQuery and Snowflake are good places to start. But once they’ve got access to data in a warehouse, they need a way to access and manage the data within the warehouse.
Going back to the software engineering analogy: they need a way to write code, test it, deploy it, and version control all code changes.
It turns out that analysts need to do all of those things too. They need to write SQL, test datasets, send SQL queries to the data warehouse as part of complex pipelines, and version control their SQL code.
At any company I’ve been at, software engineering teams have been left to make their own choices about what text editor they use, which CI/CD platform, the right programming language for each project, and where they should store their code. But for some reason, there are many organisations that don’t let analyst teams make these decisions themselves.
Let me describe three (real life) scenarios that I've encountered. They represent how a disappointingly large number of organisations work today.
Scenario one: no data warehouse
One analyst I know is expected to work with data in a MySQL database. MySQL can’t handle the queries that he needs to run, so for each analysis he has to download tens of gigabytes of data into CSVs, spend tedious hours loading it into a Jupyter notebook (hoping his notebook server doesn’t crash), and then spend more tedious hours running simple aggregation queries. When a new request comes along, the process starts again.
Scenario two: no analyst tools
Another team of analysts is expected to use the same tools as software engineers, using the command line and a text editor to work with their code and version control it. But those text editors aren’t built for analysts, and don’t have the suite of plugins available to engineers. Analysts don’t want to write code in a different environment to the data. We want to be able to write SQL and run it all in the same place, because seeing the results of the SQL is really important to our development workflow. It doesn’t make sense to make analysts work with tools not built for them.
Scenario three: in-house analyst tools
A third analyst at a well known US multi-channel retailer was “lucky” enough to have an internal engineering team volunteer to build something for them. “We’re ACME INC, we don’t buy software, we build it, we can do this”. If you’re lucky enough to work at Facebook or Google, this may end well. But in this case, the result is that analysts are forced to work with substandard tooling every single day. Saved files have no revision history, meaning queries are commonly accidentally overwritten leading to hours of wasted time rewriting them. There is a version control system, but it is completely separate to the query runner, meaning most analysts forget to use it. And the web based editor has none of the useful functionality like autocomplete or multi-line editing that are the norm in dedicated text editors. The best analysts at this company will only put up with this for so long before they head elsewhere (I know this, the analyst told me so).
Going back to the software engineering example: how many software engineering teams build their own text editor? How many build their own version control system? Not many, because building these things well, better than the (often paid for) alternatives out there, is almost certainly not something your engineering team has the resources to do for an internal tool. So please don’t do this for analysts tooling.
Which tools would I recommend for analyst teams working with data? Dataform was built to solve all of the problems discussed at the start of this section, so I’d recommend you start there - there are several other options worth investigating too.
But, this sounds expensive?
Hopefully I’ve convinced you by this point that it’s important that analysts' needs are treated with respect, and that they shouldn’t be given sub-standard tools. The problem, however, is that cloud data warehouses and data modelling platforms (like Dataform) cost money! At this point, it seems that many organisations get scared and revert to asking analysts to cobble something together with cardboard and sellotape.
But, hang on a minute. Are these tools really that expensive? Let’s do a back of the envelope calculation (remember, I’m an analyst, speed is important).
I’ll use BigQuery, because their pricing is transparent and easy to understand. Let’s say that you’re storing 1TB of data and your analyst is querying 100GB of that data each day. This will cost you ~$2/month for data storage and ~$10/month for data processing. So, \$12 total.
Granted, data volumes vary wildly so this cost estimate may not apply to your organisation, but for a team that only one analyst supports, this is probably a reasonable order of magnitude estimate.
Dataform packages all of the tooling an analyst’s needs into one platform: a SQL IDE, query scheduler, testing framework, integrated git version control and documentation. For our basic package (which will satisfy all but the largest organisations), this costs \$30/analyst/month.
So, $20/month for the warehouse and $30/month for high quality tooling. For \$50/month, we can equip analysts with all of the tools they need, built for their needs, built and supported by organisations dedicated to making analysts successful. This seems… pretty reasonable?
But even if it turned out it was expensive to get this right, let’s say \$500/month. It's still worth it! This is about more than appeasing grumpy analysts.
If you asked your software engineers to use sub-standard tools, you’d do more than just make your engineers angry. They’ll spend more time on toil, you’d release fewer features, the ones you do release would have more bugs, and your users would stop using your product. Brooks’ law suggests that hiring more engineers may not solve your problems.
If analysts are working with sub-standard tooling, your entire data strategy suffers. Analysts are slowed down by menial work, fewer decisions are supported by data, where data is used it’s often inconsistent or just wrong, and your organisation will consistently make suboptimal decisions.
Hopefully the analysts at your organisation are already using the right tools for their job. But if they’re not, and you care about your organisation using data to inform decision making, can you please pass this along to someone that can help resolve the problem? And if you’re one of the people telling analysts that they don’t need a cloud data warehouse, or that the internally built SQL IDE at your company should be fine, have I made you reconsider?
BigQuery, Snowflake and Dataform all offer free trials and rolling monthly payment (no need to commit to a contract upfront). Can I suggest you at least let your analyst team experiment with improving their development stack? 10 years of experience tells me you are unlikely to regret it, and the potential rewards for your organisation - a happy, performant and thriving analytics function - is surely worth it?