One of the key assets of the data team is having data which can be trusted. If the data you share can’t be trusted, it won’t be used to make decisions, and your organization will be left to base their decisions on guesswork and intuition.
Modern data teams use automated data quality tests to check the validity of data they provide to their organization before it is used for analytics and decision making.
A lot has been written about data quality, but it’s typically useful to consider these 5 dimensions:
- Completeness: Does the dataset contain all of the data it should?
- Uniqueness: Are there any duplicate entries within your dataset?
- Validity: Does the dataset conform with expected business rules?
- Consistency: Is data across the warehouse consistent?
- Timeliness: Is the data available at the time it should be?
- Dan Lee (Head of Data @ Dataform): why data quality testing is important, how you can apply software engineering best practices to your data quality framework and the 6 dimensions to consider when writing data quality tests.
- Sean Pegado (Product Analytics Manager @ Cisco Meraki): how his team built a data quality framework from scratch and the challenges they encountered along the way.