You’ve been promised a Big Data utopia where all business decisions are data-driven. And instead, got a report with data you don’t trust, or have a client accuse you of delivering incorrect data on a dashboard you’re pretty sure is neat.

Nobody wants to take decisions based on data if doubting its accuracy. It’s like a fly on your data soup. That’s why one of the main challenges for everyone that works with data is its quality and integrity.

Now take a deep breath. I’ll guide you through this nightmare of numbers that don’t match expectations. Whether you need to make data-driven decisions or to provide data for the people who have this power, the five questions below will help you deal with “wrong” numbers.

1. Are apples being compared to bananas?

Data nerds often refer to bananas and apples in a metaphor of absurd comparisons. When analyzing the sales of fruits, if you’re measuring the sales of apples while your client is measuring sales of bananas, the numbers won’t match (only by a coincidence).

To avoid such mistakes, always define your data properly. Specify on the report or dashboard:

  • the data source;
  • period of measurement;
  • definition of the metric or KPI;
  • which base metrics compose the KPI and how it is calculated, if applicable;
  • any filters and segments applied.

Sometimes these misleading comparisons are not obvious. Each tool may attribute a common name for a metric or KPI while using slightly different methods and definitions for its measurement.

Besides, be aware of measurements that can vary according to when the data was processed, so always check the source specifications for data freshness and its limits.

2. How is the data being validated?

Make sure everything is properly configured: the measurement in the data source, the data extraction, and treatment. If you input “dirty” data in a dashboard, don’t expect anything different from the output.

To identify and avoid mistakes when treating data, you need clear processes to validate your data and make sure it’s accurate. Validating means comparing the expected vs. actual values, e.g:

  • Is the data in the dashboard/report the same as in the data source?
  • How are the calculations being done?
  • Which filters, segments, period are being applied?

If none of the above cleared it up:

  • Re-do the steps to get that number and review them carefully.
  • Ask a colleague for a second opinion: it’s always healthy to look from other perspectives.

Whenever possible, avoid unnecessary headaches and automate the validation.

The first step to automate your data validation is to determine very clear parameters of what is expected of the data. Then build the functions to do the comparison, depending on the tools and programming language you’re using to treat your data.

This is a quite simple example using an IF function to validate if the ‘DailyHours’ on a range are equal or below 8:

if ( ‘DailyHours’ <= 8, ‘Ok’,‘ Check values’)

A specific type of validation that is worth automating is a taxonomy check. In this case, you should use a naming convention to compare with the actual data.

A taxonomy check would have prevented inappropriate values in the language filter…

Automated validation is a smart way to understand how much you can trust your data and how urgently you need to correct it.

3. Is the data wrong or just weird?

If the data is not as expected, most people will jump into conclusions like “the data on the report doesn’t match what it should” and “the dashboard is wrong”. How frustrating it is to hear this! Your data has the right to be presumed innocent until proven guilty…

The data might be just weird, not necessarily wrong. Sometimes there are problems in measurement that might be beyond correction or external factors that may affect the data. It’s important to research the causes that explain why the data is behaving differently than expected.

A challenge worth taking is to spread this mindset:

Before stating it’s wrong just because it’s not what was expected, question and investigate what happened.

4. Is the data really wrong?

Unfortunately, every once in a while your numbers will be indeed wrong. This can happen for multiple reasons, usually due to knowledge gaps, distraction or even lack of proper validation processes when configuring the data treatment.

Identify why the data is wrong, using processes of validation similar to what was described above, and fix it. Explain the situation and apologize to whoever was affected. Understand the consequences of that mistake and work to revert them if possible.

Learn from the mistake, teach others and document your learnings whenever possible. Then move on.

5. Is the data consistently wrong?

What if you knew your KPI would always be around 5% above what it should be, because of challenges in the measurement that were currently not possible to fix?

That data can still be useful to monitor trends.

As a self-confessed perfectionist, it was hard for me to get used with this mindset. But it helps in becoming a more resilient analyst and capable of squeezing all the value that data can provide, even when it’s not 100% accurate.

To recap:

  • Always define the data properly, and don’t compare apples to bananas;
  • Make sure to have validation processes for all the steps when working with data;
  • Question and investigate before stating the data is wrong;
  • Recognize mistakes and work to mitigate the consequences;
  • Be resilient and make use of all the value your data can provide.

In conclusion, if there is a fly on your data soup, don’t accept it. But don’t let your day be ruined by awkward numbers on reports and dashboards anymore! Keep those questions in mind and you’ll be much less stressed next time you face the challenge of dealing with “wrong” data.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store