Data Quality: The all-important factor in Business Intelligence
Look at the graphs generated by your business intelligence tools and software from your data. You are excited, thinking about all the insights you can get about your customers and other stakeholders. But then, one look at the graphs and tables and your heart falls. You see duplicated data separated by very similar names, think one entry each for Ibrahim Aronson, Ibrahin Aronson, and Ibrahin Aronsen. Or perhaps, looking at your Web traffic data, you see that more than half of your visitors are listed as UNKNOWN.
Data quality is a much-discussed topic in business intelligence for a very long time now. In fact, if you search for business intelligence tips and how tos you would undoubtedly come across a couple of articles talking about data quality. It would seem that perfect data is a pipe dream and merely wishful thinking, especially with the advent of big data and the burgeoning data size that we have to manage and handle.
One thing about business intelligence and data quality is that most businesses only really find out about how bad their data quality is when they start using their business intelligence software. Most business leaders become disheartened when the software’s dashboard churns out data that they cannot use in certain scenarios and they soon realize that this is because of incomplete data or poor data quality.
The sad truth that all business and IT leaders should face is that data quality will always be an issue, but that does not mean that you should just live with poor data quality.
What can you do?
- Set up accountability for data quality. There should be someone responsible for the different types of data that you gather. For instance, your sales and marketing manager would be taking care of your customer records and information, the HR supervisor would be responsible for employee related data, and so on. This way, recognition is given to the right person when it is due while you also know who would be responsible in correcting things if there is something wrong.
- Determine which part of your data needs to be of high quality. While accepting that pure and pristine data would never be a reality for now, you should also make sure that you identify which data are important. This way, you can focus on these important data sets and make sure that they have good quality to start with. For instance, if you are an online retailer and you work with a lot of erroneous customer addresses, it would be a very big problem for you unless you work on correcting it. The same problem would not be such a big deal with, a manufacturing company. This would also help you understand how to proceed with your data gathering and data quality improvements over time. If you know which data is very important to you, you should be able to estimate how much you stand to gain or lose when you work on the quality of these data. For example, if you say that striving to make your customer data 99% to 100% accurate would bring in a significant increase in sales and revenues. On the other hand, dropping the data quality to around 85% or even 80% would mean significant losses. So you do not work for the improvement of data quality for all types of data that you have. Just those where you stand to gain most or lose big time.
- You also start at the source. Instead of spending time and money fine-tuning your ETL processes, you should focus more at the source of the data. For instance, if you have a Web form that take customer information, you should write scripts to ensure that the ZIP code, phone number and addresses are correct or are completely filled in. This way, you have a better quality data than having a lot of blanks for your BI software to crunch on.
- You should also review all your processes, identify any mistakes and work on improving it, fix all that needs fixing with your data quality. Then do it all over again. This is the only way to make sure that you have better data quality in the long run.
Call Four Cornerstone today and find out how you could improve on your data quality.
Photo by Christoph Scholz.