Photo appears courtesy of Todd Huffman. Data seems to be the be-all end-all of today’s business world. We have access to more data than ever before, but no one seems to know what to do with it. Enter Tableau. I spent three days in a Tableau class this summer to learn how to use this software package to discover interesting insights that can be found by analyzing data about a company’s customers and products. The Tableau software is a data visualization tool that allows you to explore and better understand your data and create insightful visuals and dashboard displays to help with decision making. Using the analytical tools and robust visualization the software contains can unveil trends, correlations and meaningful statistics that are not obvious by just looking at the raw data. Tableau also enables a story about the data to be told in a format that is interactive allowing for slicing and dicing of the data to answer questions from your audience during a presentation to reveal even more insights on the fly. This software is powerful and easy to use and is being utilized by many organizations to uncover a wealth of information from their data to help with decision making.
What becomes quickly apparent when working with this type of software is that it is essential to spend time up front to review and clean the data sets. In order for the software to provide valuable and actionable information that can be trusted, the data that is used must be accurate, consistent and complete which is not usually the case with data. The data cleaning step (also called data cleansing and data scrubbing) can take some time, but investing the extra effort will allow the software to provide more meaningful results.
Data should come from a trusted source. It should be compiled and maintained under a sound data governance program that uses good editing and verification techniques to ensure the integrity of the data. In a data cleaning process all fields that are to be used for analysis should be verified to ensure that the data values are not missing and the fields contain valid data. For instance, in order to perform meaningful and accurate geographic analysis, all state fields cannot be blank or contain an invalid state code. Often codes must be standardized when looking at data from different divisions of a company to make them consistent across organizational lines.
This is no different than what we find with EDI transactions. A missing item number will cause an outgoing Invoice map to fail when the data element is a mandatory field. An invalid item number on an incoming Purchase Order will cause a delay in the processing as that item will be unknown to the Trading Partner receiving the order. We are currently working with a customer whose Trading Partner continues to send invalid item numbers that slow down the incoming purchase order process while demanding a short time for our customer to ship the order without penalties. In addition, the users at the company have been changing data on the order accidently (carelessly?) which causes problems with the outgoing invoicing and ASN process. This is frustrating for all of the parties involved and the chargebacks are mounting.
I have been in the Information Technology field for many decades and while so much has changed, one thing has remained constant. Only good data will yield meaningful results that can be trusted for decision making. The process of data cleaning, while it is a rather mundane, very time consuming task and not on the top of anyone’s list of fun things to do, is essential as a basis for all information processing.