“Sometimes there are practices that create problems but go unnoticed simply because it’s part of our everyday experience and we are not conscious of how it creates problems for us. The problems persist and only come to light when an unintended consequence happens that is important to you or someone else. This usually stems from not having a plan that focuses on solving these issues. It could be sometime before we realise the overall impact and how long it’s been going on.”
Now, the above statement could be applied to any aspects of our lives, but in our world of data quality, the act of import/exporting data seems to fall into this category. Let me explain.
We receive a lot of data to review and surprisingly find that import/export errors persist in the data from either the recent task of sending the data to us or a previous import/export that has left the data with problems lingering in their database, sometimes unnoticed for months or even years.
In the distant past I have instigated such problems myself without realising what I had done. At the time, you deal with the issues at hand. However, over time you become wiser and take more care.
But over the years the proliferation of these problems and the subsequent time wasted resolving what is preventable, has got me to ask is there a better way.
These problems undermine data quality, but more importantly waste a great deal of time. The data needs to be re-imported or re-exported, and this time you carefully check for errors. So, why not deploy procedures (manual or automated) that check for these particular errors proactively.
Anyone manipulating data is familiar with these problems, particularly if you are using desktop applications such as MS Excel, MS Access, Notepad, etc.; or relational databases such as MS SQL Server, Oracle, DB2, etc.; or CRM applications such salesforce.com, Saleslogix, Siebel, SAP, Goldmine, etc; and other proprietary applications. Although importing and exporting is made easy (to be honest - exporting is all ways made easy, importing is another matter), not taking into account the nuances in your data that conflict with imports/exports or not using the application properly leads to errors.
Let’s look at some common problems.
1. Text Delimiter Problem – When the character chosen as the text delimiter can be found in the data, import errors will occur. The application will think there are extra fields, incomplete field or even merge fields. How often have you switched from using commas, tabs to pipe characters to avoid errors.
2. Text Qualifier Problem – Often a character is used that should be surrounding a piece of data that is used by a software application to say this is the start and end of the data. If one is missing then this may look familiar:
3. Diacritics – Non-Roman characters become formatted to strange looking combination of text. For example: Számítástechnikai becomes SzßmÝtßstechnikai. Which people often spend a great deal of time to resolve by either (i) using Find/Replace operations or (ii) deleting these characters. The simple answer is to fix the application that caused the problem or if you are lucky then you can use a text editor and ‘Save As’ using a new encoding method such a Unicode to automatically resolve this.
4. Unexpected Characters – Adding new line characters or carriage returns can confuse an application. It can also cause problems with applications that manipulate data – we don’t often expect a new line character to be in a telephone number, but some CRM applications allow this.
To resolve this we have written our own verification product that identifies any issues before we import or export data. This allows us to inform clients of errors on data that they have sent us, and enables us to verify our data before the it is distributed. In addition, simple manual checking such as (i) record counts, (ii) listing distinct values for a field, (iii) identifying any conversion problems, (iv) noticing that a number should be right justified rather than left (v) checking diacritic characters are correctly represented, etc. can help significantly reduce time and effort.
Overall, our automated software and the practice of proactively checking saves a great deal of time.