You then go through the process of tidying this up, and if you’re like me, you’re wondering how you missed all this, I was always sure I worked thoroughly.
We’ll, we are only humans, and we are going to miss a bit here and there.
Missing a bit here and there during a data migration can be costly, it’s one thing decorating your own room, it’s another thing performing a data migration. When I say costly, I mean not only in terms of time and money, but in terms of end user confidence. If you decide not to cleanse or cleanse some of it, you can bet that someone will talk about how bad the data is and how could we have not cleansed prior to go live.
There’s a lot that can be overlooked during a migration, but I want to draw your attention to one scenario in particular.
Many organisations cleanse their data prior to a new system roll-out, but this process can take time, particularly if you have large amounts of data, or if you have many different data stores or you’re dealing with complex international data.
From the start of the data cleanse to the time it’s re-imported, the data in your application will have changed.
Your data is changing at a given rate and that rate depends on your current business (and it’s different for each business, this primarily depends on how often you use your data.)
Let’s assume a change rate of 4% per month.
If the data cleansing process takes 3 months than with a 4% change per month, you will accumulate to a total of 12% of changed records.
So, 12% of the data you cleansed in now uncleansed at the point of re-import. Seems like a never ending battle.
So what can be done about this?
Well best practice is to cleanse that 12% just prior to re-importing. We call this a Delta Cleanse (or Δ Cleanse.)
Some of you maybe shouting, if it took 3 months for 100% to be cleansed then would you not need another 3 weeks to complete this 12%.
Not if you used automation to perform your cleansing. During our data cleansing projects we automate the entire process, aiming to achieve 95% automation (there may be some manual work but this is kept to a minimum.)
Prior to re-import, we get an extract of the 12% that’s since been modified, and re-cleanse these records. This is usually completed over a weekend or an evening, minimising disruption and minimising records being changed during the re-cleanse.
The 12% and the initial 88% that hasn’t changed can be safely re-imported so tomorrow morning users can have better data.
Having completed hundreds of data projects, I can safely argue that many are forgetting this final stage.
So why is it so important?
Apart from the 12% needing to be cleansed, it’s about user confidence.
The biggest reason why applications don’t get used is because of poor data, users lose confidence in the application when it’s really the data that’s at fault. Users get tired of bad data and resort to using their own data stores.
So when announcing you’ve cleaned the data , it’s best that it’s 100% clean, because we are all humans and we will always spot the issues, and with a potential 12% not cleansed, you can be sure someone will exclaim ‘Did we cleanse our data?’ No one and I mean no one will say, ‘I noticed the data was 88% clean’, this never happens!