I came up with 5 features that best illustrate some of the complexity around international addresses and will help you understand this type of data better.
This article isn’t about all the differing address formats from around the world (there’s plenty of information on the web about this), but about the issues you may face when evaluating who to use when cleansing international addresses.
These are just some of things you should note:
1. What! More than One Language
2. The 3 Levels of Postcode Granularity
3. Some With, Some Without - (The use of International Postal Address Files (PAF))
4. Diacritics in Addresses Explained
5. The Hidden Alias
6. The All Important Country Field
What! More than One Language
Well, the words in an address are not sentences you find in a book but names of buildings, streets, towns, cities, states and country.
So if they are in a language you’re not familiar with then it’s not too difficult to grasp the address once you know the basic address words in that language.
The task is made more difficult if there is more than one language.
Let’s consider Belgium addresses for a moment. Some addresses are in French and some are in Dutch.
Belgium has three speaking languages French (mainly spoken in the Southern provinces), Dutch (predominately spoken in the Northern Flemish provinces) and German, the least spoken of the three.
Addresses are expected in the correct language.
Having addresses in the right language is so important to recipients of the mail.
It shows your company cares about its audience and also protects your reputation as someone who can be trusted to understand the local region.
It is so important that all mailings, correspondence and shipments are addressed respectfully in the local format.
Here are examples of other countries with more than one official language.
Canada : English & French
Switzerland: German, French, Italian & Romansh
Singapore: English, Chinese, Malay & Tamil
Spain: Catalan/Valencian, Basque, Galician & Aranese
The 3 Levels of Postcode Granularity
This bit of magic exists because the Royal Mail maintains a file of postal addresses for every delivery point in the UK.
Clever software allows a postcode to be searched and then a house number, the whole process takes less than half a second to find the address.
I call this a 2-step process to derive at an address (the postcode and house number).
This occurs because on average in the UK 15 houses represent a postcode. The postcodes identifies the range to 15 houses (on average) and the house number pin-points the exact location.
The granularity of the postcode dictates a 2-step process. I consider this as medium granularity
However, this is not the case for all countries.
For example in Australia, a postcode refers to an area (town or suburb). You have to specify the locality, street and house number to get precisely to the right address.
In most cases, supplying the house number and street after the postcode is sufficient.
As this is a 3-step or sometimes a 4-step process, we consider this of low granularity.
Now in Singapore, there is high granularity. A postcode is a 6 digit number, the first two represent a sector in Singapore (an area) and the last four digits represent a delivery point for a house or building.
This allows mail to be delivered very accurately with just a postcode.
Why is granularity important?
Well when cleansing addresses you want to match to an address to the Postal Address File for that country.
If the granularity is high, you are matching on fewer pieces of information and the likelihood of an accurate match is high.
Whereas with low granularity, you need more accuracy in the initial address to guarantee an accurate match.
Some With, Some Without - (The use of International Postal Address Files (PAF))
Although many countries have had postcodes for many years it’s only the last 20 years where we have seen formal PAF files.
In the UK and US, PAF files have been around much longer, but in Australia the first formal PAF was released in 2000 and in New Zealand in 2007.
There are many countries that don’t even have post codes (e.g. Angola), let alone a PAF file.
So without a PAF file, how do you cleanse your addresses?
There’s a lot that can be done.
Having the list of formal names for towns, cities, and states allows spellings to be corrected and information to be inserted where missing.
Additionally, knowing the structure of an address with all its abbreviations and aliases can make an address far more readable and deliverable.
Diacritics in Addresses Explained
Diacritics can be found in Latin alphabets as well as non-Latin alphabets.
Latin Alphabets: (e.g Greek, French, etc.)
◌́ – acute accent, ◌̀ – grave accent, ◌̂ – circumflex accent
Non-Latin Alphabets: (E.g. Aracbic)
ــَـ fatḥa (a), ــِـ kasra (i), ــُـ ḍamma (u), ــْـ sukūn
Many languages use the Latin alphabet with special characters or diacritic marks.
So why are diacritics important?
There are two main reasons:
1. Replacing special characters with characters without a mark may change the meaning of the word; and
2. It’s your reputation, when someone receives a mailing with the address in the local language there is more trust, conversely a poorly written address will affect your reputation, just as if you had spelt their name incorrectly.
It’s important your software application enables the correct entry of special and diacritic characters.
Note: Sometime when transferring data from one format to another, diacritic characters can be corrupted. This is due to a concept called encoding that may not be set correctly.
For example, ü can be corrupted to Åœ
You certainly don’t want addresses with corrupted characters.
I’ve seen suppliers corrupt data and then clients unknowingly import corrupted data, so be careful.
The Hidden Alias
Geographically data changes regularly. Your database is likely to contain an address that uses an old name for an area, but when mailing we want to ensure that the right address is printed.
For example, Bombay is Mumbai and Burma is now Myanmar.
Although, old names still make the address deliverable, but it is the official address that people expect.
So any international addressing cleansing system must update address name changes.
The All Important Country Field
This field is used to determine which country to match the address against.
If it’s missing then the address maybe marked as un-processed or something similar.
But, in my experience there are plenty of organisations who have missing country information, for whatever reason.
In this case, you can derive the country from the address by analysing the town, city, state, postcode structure. They give us clues to the identity of the country.
So ensure your supplier can provide this facility.
If you can, always use the formal ISO country code in your database – it makes life easier for cleansing, reporting, analysis, etc. purposes.
FREE Audit for All Your International Addresses
If you would like a no obligation audit of international addresses, then send me an email (firstname.lastname@example.org) and I will get back to you shortly.