Address Cleansing | What it is and How to do it.
Address cleansing is the collective process of standardizing, correcting and then validating a postal address.
Before an address can be validated, it must first be structured in the official postal format for the appropriate country, and any missing or incorrect information must be added or corrected.
Once the address is in the official postal format, with all the required information, it can be compared against the official address database for the country in question. In the United States, the official address database is managed by the USPS. If the newly 'cleansed' address matches an address in the official database, it is determined to be a 'valid' address.
Table of Contents
- Algorithms Used to Clean Addresses
- Address Cleansing Tools
- Using an API for Address Cleansing
- Address Validation API demos
Using Algorithms to Clean Addresses
When someone is looking for an algorithm to clean addresses, it's often because they are dealing with either a large Excel sheet of addresses to clean, or an entire address database. Cleaning that many addresses, one-at-a-time, would be ineffective and tedious. So, finding an algorithm to do the work programatically, just makes sense.
So, what kinds of algorithms are used?
Some people are tempted to try using regular expressions to clean up addresses. However, that approach is full of problems, and may actually make your job as a programmer more difficult.In reality, cleaning an address requires the use of a number of different algorithms, each performing a related, though unique part of the address validation process. The algorithms being used must collectively:
- Parse the address and break it into its individual components (ie. name, house number, street name, city name, state name, ZIP Code, etc.).
- Standarize the data of each individual component so that it matches the format of the official postal database to be referenced.
- Validate the now standardized address against the official address database.
The individual algorithms that are most effective in cleaning up addresses are proprietary, and usually are part of an address validation company's software. This is true for both USPS and international addresses.
There just are not that many open-source algorithms that can effectively scrub addresses. However, a number of the dominant software solutions do have free usage options available.
Address Cleansing Tools
You'll find that address validation software usually features a number of different tools that can be used to scrub your addresses. And, each of these tools requires a different level of skill to use them effectively. Some of these tools are as simple as a "copy and paste" interface. Other tools require basic to advanced programming skills.
"Copy/Paste" Address Cleaning Tools
An example of a "copy/paste" tool is SmartyStreets SmartyList tool. This type of tool is really helpful for individuals who have little to no programming skills, but still need to make sure that their list of addresses are standardized and validated.As the name implies, you simply copy your list of addresses from your Excel spreadsheet, and paste it into the SmartyList tool. Here are the steps involved:
- Select from "validate US addresses", "match ZIP Codes to US cities and states" or "validate international addresses"
- Paste your list (the one you copied from your Excel spreadsheet) into the section labeled "Paste your list below".
- Click on "Process My List". The software automatically cleans up the addresses, standardizes them, corrects or adds data as necessary, and then validates it against the official address database for the country in question.
- Copy the newly cleaned list and paste it back into your spreadsheet.
It really is that easy.
For individuals who are on the n00b side of the programmer scale, using a "copy/paste" tool like SmartyList can save a lot of time and hassle.
Using an API for Address Cleansing
For individuals who have more solid developer skills, using an API to programmatically clean up addresses is probably the best route to take. While there are many different APIs out there, the SmartyStreets collection of Address Validation APIs are crazy fast and easy to use. They only require a simple HTTP request. They send back cleaned address data with up to 45 metadata points, in a convenient JSON format. And, when properly configured, they can process up to 100,000 address a second.
Here is a list of all of the SmartyStreets address validation API live demos:
- US Street Address API: Validate USPS addresses
- International Street Address API: Validate addresses in 240+ Countries
- US ZIP Code API: Look up and verify city, state, and ZIP Code combinations
- US Autocomplete API: Suggest addresses to users in real-time
- US Extract API: Extract address data from any text
If you're trying to clean a lot of addresses in a relatively short amount of time, your best option is to use some sort of address validation algorithm. The best algorithms are most often found in some form of proprietary software. Usually in this kind of software, there are usually a number of different address cleansing tools that you can choose from, dependng on you level of programming skills. And, the best ones offer some form of free usage, especially while you're testing it out.