Before an address can be validated, it must first be structured in the official postal format for the appropriate country, and any missing or incorrect information must be added or corrected.
Once the address is in the official postal format, with all the required information, it can be compared against the official address database for the country in question. In the United States, the official address database is managed by the USPS. If the newly 'cleansed' address matches an address in the official database, it is determined to be a 'valid' address.
When someone is looking for an algorithm to clean addresses, it's often because they are dealing with either a large Excel sheet of addresses to clean, or an entire address database. Cleaning that many addresses, one-at-a-time, would be ineffective and tedious. So, finding an algorithm to do the work programatically, just makes sense.
So, what kinds of algorithms are used?
Some people are tempted to try using regular expressions to clean up addresses. However, that approach is full of problems, and may actually make your job as a programmer more difficult.
In reality, cleaning an address requires the use of a number of different algorithms, each performing a related, though unique part of the address validation process. The algorithms being used must collectively:
The individual algorithms that are most effective in cleaning up addresses are proprietary, and usually are part of an address validation company's software. This is true for both USPS and international addresses.
There just are not that many open-source algorithms that can effectively scrub addresses. However, a number of the dominant software solutions do have free usage options available.
You'll find that address validation software usually features a number of different tools that can be used to scrub your addresses. And, each of these tools requires a different level of skill to use them effectively. Some of these tools are as simple as a "copy and paste" interface. Other tools require basic to advanced programming skills.
An example of a "copy/paste" tool is SmartyStreets SmartyList tool. This type of tool is really helpful for individuals who have little to no programming skills, but still need to make sure that their list of addresses are standardized and validated.
As the name implies, you simply copy your list of addresses from your Excel spreadsheet, and paste it into the SmartyList tool. Here are the steps involved:
It really is that easy.
For individuals who are on the n00b side of the programmer scale, using a "copy/paste" tool like SmartyList can save a lot of time and hassle.
For individuals who have more solid developer skills, using an API to programmatically clean up addresses is probably the best route to take. While there are many different APIs out there, the SmartyStreets collection of Address Validation APIs are crazy fast and easy to use. They only require a simple HTTP request. They send back cleaned address data with up to 45 metadata points, in a convenient JSON format. And, when properly configured, they can process up to 100,000 address a second.
If you're trying to clean a lot of addresses in a relatively short amount of time, your best option is to use some sort of address validation algorithm. The best algorithms are most often found in some form of proprietary software. Usually in this kind of software, there are usually a number of different address cleansing tools that you can choose from, dependng on you level of programming skills. And, the best ones offer some form of free usage, especially while you're testing it out.