QuestionLast Updated: March 29, 2013
I see you provide latitude & longitude. I've looked at geocode data from SmartyStreets (as well as some of your competitors) and then compared it to the lat/lon data that google and bing have.
An example is:
4200 Truxtun Ave, Bakersfield, CA 93309-0694
SmartyStreets validated and returned the coordinates as: 35.36644, -119.06434. If you plot these on a map- whether Google, Bing, or MapQuest- they are quite a distance apart. Why is there a difference in the results?
We provide geocode latitude/longitude data with every address that we process. This will allow you to determine distance from two points or to quickly plot an address onto a map. This is available with both our LiveAddress API and LiveAddress for Lists address verification service.
Geocoding data (meaning latitude/longitude coordinates as they relate to physical addresses) is available from two main groups:
Public data - interpreted results, generally less expensive, street level resolution
Private data - positioned results, VERY expensive, building/structure/rooftop resolution
The public data comes from the US Census Bureau and is known as TIGER data (Warning, there's nothing to see there unless you can write code that will interpret the data that is provided.) This TIGER data is gathered and compiled using taxpayer funds and thus it is available to the public at no charge. Taking the data and combining it with data from other sources to make it valuable is the tricky part and there is no general consensus on the best way to do that. If the source data is based on the TIGER dataset, you will always end up with "interpreted data." Interpreted means that it is a calculated result based on various datasets and best practices.
Wikipedia has some excellent examples of how the interpreted results work as well as some of the downsides.
The private data comes from various sources, such as Google, Bing, Navteq, Garmin, and other such services. As you can imagine, they don't share and so they each have had to make their own maps and gather their own data. They compete for accuracy, image resolution, timeliness, and other features. Generally this data is gathered from a fleet of vehicles that travel on public streets gathering map/gps data as they go. Another method used is to overlay high resolution images onto a known map and then manually pick out the structures and assign the corresponding geocode. This is very labor-intensive and still somewhat error-prone.
If a property has two structures on it, one a house and a much larger one being a barn, the barn may get tagged as the principal structure since it is the largest.
Assigning a precise latitude and longitude coordinate doesn't automatically match up with a physical address. The mailbox for a house can be several hundred feet away from the actual house. This is the reason that GoogleMaps can show you a house in street view but will only approximate the address.
How expensive is the data from the private group? It's hard to say. None of the companies I have contacted want to give away the price without first qualifying you as a buyer with sufficient resources. I have yet to get a straight answer on pricing on the first call. The best estimates that I have been able to find are from $500k to millions per year. AND, that data comes with a very strict set of terms. you will not be able to make that data available through a service (if that's what you're doing) without paying a lot more.
Geocoding is a very difficult and time-consuming task to begin with, and if absolute precision is required, the complexity of the task is greatly compounded. Best results will come from first determining what level of precision is needed for your project and then comparing that to your budget. Our lat/lon data is based on the TIGER dataset, and as such we are not able to give you 100% accurate rooftop or "structure" data. Anyone that uses the TIGER data will be limited the same way. (Any company that promises rooftop level data based on the TIGER data is not being very honest with you.)
Our geodata is compiled using a number of complex algorithms and derived from the most precise data that is available to us. We are constantly looking for ways to improve these algorithms by combining data from multiple sources.
We currently have 28,830,200 unique lat/lon coordinates. Within that set, here is a breakdown of the various levels of geocode precision:
- ZIP5 - 41,799 (.1%)
- ZIP6 - 114,345 (.4%)
- ZIP7 - 625,935 (2.1%)
- ZIP8 - 3,747,379 (13%)
- ZIP9 - 24,300,742 (84%)