US Extract API - Address Extraction Made Easy

This page describes how to use the extraction endpoint to find and validate addresses in arbitrary text input.

Contents

  1. HTTP Request
    1. URL Composition
    2. Request Methods
    3. Input Fields
    4. Headers
  2. HTTP Response
    1. Status Codes and Results
    2. Annotated Response
  3. Supplementary Materials
    1. Subscription Usage
    2. Credits
    3. SSL/TLS Information
    4. Try the demo

HTTP Request: URL Composition

Proper URL construction is required for all API requests. Here is an example URL:

https://us-extract.api.smarty.com?auth-id=123&auth-token=abc

Here is a more granular examination of the example above:

URL Components Values Notes
Scheme https NOTE: Non-secure http requests are not supported
Hostname us-extract.api.smarty.com
Query String ?auth-id=123&auth-token=abc Authentication information, inputs, etc. Additional query string parameters are introduced in the next section.

For additional information about URLs, please read our article about URL components.

Please note that all query string parameter values must be url-encoded (spaces become + or %20, for example) to ensure that the data is transferred correctly. A common mistake we see is a non-encoded pound sign (#) like in an apartment number (# 409). This character, when properly encoded in a URL, becomes %23. When not encoded this character functions as the fragment identifier, which is ignored by our API servers.

HTTP Request: Supported Methods/Verbs

HTTP requests can be categorized according to their HTTP method. Most HTTP requests are defined using the GET method. We call these "get requests." Other common methods are PUT, POST, and DELETE.

The following methods are supported by this API:

Note: When calling any of our APIs using "embedded key" authentication, only the HTTP GET method is allowed; this means embedded keys are NOT supported in this API. With "secret key" authentication, only the HTTP POST method is allowed

Send the text with addresses to extract as the body of the request. Set the value of the Content-Type header to text/plain; charset=utf-8. Each request body is limited to a maximum length of 64 kilobytes. Here's an example POST request submitted using the curl command:

curl -v 'https://us-extract.api.smarty.com/?
			auth-id=YOUR_AUTH_ID&
			auth-token=YOUR_AUTH_TOKEN'
	-H 'Content-Type: text/plain; charset=utf-8'
	--data-binary '
		There are addresses everywhere.
		1109 Ninth 85007
		Smarty can find them.
		3785 Las Vegs Av.
		Los Vegas, Nevada
		That is all.'

HTTP Request: Input Fields

Along with the body of your POST request (which is the input string from which to extract addresses) there are several other parameters which have an effect on address extraction behavior. These parameters, which are submitted as query string parameters, are detailed in the table below:

Name Default Description
html derived HTML input is automatically detected and stripped, but you can manually specify whether your input is formatted as HTML by setting this to true or false.
aggressive false Aggressive mode may use more lookups on your account, but it can find addresses in populous cities without needing a state and ZIP Code , as well as finding addresses in some messy inputs.
addr_line_breaks true This parameter specifies if addresses in your input will ever have line breaks.
addr_per_line 0 Limits the extractor to a certain number of addresses per input line. Generally, you will not need this parameter unless you are submitting structured data that you know will only have a certain number of addresses per line. Set to 0 (default) for no limit.
license derived Specifies the license or licenses (comma separated) to use for this lookup. Valid values can be found in your account's Subscriptions page. If multiple licenses are specified, they are considered in left to right order. We recommend that each request explicitly specify a license value.
match strict The match output strategy to be employed for this lookup. See more here.

HTTP Request: Headers

You must include the following required HTTP headers in all requests:

Header Description Example
Content-Type The purpose of the Content-Type field is to describe the data contained in the body fully enough that the receiving user agent can pick an appropriate agent or mechanism to present the data to the user, or otherwise deal with the data in an appropriate manner. Content-Type: text/plain; charset=utf-8
Host The Host request header field specifies the internet host and port number of the resource being requested Host: us-extract.api.smarty.com

HTTP Response: Status Codes and Results

Responses will have a status header with a numeric value. This value is what you should check for when writing code to parse the response. The only response body that should be read and parsed is a 200 response.

Status Code Response and Explanation
401 Unauthorized: The credentials were provided incorrectly or did not match any existing, active credentials.
402 Payment Required: There is no active subscription for the account associated with the credentials submitted with the request.
400 Bad Request (Malformed Payload): The request body was blank or otherwise malformed.
422 Unprocessable Entity: Returns errors describing what needs to be corrected.
429 Too Many Requests: When using public embedded key authentication, we restrict the number of requests coming from a given source over too short of a time. If you use embedded key authentication, you can avoid this error by adding your IP address as an authorized host for the embedded key in question.
413 Request Entity Too Large: The request body was larger than 64 Kilobytes.
200 OK (success!): The response body is a JSON object containing metadata about the results and zero or more extracted addresses from the input provided with the request. See the annotated example below for details.

HTTP Response: An Annotated Example

Rather than writing your own code to parse the JSON response, we recommend using a tried and tested JSON parser that is specific for your programming language. There is a very comprehensive list of such tools (as well as the complete JSON specification) at json.org.

NOTE: Any returned fields that are not defined within this document should be considered experimental and may be changed or discontinued at any time without notice.

curl -v 'https://us-extract.api.smarty.com?
			auth-id=YOUR+AUTH-ID+HERE&
			auth-token=YOUR+AUTH-TOKEN+HERE'
	-H 'Content-Type: text/plain; charset=utf-8'
	--data-binary '
	<div>
		<p>
			Meet me at 5732 Lincoln Drive Minneapolis MN
		</p>
	</div>'

The above sample request yields the following JSON output. NOTE: We have modified the output with // comment statements (which are actually NOT valid JSON) as minimal documentation. Also, it is important to notice that the api_output field has structural parity with the response of the address verification endpoint:

{
	"meta":{
		// How many total lines of input were received?
		"lines":6,

		// Did the text have unicode characters or was it plain ASCII?
		"unicode":false,

		// How many addresses were found in the input?
		"address_count":1,

		// How many of the found addresses were valid?
		"verified_count":1,

		// Length of the input in bytes:
		"bytes":53,

		// Length of the input in characters:
		"character_count":53
	},

	// Array of addresses extracted from the input.
	"addresses":[
		{
		// The actual input text:
		"text":"5732 Lincoln Drive Minneapolis MN",

		// Was this address verified successfully?
		"verified":true,

		// The starting line of the 'text' in the input:
		"line":4,

		// The starting character index of the 'text':
		"start":16,

		// The ending character index of the text:
		"end":49,

		// The actual response from the US Street API:
		"api_output":[
			{
				"candidate_index":0,
				"delivery_line_1":"5732 Lincoln Dr",
				"last_line":"Minneapolis MN 55436-1608",
				"delivery_point_barcode":"554361608327",
				"components":{
					"primary_number":"5732",
					"street_name":"Lincoln",
					"street_suffix":"Dr",
					"city_name":"Minneapolis",
					"state_abbreviation":"MN",
					"zipcode":"55436",
					"plus4_code":"1608",
					"delivery_point":"32",
					"delivery_point_check_digit":"7"
				},
				"metadata":{
					"record_type":"S",
					"zip_type":"Standard",
					"county_fips":"27053",
					"county_name":"Hennepin",
					"carrier_route":"C009",
					"congressional_district":"03",
					"rdi":"Commercial",
					"elot_sequence":"0035",
					"elot_sort":"A",
					"latitude":44.90127,
					"longitude":-93.40045,
					"precision":"Zip9",
					"time_zone":"Central",
					"utc_offset":-6,
					"dst":true
				},
				"analysis":{
					"dpv_match_code":"Y",
					"dpv_footnotes":"AABB",
					"dpv_cmra":"N",
					"dpv_vacant":"N",
					"active":"Y",
					"footnotes":"N#"
					}
				}
			]
		}
	]
}

Subscription Usage

With the extraction endpoint, the usage on your subscription varies depending on your input. One request to the extraction API will use zero or more lookups on your subscription. Aggressive mode will probably use more lookups, but it may find more addresses.

Credits

The US Extract API is brought to you, in part, by the following source code package:

github.com/glenn-brown/golang-pkg-pcre

Copyright (c) 2011 Florian Weimer. All rights reserved.

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:

* Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.

* Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

SSL/TLS Information

Use modern security software and cipher suites.

Ready to get started?