Bulk Address Validation: Command-Line Interface

Reading not your thing? Watch our quick-start video instead (Windows | Mac).

If you have large lists of addresses to process, and you have some experience with the command line, our Command-Line Interface (CLI) might become your new best friend. It can process millions of either US or international (non-US) addresses very quickly. Each address processed will count as one "lookup" from your US or international subscription. (If you're not yet familiar with the command line, try our Web Interface. It can process up to 500,000 US or international addresses at once.)

Contents

  1. On Scripting and Automation
  2. Download
  3. Installation
  4. Preparing Your Input File
  5. Using the Interface
  6. The Output File
  7. The Log File
  8. Command-Line Parameters
  9. Updates
  10. Troubleshooting

An Important Note On Scripting and Automation

This Command-Line Interface tool is provided as a convenience for (mostly) non-computer programmers seeking to process large quantities of addresses formatted as CSV or PSV records. It is intended that it will be invoked manually by human users typing at a command prompt (not the most friendly of user experiences, we get it). The use-case of deploying this tool into an automated environment for the processing of ad-hoc address data is not supported. This constraint is based on how we provide software updates for this tool. If you need to process address data from deployed software running autonomously we recommend our officially supported SDKs. For those who seek an even more direct HTTP integration we also provide detailed US Street Address API documentation.

Download

You can download (free) the Command-Line Interface for the following platforms:

You're welcome!

Installation

After downloading one of the above packages, extract the contents of the archive to your desktop. You'll see a SmartyList folder containing the following files:

  • smartylist This is the application. Instead of double-clicking it, you will access it from the command line.
  • sample-input.csv This is a simple address list for your reference.
  • sample-output.csv This is the output produced by processing the sample-input.csv file above.
  • change-log.txt A log of recent changes made by the software developers.
  • DO-NOT-README.txt Actually, please read it.

Power users: Feel free to copy or move smartylist to wherever is convenient. On a Linux machine you might put it in /usr/local/bin or somewhere else that is already in your $PATH.

Preparing Your Input File

Save your input data as a CSV or PSV file (comma-separated-values or pipe-separated-values), within the SmartyList folder on your desktop. Within that file, have your data organized into columns using one of the combinations shown below. (The more data provided, the better.) The top row MUST consist of field names, spelled exactly as you see here.

For US addresses, use one of these combinations of columns:
street | city | state | zipcode
street | city | state
street | zipcode
address (entire address in a single field)

If you have secondary information (apartment/suite/etc.) in a separate column, label that column "secondary." Such a column can be added to any of the first three combinations shown above. For example:
street | secondary | city | state | zipcode

street city state zipcode
11310 Old Seward Highway Anchorage AK 99515
3211 Edwards Lake Pkwy Birmingham AL
11219 N Rodney Parham Road 72212
4507 North US Highway 89 Flagstaff AZ 86004

For international addresses, use one of these combinations of columns:
country | address1 | locality | administrative_area | postal_code
country | address1 | locality | administrative_area
country | address1 | postal_code
country | freeform (entire address except country in a single column)

country address1 locality administrative_area postal_code
AUS 200 River Terrace Kangaroo Point Queensland 4169
DEU Hainichener Strasse 64 Freiberg Sachsen
PYF 21 Allée Pierre Loti Papeete 98714
RUS ул. Фурштатская, д. 13 191028
JPN きみ野 6-1-8 大和市 神奈川県 242-0001

For either US or international addresses, you can include fields that contain non-address data (like ID number or business name). All your input data will be returned untouched as part of the output. (If you do include non-address data, be sure to give those data fields non-address names.)

One final consideration: Make sure your list doesn't include blank lines (except at the end). By "blank lines" we mean lines that have no delimiters (commas or tabs) and no data except a carriage return character (and/or line feed character). Blank lines can cause line numbers to output incorrectly, which makes pasting back into a spreadsheet a bit tricky. If you insist on having blank lines, make sure each record has an 'ID' field containing a unique value.

Using the Interface

Open your favorite command-line application, and use the "change directory" command to navigate to the directory where your Command Line Interface files reside. On Windows, we recommend running this command as an administrator. This is what that might look like:

Windows:

cd /Users/[username]/Desktop/smartylist_windows_latest

Mac:

cd ~/Desktop/smartylist_osx_latest

Three specific command-line parameters are required in order to process a list: -auth-id, -auth-token, and -input. (To find your -auth-id and -auth-token, open the API Keys tab of your account and look under the heading of Secret Keys.) The -input parameter tells the tool where your input file is. If you placed your input file inside of the SmartyList folder, the complete command to process it might look like this:

Windows:

smartylist -auth-id="123" -auth-token="Abc" -input="your_file"

Mac:

./smartylist -auth-id="123" -auth-token="Abc" -input="your_file"

We suggest you try a short list first, to make sure everything is working as expected. When you run the command, the terminal will first display your current configuration settings, so you can verify that they are as desired. It will also list your input field names, and below those, the matching data type for each. Make sure these are correct.

Finally, the prompt will ask if everything appears to be in order. If everything looks right, type "y" then hit "enter." During processing, the terminal will display a progress bar. (Although, if your list is small, the job will be done almost instantly.)

The Output File

By default, the output file will be placed next to the input file, and it will be named like the input file, except with "-output" appended. (If you wish, you can specify a different output directory using the -output command-line parameter.)

When viewing the output file, you will see all of your original data fields on the left, followed by an empty field, followed by our output fields on the right, with field names in brackets.

The CLI output fields for US addresses are very similar to the raw output from the US Street Address API, though in a different order. For an explanation of the US output fields, please see Address Output Fields.

The CLI output fields for international addresses are likewise very similar to the raw output from the International Street Address API. There are two differences: (1) There is one new field in the CLI output for international addresses: line_number. This is simply a numbering of rows, to help keep track of their original output order. (2) The "changes" output fields, which are part of the raw API response, are currently absent from the CLI output. These fields will likely be added in a future update to the CLI.

The Log File

Every time you process a list with the Command-Line Interface, it will produce a log file and place it next to the corresponding input file. The name of the log file will follow this pattern:

[name-of-input-file]-log_[date-time]

The file will contain all the information displayed by the terminal before processing, as well as a precise play-by-play of the tool's various actions. In the unlikely event that your list fails to process, check the log file for the gory details of what happened. If you contact Support with questions, they may ask to see this file in order to aid in the debugging process.

Command-Line Parameters

Here we list all the command-line parameters that can be used with our Command-Line Interface. As explained above, the first three parameters listed below are all that are required to process a list. The others are optional; you can employ them to customize the tool's functionality. To use them, simply list them when you run smartylist at the command prompt, following this model:

smartylist -[parameter] -[another-parameter]
  • -auth-id="123"

    The auth-id value (or name of environment variable) to use for API requests.

  • -auth-token="Abc"

    The auth-token value (or name of environment variable) to use for API requests.

  • -input="path/to/the/input/file"

    The path to the input file which has addresses you want to validate.

  • -output="/path/to/the/output/file"

    If provided, this is where bulk validation tool will place the output file containing the results of processing your input file. If not provided, the tool will place the output alongside the input.

  • -log="path/to/the/log/file"

    If desired, you can tell the bulk validation tool where to put the diagnostic log file. If this parameter is not provided, the tool will place the log file alongside the input file.

  • -api="name-of-api"

    Valid values are "us-street", "international-street", and "us-enrichment". If this parameter is not provided, "us-street" will be assumed by default. If an invalid value is provided, an error will be thrown, and the process will not run.

  • -license="name-of-license"

    Use this parameter to specify the license to use for the chosen input file. Valid values can be found in your subscriptions page, under the appropriate subscription.

  • -base-url="http://www.your-site.com"

    The base URL to use for API requests if you are pointing to an onsite API installation. If you are using our regular cloud service, this parameter is not necessary.

  • -format="format-value"

    This parameter should only be used when processing US addresses. When you provide this parameter, the tool will override the default output format. Valid values are the same as the format parameter for the US Street Address API. If you would like to set formatting to the Project USA Format, we recommend you set format="project-usa”.

  • -match="match-value"

    This parameter should only be used when processing US addresses. When you provide this parameter, the tool will override any values in the match column of the input file. Valid values are the same as the match parameter for the US Street Address API. If you are using one of our newer "Core" licenses, we highly recommend you set match="enhanced".

  • -enrichment-dataset=”name-of-enrichment-dataset”

    This parameter should only be used when processing US addresses with the "us-enrichment" API to target a specific enrichment dataset.

  • -enrichment-data-subset=”name-of-enrichment-data-subset””

    This parameter should only be used when processing US addresses with the "us-enrichment" API and the enrichment dataset contains subsets.

  • -rate-limit=[integer]

    With this command-line parameter, you can choose how fast to send addresses to the API, in addresses per second. For example: -rate-limit=300 will cause the CLI to send 300 addresses per second. Valid values are positive integers. If a rate-limit value of less than 1 or non-integer is given, an error will be thrown, and the CLI process will not run. If this parameter is not used, no rate limit will be applied.

  • -proxy="www.your-proxy.com"

    The URL of your proxy, if one has been configured for your network. In most cases this flag is not necessary.

  • -silent

    Tells the tool to squelch all diagnostic output and process the list without a confirmation prompt if possible. (No value needed.)

  • -timeout

    If your network connection is slow you may receive timeout errors during execution such as context deadline exceeded (Client.Timeout or context cancellation while reading body). This parameter can help prevent those errors by allowing more time for the response to be received from the server. The default value is 5 (5 seconds). (No value needed.)

  • -version

    When you provide this parameter, the tool simply prints the version of the application to stdout and exits. (No value needed.)

Updates (Pay attention, this is important!)

Try this command at the command prompt:

smartylist -version

(Mac/Linux users may need to insert ./ in front of the word smartylist.)

If the latest version number doesn't match what you see, you might be missing out on recent improvements and should probably download and install the latest version.

The version number you see is the semantic version number of your copy of the application. Each of the three dot-delimited numbers is significant: major.minor.patch

The first of the three numbers in the version output is the "major" version number. If we need to release a new major version, any copies of the old version will be automatically disabled, requiring you to download and install the latest version before processing any additional lists. (Read that last sentence again...slowly...just to make sure it sinks in.) This is not something we will do often and certainly not ever without extensive consideration.

The second of the three numbers in the version output is the "minor" version number. Incrementing this number means we have released new functionality that is still backwards-compatible. It would behoove you to download and install the latest version. Until you do, a message will be sent to stderr and a non-zero exit status will be returned by the application as a signal that something is amiss. The application will continue to process your lists.

This third number refers to patches and bug fixes—corrections to existing behavior. If this number doesn't match, it would be a good idea (probably worth a promotion!) for you to download and install the latest version so you have the most current and correct software. In this situation, a message will be sent to stdout as a signal that something is amiss. The application will continue to process your lists.

New releases are announced in our open-source Changelog repository.

Troubleshooting

If Excel Doesn’t Display Some Characters Correctly

The Smarty CLI outputs a comma-delimited or pipe-delimited file with correct characters, in many languages, including all characters in the UTF-8 character set. If Excel is not displaying some characters correctly, we recommend this procedure:

  1. Instead of opening the output file directly with Excel (e.g., by double-clicking on the file), open Excel and open a brand new, empty file.
  2. From within the Excel application, go to the File menu and choose Import.
  3. During the Import process, choose the comma-delimited or pipe-delimited file that was output by the Smarty CLI.
  4. Also during the Import process, be sure to tell Excel that the "File origin" or character set you want to use is "Unicode (UTF-8)."
  5. Finally, you will be given the opportunity to set the file delimiters. Choose the one that makes the preview look right (probably either comma or pipe).
Ready to get started?