• 0

[SQL] Compare Addresses


Question

Hello gang,

We are having an issue where we need to compare some address data to insure that we are not entering a duplicate record. (The data is coming from multiple outside sources, so an identity/GUID won't work... but thanks for thinking of that)

Consider the following

Company Name

Address

City

State

ZIP Code

If the first person was to send us:

"Bob's Restaurant"

"123 Main Street"

"New York"

"NY"

"10018"

and the second person sends us:

"Bobs Restaurant"

"123 Main St."

"New York"

"NY"

"10018"

Notice that the second restaurant does not have the appostrophy and the street name is abreviated. So an exact match is SQL would not work.

We're looking at Latitude and Longitude, and that should get us there, but in case anyone has any other thoughts..... thanks.

Link to comment
https://www.neowin.net/forum/topic/719338-sql-compare-addresses/
Share on other sites

8 answers to this question

Recommended Posts

  • 0

I'm not sure what database server you are using, but how about something like soundex()? If you can compile a list of common ways to abbreviate parts of an address, and could be reasonably sure that all occurances of St. really equals Street (I'm not sure how your data is stored, if you have city in one column, state in one column, street address in one column, name of location in one column, etc. then this would probably doable), then this would work for you:

DROP TEMPORARY TABLE IF EXISTS t1;
CREATE TEMPORARY TABLE t1 
(
  address text
);

INSERT INTO t1 (address) 
VALUES 
('Bob\'s Restaurant 123 Main Street New York NY 10018'),
('Bobs Restaurant 123 Main St. New York NY 10018');

UPDATE t1 SET address = REPLACE(address, 'St.', 'Street');

SELECT address, 
		   soundex(address) 
FROM t1;

This returns this data set, so, you can see that it is a duplicate address:

+----------------------------------------------------+-------------------+
| address											| soundex(address)  |
+----------------------------------------------------+-------------------+
| Bob's Restaurant 123 Main Street New York NY 10018 | B2623653523635625 |
| Bobs Restaurant 123 Main Street New York NY 10018  | B2623653523635625 |
+----------------------------------------------------+-------------------+

In practice, the way I would do this is to create a stored procedure that takes an address as a parameter or set of parameters, one param for each part of the address. Then you could run a set of replaces against the parameter to normalize it to the rest of the data already in the database (replace St. with Street, or, Street with St., whichever you want, Dr. with Drive, Ln. with Lane, etc.). So, assuming your sp took a param named p_address, and you have already ran your replaces to normalize the string, you could run a query like this:

DECLARE p_is_duplicate tinyint(1) NOT NULL DEFAULT 0;

SET p_is_duplicate = EXISTS(SELECT * FROM addresses WHERE soundex(address) = soundex(p_address);

IF NOT p_is_duplicate THEN 
  INSERT INTO address SET address = p_address;
END IF;

Hope this helps, latitude and longitude will probably be more accurate in the long run, but something like this may work for you depending on what dbms you use and how your data is stored. :)

  • 0

UPDATE:

There is an issue with SoundEx where is does not acurately compare. The two examples below both return values that state they are the same even though they are not (different city, state and zip)

I understand that I can break out the street while doing my compare, and I may do that, I simply wanted to alert other developers of this issue.

SELECT SOUNDEX ('Bob 500 1st St. Seattle, WA 98109'), SOUNDEX ('Bobby 500 First St New York, NY')

SELECT DIFFERENCE ('Bob 500 1st St, Seattle 98109', 'Bobby 500 First St New York, NY 10036')

I may actually may use a DLL that allows me to create lat/long and compare these values....

Thanks again for the feedback and idea.

  • 0

I knew there was another SoundEx example that I wanted to mention:

SELECT SOUNDEX ('500 First Street'), SOUNDEX ('400 First Street')

Both of these come back as a match. So be aware that soundex is prob not the best solution for address compare. I've moved on to (attempting) to work with lat/long. (It's the getting of these values that is killing me)

Have a great weekend.

  • 0
I'm not sure what database server you are using, but how about something like soundex()? If you can compile a list of common ways to abbreviate parts of an address, and could be reasonably sure that all occurances of St. really equals Street (I'm not sure how your data is stored, if you have city in one column, state in one column, street address in one column, name of location in one column, etc. then this would probably doable), then this would work for you:

DROP TEMPORARY TABLE IF EXISTS t1;
CREATE TEMPORARY TABLE t1 
(
  address text
);

INSERT INTO t1 (address) 
VALUES 
('Bob\'s Restaurant 123 Main Street New York NY 10018'),
('Bobs Restaurant 123 Main St. New York NY 10018');

UPDATE t1 SET address = REPLACE(address, 'St.', 'Street');

SELECT address, 
		   soundex(address) 
FROM t1;

This returns this data set, so, you can see that it is a duplicate address:

+----------------------------------------------------+-------------------+
| address											| soundex(address)  |
+----------------------------------------------------+-------------------+
| Bob's Restaurant 123 Main Street New York NY 10018 | B2623653523635625 |
| Bobs Restaurant 123 Main Street New York NY 10018  | B2623653523635625 |
+----------------------------------------------------+-------------------+

In practice, the way I would do this is to create a stored procedure that takes an address as a parameter or set of parameters, one param for each part of the address. Then you could run a set of replaces against the parameter to normalize it to the rest of the data already in the database (replace St. with Street, or, Street with St., whichever you want, Dr. with Drive, Ln. with Lane, etc.). So, assuming your sp took a param named p_address, and you have already ran your replaces to normalize the string, you could run a query like this:

DECLARE p_is_duplicate tinyint(1) NOT NULL DEFAULT 0;

SET p_is_duplicate = EXISTS(SELECT * FROM addresses WHERE soundex(address) = soundex(p_address);

IF NOT p_is_duplicate THEN 
  INSERT INTO address SET address = p_address;
END IF;

Hope this helps, latitude and longitude will probably be more accurate in the long run, but something like this may work for you depending on what dbms you use and how your data is stored. :)

Soundex performs a match based on the pronumciation of the first four constanants it encounters (I had to look this up before because I ran into a similar situation). So "Bob's R" and "Bobs R" would be used for comparison and thus match.

It's also a good habit to ALWAYS replace input quotation marks and apostrophes with "`" or "``". You're open to SQL injection attacks if you don't.

  • 0

Try using a CHECKSUM instead:

SELECT * FROM ADDRESS WHERE CHECKSUM(company + address + city + state + zip) = CHECKSUM(@company + @address + @city + @state + @zip)

Be sure to encase any fields with an isnull() to default the value out to '' if any of those fields can be null. EG

SELECT * FROM ADDRESS WHERE CHECKSUM(isnull(company, '') + isnull(address, '') + isnull(city, '') + isnull(state, '') + isnull(zip, '')) = CHECKSUM(@company + @address + @city + @state + @zip)

If the query returns any rows, you have a duplicate.

This topic is now closed to further replies.
  • Recently Browsing   0 members

    • No registered users viewing this page.
  • Posts

    • Anybody that thinks flying cars were possible are idiots. Everyone would basically need a pilot licence, can you imagine how insane and dangerous that would be, people can barely handle driving on land safely right now.
    • Microsoft Edge 149.0.4022.80 by Razvan Serea Microsoft Edge is a super fast and secure web browser from Microsoft. It works on almost any device, including PCs, iPhones and Androids. It keeps you safe online, protects your privacy, and lets you browse the web quickly. You can even use it on all your devices and keep your browsing history and favorites synced up. Built on the same technology as Chrome, Microsoft Edge has additional built-in features like Startup boost and Sleeping tabs, which boost your browsing experience with world class performance and speed that are optimized to work best with Windows. Microsoft Edge security and privacy features such as Microsoft Defender SmartScreen, Password Monitor, InPrivate search, and Kids Mode help keep you and your loved ones protected and secure online. Microsoft Edge has features to keep both you and your family protected. Enable content filters and access activity reports with your Microsoft Family Safety account and experience a kid-friendly web with Kids Mode. The new Microsoft Edge is now compatible with your favorite extensions, so it’s easy to personalize your browsing experience. Microsoft Edge 149.0.4022.80 changelog: Fixes Fixed an issue that prevented QR code generation from working. Feature updates Intune MAM Protected Downloads. The protected downloads feature for Intune MAM will now save downloaded files to the Documents > Microsoft Edge > Downloads folder in OneDrive. Extensions monitoring in the Edge management service. The Microsoft Edge management service now allows admins to gain visibility into extensions installed across their managed users. From the extensions monitoring page, admins can see which extensions have been installed as well as manage user requests for blocked extensions. For more information, see Microsoft Edge Extensions Monitoring. Validate Edge builds early with enterprise preview. Enterprise preview provides a simpler way for admins to flight pre-release Edge builds to their users. To reduce friction and bolster usage, users will receive pre-release builds directly inside of their Stable Edge application. Admins can allow users to easily opt-out of the preview experience, using built-in rollback to switch between their pre-release and stable channels with ease. Microsoft 365 admin center users can configure the feature, view their flighting population, and receive personalized recommendations all in one place. For more information, see Get started with Enterprise Preview in Microsoft Edge. Download: Microsoft Edge (64-bit) | 193.0 MB (Freeware) Download: Microsoft Edge (32-bit) | 170.0 MB Download: Microsoft Edge (ARM64) | 188.0 MB View: Microsoft Edge Website | Release History Get alerted to all of our Software updates on Twitter at @NeowinSoftware
    • The machines are starting to fight back any way they can.
    • No news articles about the Arch Linux repo being majorly infected with malware?!?
    • Waymo recalls self-driving software after cars enter closed freeway work zones by Paul Hill Waymo, the self-driving car maker owned by Alphabet – the parent company of Google –, has recalled some of its fifth-generation Automated Driving Systems (ADS). It did so after some of its cars drove through closed construction zones. According to the National Highway Traffic Safety Administration (NHTSA), the affected vehicles were capable of driving through a closed freeway construction zone and continuing to drive at speed. The listing on the NHTSA website says that Waymo is currently developing a solution to fix this issue, but in the meantime, freeway driving is being restricted. Waymo will update its ADS software so that vehicles can detect when they can avoid entering construction zones. According to the Safety Recall Report, on April 20, 2026, Waymo’s Field Safety Committee began meetings reviewing an event from April 11, 2026, and five events from April 19, 2026, where Waymo’s autonomous vehicles didn’t recognize and drove past ramp closure signs into the pre-planned freeway construction zones. This took place in Phoenix, Arizona. Separately, on May 18, 2026, seven Waymo vehicles entered freeway lanes with active construction in the San Francisco Bay Area by driving between cones that were placed to show the lane was closed. On the back of both of these events, Waymo restricted freeway driving until it could address the issue. In June, Waymo’s Safety Board reviewed the issue and additional information related to ADS performances around construction zones; then, as a result, it decided to conduct a recall. This development is not good for Waymo as it adds to a growing list of technical hiccups its cars have experienced. Ultimately, it will lead to more scrutiny from lawmakers around the world who will be more cautious about letting autonomous vehicles on their roads without tighter regulation. For readers in areas where Waymo operates, does this news make you more wary about stepping into one of these vehicles?
  • Recent Achievements

    • Week One Done
      Eurosoft10 earned a badge
      Week One Done
    • One Month Later
      Eurosoft10 earned a badge
      One Month Later
    • One Year In
      Skeet Campbell earned a badge
      One Year In
    • One Month Later
      Sharbel earned a badge
      One Month Later
    • First Post
      BizSAR earned a badge
      First Post
  • Popular Contributors

    1. 1
      +primortal
      599
    2. 2
      +Edouard
      190
    3. 3
      PsYcHoKiLLa
      79
    4. 4
      Michael Scrip
      77
    5. 5
      Steven P.
      69
  • Tell a friend

    Love Neowin? Tell a friend!