• 0

[SQL] Compare Addresses


Question

Hello gang,

We are having an issue where we need to compare some address data to insure that we are not entering a duplicate record. (The data is coming from multiple outside sources, so an identity/GUID won't work... but thanks for thinking of that)

Consider the following

Company Name

Address

City

State

ZIP Code

If the first person was to send us:

"Bob's Restaurant"

"123 Main Street"

"New York"

"NY"

"10018"

and the second person sends us:

"Bobs Restaurant"

"123 Main St."

"New York"

"NY"

"10018"

Notice that the second restaurant does not have the appostrophy and the street name is abreviated. So an exact match is SQL would not work.

We're looking at Latitude and Longitude, and that should get us there, but in case anyone has any other thoughts..... thanks.

Link to comment
https://www.neowin.net/forum/topic/719338-sql-compare-addresses/
Share on other sites

8 answers to this question

Recommended Posts

  • 0

I'm not sure what database server you are using, but how about something like soundex()? If you can compile a list of common ways to abbreviate parts of an address, and could be reasonably sure that all occurances of St. really equals Street (I'm not sure how your data is stored, if you have city in one column, state in one column, street address in one column, name of location in one column, etc. then this would probably doable), then this would work for you:

DROP TEMPORARY TABLE IF EXISTS t1;
CREATE TEMPORARY TABLE t1 
(
  address text
);

INSERT INTO t1 (address) 
VALUES 
('Bob\'s Restaurant 123 Main Street New York NY 10018'),
('Bobs Restaurant 123 Main St. New York NY 10018');

UPDATE t1 SET address = REPLACE(address, 'St.', 'Street');

SELECT address, 
		   soundex(address) 
FROM t1;

This returns this data set, so, you can see that it is a duplicate address:

+----------------------------------------------------+-------------------+
| address											| soundex(address)  |
+----------------------------------------------------+-------------------+
| Bob's Restaurant 123 Main Street New York NY 10018 | B2623653523635625 |
| Bobs Restaurant 123 Main Street New York NY 10018  | B2623653523635625 |
+----------------------------------------------------+-------------------+

In practice, the way I would do this is to create a stored procedure that takes an address as a parameter or set of parameters, one param for each part of the address. Then you could run a set of replaces against the parameter to normalize it to the rest of the data already in the database (replace St. with Street, or, Street with St., whichever you want, Dr. with Drive, Ln. with Lane, etc.). So, assuming your sp took a param named p_address, and you have already ran your replaces to normalize the string, you could run a query like this:

DECLARE p_is_duplicate tinyint(1) NOT NULL DEFAULT 0;

SET p_is_duplicate = EXISTS(SELECT * FROM addresses WHERE soundex(address) = soundex(p_address);

IF NOT p_is_duplicate THEN 
  INSERT INTO address SET address = p_address;
END IF;

Hope this helps, latitude and longitude will probably be more accurate in the long run, but something like this may work for you depending on what dbms you use and how your data is stored. :)

  • 0

UPDATE:

There is an issue with SoundEx where is does not acurately compare. The two examples below both return values that state they are the same even though they are not (different city, state and zip)

I understand that I can break out the street while doing my compare, and I may do that, I simply wanted to alert other developers of this issue.

SELECT SOUNDEX ('Bob 500 1st St. Seattle, WA 98109'), SOUNDEX ('Bobby 500 First St New York, NY')

SELECT DIFFERENCE ('Bob 500 1st St, Seattle 98109', 'Bobby 500 First St New York, NY 10036')

I may actually may use a DLL that allows me to create lat/long and compare these values....

Thanks again for the feedback and idea.

  • 0

I knew there was another SoundEx example that I wanted to mention:

SELECT SOUNDEX ('500 First Street'), SOUNDEX ('400 First Street')

Both of these come back as a match. So be aware that soundex is prob not the best solution for address compare. I've moved on to (attempting) to work with lat/long. (It's the getting of these values that is killing me)

Have a great weekend.

  • 0
I'm not sure what database server you are using, but how about something like soundex()? If you can compile a list of common ways to abbreviate parts of an address, and could be reasonably sure that all occurances of St. really equals Street (I'm not sure how your data is stored, if you have city in one column, state in one column, street address in one column, name of location in one column, etc. then this would probably doable), then this would work for you:

DROP TEMPORARY TABLE IF EXISTS t1;
CREATE TEMPORARY TABLE t1 
(
  address text
);

INSERT INTO t1 (address) 
VALUES 
('Bob\'s Restaurant 123 Main Street New York NY 10018'),
('Bobs Restaurant 123 Main St. New York NY 10018');

UPDATE t1 SET address = REPLACE(address, 'St.', 'Street');

SELECT address, 
		   soundex(address) 
FROM t1;

This returns this data set, so, you can see that it is a duplicate address:

+----------------------------------------------------+-------------------+
| address											| soundex(address)  |
+----------------------------------------------------+-------------------+
| Bob's Restaurant 123 Main Street New York NY 10018 | B2623653523635625 |
| Bobs Restaurant 123 Main Street New York NY 10018  | B2623653523635625 |
+----------------------------------------------------+-------------------+

In practice, the way I would do this is to create a stored procedure that takes an address as a parameter or set of parameters, one param for each part of the address. Then you could run a set of replaces against the parameter to normalize it to the rest of the data already in the database (replace St. with Street, or, Street with St., whichever you want, Dr. with Drive, Ln. with Lane, etc.). So, assuming your sp took a param named p_address, and you have already ran your replaces to normalize the string, you could run a query like this:

DECLARE p_is_duplicate tinyint(1) NOT NULL DEFAULT 0;

SET p_is_duplicate = EXISTS(SELECT * FROM addresses WHERE soundex(address) = soundex(p_address);

IF NOT p_is_duplicate THEN 
  INSERT INTO address SET address = p_address;
END IF;

Hope this helps, latitude and longitude will probably be more accurate in the long run, but something like this may work for you depending on what dbms you use and how your data is stored. :)

Soundex performs a match based on the pronumciation of the first four constanants it encounters (I had to look this up before because I ran into a similar situation). So "Bob's R" and "Bobs R" would be used for comparison and thus match.

It's also a good habit to ALWAYS replace input quotation marks and apostrophes with "`" or "``". You're open to SQL injection attacks if you don't.

  • 0

Try using a CHECKSUM instead:

SELECT * FROM ADDRESS WHERE CHECKSUM(company + address + city + state + zip) = CHECKSUM(@company + @address + @city + @state + @zip)

Be sure to encase any fields with an isnull() to default the value out to '' if any of those fields can be null. EG

SELECT * FROM ADDRESS WHERE CHECKSUM(isnull(company, '') + isnull(address, '') + isnull(city, '') + isnull(state, '') + isnull(zip, '')) = CHECKSUM(@company + @address + @city + @state + @zip)

If the query returns any rows, you have a duplicate.

This topic is now closed to further replies.
  • Recently Browsing   0 members

    • No registered users viewing this page.
  • Posts

    • I bought this game. Played it for an hour, and then got a refund from Steam. Not a fun game at all.
    • Nothing Ear buds with active noise cancellation are at their lowest price ever with 51% off by Fiza Ali Amazon is currently offering the Nothing Ear wireless earbuds at their lowest price ever with 51% off limited prime deal. The earbuds feature an 11mm dynamic drivers with a ceramic diaphragm, and support high-resolution audio codecs including AAC, SBC, LDAC, and LHDC 5.0. They support active noise cancellation of up to 45dB across a frequency range of up to 5000Hz, and include a smart ANC algorithm, adaptive noise cancellation, and a transparency mode that allows surrounding sounds to be heard when needed. Connectivity is provided via Bluetooth 5.3, with support for multiple profiles including HFP, A2DP, AVRCP, and others. The earbuds also support dual connection, allowing them to be paired with two devices at the same time. Additional features include IP54 water and dust resistance for the earbuds and IP55 for the charging case, in-ear detection, pinch controls, low-latency mode, Google Fast Pair, Microsoft Swift Pair, and a three-microphone system per earbud for clearer voice calls. The Nothing X app, available on Android and iOS, provides access to custom EQ settings, bass enhancement, personal sound profiles, ear tip fit testing, firmware updates, customisable controls, dual-device management, and a find-my-earbuds feature. In terms of battery performance, each earbud has a 46mAh battery and the charging case has a 500mAh capacity. With active noise cancellation (ANC) turned off, the earbuds should offer up to 8.5 hours of playback on a single charge and up to 40.5 hours in total with the charging case. With ANC enabled, playback should last up to 5.2 hours on the earbuds and up to 24 hours with the case. For calls, talk time should reach up to 5 hours on the earbuds and 23 hours with the case when ANC is off, while ANC on should provide up to 4 hours on the earbuds and 18 hours with the case. Finally, fast charging should deliver up to 10 hours of playback from 10 minutes of charging when ANC is disabled. Nothing Ear Wireless Earbuds Bluetooth: $73.15 (Amazon US) - 51% off Good to know This Amazon deal is U.S. specific, and not available in other regions unless specified. We only use first-party seller links (at the time of article publishing); ensure that you purchase from a first-party seller link only. Check out Today's Deals on Amazon | or our recent tech deals. Become a Prime member (for Students or SNAP) via Neowin Get Prime Access - Prime for half price (for qualifying Medicaid, EBT, SNAP) Subscribe to Prime Video, Audible Plus, Music Unlimited or Kindle Unlimited via Neowin As an Amazon Associate, we earn from qualifying purchases.
    • Microsoft officially launched its Copilot Cowork enterprise AI agent on June 16, 2026, switching to usage-based pricing on the same day it disclosed it is considering a Microsoft-hosted version of China's DeepSeek V4 as a lower-cost engine for the platform — a pairing that puts the company on a collision course with both its enterprise customers' security teams and a White House that has spent months trying to wall off Chinese AI from American infrastructure.................... https://www.techtimes.com/articles/318647/20260618/microsoft-eyes-deepseek-v4-copilot-cowork-what-azure-hosting-cannot-fix.htm  
    • Forza Horizon 6 gets another hotfix for one of the game's online modes by Taras Buria Recently, Forza Horizon 6 players discovered an interesting glitch that allowed farming a crazy amount of in-game credits in a few minutes. Playground Games quickly pulled the plug on the exploit by disabling one of the game's online modes, and today, the studio is rolling out another hotfix. In my review, I complained about the game still showering gamers with cars, credits, and wheelspins. As such, earning money in Forza Horizon 6 is not a particularly difficult task. You simply have to play the game, crazy, I know. However, people still found an easier path to becoming a billionaire in Forza Horizon 6. All you had to do was purchase the Hummer EV, install a specific tune, shift in reverse while going at about 15 MPH, hit a wall, and get launched into the stratosphere at the speed of light. While mid-air, launch Eliminator and quickly get eliminated. Boom, the game just awarded you with a few million in-game credits. Initially, Playground Games disabled Eliminator to prevent people from farming credits. Now, following the release of the first balancing update, developers are rolling out a new update that re-enables Eliminator and gives users a free McLaren Sabre as a gesture of goodwill. Here is the changelog: One critical issue remains unpatched, though. There are quite a few reports of the game wiping gamers' saves, and developers are still looking into that. To avoid potential data loss, Playground Games recommends taking one of the steps outlined in a previously published support article.
  • Recent Achievements

    • First Post
      AndreaB earned a badge
      First Post
    • Week One Done
      Huge Trailer earned a badge
      Week One Done
    • Week One Done
      Classifyskilleducation earned a badge
      Week One Done
    • One Month Later
      eurospharma62 earned a badge
      One Month Later
    • Week One Done
      With What earned a badge
      Week One Done
  • Popular Contributors

    1. 1
      +primortal
      591
    2. 2
      +Edouard
      170
    3. 3
      PsYcHoKiLLa
      76
    4. 4
      Michael Scrip
      67
    5. 5
      neufuse
      64
  • Tell a friend

    Love Neowin? Tell a friend!