• 0

[SQL] Compare Addresses


Question

Hello gang,

We are having an issue where we need to compare some address data to insure that we are not entering a duplicate record. (The data is coming from multiple outside sources, so an identity/GUID won't work... but thanks for thinking of that)

Consider the following

Company Name

Address

City

State

ZIP Code

If the first person was to send us:

"Bob's Restaurant"

"123 Main Street"

"New York"

"NY"

"10018"

and the second person sends us:

"Bobs Restaurant"

"123 Main St."

"New York"

"NY"

"10018"

Notice that the second restaurant does not have the appostrophy and the street name is abreviated. So an exact match is SQL would not work.

We're looking at Latitude and Longitude, and that should get us there, but in case anyone has any other thoughts..... thanks.

Link to comment
https://www.neowin.net/forum/topic/719338-sql-compare-addresses/
Share on other sites

8 answers to this question

Recommended Posts

  • 0

I'm not sure what database server you are using, but how about something like soundex()? If you can compile a list of common ways to abbreviate parts of an address, and could be reasonably sure that all occurances of St. really equals Street (I'm not sure how your data is stored, if you have city in one column, state in one column, street address in one column, name of location in one column, etc. then this would probably doable), then this would work for you:

DROP TEMPORARY TABLE IF EXISTS t1;
CREATE TEMPORARY TABLE t1 
(
  address text
);

INSERT INTO t1 (address) 
VALUES 
('Bob\'s Restaurant 123 Main Street New York NY 10018'),
('Bobs Restaurant 123 Main St. New York NY 10018');

UPDATE t1 SET address = REPLACE(address, 'St.', 'Street');

SELECT address, 
		   soundex(address) 
FROM t1;

This returns this data set, so, you can see that it is a duplicate address:

+----------------------------------------------------+-------------------+
| address											| soundex(address)  |
+----------------------------------------------------+-------------------+
| Bob's Restaurant 123 Main Street New York NY 10018 | B2623653523635625 |
| Bobs Restaurant 123 Main Street New York NY 10018  | B2623653523635625 |
+----------------------------------------------------+-------------------+

In practice, the way I would do this is to create a stored procedure that takes an address as a parameter or set of parameters, one param for each part of the address. Then you could run a set of replaces against the parameter to normalize it to the rest of the data already in the database (replace St. with Street, or, Street with St., whichever you want, Dr. with Drive, Ln. with Lane, etc.). So, assuming your sp took a param named p_address, and you have already ran your replaces to normalize the string, you could run a query like this:

DECLARE p_is_duplicate tinyint(1) NOT NULL DEFAULT 0;

SET p_is_duplicate = EXISTS(SELECT * FROM addresses WHERE soundex(address) = soundex(p_address);

IF NOT p_is_duplicate THEN 
  INSERT INTO address SET address = p_address;
END IF;

Hope this helps, latitude and longitude will probably be more accurate in the long run, but something like this may work for you depending on what dbms you use and how your data is stored. :)

  • 0

UPDATE:

There is an issue with SoundEx where is does not acurately compare. The two examples below both return values that state they are the same even though they are not (different city, state and zip)

I understand that I can break out the street while doing my compare, and I may do that, I simply wanted to alert other developers of this issue.

SELECT SOUNDEX ('Bob 500 1st St. Seattle, WA 98109'), SOUNDEX ('Bobby 500 First St New York, NY')

SELECT DIFFERENCE ('Bob 500 1st St, Seattle 98109', 'Bobby 500 First St New York, NY 10036')

I may actually may use a DLL that allows me to create lat/long and compare these values....

Thanks again for the feedback and idea.

  • 0

I knew there was another SoundEx example that I wanted to mention:

SELECT SOUNDEX ('500 First Street'), SOUNDEX ('400 First Street')

Both of these come back as a match. So be aware that soundex is prob not the best solution for address compare. I've moved on to (attempting) to work with lat/long. (It's the getting of these values that is killing me)

Have a great weekend.

  • 0
I'm not sure what database server you are using, but how about something like soundex()? If you can compile a list of common ways to abbreviate parts of an address, and could be reasonably sure that all occurances of St. really equals Street (I'm not sure how your data is stored, if you have city in one column, state in one column, street address in one column, name of location in one column, etc. then this would probably doable), then this would work for you:

DROP TEMPORARY TABLE IF EXISTS t1;
CREATE TEMPORARY TABLE t1 
(
  address text
);

INSERT INTO t1 (address) 
VALUES 
('Bob\'s Restaurant 123 Main Street New York NY 10018'),
('Bobs Restaurant 123 Main St. New York NY 10018');

UPDATE t1 SET address = REPLACE(address, 'St.', 'Street');

SELECT address, 
		   soundex(address) 
FROM t1;

This returns this data set, so, you can see that it is a duplicate address:

+----------------------------------------------------+-------------------+
| address											| soundex(address)  |
+----------------------------------------------------+-------------------+
| Bob's Restaurant 123 Main Street New York NY 10018 | B2623653523635625 |
| Bobs Restaurant 123 Main Street New York NY 10018  | B2623653523635625 |
+----------------------------------------------------+-------------------+

In practice, the way I would do this is to create a stored procedure that takes an address as a parameter or set of parameters, one param for each part of the address. Then you could run a set of replaces against the parameter to normalize it to the rest of the data already in the database (replace St. with Street, or, Street with St., whichever you want, Dr. with Drive, Ln. with Lane, etc.). So, assuming your sp took a param named p_address, and you have already ran your replaces to normalize the string, you could run a query like this:

DECLARE p_is_duplicate tinyint(1) NOT NULL DEFAULT 0;

SET p_is_duplicate = EXISTS(SELECT * FROM addresses WHERE soundex(address) = soundex(p_address);

IF NOT p_is_duplicate THEN 
  INSERT INTO address SET address = p_address;
END IF;

Hope this helps, latitude and longitude will probably be more accurate in the long run, but something like this may work for you depending on what dbms you use and how your data is stored. :)

Soundex performs a match based on the pronumciation of the first four constanants it encounters (I had to look this up before because I ran into a similar situation). So "Bob's R" and "Bobs R" would be used for comparison and thus match.

It's also a good habit to ALWAYS replace input quotation marks and apostrophes with "`" or "``". You're open to SQL injection attacks if you don't.

  • 0

Try using a CHECKSUM instead:

SELECT * FROM ADDRESS WHERE CHECKSUM(company + address + city + state + zip) = CHECKSUM(@company + @address + @city + @state + @zip)

Be sure to encase any fields with an isnull() to default the value out to '' if any of those fields can be null. EG

SELECT * FROM ADDRESS WHERE CHECKSUM(isnull(company, '') + isnull(address, '') + isnull(city, '') + isnull(state, '') + isnull(zip, '')) = CHECKSUM(@company + @address + @city + @state + @zip)

If the query returns any rows, you have a duplicate.

This topic is now closed to further replies.
  • Recently Browsing   0 members

    • No registered users viewing this page.
  • Posts

    • The possibility that milk gathers back into a glass implies that gravity can be 'reversed'.
    • VidCoder 12.20 by Razvan Serea  VidCoder is a DVD/Blu-ray ripping and video transcoding application for Windows. It uses HandBrake as its encoding engine. Calling directly into the HandBrake library gives it a more rich UI than the official HandBrake Windows GUI. VidCoder can rip DVDs but does not defeat the CSS encryption found in most commercial DVDs. You’ll need the NET 8 Desktop Runtime. If you don’t have it, VidCoder will prompt you to download and install it. The Portable version is self-contained and does not require any .NET Runtime to be installed. You do not need to install HandBrake for VidCoder to work. Feature list: Multi-threaded MP4, MKV containers Completely integrated encoding pipeline: everything is in one process and no huge intermediate temporary files H.264, H.265, MPEG-4, MPEG-2, VP8, Theora video Hardware-accelerated encoding with AMD VCE, Nvidia NVENC and Intel QuickSync AAC, MP3, Vorbis, AC3, FLAC audio encoding and AAC/AC3/MP3/DTS/DTS-HD passthrough Target bitrate, size or quality for video 2-pass encoding Decomb, detelecine, deinterlace, rotate, reflect, chroma smooth, colorspace filters Powerful batch encoding with simultaneous encodes Customizable Pickers to automatically pick audio and subtitle tracks, destination, titles and more Instant source previews Creates small encoded preview clips Pause, resume encoding VidCoder 12.20 changes: Updated HandBrake core to 1.11.2. Download: VidCoder 12.20 | 47.0 MB (Open Source) Download: Portable VidCoder 12.19 | 89.3 MB Link: VidCoder Home Page | Github | Screenshot Get alerted to all of our Software updates on Twitter at @NeowinSoftware
    • Too soon, I'm still not over this death!
    • Normally, I admit when a title is clickbait (unfortunately, it's become somewhat necessary to compete against AI-dominated news sections today), but in this case, all supported versions is implied and doesn't need to be spelled out in the title. Of course, I'm covering a Patch Tuesday update bug that is only available to supported Windows SKUs. All our coverage relates to supported Windows software and SKUs only unless we expressly state that it's "unsupported", "unofficial", or "third-party". I'm sorry, but supported/official SKUs don't need to be spelled out as such in every Neowin headline.
  • Recent Achievements

    • Week One Done
      Jordan Smith earned a badge
      Week One Done
    • Reacting Well
      BizSAR earned a badge
      Reacting Well
    • First Post
      AndreaB earned a badge
      First Post
    • Week One Done
      Huge Trailer earned a badge
      Week One Done
    • Week One Done
      Classifyskilleducation earned a badge
      Week One Done
  • Popular Contributors

    1. 1
      +primortal
      593
    2. 2
      +Edouard
      185
    3. 3
      PsYcHoKiLLa
      77
    4. 4
      Michael Scrip
      73
    5. 5
      Steven P.
      66
  • Tell a friend

    Love Neowin? Tell a friend!