Jump to content



Photo

c# Short URL from GUID

c# short url guid

  • Please log in to reply
16 replies to this topic

#1 James Rose

James Rose

    Software Developer

  • Tech Issues Solved: 1
  • Joined: 20-January 04
  • Location: New York City

Posted 06 March 2014 - 14:12

Hello gang,

 

I am working on a new project for a site and I would like to implement short urls.  Historically I have used GUIDs as table ids so that replication is not an issue.  So, now I'm looking at creating a short url for these values, the thing is a shortened guid is not that short (vs a shortened Int) Before getting too far down a path, I thought I'd ask if anyone had any thoughts.

 

Thanks




#2 firey

firey

    F͎̗͉͎͈͑͡ȉ͎̣̐́ṙ͖̺͕͙̓̌è̤̞͉̟̲͇̍̍̾̓ͥͅy͓̍̎̌̏̒

  • Tech Issues Solved: 6
  • Joined: 30-October 05
  • Location: Ontario, Canada
  • OS: Windows 7
  • Phone: Android (4.1.2)

Posted 06 March 2014 - 20:00

What is it you are trying to do?

Like you have something say:

guasd3 = www.google.ca
-3idjqis = www.neowin.net

or?  

I guess I don't understand what you are trying to do exactly.



#3 +virtorio

virtorio

    4089 III

  • Tech Issues Solved: 11
  • Joined: 28-April 03
  • Location: New Zealand
  • OS: OSX 10.9, Windows 8.1
  • Phone: Samsung Galaxy SIII

Posted 06 March 2014 - 20:32

Just generate strings of whatever size you consider to be short, made up of numbers, uppercase letters and lowercase letters, and associated them with your ID field.

 

[short_url_assoc_table]

table_id : GUID

short_url : string



#4 OP James Rose

James Rose

    Software Developer

  • Tech Issues Solved: 1
  • Joined: 20-January 04
  • Location: New York City

Posted 06 March 2014 - 21:16

What is it you are trying to do?

Like you have something say:

guasd3 = www.google.ca
-3idjqis = www.neowin.net

or?  

I guess I don't understand what you are trying to do exactly.

 

 

I need to use GUIDs as the table ident fields for the issue of replication.  If I use numeric values for the identity, which is easy to convert to a short URL by using Base64 (quite a number of examples around the net for doing this)  however if I use numeric I'm going to have collisions when multiple machines are making new records.  When I have looked at making a short url from a guid, the value is shorter than the GUID, but still longer than the average short GUID.

 

Just generate strings of whatever size you consider to be short, made up of numbers, uppercase letters and lowercase letters, and associated them with your ID field.

 

[short_url_assoc_table]

table_id : GUID

short_url : string

 

Interesting idea, but this could be an issue with replication.  I am also concerned about multiple identity fields (waste of time, space, etc)   Thanks though



#5 +virtorio

virtorio

    4089 III

  • Tech Issues Solved: 11
  • Joined: 28-April 03
  • Location: New Zealand
  • OS: OSX 10.9, Windows 8.1
  • Phone: Samsung Galaxy SIII

Posted 06 March 2014 - 21:29

 

 

Interesting idea, but this could be an issue with replication.  I am also concerned about multiple identity fields (waste of time, space, etc)   Thanks though

 

What exactly is being replicated?

 

 

I need to use GUIDs as the table ident fields for the issue of replication.  If I use numeric values for the identity, which is easy to convert to a short URL by using Base64 (quite a number of examples around the net for doing this)  however if I use numeric I'm going to have collisions when multiple machines are making new records.  When I have looked at making a short url from a guid, the value is shorter than the GUID, but still longer than the average short GUID.

What am I missing here? This shouldn't be an issue with a relational database. You don't tell your database what the ID of a row is, you let the database decide when it inserts the row(s).



#6 firey

firey

    F͎̗͉͎͈͑͡ȉ͎̣̐́ṙ͖̺͕͙̓̌è̤̞͉̟̲͇̍̍̾̓ͥͅy͓̍̎̌̏̒

  • Tech Issues Solved: 6
  • Joined: 30-October 05
  • Location: Ontario, Canada
  • OS: Windows 7
  • Phone: Android (4.1.2)

Posted 06 March 2014 - 21:37

Im not sure where replication is happening.  Also why do you have to use guids for the identity, why not just use an incrementing number, and hash it's value or something to get the url you want to use?  Also, the database should be able to handle the ID itself.. and there should never be an issue of too many inserts causing problems.



#7 Andre S.

Andre S.

    Asik

  • Tech Issues Solved: 10
  • Joined: 26-October 05

Posted 06 March 2014 - 21:47

If you have a method for converting numeric values into short URLs, then simply convert your GUID to a numeric value. Here's an example: http://stackoverflow.../4207060/154766



#8 wrack

wrack

    Wireless Robotic Android Calibrated for Killing

  • Joined: 09-December 06
  • Location: Melbourne, Australia

Posted 06 March 2014 - 21:51

Just throwing a wild idea but wouldn't a CRC of the Guid as a string will give you a nice short form? I am sure a CRC might repeat at some point but all you have to do is add some more text to it.



#9 OP James Rose

James Rose

    Software Developer

  • Tech Issues Solved: 1
  • Joined: 20-January 04
  • Location: New York City

Posted 06 March 2014 - 22:01

Okay gang,

 

Replication of tables between multiple servers cannot use numeric values as, for example in SQL Server the Identity value is incrimented by 1 (and yes, you can change this value, but it wouldn't help when the app needs to scale)  Imagine two servers, each one adding new values to a table; "Customers" One each server they would both get identity #1 for the first record, when the two servers attempt to merge (ever hour, every minute, whenever) there would be a collision since there would be two records with the same identity value.  Using GUIDs for the key field avoids this issue as it is, almost, impossible to have the same guid twice.

 

 

Thanks Asik, however the "guidAsString" variable is still too long to be a short url.



#10 OP James Rose

James Rose

    Software Developer

  • Tech Issues Solved: 1
  • Joined: 20-January 04
  • Location: New York City

Posted 06 March 2014 - 22:09

Looks like I found the answer.

                //Guid guid = Guid.NewGuid();
                string sGUID = "a33d4a21-7d95-41f7-859e-bf02b2fda650"
                string hashCode = String.Format("{0:X}", sGUID.GetHashCode());
                Console.WriteLine(hashCode);

This makes a nice small url, can anyone think of why this should not be used?



#11 +snaphat (Myles Landwehr)

snaphat (Myles Landwehr)

    Electrical & Computer Engineer

  • Tech Issues Solved: 29
  • Joined: 23-August 05
  • OS: Win/Lin/Bsd/Osx
  • Phone: dumb phone

Posted 06 March 2014 - 22:13

^you shouldn't use it for the reasons listed here (about uniqueness guarantees and differences between versions): http://stackoverflow...ed-to-be-unique

 

Hash the GUID using SHA1 and truncate it or something like that. That's probably the best you are going to do. (perhaps you will have to truncate it to much, forcing a too high of probability for collision -- you should check the probability).

 

EDIT: Oh also, if you encode the result of hashing in a higher base, you reduce the amount of information loss during truncation.



#12 OP James Rose

James Rose

    Software Developer

  • Tech Issues Solved: 1
  • Joined: 20-January 04
  • Location: New York City

Posted 06 March 2014 - 22:25

^you shouldn't use it for the reasons listed here (about uniqueness guarantees and differences between versions): http://stackoverflow...ed-to-be-unique

 

Hash the GUID using SHA1 and truncate it or something like that. That's probably the best you are going to do. (perhaps you will have to truncate it to much, forcing a too high of probability for collision -- you should check the probability).

 

EDIT: Oh also, if you encode the result of hashing in a higher base, you reduce the amount of information loss during truncation.

 

EDIT:  Maybe what the article is saying, and what you are trying to tell me is that two different GUIDs could return the same hex?

 

 

 

Pardon me if I appear dense; I just read that article and ran some test against the same guid value for 1 billion iterations and it always comes up with the same hex.  I understand that 1 billion isn't necessarily that large a number...  what I am asking is shouldn't the hex value for a specific string always return the same hex value. 

 

Quote: "does not guarantee unique return values for different objects."  Since the app will pull the guid from the db, and then issue a hex on demand wouldn't that value always be the same?

 

Thanks for your input



#13 +snaphat (Myles Landwehr)

snaphat (Myles Landwehr)

    Electrical & Computer Engineer

  • Tech Issues Solved: 29
  • Joined: 23-August 05
  • OS: Win/Lin/Bsd/Osx
  • Phone: dumb phone

Posted 06 March 2014 - 22:31

Pardon me if I appear dense; I just read that article and ran some test against the same guid value for 1 billion iterations and it always comes up with the same hex.  I understand that 1 billion isn't necessarily that large a number...  what I am asking is shouldn't the hex value for a specific string always return the same hex value. 

 

Quote: "does not guarantee unique return values for different objects."  Since the app will pull the guid from the db, and then issue a hex on demand wouldn't that value always be the same?

 

Thanks for your input

 

It's the same each time because you are using the same version of the .net runtime on the same object for each run so it's producing the same hash. What they are saying is really two things: (1) if you switch versions of the .net runtime (e.g. 3.5 to 4), the returned result can be different for the same object, and (2) and within the same version of the runtime (e.g. 4) there can be collisions in hashes between different objects. There are no uniqueness guarantees.

 

So for example GUID_A.getHashCode() can return different results if you switch .net runtimes. And GUID_B.getHashCode() and GUID_C.getHashCode() could return the same result in the same runtime.



#14 OP James Rose

James Rose

    Software Developer

  • Tech Issues Solved: 1
  • Joined: 20-January 04
  • Location: New York City

Posted 06 March 2014 - 22:39

So for example GUID_A.getHashCode() can return different results if you switch .net runtimes. And GUID_B.getHashCode() and GUID_C.getHashCode() could return the same result in the same runtime.

 

yea, this is the answer I finally got to (see my edit above).  I was having a very hard time getting to the idea that a 30+ char piece of data could reliably be set to a shorter value.



#15 +snaphat (Myles Landwehr)

snaphat (Myles Landwehr)

    Electrical & Computer Engineer

  • Tech Issues Solved: 29
  • Joined: 23-August 05
  • OS: Win/Lin/Bsd/Osx
  • Phone: dumb phone

Posted 06 March 2014 - 23:03

yea, this is the answer I finally got to (see my edit above).  I was having a very hard time getting to the idea that a 30+ char piece of data could reliably be set to a shorter value.

Well in any case, you should re-encode whatever you do use to a higher number base that is still valid as url characters. For example, as I was saying before if you do the following you can store more information of your hash in less characters. 

String result=re_encode_as_base_X(SHA1_hash(GUID), N) //base 16 --> base N

I think at the end of the day, you will have to truncate though regardless of what you do. 





Click here to login or here to register to remove this ad, it's free!