Jump to content



Photo

  • Please log in to reply
5 replies to this topic

#1 Complete

Complete

    Neowinian

  • Joined: 07-January 06

Posted 28 November 2012 - 07:36

How do you programmatically expand a url link to its true location?

Do you know about << spam >> and baidu.com? Baidu.com is a search engine that tries to discourage people from using their web site to make metasearch engines by hiding their links in a way that is a lot like how << spam >> works. << spam >> is a web site where, if you want to present someone with a link to something and that link is long, you can use tinyurl to produce a tiny url for presentation purposes.

Anyway, what I want to do is to find a way to programmatically take the link http://www.baidu.com...il7qccoOX3rynaE (the first link in a search for Jessica Alba using baidu.com) and have it return the actual link, http://baike.baidu.com/view/270790.htm . That is just one example. What I want to do is not specific to Jessica but for using Baidu.com as part of my group of search engines in my meta search engine project.

Maybe there is a way of using the WebBrowser class but I did not see a member that was the URL.

Maybe there is a way of using WebRequest and WebResponse.


#2 rfirth

rfirth

    Software Engineer

  • Tech Issues Solved: 2
  • Joined: 11-September 09
  • Location: Baton Rouge, Louisiana
  • OS: Windows 8
  • Phone: Nokia Lumia 620

Posted 28 November 2012 - 07:45

Maybe there is a way of using the WebBrowser class but I did not see a member that was the URL.


http://msdn.microsof...rowser.url.aspx

#3 -Alex-

-Alex-

    Noob Hunter

  • Joined: 08-August 06
  • Location: Oslo, Norway

Posted 28 November 2012 - 07:48

I'm presuming this in C#...

Anyway, here you go, it's nice and easy! :)

WebRequest WReq = WebRequest.Create("http://www.baidu.com/link?url=mW91GJqjJ4zBBpC8yDF8xDhiqDSn1JZjFWsHhEoSNd85PkV8Xil7qccoOX3rynaE");
WReq.Method = "HEAD"; // Only download the headers, not the page content
WebResponse WRes = WReq.GetResponse();
string ActualURL = WRes.ResponseUri.ToString();
MessageBox.Show(ActualURL);

C# Response URI.png

Personally, I'm curious to what website "<< spam >>" could be :p

#4 vetthe evn show

the evn show

    Removed

  • Joined: 10-June 02

Posted 28 November 2012 - 15:49

Removed

#5 SPEhosting

SPEhosting

    C++ n00b

  • Tech Issues Solved: 1
  • Joined: 15-July 08
  • Location: my room
  • OS: windows 7, backtrack 5, OSx 10.6

Posted 29 November 2012 - 00:53

Probably one of the url shortening services like tiny URL dot com.


a website so simple it was genius

#6 +Karl L.

Karl L.

    xorangekiller

  • Tech Issues Solved: 15
  • Joined: 24-January 09
  • Location: Virginia, USA
  • OS: Debian Testing

Posted 29 November 2012 - 03:25

I've never really thought about this before, but I'm intrigued by the idea. Thanks -Alex-, I didn't real it was so simple! After playing with it for a few minutes, I managed to accomplish the same thing using cURL (which isn't really anything novel).

curl -s -I http://www.baidu.com/link?url=mW91GJqjJ4zBBpC8yDF8xDhiqDSn1JZjFWsHhEoSNd85PkV8Xil7qccoOX3rynaE | awk '{if($1 == "Location:") print $2}'

Edit: Cool! the same thing works with bit.ly.

curl -s -I bit.ly/TuX5wi | awk '{if($1 == "Location:") print $2}'




Click here to login or here to register to remove this ad, it's free!