Microsoft caught using Google search results...

4,603 · November 11, 2004

I was questioned today by a developer who was watching a particular IP address scan his site. The IP was 65.54.188.86 and is registered to Microsoft Corp. located at One Microsoft Way, Redmond, Washington 98052. This visitor was not sending the normal header information associated with a crawler to the web server such as an http robot name or identifying info or even a browser name.

The behavior it demonstrated made it look like a crawler, especially since it was spidering urls that were no longer in existence (search engine spiders crawl site segments at regular intervals and often come back when an initial crawl left urls uncrawled) and doing so at the rate of 1 page every 3 - 5 seconds. The visitor started their visit at 7:37 am and was still on the site at 12:00 pm.

Correction, the data was there after all, here's the crawler info... msnbot/0.3 (+http://search.msn.com/msnbot.htm)

Here's the kicker

So now you're saying, so what, big deal. But this really is a big deal. It's a big deal not only because the urls this visitor was making requests to don't exist any longer but because the only place these urls can be found is in Google's search results using site:www.sitename.com. A similar query on MSN Search doesn't show the urls at all, even on the beta version of their new Microsoft search engine. But then within just hours of the visitors exit from the site the new same search at Microsoft's new search engine shows all of the urls in question being fully indexed within its results.

My Theory On This Mysterious Microsoft Crawler

The old msn required a fee to be crawled by its spider. But a few months back MSN dropped the fee and said they were going to begin crawling the entire web and doing it without charge. However, that's no easy task. So I believe MSN is using the results from Google and possibly even Yahoo to get all of the pages they've indexed on sites that have a relatively low page count in the current msn search engine.

First off, that's the fastest way to get the relevant pages from a web site. Sure they could just go to the site directly and start crawling but in doing so they're going to get tons of duplicate urls and urls that seem different but point to the same content. Crawling Google's results will eliminate the bandwidth to some extent but will not completely take care of the duplicate content issue their spider will encounter.

Secondly, crawling Google's results can act as a qualitative measure for their new search engine. By creating a baseline number of pages per site when the new Microsoft Search is launched and running a comparison on a regular interval for the next 6 months, they'll be able to determine internally if their engine is finding and indexing the same links and as many links as Google. Call it competitive analysis or whatever you want.

So Microsoft's Screen Scraping?

Obviously my conclusion should be taken as a grain of salt but it's a definite possibility. Microsoft very well could be screen scraping Google (or maybe even using their API, LOL) and crawling the urls it finds. It makes sense from a business case but I wonder if there are any legal issues there. I doubt it. It's like putting garbage out to the curb. Once it's out there it's fair game but I bet Google's lawyers would have more to say than that on the case.

http://www.webpronews.com/insiderreports/s...archEngine.html

2,609 · November 11, 2004

Interesting......

2,780 · November 11, 2004

^^ agree, normally I don't read that much, but this was interesting

2,842 · November 11, 2004

hehe.. time to sue ms again. :p

10,349 · November 11, 2004

Damn Microsoft.. when will they ever learn! :|

6,535 · November 11, 2004

Oooh Microsoft is going to bring out a Google Buster Search engine.... Based on ..... Wait for it.... GOOGLE! :laugh:

1,527 · November 11, 2004

Based on google yet the results seem to be so much slower, to me at least. Then you always seem to get the server too busy error message. My conclusion: stick with google.

2,097 · November 11, 2004

People post any crap to discredit MS.

He has presented no proof what so ever.

I can publish the same article reversing Google and MSN

13,447 · November 11, 2004

hahaha good read

thanks

November 11, 2004

That article of full of **** IMO.

8,323 · November 11, 2004

b3ta said:
Oooh Microsoft is going to bring out a Google Buster Search engine.... Based on ..... Wait for it.... GOOGLE! :laugh:

584904781[/snapback]

:laugh: :laugh: :laugh: :laugh: :laugh:

6,009 · November 11, 2004

THe new msn search beta thingy is not as good as I spected. It gave me an error on the first query I threw at it!

Bad bad bad...

Anyway, let's see how it evolves :ninja:

1,317 · November 11, 2004

Which leads me to think: why bother

12,077 · November 11, 2004

oh well life goes on... :sleep:

5,884 · November 11, 2004

:hmmm:

22,661 · November 12, 2004

so how come ms's search is faster than google's?

November 12, 2004

ohhhhhhhhhhhh thats cheap

November 12, 2004

hmm interesting; i'd like to see a statement from msn on this though.

6,532 · November 12, 2004

its technically (and i use this word loosly) not microsofts fault it would be the peson who tells the web masters to do this actually lol

2,681 · November 12, 2004

slimy said:
so how come ms's search is faster than google's?

584909885[/snapback]

Because it only has its ten pages to search through? :)

1,268 · November 12, 2004

Go to msn's new search engine.. at http://beta.search.msn.com/

and type 'google' and scroll to the bottom of the page and you'd find this...

I wonder which search engine they got it from...

3,364 · November 12, 2004

^ Uh oh.

25,585 · November 12, 2004

this doesn't mean crap? for all we know it was a research project doing something else from microsoft research. just because a IP is owned by MS doesn't mean its being used by the company MS, it could be an at home employee or any number of other things... including VPN users that have access to MS's ip range

November 12, 2004

Quote
So now you're saying, so what, big deal

that's where you should have stopped

32,574 · November 12, 2004

i guess microsoft uses the few text filters (e.g. -microsoft, and the search wont bring up anything with microsoft in it...) so i guess they only put a "microsoft.com: keywords" so you can only browse threw there site... cheap cheap microsoft... richest person in the world cant afford to get developers to create a search engine?

Sign In

Microsoft caught using Google search results...

Recommended Posts

CaKeY

Link to comment

Share on other sites

itsnotabigtruck

Link to comment

Share on other sites

Lare2

Link to comment

Share on other sites

[yt]

Link to comment

Share on other sites

Chicane-UK Veteran

Link to comment

Share on other sites

b3ta

Link to comment

Share on other sites

Kupo-Cheer

Link to comment

Share on other sites

figgy

Link to comment

Share on other sites

Pink Floyd Veteran

Link to comment

Share on other sites

Guest Dan C

Link to comment

Share on other sites

NeoXY

Link to comment

Share on other sites

Julius Caro

Link to comment

Share on other sites

Sporkguy

Link to comment

Share on other sites

Samoa

Link to comment

Share on other sites

Nelsinho

Link to comment

Share on other sites

Slimy

Link to comment

Share on other sites

LastSamurai

Link to comment

Share on other sites

kitchenutensils

Link to comment

Share on other sites

AcdShdw

Link to comment

Share on other sites

TheDogsBed

Link to comment

Share on other sites

Coolme

Link to comment

Share on other sites

ootput

Link to comment

Share on other sites

neufuse Veteran

Link to comment

Share on other sites

scoobydoobie

Link to comment

Share on other sites

Andrew Lyle Global Moderator

Link to comment

Share on other sites

Recently Browsing 0 members

Posts