Microsoft caught using Google search results...


Recommended Posts

from: WebProNews

I was questioned today by a developer who was watching a particular IP address scan his site. The IP was 65.54.188.86 and is registered to Microsoft Corp. located at One Microsoft Way, Redmond, Washington 98052. This visitor was not sending the normal header information associated with a crawler to the web server such as an http robot name or identifying info or even a browser name.

The behavior it demonstrated made it look like a crawler, especially since it was spidering urls that were no longer in existence (search engine spiders crawl site segments at regular intervals and often come back when an initial crawl left urls uncrawled) and doing so at the rate of 1 page every 3 - 5 seconds. The visitor started their visit at 7:37 am and was still on the site at 12:00 pm.

Correction, the data was there after all, here's the crawler info... msnbot/0.3 (+http://search.msn.com/msnbot.htm)

Here's the kicker

So now you're saying, so what, big deal. But this really is a big deal. It's a big deal not only because the urls this visitor was making requests to don't exist any longer but because the only place these urls can be found is in Google's search results using site:www.sitename.com. A similar query on MSN Search doesn't show the urls at all, even on the beta version of their new Microsoft search engine. But then within just hours of the visitors exit from the site the new same search at Microsoft's new search engine shows all of the urls in question being fully indexed within its results.

My Theory On This Mysterious Microsoft Crawler

The old msn required a fee to be crawled by its spider. But a few months back MSN dropped the fee and said they were going to begin crawling the entire web and doing it without charge. However, that's no easy task. So I believe MSN is using the results from Google and possibly even Yahoo to get all of the pages they've indexed on sites that have a relatively low page count in the current msn search engine.

First off, that's the fastest way to get the relevant pages from a web site. Sure they could just go to the site directly and start crawling but in doing so they're going to get tons of duplicate urls and urls that seem different but point to the same content. Crawling Google's results will eliminate the bandwidth to some extent but will not completely take care of the duplicate content issue their spider will encounter.

Secondly, crawling Google's results can act as a qualitative measure for their new search engine. By creating a baseline number of pages per site when the new Microsoft Search is launched and running a comparison on a regular interval for the next 6 months, they'll be able to determine internally if their engine is finding and indexing the same links and as many links as Google. Call it competitive analysis or whatever you want.

So Microsoft's Screen Scraping?

Obviously my conclusion should be taken as a grain of salt but it's a definite possibility. Microsoft very well could be screen scraping Google (or maybe even using their API, LOL) and crawling the urls it finds. It makes sense from a business case but I wonder if there are any legal issues there. I doubt it. It's like putting garbage out to the curb. Once it's out there it's fair game but I bet Google's lawyers would have more to say than that on the case.

http://www.webpronews.com/insiderreports/s...archEngine.html

this doesn't mean crap? for all we know it was a research project doing something else from microsoft research. just because a IP is owned by MS doesn't mean its being used by the company MS, it could be an at home employee or any number of other things... including VPN users that have access to MS's ip range

i guess microsoft uses the few text filters (e.g. -microsoft, and the search wont bring up anything with microsoft in it...) so i guess they only put a "microsoft.com: keywords" so you can only browse threw there site... cheap cheap microsoft... richest person in the world cant afford to get developers to create a search engine?

This topic is now closed to further replies.
  • Recently Browsing   0 members

    • No registered users viewing this page.
  • Posts

    • Looks less buggy than FC25. I still don't understand how they managed to release that game.
    • Yes, Command Palette, that's what I meant, thank you.
    • PowerToys Run has already been replaced with Command Palette. Run is still functional, but it's just a matter of time before Microsoft disables it. Raycast can do much more though, and with its plugins much, much, much more...
    • Snagit 2025.2.1 by Razvan Serea Snagit is the most complete screen capture utility available. Showing someone exactly what you see on your screen is sometimes the quickest and clearest way to communicate. With Snagit, you can select anything on your screen – an area, image, article, Web page, or error message – and capture it. Then, save the screen capture to a file, send it to Snagit​'s editor to add professional effects, share it by e-mail, or drop it into PowerPoint®, Word®, or another favorite application. Capture and share images, text or video from your PC. Create beautiful presentations, flawless documentation and quickly save online content. The latest version of Snagit offers a totally new interface and workflow - making SnagIt easier for beginners to use, while still providing maximum convenience and flexibility for the screen capture experts. Snagit 2025.2.1 fixes: Fixed an issue where tags applied to images might not save correctly. Fixed a crash that might occur when closing the Editor. Download: Snagit 64-bit | 419.0 MB (Shareware) Links: SnagIt Home Page | Release Notes Get alerted to all of our Software updates on Twitter at @NeowinSoftware
    • Microsoft 365 Copilot Chat now rolling out to Government Community Cloud (GCC) tenants by Paul Hill Microsoft has announced that Microsoft 365 Copilot Chat is now rolling out to Government Community Cloud (GCC) tenants. This will allow affected users to experience AI-powered chat experience in Microsoft 365 apps, while meeting US government cloud requirements. Admins will also get the tools they need to manage access, security, and compliance at scale. Since June, Microsoft has also been rolling out Web Chat access for Copilot users since early June and that’s expected to be completed by the end of the month. The rollout announced today will also bring Copilot Chat to Microsoft 365 Copilot app (web), Outlook (web and desktop), and Microsoft Teams (web and desktop). The Redmond giant said that these features are included at no additional cost with eligible Microsoft 365 and Office 365 licenses (F1, F3, G3, and G5). It also pointed out that this rollout doesn’t include AI Agent functionality; that rollout will be announced in the future. If you have used artificial intelligence chatbots like ChatGPT or Google Gemini, you’ll have probably noticed that it scours the web for up-to-date information. By default, this “web grounding” is off in GCC with Copilot Chat. If administrators want to allow it, they can explicitly enable the “Allow web search in Copilot” Cloud Policy - the exception is if the policy was already enabled in your environment, in that case your existing configuration will be followed. Microsoft says admins will also have granular controls over the AI in Microsoft 365 Copilot app, Outlook, Teams, and Web. If you’re an administrator of a GCC environment, you can check out the How to prepare section of the announcement to learn about reviewing your admin settings, educating users about how to use the AI, and monitoring the rollout. The usage of AI in government holds significant potential as shown by a recent study by the UK government. It found that AI helped to save civil servants 2 weeks, per person, per year. This time can be spent by them doing other productive activities, making government more efficient overall. With Microsoft's rollout of Copilot Chat to GCC, US government workers can be more efficient too.
  • Recent Achievements

    • Conversation Starter
      Kavin25 earned a badge
      Conversation Starter
    • One Month Later
      Leonard grant earned a badge
      One Month Later
    • Week One Done
      pcdoctorsnet earned a badge
      Week One Done
    • Rising Star
      Phillip0web went up a rank
      Rising Star
    • One Month Later
      Epaminombas earned a badge
      One Month Later
  • Popular Contributors

    1. 1
      +primortal
      529
    2. 2
      ATLien_0
      205
    3. 3
      +FloatingFatMan
      168
    4. 4
      Michael Scrip
      150
    5. 5
      snowy owl
      126
  • Tell a friend

    Love Neowin? Tell a friend!