How Google Works


Recommended Posts

By Jonathan Strickland

sdphn4.jpg
Fair-goers use laptops at Google's stand at the Frankfurt Book Fair on Oct. 8, 2006. Take a look inside Google with Googleplex pictures. Torsten Silz/AFP/Getty Images

What began as a project helmed by Larry Page and Sergey Brin, two students in Stanford University's Ph.D. program, is now one of the most influential companies on the World Wide Web: Google. At first, the students' goal was to make an efficient search engine that gave users relevant links in response to search requests. While that's still Google's core purpose today, the company now provides services ranging from e-mail and document storage to productivity software and mobile phone operating systems. In less than a decade, Google evolved from a two-man enterprise to a multibillion-dollar corporation.

Today, Google's popularity continues to grow. In 2007, the company surpassed Microsoft as the most visited site on the Web [source: Kopytoff]. The company's influence on the Web is undeniable. Practically every webmaster wants his or her site listed high on Google's search engine results pages (SERPs), because it almost always translates into more traffic on the corresponding Web site. Google has also acquired other Internet companies, ranging from blogging services to the video-sharing site YouTube. For a while, the company's search technology even powered rival companies' search engines -- Yahoo! relied on Google searches for nearly four years until developing its own search engine technologies in 2004 [sources: Google; Hu and Olsen].

?Google's influence isn't limited to just the Web. In 2007, company executives announced their intention to enter the FCC's auction of the wireless spectrum in the 700 megahertz (MHz) band. That part of the wireless spectrum previously belonged to analog television broadcasters. Google representatives said the company entered the auction to foster competition within the wireless service industry. Google supported an open technology approach to wireless service in which consumers could use any device with any provider rather than face limited choices determined by the provider and its preferred vendors. In order to participate in the auction, Google had to prove it was ready to meet the reserve price for the spectrum: $4.6 billion. Ultimately, Google didn't win the auction. But the company still achieved its main goal -- Verizon, which won the bid, must follow the open technology approach Google wanted.

?In this article, we'll learn about the backbone of Google's business: its search engine. We'll also look at the other services Google offers to both average users and to commercial businesses. Then we'll take a quick peek at some of the tools Google has developed over the years. We'll also learn more about the equipment Google uses to keep its massive operation running. Finally, we'll take a closer look at Google the company.

How Many Zeros?
Google's name is a variation of the word "googol," which is a mathematical term for a one followed by 100 zeros. Page and Brin felt the name helped illustrate Google's monumental mission: Organizing billions of bytes of data found on the Web.


The Google Search Engine

?Google's search engine is a powerful tool. Without search engines like Google, it would be practically impossible to find the information you need when you browse the Web. Like all search engines, Google uses a special algorithm to generate search results. While Google shares general facts about its algorithm, the specifics are a company secret. This helps Google remain competitive with other search engines on the Web and reduces the chance of someone finding out how to abuse the system.

?Google uses automated programs called spiders or crawlers, just like most search engines. Also like other search engines, Google has a large index of keywords and where those words can be found. What sets Google apart is how it ranks search results, which in turn determines the order Google displays results on its search engine results page (SERP). Google uses a trademarked algorithm called PageRank, which assigns each Web page a relevancy score.

Does Whatever a Spider Can and Hitting the Links
A search engine spider does the search engine's grunt work: It scans Web pages and creates indexes of keywords. Once a spider has visited, scanned and categorized a page, it follows links from that page to other sites. The spider will continue to crawl from one site to the next, which means the search engine's index becomes more comprehensive and robust. To learn more about these programs, read How Search Engines Work.


Google uses lots of tricks to prevent people from cheating the system to get higher placement on SERPs. For example, as a Web page adds links to more sites, its voting power decreases. A Web page that has a high PageRank with lots of outgoing links can have less influence than a lower-ranked page with only one or two outgoing links

A Web page's PageRank depends on a few factors:

?The frequency and location of keywords within the Web page: If the keyword only appears once within the body of a page, it will receive a low score for that keyword.
?How long the Web page has existed: People create new Web pages every day, and not all of them stick around for long. Google places more value on pages with an established history.
?The number of other Web pages that link to the page in question: Google looks at how many Web pages link to a particular site to determine its relevance.

Out of these three factors, the third is the most important. It's easier to understand it with an example. Let's look at a search for the terms "Planet Earth."
As more Web pages link to Discovery's Planet Earth page, the Discovery page's rank increases. When Discovery's page ranks higher than other pages, it shows up at the top of the Google search results page.
Because Google looks at links to a Web page as a vote, it's not easy to cheat the system. The best way to make sure your Web page is high up on Google's search results is to provide great content so that people will link back to your page. The more links your page gets, the higher its PageRank score will be. If you attract the attention of sites with a high PageRank score, your score will grow faster.
Google initiated an experiment with its search engine in 2008. For the first time, Google is allowing a group of beta testers to change the ranking order of search results. In this experiment, beta testers can promote or demote search results and tailor their search experience so that it's more personally relevant. Google executives say there's no guarantee that the company will ever implement this feature into the search engine globally.?
Google offers many different kinds of services in addition to chat.
33mtsav.jpg
Google Services

?As Google has grown, the company has added several new services for its users. Some of the services are designed to help make Web searches more efficient and relevant, while others seem to have little in common with search engines. With many of its services, Google has entered into direct competition with other companies.

Google's specialized searches are an extension of its normal search engine protocol. With specialized searches, you can narrow your search to specific resources. You can enter keywords into Google and search for:

?Images related to your keywords
?Maps
?News articles or footage
?Products or services you can purchase online
?Blog entries containing the keywords you've chosen
?Content in books
?Videos
?Scholarly papers


For these searches, Google has created specialized indexes that only contain relevant sources. For example, if you search for the term "Planet Earth" in the news category, the results will include only news articles that contain those keywords. The results will look very different from Google's normal SERP.?

In the last few years, Google has unveiled services that don't relate to search engines upon first glance. For example, Google's Gmail is a free Web-based e-mail program. When the service first launched, Google limited the number of users who could create accounts. The first group of users could invite a limited number of people to join the service, and so Gmail invitations became a commodity. Today, anyone can sign up for a free Gmail account.

Gmail organizes e-mails into conversations. This means that when you send an e-mail to someone and he or she replies, both e-mails are grouped together as a thread in your inbox. This makes it easier to follow the flow of an e-mail exchange. If you reply to your friend's response, Google will attach your message to the bottom of the thread. It's easy to navigate through the e-mail program and follow specific conversations.

Another free service from Google is Google Docs, a storage database and collaborative productivity software suite. It includes word processing, spreadsheet and presentation programs. Creating a Docs account is free and allows you to store up to 5,000 documents and images online. Each document can be up to 500 kilobytes, and each embedded image can be up to 2 megabytes. You can share documents on Google Docs, which allows your friends to view and make changes to documents. You can also store all of your documents on Google's servers and access them wherever there's an Internet connection.

Google on the Go and Advance Services
You can perform a Google search with any short message service (SMS) compatible cell phone, even if you can't access the Web with your phone. Simply text your query to 466453 (which spells GOOGLE on a phone pad). Google will send a response back within a couple of seconds.


With an advanced search, you can use Google to retrieve the most relevant results for your keywords. You can search for documents written in a specific language or saved in a particular file format like .pdf or .rtf. You can tell Google where to look for the keywords, such as in page titles or headers. Google even allows you to limit searches to a single domain name. Try typing in "site:howstuffworks.com 'cloud computing'" in the Google search bar to see how it works. Each choice you make tells Google which index to use when returning your search results.
6qun83.jpg

Google Tools

Google offers a popular tool called Google Maps, an online mapping service similar to MapQuest. Google uses map sources from companies like NAVTEQ and TeleAtlas, as well as satellite data from DigitalGlobe and MDA Federal, to create interactive maps. You can use Google maps to view an address' location or get driving directions to a particular destination.

Google Maps has several view modes. The map view is a basic road map, satellite view overlays a road map on top of satellite photos of the region, terrain view creates a topographic map with a road map overlay, and the traffic view uses red, yellow and green to indicate congested major roadways in the area. Street view mode is available in several U.S. cities. Selecting street view in such locations as Orlando, Fla., gives you the option to view photos taken from street level. You can navigate through the city by clicking on arrows in the photographs, and you can rotate your view 360 degrees.

Google Maps can also integrate business information. You can use Google Maps like a search engine to find a business, such as "HowStuffWorks, Atlanta, Ga.," which will show you our office's location. You can also search for general businesses. If you're in the mood to eat sushi in San Francisco, Calif., you can type "sushi, San Francisco," and with a click of the Search button, Google Maps will display a map of the city with several sushi restaurants tagged.

A product related to Google Maps is Google Earth, an interactive digital globe. It uses the same satellite images licensed for Google Maps, but you must download the application and install it on your computer to access all of its functions. Google Earth requires an Internet connection to be fully functional, though you can still view locations on the globe even if you aren't connected. A scaled-back, Web-based version is also available -- you can even embed it in existing Web sites. To learn more about this program, read "How Google Earth Works."

The Google Toolbar is another handy add-on available for Firefox or Internet Explorer users. The toolbar has customizable buttons. Each button maps to a particular function, which can include anything from viewing a Web site's PageRank to translating a word from one language to another.
246q5xh.jpg
Google Desktop is another application you can download for free. This program lets you search your computer the way you would search the Internet using the Google search engine. You can also choose to download Google Gadgets, computer programs that integrate seamlessly into your desktop. Each gadget does something different. Gadgets include clocks, calendars, news feeds and weather reports.

Smile, You're on Street View!
Some people feel that Google's street view function is a violation of privacy. For example, homeowners who were behind in their yard work became worried that anyone viewing pictures of their home through Google would see a messy house, making it harder to sell the property in the future. Several individuals and communities have filed lawsuits against Google, demanding that the company remove images of certain areas from street view.

14jukxu.jpg

Google Revenue

Unlike some Internet companies, Google has multiple ways of generating revenue beyond private investment or selling shares of its stock. Google uses three methods to partner with merchants and advertisers: Google Checkout, Google AdWords and Google AdSense.

Google Checkout is a service designed to make online purchases easier for both the consumer and the retailer. On the consumer end, users create a free Google Checkout account. Part of the account creation process includes entering a credit or debit card number, which Google stores in a secure database. When the user visits a retailer that subscribes to Google Checkout, he or she can click on the checkout option and Google facilitates the transaction. This means that the user doesn't have to enter a card number every time he or she makes a purchase.

Retailers can set up Google Checkout accounts for free, but as of August 2008, Google charges a 2 percent plus 20-cent fee per transaction. For example, if a customer buys a $10 item from a merchant, Google will charge that merchant 40 cents for that transaction.

Another way Google generates revenue is through a pair of Web advertising services called AdWords and AdSense. With AdWords, advertisers can submit ads to Google that include a list of keywords relating to the product, service or business. When a Google user searches the Web using one or more of those keywords, the ad appears on the SERP in a sidebar. The advertiser pays Google every time a user clicks on the ad.

AdSense is similar, except that instead of displaying ads on a Google SERP, a webmaster can choose to integrate ads into his or her own site. Google's spiders crawl the site and analyze the content. Then, Google selects ads that contain keywords relevant to the webmaster's site. The webmaster can customize the location and color of the sidebar containing the ads. Every time someone clicks on an ad on the webmaster's site, the webmaster receives a portion of the ad revenue (Google gets the rest).
With both AdWords and AdSense, Google's strategy is to provide targeted advertising to users. Google believes that by providing advertising relevant to the information for which the user is already searching, the chances of someone following the ad are greatly increased [source: Google].

Google's Acquisitions
Google isn't just famous for creating and providing useful services -- it has also bought a few innovative companies and integrated them. These include YouTube (a video-sharing Web site), Blogger (a weblog service), Picasa (a photo-sharing service) and Jaiku (an SMS and micro-blog service).


Google Equipment

Back in 1998, Google's equipment was relatively modest. Co-founders Larry Page and Sergey Brin used Stanford equipment and donated machines to run Google's search engine duties. The equipment at that time included:
?Two 300-megahertz (MHz) Dual Pentium II servers with 512 megabytes (MB) of memory
?A four-processor F50 IBM RS6000 computer with 512 MB of memory
?A dual-processor Sun Ultra II computer with 256 MB of memory
?Several hard drives (some of which were housed in a box covered in LEGO bricks) ranging from 4 to 9 gigabytes (GB) for a total of more than 350 GB of storage space [source: Google Stanford Hardware]

Today, Google uses hundreds of thousands of servers to provide services to its users. Google's strategy is to use relatively inexpensive machines running on a customized operating system based on Linux. A program called Google File System manages the data on Google's servers [source: Google Cluster Architecture].

You Got Served, Google, and Bandwidth
How many servers does Google have? The company is quiet about the subject, but estimates range from 200,000 to more than 450,000 machines.
Some webmasters feel that Google's spiders consume too much bandwidth per month. When a spider follows a link to a Web page, it uses up bandwidth. Most Web hosting services charge webmasters for bandwidth consumption. If the webmaster feels that Google's spiders are a liability, he or she can create a robot.txt file in the root directory of the Web page that will tell the spiders to ignore the site.


Google uses servers for different tasks. Web servers receive and process user queries, sending the request on to the next appropriate server. Index servers store Google's indexes and search results. Document servers to store search summaries, user information, gmail and Google Docs files. Ad servers store the advertisements Google displays on search pages.

??Google divides the information on each index server into 64 MB blocks. There are three copies of each block of data, and each copy is stored on a different server running on a separate power strip. The blocks of data are distributed semi-randomly so that no two servers have the exact same collection of data blocks. That way, if there's a problem with one server, the data will still exist in other machines. Using multiple copies of data to prevent an interruption in service is called redundancy. Find out more in How the Google File System Works.
21bq694.jpg

Google Company Culture

Google has come a long way since Sergey Brin and Larry Page networked a few computers together at Stanford. What started as a modest project is now a multibillion-dollar global organization that employs more than 19,000 people around the world. Brin and Page are still very much involved with Google's operations -- they're Presidents of Google's Technology and Products divisions, respectively.

In September 2008, Google's market capitalization figure (Google's stock price multiplied by the number of outstanding company shares) was more than $145 billion. Google's stock is listed in NASDAQ as GOOG, and in late 2008 Google had more than 314 million outstanding shares in the marketplace [source: Google Finance].

Google's headquarters are in Mountain View, Calif. Google cheekily calls its campus the Googleplex -- a combination of the words "Google" and "complex" and a play on the term googolplex: One followed by a googol of zeroes. Life at the Googleplex is pretty sweet. Here's just a small list of the amenities you can find there:

?Several caf? stations where employees can gather to eat free food and have conversations
?Snack rooms stocked with goodies ranging from candy to healthy foods like carrots and yogurt
?Exercise rooms
?Game rooms with video games, foosball, pool tables and ping-pong
?A baby grand piano for those who enjoy tickling the ivories


?In addition to these amenities, Google employees receive a comprehensive benefits package that includes not only medical and dental coverage, but also a host of other services. These include tuition reimbursement, a child care center, adoption assistance services, an on-site doctor, financial planning classes and lots of opportunities to gather with coworkers at special corporate events. Google's philosophy also places importance on nonprofit work, and so Google will match up to $3,000 of any employee's contributions to nonprofit organizations.

Google has asserted itself as one of the most dominant forces on the Internet. Still, the company says its mission is "to organize the world's information and make it universally accessible and useful" [source: Google]. With a goal that lofty, it's a good bet that the people behind Google feel their work is just beginning.

Green Google
Besides being an Internet juggernaut, Google is also a leader in pursuing environmentally friendly methods of conducting business. Google launched an eco-friendly initiative they call Develop Renewable Energy Cheaper than Coal (RE Google plans to switch to renewable energy sources including solar, geothermal and wind power. Google also plans to make significant investments in renewable energy companies.


A master computer manages each set of servers. The master computer's job is to keep track of which servers hold each block of data in the event of a catastrophe. If one server goes down, the master computer redirects all traffic to the other servers containing the same data.

References
?Atwood, Jeff. "Google's Custom Built Servers." Google Operating System. March 14, 2007. http://googlesystem.blogspot.com/2007/03/googles-custom-built-servers.html
?Austin, David. "How Google Finds Your Needle in the Web's Haystack." Grand Valley State University. http://www.ams.org/featurecolumn/archive/pagerank.html
?Barroso, Luiz Andre, et al. "Web Search for a Planet: The Google Cluster Architecture." IEEE Computer Society. 2003. http://labs.google.com/papers/googlecluster-ieee.pdf
?Bereitschaft, Brad. "Getting Listed in the ODP, Google Directory." Search Engine Guide. March 23, 2005. http://www.searchengineguide.com/brad-bereitschaft/getting-listed-in-the-odp-google-directory.php
?Brin, Sergey and Page, Lawrence. "The Anatomy of a Large-Scale Hypertextual Web Search Engine." Computer Science Department, Stanford University, California. http://infolab.stanford.edu/~backrub/google.html
?Callan, David. "Google Ranking Tips." AKAMarketing.com. http://www.akamarketing.com/google-ranking-tips.html
?Collins, Gord. "The Latest on Google's Algorithm." SEO Today. April 6, 2004. http://www.seotoday.com/browse.php/category/articles/id/446/index.php
?Fishkin, Rand. "A Little Piece of the Google Algorithm - Revealed." Seomoz.org. October 16, 2006. http://www.seomoz.org/blog/a-little-piece-of-the-google-algorithm-revealed
?Fishkin, Rand. "If They Did Leak the Google Algorithm . . ." Seomoz.org. October 12, 2006. http://www.seomoz.org/blog/if-they-did-leak-the-google-algo
?Google. http://www.google.com
?Google. "Yahoo! Selects Google as its Default Search Engine Provider." June 26, 2000. (Sept. 3, 2008) http://www.google.com/press/pressrel/pressrelease25.html
?Google Finance. "GOOG." http://finance.google.com/finance?client=ob&q=NASDAQ:GOOG
?Google Hardware. http://backrub.c63.be/May1998/hardware.htm
?Hansell, Saul. "The People Inside Google's Black Box." Bits. The New York Times. December 18, 2007. http://bits.blogs.nytimes.com/2007/12/18/the-people-inside-googles-black-box/index.html
?Hu, Jim and Olsen, Stefanie. "Yahoo dumps Google search technology." CNET News.com. February 17, 2004. http://www.news.com/2100-1024_3-5160710.html
?Kopytoff, Verne. "Google surpasses Microsoft as world's most-visited site." San Francisco Chronicle. April 25, 2007. http://www.sfgate.com/cgi-bin/article.cgi?file=/c/a/2007/04/25/ MNGELPF0DR1.DTL&type=tech
?Kuchinskas, Susan. "Peeking Into Google." Internetnews.com. March 2, 2005. http://www.internetnews.com/xSP/article.php/3487041
?Perez, Juan Carlos. "Google may let users comment on, rearrange search results." The Industry Standard. Aug. 26, 2008. Accessed Aug. 27, 2008. http://www.thestandard.com/news/2008/08/26/google-may-let-users-comment-rearrange -search-results
?A Promotion Guide. "Ranking high at Google." http://www.apromotionguide.com/google.html
?Rogers, Ian. "Google Pagerank Algorithm and How it Works." Ian Rogers. http://www.ianrogers.net/google-page-rank/
?Search Engine Promotion Help. "Google's New Web Page Spider." Oct. 5, 2004. http://www.searchenginepromotionhelp.com/m/articles/search-engine-optimization/ googles-new-spider.php
?Sobek, Markus. "The PageRank Algorithm." eFactory. http://pr.efactory.de/e-pagerank-algorithm.shtml


Source

Link to comment
Share on other sites

This topic is now closed to further replies.