Google Ngram Database Tracks Popularity Of 500 Billion Words

Did you know that ''google'' appeared in print as early as 1908? Or that ''email'' first popped up in 1524?

Google has quietly released Ngram, a free tool that allows users to sift through 500 billion words contained in nearly 5.2 million books published between 1500 and 2008 in English, French, Spanish, German, Chinese and Russian. The tool can compare words and phrases and provide a year-by-year breakdown of when and how often they appeared in print. The entire dataset that powers the tool is also available for download.

The tool is expected to prove invaluable for researchers and academics, and provides a fascinating window into the past for the rest of us. For example, a quick search shows that references to communism peaked around 1960 at the height of the Cold War, while the word ''internet'' only appeared sporadically in print before usage skyrocketed in the late 1980s.

According to the New York Times, the project was a collaboration between Google and Harvard, and Erez Lieberman Aiden, a junior fellow at Harvard's Society of Fellows told the newspaper the goal was to make it simple for people to browse cultural trends as shown in books.

''We wanted to show what becomes possible when you apply very high-turbo data analysis to questions in the humanities,'' he said.

On the tech front, Microsoft will be pleased to know that ''bing'' first appeared in print in 1650, and it will come as no surprise to learn that ''apple'' was around as early as the late 1500s. Sadly, ''Neowin'' does not appear once in the centuries of text indexed by the tool. Those looking for a special treat should try entering the phrase ''never gonna give you up'' into the tool.

Report a problem with article
Previous Story

New iPhone app translates words in real-time

Next Story

Costco to stop selling Apple products

15 Comments

Commenting is disabled on this article.

"how did the word email pop up before the actual idea was even conceived? "

English is not, and was not in the 16 th century, the only language.
Email did exist before the 1500 era in French for heraldics, pottery and enamelled items. So enamel it it... Far from electronic mailboxes.

jasonon said,
how did the word email pop up before the actual idea was even conceived?

from the looks of it, it's mis-reading "small". the words are in a typeface that's hard to pickup with ocr.

What's actually amusing is I searched for both of those items in the snapshot (as well as Shangri La) after having a flashback to Journeyman Project 3.

Except that nearly all the early occurrences of the word "email" is actually just a computer mis-reading the word "small". There weren't any references to the actual word email in the 1500's, but I didn't go through all the centuries to see when the first real reference was.

Wow! Since the 1500? That's crazy.

I tried it with World War II and AIDS (the first things that popped up, I must be depressed), and the results were exactly as expected.

s3n4te said,
Impossible, I don't think there is 500 billion different words in all the languages on Earth combined.

500 billion printed words in 5.2 million books. Read much?

That means there are roughly 100,000 words on average in the 5.2 million books it has analyzed.