Google Ngram Viewer provides searchable dataset of books

Dec. 16, 2010 4:47 PM PT

This article was originally on a blog post platform and may be missing photos, graphics or links. See About archive blog posts.

Want to know when ‘hep cat’ entered the popular lexicon? Or when ‘dying of consumption’ fell out of literary use? Or which of three former presidents -- Abraham Lincoln, George Washington or Thomas Jefferson -- made the most appearances in print in a given decade? (Turns out Washington surpassed Lincoln some time around 1928 and has remained in the lead ever since.)

Google’s latest data-visualization tool, Ngram Viewer, allows the curious to search through datasets of 500 billion words from 5.2 million books in Chinese, English, French, German, Russian and Spanish to determine the approximate frequency with which sets of up to three words or phrases have appeared from year to year. Users can search the data using the viewer tool or freely download the datasets for their own use.

The datasets backing the Ngram Viewer are a subset of the more than 15 million books Google has digitized since 2004.

‘We know nothing can replace the balance of art and science that is the qualitative cornerstone of research in the humanities,’ wrote Google Books engineering manager Jon Orwant on the company’s blog. ‘But we hope the Google Books Ngram Viewer will spark some new hypotheses ripe for in-depth investigation, and invite casual exploration at the same time.’

-- Abby Sewell

Technology Blog

Abby Sewell

Abby Sewell is a former staff writer for the Los Angeles Times.

Google Ngram Viewer provides searchable dataset of books

After scandal, movie producer Randall Emmett is flying under the radar with a new name

Birkin bag thieves prowl L.A.’s rich neighborhoods, fueling a bizarre black market

These are the California cities where $150,000 still buys you a home. Could you live here?

‘Rivers in the sky’ have drenched California, yet even more extreme rains are possible

How a migrant farmworker built generational wealth, penny by penny