Google Ngram Viewer provides searchable dataset of books

Dec. 16, 2010 4:47 PM PT

This article was originally on a blog post platform and may be missing photos, graphics or links. See About archive blog posts.

Want to know when ‘hep cat’ entered the popular lexicon? Or when ‘dying of consumption’ fell out of literary use? Or which of three former presidents -- Abraham Lincoln, George Washington or Thomas Jefferson -- made the most appearances in print in a given decade? (Turns out Washington surpassed Lincoln some time around 1928 and has remained in the lead ever since.)

Google’s latest data-visualization tool, Ngram Viewer, allows the curious to search through datasets of 500 billion words from 5.2 million books in Chinese, English, French, German, Russian and Spanish to determine the approximate frequency with which sets of up to three words or phrases have appeared from year to year. Users can search the data using the viewer tool or freely download the datasets for their own use.

The datasets backing the Ngram Viewer are a subset of the more than 15 million books Google has digitized since 2004.

‘We know nothing can replace the balance of art and science that is the qualitative cornerstone of research in the humanities,’ wrote Google Books engineering manager Jon Orwant on the company’s blog. ‘But we hope the Google Books Ngram Viewer will spark some new hypotheses ripe for in-depth investigation, and invite casual exploration at the same time.’

-- Abby Sewell

Technology Blog

Abby Sewell

Abby Sewell is a former staff writer for the Los Angeles Times.

Google Ngram Viewer provides searchable dataset of books

Meet the homeless L.A. immigrants who built a DIY home in gentrifying Highland Park

A celebrated L.A. astrology influencer’s stunning fall from ‘healer’ to solar eclipse killer

Supreme Court divided on homelessness case that will affect California encampment policy

The 50 best Hollywood books of all time

David Ellison’s journey from trust fund kid to media mogul vying to buy Paramount