Our FAQs have moved!

As of August 7, 2024, Baker Library's FAQs have moved to https://www.library.hbs.edu/services/help-center.  Please visit Baker's Help Center to find our research FAQs and send us your questions.  Thank you!

Answered By: Jen Beauregard
Last Updated: May 08, 2024     Views: 670

Text Mining Resources


Baker Library has licensed the following newspapers for data mining from ProQuest. Currently, the newspapers are available on hard drives. For information, please contact Alex Caracuzzo acaracuzzo@hbs.edu. 

Harvard affiliates may also want to explore ProQuest TDM Studio, a tool that allows you to mine large volumes of published content. 

Newspaper Title Years of XML/PDF Articles Articles-Level vs. Page-level
Atlanta Constitution 1868-1930 (XML only) TBD
Austin American Statesman 1871-1926 all years article-level
The Baltimore Sun 1837-1932 all years article-level
The Boston Globe 1872-1987 all years article-level
Chicago Tribune 1849-1935 all years article-level
The Christian Science Monitor 1908-1995 all years article-level
The Cincinnati Enquirer 1841-2009 1841-1922 article-level; 1923-2009 page-level
Dayton Daily News TBD TBD
Detroit Free Press 1831-1999 1931-1922 article-level; 1923-1999 page level
Hartford Courant 1764-1934 all years article-level
Los Angeles Times 1881-1950 all years article-level
Louisville Courier-Journal 1830-2000 1830-1922 article-level; 1923-2000 page-level
Nashville Tennessean 1812-2002 1812-1922 article-level; 1923-2002 page-level
The New York Times 1851-1933 (XML only) TBD
New York Tribune/Herald Tribune 1841-1962 all years article-level
Newsday 1940-1990 all years article-level
Philadelphia Inquirer 1860-2001 all years page-level
San Francisco Chronicle 1865-1922 all years article-level
St. Louis Post-Dispatch 1874-2003 1874-1922 article-level; 1923-2003 page-level
Wall Street Journal 1889-1932 (XML only) TBD
Washington Post 1877-1937 TBD

The Harvard Kennedy School also has a guide on resources available for texting mining.

Text Analysis Tools
NVivo - https://library.harvard.edu/services-tools/nvivo

MALLET - http://mallet.cs.umass.edu/ 

Voyant Tools - http://voyant-tools.org/

Computational Literature Review (clR) - https://github.com/rvidgen/clr

Google n-gram https://books.google.com/ngrams

Natural Language Toolkit (Python) - http://www.nltk.org/ 

Stanford CoreNLP - https://stanfordnlp.github.io/CoreNLP/index.html#download