Search our FAQs. Or, ask a new question!
Browse:
Baker Library has licensed the following newspapers for data mining from ProQuest. Currently, the newspapers are available on hard drives. For information, please contact Alex Caracuzzo acaracuzzo@hbs.edu.
Harvard affiliates may also want to explore ProQuest TDM Studio, a tool that allows you to mine large volumes of published content.
Newspaper Title | Years of XML/PDF Articles | Articles-Level vs. Page-level |
---|---|---|
Atlanta Constitution | 1868-1930 (XML only) | TBD |
Austin American Statesman | 1871-1926 | all years article-level |
The Baltimore Sun | 1837-1932 | all years article-level |
The Boston Globe | 1872-1987 | all years article-level |
Chicago Tribune | 1849-1935 | all years article-level |
The Christian Science Monitor | 1908-1995 | all years article-level |
The Cincinnati Enquirer | 1841-2009 | 1841-1922 article-level; 1923-2009 page-level |
Dayton Daily News | TBD | TBD |
Detroit Free Press | 1831-1999 | 1931-1922 article-level; 1923-1999 page level |
Hartford Courant | 1764-1934 | all years article-level |
Los Angeles Times | 1881-1950 | all years article-level |
Louisville Courier-Journal | 1830-2000 | 1830-1922 article-level; 1923-2000 page-level |
Nashville Tennessean | 1812-2002 | 1812-1922 article-level; 1923-2002 page-level |
The New York Times | 1851-1933 (XML only) | TBD |
New York Tribune/Herald Tribune | 1841-1962 | all years article-level |
Newsday | 1940-1990 | all years article-level |
Philadelphia Inquirer | 1860-2001 | all years page-level |
San Francisco Chronicle | 1865-1922 | all years article-level |
St. Louis Post-Dispatch | 1874-2003 | 1874-1922 article-level; 1923-2003 page-level |
Wall Street Journal | 1889-1932 (XML only) | TBD |
Washington Post | 1877-1937 | TBD |
The Harvard Kennedy School also has a guide on resources available for texting mining.
Text Analysis Tools
NVivo - https://library.harvard.edu/services-tools/nvivo
MALLET - http://mallet.cs.umass.edu/
Voyant Tools - http://voyant-tools.org/
Computational Literature Review (clR) - https://github.com/rvidgen/clr
Google n-gram https://books.google.com/ngrams
Natural Language Toolkit (Python) - http://www.nltk.org/
Stanford CoreNLP - https://stanfordnlp.github.io/CoreNLP/index.html#download
Was this helpful? 0 0
Copyright © 2022 President & Fellows of Harvard College.