Text Analysis for Everyone

The Digital Humanities (DH) can be intimidating to those just getting started. Can I just specialize in one thing and be a digital humanist or do I have to know XML, GIS, text analysis like topic modeling, and everything else? Do I have to be able to program? What do the techniques and tools of DH do for me that I can’t already figure out using traditional approaches to humanities questions?

DH should not necessarily be an end, but a means to an end. If presented in that way, it can be more palatable to scholars unfamiliar with it and more approachable for those considering making their own map, marking up a document in TEI, or venturing into the world of web scraping. For me, DH is a supplement to traditional humanities techniques. As an historian, traditional means the close reading of archival resources. It also generally means working solo. But some projects are too large or too complex to not be collaborative, and visualizations, while they might not tell the researcher anything they don’t know from the archive already, can be very useful to those approaching a topic for the first time. Let me use my work for the past three years on Founders Online as an example of what I mean.

Docs-Compass-Word-CloudThe first ever word cloud made from the famous retirement-era correspondence

of Thomas Jefferson and John Adams (1812-1826).

A few months ago, I was asked to give a presentation about my work on the Early Access portion of Founders Online. I spoke about the value of having the letters of the Founding Fathers online for free, about the generous assistance we received from the editorial staffs of the Founding Fathers projects, and about the workflow we developed at Documents Compass in order to proofread or transcribe over 50,000 documents in only three years.

After my talk, I fielded a number of questions about the site and our work on Early Access. One of the best questions was about how the letters and data on Founders Online could be put to other uses by digital scholars. At the time, the most obvious answer to me was to point to the People of the Founding Era (PFE) project that Documents Compass has been working on for the last few years. PFE, which launched earlier this year, aims to create a prosopography of the founding era of the United States (i.e. the late 18th and early 19th century). In creating PFE, Documents Compass extracted identifications of people mentioned in the letters of the Founding Fathers found on Rotunda’s Founding Era Collection, from the famous such as Patrick Henry to the private soldier of Washington’s army. Many of these letters, and the biographical data their footnotes contain, are now available on Founders Online.

In addition to pointing to PFE, I also suggested that someone might use place information in the dateline to map the network of correspondence between the Founding Fathers and their constituents, similar to Stanford’s Mapping the Republic of Letters. While it would take some work to separate place names from date information and to standardize highly variable eighteenth-century spelling, scholars and students could use one of the many free geospatial tools available online to create a visualization of Washington’s correspondence with his army commanders during the Yorktown campaign or to map out Jefferson’s correspondence with his friends, family, and creditors in his retirement years.

Another obvious idea would be to use digital tools to text mine or do topic modeling of the corpus of letters on Founders Online, which will total nearly 200,000 documents when complete. At a recent presentation at UVA’s scholar’s lab, historian Micki Kaufman discussed how she evaluated and visualized a selected portion of the correspondence of Henry Kissinger in her project known as Quantifying Kissinger. While her work is very impressive, many of the visualization and sentiment analysis tools she demonstrated such as Gephi have a steep learning curve that might turn off all but the most devoted digital humanists from using them on a larger corpus of correspondence like Founders Online.

Fortunately, there are more user-friendly tools that historians and students with no background in digital humanities can use to visualize well-used historical letters. For example, I decided to use a website called Voyant to analyze Early Access’s proofread transcriptions of John Adams’s and Thomas Jefferson’s famous retirement correspondence. I uploaded a combined XML file of the over 150 letters comprising this famous exchange to Voyant I was able to create easily the word cloud at the top of this post by making a few tweaks to a pre-set filter meant eliminate articles and prepositions appearing in the text. The larger the word appears in the cloud, the more frequently it occurred in their correspondence.

The word cloud shows how often such topics as God, government, and history came up in these two famous Founders’ letters to each other. While the prevalence of such topics in the Adams-Jefferson letters would not surprise most American historians, the word cloud does provide those unfamiliar with the letters a preview of what to expect when doing a closer reading of the letters’ content. Best of all, it only took me a few minutes. Uploading documents in a variety of formats is incredibly easy, and the website provides a handful of other tools for counting which words occur most frequently in the text, how the frequency of certain words changes from beginning to end, and analysis of the context in which a selected keyword appears.

A Screenshot of the TJ-JA Correspondence in VoyantA Screenshot of the TJ-JA Correspondence in Voyant

While the word cloud I created from Voyant may not be as useful as data visualized in more sophisticated textual analysis programs, I hope it demonstrates the potential of Founders Online for traditional scholars and digital humanists alike.

An earlier version of this blog post appeared on the Documents Compass website in September 2014.