Using Voyant: Text Analysis meets Historical Research


Text analysis is described as “the use of computers as an aide in the interpretation of electronic texts” (Sinclair and Rockwell 242), and is often used as an aid to understand large document collections that are too extensive for a close-reading (Huijnen et al. 72). However, a study by Gibbs and Owens, using a sampling of historians, found that the respondents had little interest in using digital tools as a method for historical analysis.[1] Indeed, Huijnen et al. suggest that “historians have only just begun to explore what it means doing history from the perspective of both humanities and computer sciences” (72). There maybe various reasons why historians have not opted to use text analysis tools on a larger scale, and is discussed elsewhere by Robertson. Nevertheless, Huijnen et al. believe that text analysis tools are useful for historical research as they often “trigger historians, to draw their attention to potentially interesting cases to explore” (83). In order to investigate this further, this blog post looks at Voyant Tools, for the purpose of examining text analysis as a complementary tool for qualitative historical research.

Voyant ( is a free, online text analysis program which provides good support documentation, and is compatible with a wide range of document formats, including plain text, HTML, XML, PDF, RTF, and MS Word (Sinclair and Rockwell 259). The tool allows users to interact with the text through the visualisation of a word cloud of most frequent words and Keyword-in-Context (KWIC) displays where the word can be found in the context of a sentence. It also allows for the generation of graphs of word frequency within a single text, and multiple texts. While the interface was designed to provide “a low technical bar of entry” for humanists, the tools also provide “more advanced operations” (Sinclair and Rockwell 259). For example, alternative visualisations of word frequencies and trends are available through other tools in the Voyant environment such as Bubblelines and Knots.


Voyant offers three options for uploading text as shown above. If the tool is being used to compare online documents through the use of URL’s, users need to be aware that supplementary or introductory information on a page with a text transcript will also be analysed and affect results. Thus, the preparation of files to be analysed is the first step to a successful experience with this tool. Once the text is prepared and uploaded to the ‘Add Text’ box, a series of windows show a breakdown of the text, and the gear icon in any window can be clicked, to apply a filter for ‘stopwords’.


Applying ‘stopwords’ – English Taporware

One of the advantages of this tool is its ability to export results from the main environment, through “live tool widgets” which can then be embedded in blogs and websites (Sinclair and Rockwell 259). This adds support to an argument, as other researchers may corroborate the findings for themselves.


Exporting data

While historians tend to read primary texts closely, they may not always see word patterns which may add further meaning. In using the tool to compare multiple documents, the preparation of files is most important as Voyant does not handle text vs. date, or text vs. geography. In order to compare documents geographically, a user needs to divide a corpus into geographical areas; and in comparing texts over time, the files need to be prepared in a chronological order. This is reflected in the digital history studies by Baker at the British Library, Anderson at Rice University, and the Emory Library Project, though, their results seem to justify the time spent in the preparation of files. However, there is a scarcity of literature which speculates on the impact of Voyant on the historical community; thus, it is hard to assess whether potential results would justify the time and human resources needed to compartmentalise larger document collections for the purpose of text analysis through Voyant.

In conclusion, Voyant offers potential to reveal new areas of inquiry through word frequency patterns and is designed to be user-friendly for humanists. However, the preparation of files is important to achieve adequate results, as the tool does not handle text vs. date, or text vs. geography, and it takes time to put a large corpus of text in order. The tool provides for the exportation of results, and allows for others to return to the original data to corroborate findings, which is a significant advantage in terms of methodological transparency. Overall, I found the tool easy-to-use, with some interesting results, and would certainly use it again.


[1] From a sampling of 213 historians, mostly from Western Europe and North America, a study by Gibbs and Owens suggests that “finding references and information is a much higher priority than using tools to analyze primary sources”. Moreover, while respondents applauded the growing availability of primary sources online, there was “little comment about a need for, or interest in, any specific tools to help make use of these archives in novel ways.” Thus, Gibbs and Owens surmise that “the uses of digital tools among our respondents are of the most general kind: Google searches and the use of digitized primary and secondary sources” (italics in original).

[2] Sinclair and Rockwell suggest that “using computers to perform formal operations on texts does not require humanists to approach texts from a positivistic perspective: we can ask formal questions of texts in service of speculative or hermeneutic objectives” (255-256). This type of approach is expressed as “algorithmic criticism” by Stephen Ramsay who suggests “‘one would not ask how the ends of interpretation were or were not justified by means of the algorithms imposed, but rather, how successful the algorithms were in provoking thought and allowing insight’” (qtd. in Sinclair and Rockwell 256).


