Mapping Texts: Computational Text Analysis for the Social Sciences

Metaphors orient us. Many guides to text analysis use the mining metaphor. When mining, we search for a valuable vein to extract from valueless gangue. Information retrieval scholars pioneered computer-assisted text analysis by confronting the problem of (as you might guess) retrieving the most relevant documents from ever-growing databases. Parts of the documents were less helpful for completing this task. This was especially true when computational resources were minimal. Mining was an apt metaphor.

Mapping, by contrast, is not about extraction. It is about reduction to aid interpretation. When mapping texts, we simplify their information, but always for particular uses. Many useful cartographies are based on the same territory: road maps, contour maps, political maps, to name a few. Wrangling, pruning, stopping, and transforming text all involve a decision informed by a particular goal. To put it plainly: there is not a sole kernel of truth to be extracted, but rather a range of empirical patterns. There is an unfinished quality to text analysis—perhaps all science. Repeating, then, is a cornerstone of this [section]. While scale can undoubtedly be useful, iteration is the unsung hero of computational methods.

Merging computational techniques with social scientific text analysis blurs qualitative and quantitative methods. While computational methods involve quantification, there is interpretation before, during, and after any quantification. Text analysis is about establishing whether patterns are present in a collection of texts and how they vary. Determining what these patterns mean and which are important, however, is a qualitative process. So, we look toward the centuries of scholarly analysis of texts to guide our computational workflows.