Manual Annotation of Unsupervised Models: Close and Distant Reading of Politics on Reddit

Christoph Aurnhammer*, Iris Cuppen, Inge van de Ven, Menno van Zaanen

*Corresponding author for this work

Research output: Contribution to journalArticleScientificpeer-review


This article offers a methodological contribution to manually-assisted topic modeling. With the availability of vast amounts of (online) texts, performing full scale literary analysis using a close reading approach is not practically feasible. The set of alternatives proposed by Franco Moretti (2000) under the umbrella term of "distant reading" aims to show broad patterns that can be found throughout the entire text collection. After a survey of literary-critical practices that combine close and distant reading methods, we use manual annotations of a thread on Reddit, both to evaluate an LDA model, and to provide information that topic modeling lacks. We also make a case for applying these reading techniques that originate in literary reading more broadly to online, non-literary contexts. Given a large collection of posts from a Reddit thread, we compare a manual, close reading analysis against an automatic, computational distant reading approach based on topic modeling using LDA. For each text in the collection, we label the contents, effectively clustering related texts. Next, we evaluate the similarity of the respective outcomes of the two approaches. Our results show that the computational content/topic-based labeling partially overlaps with the manual annotation. However, the close reading approach not only identifies texts with similar content, but also those with similar function. The differences in annotation approaches require rethinking the purpose of computational techniques in reading analysis. Thus, we present a model that could be valuable for scholars who have a small amount of manual annotation that could be used to tune an unsupervised model of a larger dataset.

Original languageEnglish
Number of pages18
JournalDigital Humanities Quarterly
Issue number3
Publication statusPublished - 2019


Dive into the research topics of 'Manual Annotation of Unsupervised Models: Close and Distant Reading of Politics on Reddit'. Together they form a unique fingerprint.

Cite this