Unigrams, bigrams and LSA: Corpus linguistic explorations of genres in Shakespeare's plays

Max Louwerse, Gwyneth A. Lewis, Jie Wu

Research output: Chapter in Book/Report/Conference proceedingChapterScientificpeer-review

Abstract

Corpus and computational linguistics could further strengthen the thriving field of empirical studies of literature. This chapter discusses some straightforward corpus linguistic techniques: unigrams, bigrams and latent semantic analysis. The three techniques are then applied to Shakespeare's plays in order to determine how well they can categorize them in genres. In the n-gram analyses frequencies of shared words across the plays are entered in a Multi-Dimensional Scaling (MDS) analysis, in LSA the similarity values between the plays are entered in MDS. With all three techniques two categories emerged: comedies on the one hand, and tragedies/histories on the other. Moreover, a strong correlation was found between the three fundamentally different techniques.
Original languageEnglish
Title of host publicationNew Directions in Literary Studies
EditorsW. van Peer, J. Auracher
Place of PublicationNewcastle
PublisherCambridge Scholars Publishing
Chapter5
Pages108-129
Publication statusPublished - 2008
Externally publishedYes

    Fingerprint

Cite this

Louwerse, M., Lewis, G. A., & Wu, J. (2008). Unigrams, bigrams and LSA: Corpus linguistic explorations of genres in Shakespeare's plays. In W. van Peer, & J. Auracher (Eds.), New Directions in Literary Studies (pp. 108-129). Cambridge Scholars Publishing.