Corpus and computational linguistics could further strengthen the thriving field of empirical studies of literature. This chapter discusses some straightforward corpus linguistic techniques: unigrams, bigrams and latent semantic analysis. The three techniques are then applied to Shakespeare's plays in order to determine how well they can categorize them in genres. In the n-gram analyses frequencies of shared words across the plays are entered in a Multi-Dimensional Scaling (MDS) analysis, in LSA the similarity values between the plays are entered in MDS. With all three techniques two categories emerged: comedies on the one hand, and tragedies/histories on the other. Moreover, a strong correlation was found between the three fundamentally different techniques.
|Title of host publication||New Directions in Literary Studies|
|Editors||W. van Peer, J. Auracher|
|Place of Publication||Newcastle|
|Publisher||Cambridge Scholars Publishing|
|Publication status||Published - 2008|
Louwerse, M., Lewis, G. A., & Wu, J. (2008). Unigrams, bigrams and LSA: Corpus linguistic explorations of genres in Shakespeare's plays. In W. van Peer, & J. Auracher (Eds.), New Directions in Literary Studies (pp. 108-129). Cambridge Scholars Publishing.