Abstract
Corpus and computational linguistics could further strengthen the thriving field of empirical studies of literature. This chapter discusses some straightforward corpus linguistic techniques: unigrams, bigrams and latent semantic analysis. The three techniques are then applied to Shakespeare's plays in order to determine how well they can categorize them in genres. In the n-gram analyses frequencies of shared words across the plays are entered in a Multi-Dimensional Scaling (MDS) analysis, in LSA the similarity values between the plays are entered in MDS. With all three techniques two categories emerged: comedies on the one hand, and tragedies/histories on the other. Moreover, a strong correlation was found between the three fundamentally different techniques.
Original language | English |
---|---|
Title of host publication | New Directions in Literary Studies |
Editors | W. van Peer, J. Auracher |
Place of Publication | Newcastle |
Publisher | Cambridge Scholars Publishing |
Chapter | 5 |
Pages | 108-129 |
Publication status | Published - 2008 |
Externally published | Yes |