Skip to main navigation Skip to search Skip to main content

PoeTree: Poetry Treebanks in Czech, English, French, German, Hungarian, Italian, Portuguese, Russian, Slovenian and Spanish

  • Petr Plecháč
  • , Silvie Cinková
  • , Robert Kolár
  • , Artjoms Šeļa
  • , Mirella De Sisto
  • , Lara Nugues
  • , Thomas Haider
  • , Neža Kočnik

    Research output: Contribution to journalArticleScientificpeer-review

    432 Downloads (Pure)

    Abstract

    This article presents a set of standardised corpora of poetry comprising over 330,000 poems in ten languages (Czech, English, French, German, Hungarian, Italian, Portuguese, Russian, Slovenian, and Spanish). Each corpus has been deduplicated, enriched with Universal Dependencies, provided with additional metadata, and converted into a unified json structure.
    Original languageEnglish
    Number of pages17
    JournalResearch Data Journal for the Humanities and Social Sciences
    DOIs
    Publication statusPublished - Sept 2024

    Keywords

    • poetry
    • computational poetry
    • corpus linguistics
    • digital humanities

    Fingerprint

    Dive into the research topics of 'PoeTree: Poetry Treebanks in Czech, English, French, German, Hungarian, Italian, Portuguese, Russian, Slovenian and Spanish'. Together they form a unique fingerprint.

    Cite this