PoeTree: Poetry Treebanks in Czech, English, French, German, Hungarian, Italian, Portuguese, Russian, Slovenian and Spanish

Petr Plecháč, Silvie Cinková, Robert Kolár, Artjoms Šeļa, Mirella De Sisto, Lara Nugues, Thomas Haider, Neža Kočnik

Research output: Contribution to journalArticleScientificpeer-review

Abstract

This article presents a set of standardised corpora of poetry comprising over 330,000 poems in ten languages (Czech, English, French, German, Hungarian, Italian, Portuguese, Russian, Slovenian, and Spanish). Each corpus has been deduplicated, enriched with Universal Dependencies, provided with additional metadata, and converted into a unified json structure.
Original languageEnglish
JournalResearch Data Journal for the Humanities and Social Sciences
DOIs
Publication statusPublished - Sept 2024

Fingerprint

Dive into the research topics of 'PoeTree: Poetry Treebanks in Czech, English, French, German, Hungarian, Italian, Portuguese, Russian, Slovenian and Spanish'. Together they form a unique fingerprint.

Cite this