Abstract
This paper describes the CACAPO dataset, built for training both neural pipeline and end-to-end data-to-text language generation systems. The dataset is multilingual (Dutch and English), and contains almost 10,000 sentences from human-written news texts in the sports, weather, stocks, and incidents domain, together with aligned attribute-value paired data. The dataset is unique in that the linguistic variation and indirect ways of expressing data in these texts reflect the challenges of real world NLG tasks.
| Original language | English |
|---|---|
| Title of host publication | Proceedings of The 13th International Conference on Natural Language Generation |
| Place of Publication | Dublin, Ireland |
| Pages | 68-79 |
| Number of pages | 15 |
| Publication status | Published - 1 Dec 2020 |
| Event | International Conference on Natural Language Generation - online, Dublin , Ireland Duration: 15 Dec 2020 → 18 Dec 2020 Conference number: 13 https://www.inlg2020.org/ |
Conference
| Conference | International Conference on Natural Language Generation |
|---|---|
| Abbreviated title | INLG 2020 |
| Country/Territory | Ireland |
| City | Dublin |
| Period | 15/12/20 → 18/12/20 |
| Internet address |
Fingerprint
Dive into the research topics of 'The CACAPO Dataset: A Multilingual, Multi-Domain Dataset for Neural Pipeline and End-to-End Data-to-Text Generation'. Together they form a unique fingerprint.Datasets
-
CACAPO dataset
van der Lee, C. (Creator), Emmery, C. (Creator), Wubben, S. (Creator) & Krahmer, E. (Creator), DataverseNL, 2 Aug 2022
DOI: 10.34894/libyhp, https://dataverse.nl/citation?persistentId=doi:10.34894/LIBYHP
Dataset