Abstract
We introduce the Iens corpus, a dataset of over 684,000 Dutch restaurant reviews posted on the website iens.nl between 2012 and 2017. As such, it represents a large-sized language dataset for the Dutch language. While similar corpora exist for English (e.g., the Yelp dataset or the Amazon review corpus), there is a lack of easily available, high-quality data for low-resource languages. The Iens corpus is intended to fill this gap. In addition, the Iens corpus has several unique properties that make it a valuable resource for computational linguistics research. In this paper, we describe the construction and contents of the corpus, discuss its distinguishing features, and present some of its possible applications in computational linguistics.
| Original language | English |
|---|---|
| Publication status | Published - 9 Jul 2021 |
| Event | Computational Linguistics in The Netherlands - Ghent, Belgium Duration: 9 Jul 2021 → 9 Jul 2021 Conference number: 31 https://www.clin31.ugent.be/ |
Conference
| Conference | Computational Linguistics in The Netherlands |
|---|---|
| Abbreviated title | CLIN |
| Country/Territory | Belgium |
| City | Ghent |
| Period | 9/07/21 → 9/07/21 |
| Internet address |