We introduce the Iens corpus, a dataset of over 684,000 Dutch restaurant reviews posted on the website iens.nl between 2012 and 2017. As such, it represents a large-sized language dataset for the Dutch language. While similar corpora exist for English (e.g., the Yelp dataset or the Amazon review corpus), there is a lack of easily available, high-quality data for low-resource languages. The Iens corpus is intended to fill this gap. In addition, the Iens corpus has several unique properties that make it a valuable resource for computational linguistics research. In this paper, we describe the construction and contents of the corpus, discuss its distinguishing features, and present some of its possible applications in computational linguistics.
|Publication status||Published - 9 Jul 2021|
|Event||Computational Linguistics in The Netherlands - Ghent, Belgium|
Duration: 9 Jul 2021 → 9 Jul 2021
Conference number: 31
|Conference||Computational Linguistics in The Netherlands|
|Period||9/07/21 → 9/07/21|