Hungry for language data? Introducing a large Dutch corpus of restaurant reviews

Research output: Contribution to conferenceAbstractScientificpeer-review

Abstract

We introduce the Iens corpus, a dataset of over 684,000 Dutch restaurant reviews posted on the website iens.nl between 2012 and 2017. As such, it represents a large-sized language dataset for the Dutch language. While similar corpora exist for English (e.g., the Yelp dataset or the Amazon review corpus), there is a lack of easily available, high-quality data for low-resource languages. The Iens corpus is intended to fill this gap. In addition, the Iens corpus has several unique properties that make it a valuable resource for computational linguistics research. In this paper, we describe the construction and contents of the corpus, discuss its distinguishing features, and present some of its possible applications in computational linguistics.
Original languageEnglish
Publication statusPublished - 9 Jul 2021
EventComputational Linguistics in The Netherlands - Ghent, Belgium
Duration: 9 Jul 20219 Jul 2021
Conference number: 31
https://www.clin31.ugent.be/

Conference

ConferenceComputational Linguistics in The Netherlands
Abbreviated titleCLIN
Country/TerritoryBelgium
CityGhent
Period9/07/219/07/21
Internet address

Fingerprint

Dive into the research topics of 'Hungry for language data? Introducing a large Dutch corpus of restaurant reviews'. Together they form a unique fingerprint.

Cite this