Hungry for language data? Introducing a large Dutch corpus of restaurant reviews

    Research output: Contribution to conferenceAbstractScientificpeer-review

    Abstract

    We introduce the Iens corpus, a dataset of over 684,000 Dutch restaurant reviews posted on the website iens.nl between 2012 and 2017. As such, it represents a large-sized language dataset for the Dutch language. While similar corpora exist for English (e.g., the Yelp dataset or the Amazon review corpus), there is a lack of easily available, high-quality data for low-resource languages. The Iens corpus is intended to fill this gap. In addition, the Iens corpus has several unique properties that make it a valuable resource for computational linguistics research. In this paper, we describe the construction and contents of the corpus, discuss its distinguishing features, and present some of its possible applications in computational linguistics.
    Original languageEnglish
    Publication statusPublished - 9 Jul 2021
    EventComputational Linguistics in The Netherlands - Ghent, Belgium
    Duration: 9 Jul 20219 Jul 2021
    Conference number: 31
    https://www.clin31.ugent.be/

    Conference

    ConferenceComputational Linguistics in The Netherlands
    Abbreviated titleCLIN
    Country/TerritoryBelgium
    CityGhent
    Period9/07/219/07/21
    Internet address

    Fingerprint

    Dive into the research topics of 'Hungry for language data? Introducing a large Dutch corpus of restaurant reviews'. Together they form a unique fingerprint.

    Cite this