Human evaluation of automatically generated text: Current trends and best practice guidelines

Research output: Contribution to journalArticleScientificpeer-review

3 Citations (Scopus)

Abstract

Currently, there is little agreement as to how Natural Language Generation (NLG) systems should be evaluated, with a particularly high degree of variation in the way that human evaluation is carried out. This paper provides an overview of how (mostly intrinsic) human evaluation is currently conducted and presents a set of best practices, grounded in the literature. These best practices are also linked to the stages that researchers go through when conducting an evaluation research (planning stage; execution and release stage), and the specific steps in these stages. With this paper, we hope to contribute to the quality and consistency of human evaluations in NLG.
Original languageEnglish
Article number101151
Pages (from-to)1-24
Number of pages24
JournalComputer Speech and Language: An official publication of the International Speech Communication Association (ISCA)
Volume67
DOIs
Publication statusPublished - 21 May 2021

Keywords

  • BIAS
  • DESIGN
  • Ethics
  • Human evaluation
  • INFORMED-CONSENT
  • INTERRATER RELIABILITY
  • LANGUAGE
  • Literature review
  • NUMBER
  • Natural Language Generation
  • Open science
  • POWER
  • RATING-SCALES
  • RESPONSE CATEGORIES
  • Recommendations
  • VALIDITY

Fingerprint

Dive into the research topics of 'Human evaluation of automatically generated text: Current trends and best practice guidelines'. Together they form a unique fingerprint.

Cite this