Human evaluation of automatically generated text: Current trends and best practice guidelines

Research output: Contribution to journalArticleScientificpeer-review

Abstract

Currently, there is little agreement as to how Natural Language Generation (NLG) systems should be evaluated, with a particularly high degree of variation in the way that human evaluation is carried out. This paper provides an overview of how (mostly intrinsic) human evaluation is currently conducted and presents a set of best practices, grounded in the literature. These best practices are also linked to the stages that researchers go through when conducting an evaluation research (planning stage; execution and release stage), and the specific steps in these stages. With this paper, we hope to contribute to the quality and consistency of human evaluations in NLG.
Original languageEnglish
Pages (from-to)1-24
Number of pages24
JournalComputer Speech and Language: An official publication of the International Speech Communication Association (ISCA)
Volume67
DOIs
Publication statusPublished - 21 May 2021

Keywords

  • Natural Language Generation
  • Human evaluation
  • Recommendations
  • Literature review
  • Open science
  • Ethics

Fingerprint Dive into the research topics of 'Human evaluation of automatically generated text: Current trends and best practice guidelines'. Together they form a unique fingerprint.

Cite this