Abstract
Currently, there is little agreement as to how Natural Language Generation (NLG) systems should be evaluated. While there is some agreement regarding automatic metrics, there is a high degree of variation in the way that human evaluation is carried out. This paper provides an overview of how human evaluation is currently conducted, and presents a set of best practices, grounded in the literature. With this paper, we hope to contribute to the quality and consistency of human evaluations in NLG.
Original language | English |
---|---|
Title of host publication | Proceedings of the 12th International Conference on Natural Language Generation |
Place of Publication | Tokyo, Japan |
Publisher | Association for Computational Linguistics |
Pages | 355-368 |
Number of pages | 14 |
Publication status | Published - 1 Oct 2019 |
Event | 12th International conference on Natural Language Generation (INLG 2019) - Tokyo, Japan Duration: 29 Oct 2019 → 1 Nov 2019 https://www.inlg2019.com |
Conference
Conference | 12th International conference on Natural Language Generation (INLG 2019) |
---|---|
Country/Territory | Japan |
City | Tokyo |
Period | 29/10/19 → 1/11/19 |
Internet address |