Abstract
Currently, there is little agreement as to how Natural Language Generation (NLG) systems should be evaluated. While there is some agreement regarding automatic metrics, there is a high degree of variation in the way that human evaluation is carried out. This paper provides an overview of how human evaluation is currently conducted, and presents a set of best practices, grounded in the literature. With this paper, we hope to contribute to the quality and consistency of human evaluations in NLG.
| Original language | English |
|---|---|
| Title of host publication | Proceedings of the 12th International Conference on Natural Language Generation |
| Place of Publication | Tokyo, Japan |
| Publisher | Association for Computational Linguistics |
| Pages | 355-368 |
| Number of pages | 14 |
| Publication status | Published - 1 Oct 2019 |
| Event | 12th International conference on Natural Language Generation (INLG 2019) - Tokyo, Japan Duration: 29 Oct 2019 → 1 Nov 2019 https://www.inlg2019.com |
Conference
| Conference | 12th International conference on Natural Language Generation (INLG 2019) |
|---|---|
| Country/Territory | Japan |
| City | Tokyo |
| Period | 29/10/19 → 1/11/19 |
| Internet address |
Fingerprint
Dive into the research topics of 'Best practices for the human evaluation of automatically generated text'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver