Abstract
This review gives an overview of evaluation methods for task-oriented dialogue systems, discussing the constructs, metrics and operationalisations used in previous work and highlighting the challenges in the context of dialogue system evaluation. The objective of this review is to encourage a more critical approach when evaluating dialogue systems. To that end, a systematic review of four databases was conducted (ACL, ACM, IEEE and Web of Science), which after screening resulted in 122 studies. Those studies were carefully analysed for the constructs and methods they proposed for evaluation. Four of the most occurring constructs (satisfaction, correctness, quality, and efficiency) are discussed as an example of how constructs are operationalised and measured in research. Additionally, recent developments regarding large language models are discussed for their applicability in the context of evaluation of dialogue systems. Furthermore, considerations and concerns about validity and reliability are discussed in relation to the found constructs and metrics. To improve consistency in evaluation approaches, future work should take a critical and systematic approach to the operationalisation and specification of the used constructs. To work towards this aim, this review ends with a research agenda for dialogue system evaluation and suggestions for outstanding questions.
| Original language | English |
|---|---|
| Number of pages | 38 |
| Journal | Northern European Journal of Language Technology |
| Volume | 12 |
| Issue number | 1 |
| DOIs | |
| Publication status | Published - 1 Mar 2026 |
Fingerprint
Dive into the research topics of 'Evaluating Task-oriented Dialogue Systems: A Systematic Review of Measures, Constructs and their Operationalisations'. Together they form a unique fingerprint.Projects
- 1 Finished
-
Smooth Operator. Development and effects of personalized conversational AI.
Liebrecht, C. (Principal Investigator), van Hooijdonk, C. M. J. (CoPI), Krahmer, E. (CoPI), van Miltenburg, E. (CoPI), Kunneman, F. (CoPI), Hoeken, H. (CoPI) & te Molder, H. (CoPI)
31/03/21 → 31/03/25
Project: Research project
Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver