Abstract
NLG researchers often use uncontrolled corpora to train and evaluate their systems, using textual similarity metrics, such as BLEU. This position paper argues in favour of two alternative evaluation strategies, using grammars or rule-based systems. These strategies are particularly useful to identify the strengths and weaknesses of different systems. We contrast our proposals with the (extended) WebNLG dataset, which is revealed to have a skewed distribution of predicates. We predict that this distribution affects the quality of the predictions for systems trained on this data. However, this hypothesis can only be thoroughly tested (without any confounds) once we are able to systematically manipulate the skewness of the data, using a rule-based approach.
Original language | English |
---|---|
Title of host publication | Proceedings of the 1st Workshop on Evaluating NLG Evaluation |
Place of Publication | Dublin, Ireland |
Publisher | Association for Computational Linguistics |
Pages | 17-27 |
Number of pages | 11 |
Publication status | Published - 1 Dec 2020 |
Event | Workshop on Evaluating NLG Evaluation - Online, Dublin, Ireland Duration: 18 Dec 2020 → … https://evalnlg-workshop.github.io/ |
Conference
Conference | Workshop on Evaluating NLG Evaluation |
---|---|
Country/Territory | Ireland |
City | Dublin |
Period | 18/12/20 → … |
Internet address |