Evaluation rules! On the use of grammars and rule-based systems for NLG evaluation

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review


NLG researchers often use uncontrolled corpora to train and evaluate their systems, using textual similarity metrics, such as BLEU. This position paper argues in favour of two alternative evaluation strategies, using grammars or rule-based systems. These strategies are particularly useful to identify the strengths and weaknesses of different systems. We contrast our proposals with the (extended) WebNLG dataset, which is revealed to have a skewed distribution of predicates. We predict that this distribution affects the quality of the predictions for systems trained on this data. However, this hypothesis can only be thoroughly tested (without any confounds) once we are able to systematically manipulate the skewness of the data, using a rule-based approach.
Original languageEnglish
Title of host publicationProceedings of the 1st Workshop on Evaluating NLG Evaluation
Place of PublicationDublin, Ireland
PublisherAssociation for Computational Linguistics
Number of pages11
Publication statusPublished - 1 Dec 2020
EventWorkshop on Evaluating NLG Evaluation - Online, Dublin, Ireland
Duration: 18 Dec 2020 → …


ConferenceWorkshop on Evaluating NLG Evaluation
Period18/12/20 → …
Internet address


Dive into the research topics of 'Evaluation rules! On the use of grammars and rule-based systems for NLG evaluation'. Together they form a unique fingerprint.

Cite this