Neural data-to-text generation: A comparison between pipeline and end-to-end architectures

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

Abstract

Traditionally, most data-to-text applications have been designed using a modular pipeline architecture, in which non-linguistic input data is converted into natural language through several intermediate transformations. By contrast, recent neural models for data-to-text generation have been proposed as end-to-end approaches, where the non-linguistic input is rendered in natural language with much less explicit intermediate representations in between. This study introduces a systematic comparison between neural pipeline and end-to-end data-to-text approaches for the generation of text from RDF triples. Both architectures were implemented making use of the encoder-decoder Gated-Recurrent Units (GRU) and Transformer, two state-of-the art deep learning methods. Automatic and human evaluations together with a qualitative analysis suggest that having explicit intermediate steps in the generation process results in better texts than the ones generated by end-to-end approaches. Moreover, the pipeline models generalize better to unseen inputs. Data and code are publicly available.
Original languageEnglish
Title of host publicationProceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)
Place of PublicationHong Kong, China
PublisherAssociation for Computational Linguistics
Pages552-562
Number of pages11
Publication statusPublished - 1 Nov 2019
Event2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing - Asia World Expo, Hong Kong, China
Duration: 3 Nov 20197 Nov 2019
https://www.emnlp-ijcnlp2019.org/

Conference

Conference2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing
Abbreviated title(EMNLP-IJCNLP)
CountryChina
CityHong Kong
Period3/11/197/11/19
Internet address

Fingerprint

Pipelines
Deep learning

Cite this

Castro Ferreira, T., van der Lee, C., van Miltenburg, E., & Krahmer, E. (2019). Neural data-to-text generation: A comparison between pipeline and end-to-end architectures. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) (pp. 552-562). Hong Kong, China: Association for Computational Linguistics.
Castro Ferreira, Thiago ; van der Lee, Chris ; van Miltenburg, Emiel ; Krahmer, Emiel. / Neural data-to-text generation : A comparison between pipeline and end-to-end architectures. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Hong Kong, China : Association for Computational Linguistics, 2019. pp. 552-562
@inproceedings{b0ed5e4c4c1e40d5b9369472dc05a3d1,
title = "Neural data-to-text generation: A comparison between pipeline and end-to-end architectures",
abstract = "Traditionally, most data-to-text applications have been designed using a modular pipeline architecture, in which non-linguistic input data is converted into natural language through several intermediate transformations. By contrast, recent neural models for data-to-text generation have been proposed as end-to-end approaches, where the non-linguistic input is rendered in natural language with much less explicit intermediate representations in between. This study introduces a systematic comparison between neural pipeline and end-to-end data-to-text approaches for the generation of text from RDF triples. Both architectures were implemented making use of the encoder-decoder Gated-Recurrent Units (GRU) and Transformer, two state-of-the art deep learning methods. Automatic and human evaluations together with a qualitative analysis suggest that having explicit intermediate steps in the generation process results in better texts than the ones generated by end-to-end approaches. Moreover, the pipeline models generalize better to unseen inputs. Data and code are publicly available.",
author = "{Castro Ferreira}, Thiago and {van der Lee}, Chris and {van Miltenburg}, Emiel and Emiel Krahmer",
year = "2019",
month = "11",
day = "1",
language = "English",
pages = "552--562",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)",
publisher = "Association for Computational Linguistics",

}

Castro Ferreira, T, van der Lee, C, van Miltenburg, E & Krahmer, E 2019, Neural data-to-text generation: A comparison between pipeline and end-to-end architectures. in Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics, Hong Kong, China, pp. 552-562, 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing , Hong Kong, China, 3/11/19.

Neural data-to-text generation : A comparison between pipeline and end-to-end architectures. / Castro Ferreira, Thiago; van der Lee, Chris; van Miltenburg, Emiel; Krahmer, Emiel.

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Hong Kong, China : Association for Computational Linguistics, 2019. p. 552-562.

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

TY - GEN

T1 - Neural data-to-text generation

T2 - A comparison between pipeline and end-to-end architectures

AU - Castro Ferreira, Thiago

AU - van der Lee, Chris

AU - van Miltenburg, Emiel

AU - Krahmer, Emiel

PY - 2019/11/1

Y1 - 2019/11/1

N2 - Traditionally, most data-to-text applications have been designed using a modular pipeline architecture, in which non-linguistic input data is converted into natural language through several intermediate transformations. By contrast, recent neural models for data-to-text generation have been proposed as end-to-end approaches, where the non-linguistic input is rendered in natural language with much less explicit intermediate representations in between. This study introduces a systematic comparison between neural pipeline and end-to-end data-to-text approaches for the generation of text from RDF triples. Both architectures were implemented making use of the encoder-decoder Gated-Recurrent Units (GRU) and Transformer, two state-of-the art deep learning methods. Automatic and human evaluations together with a qualitative analysis suggest that having explicit intermediate steps in the generation process results in better texts than the ones generated by end-to-end approaches. Moreover, the pipeline models generalize better to unseen inputs. Data and code are publicly available.

AB - Traditionally, most data-to-text applications have been designed using a modular pipeline architecture, in which non-linguistic input data is converted into natural language through several intermediate transformations. By contrast, recent neural models for data-to-text generation have been proposed as end-to-end approaches, where the non-linguistic input is rendered in natural language with much less explicit intermediate representations in between. This study introduces a systematic comparison between neural pipeline and end-to-end data-to-text approaches for the generation of text from RDF triples. Both architectures were implemented making use of the encoder-decoder Gated-Recurrent Units (GRU) and Transformer, two state-of-the art deep learning methods. Automatic and human evaluations together with a qualitative analysis suggest that having explicit intermediate steps in the generation process results in better texts than the ones generated by end-to-end approaches. Moreover, the pipeline models generalize better to unseen inputs. Data and code are publicly available.

M3 - Conference contribution

SP - 552

EP - 562

BT - Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

PB - Association for Computational Linguistics

CY - Hong Kong, China

ER -

Castro Ferreira T, van der Lee C, van Miltenburg E, Krahmer E. Neural data-to-text generation: A comparison between pipeline and end-to-end architectures. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Hong Kong, China: Association for Computational Linguistics. 2019. p. 552-562