Combining SMT and NMT back-translated data for efficient NMT

Alberto Poncelas, Maja Popović, Dimitar Shterionov, Gideon Maillette De Buy Wenniger, Andy Way

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

1 Citation (Scopus)

Abstract

Neural Machine Translation (NMT) models achieve their best performance when large sets of parallel data are used for training. Consequently, techniques for augmenting the training set have become popular recently. One of these methods is back-translation (Sennrich et al., 2016a), which consists on generating synthetic sentences by translating a set of monolingual, target-language sentences using a Machine Translation (MT) model. Generally, NMT models are used for back-translation. In this work, we analyze the performance of models when the training data is extended with synthetic data using different MT approaches. In particular we investigate back-translated data generated not only by NMT but also by Statistical Machine Translation (SMT) models and combinations of both. The results reveal that the models achieve the best performances when the training set is augmented with back-translated data created by merging different MT approaches.

Original languageEnglish
Title of host publicationInternational Conference on Recent Advances in Natural Language Processing in a Deep Learning World, RANLP 2019 - Proceedings
EditorsGalia Angelova, Ruslan Mitkov, Ivelina Nikolova, Irina Temnikova, Irina Temnikova
PublisherIncoma Ltd., Shoumen, Bulgaria
Pages922-931
Number of pages10
ISBN (Electronic)9789544520557
Publication statusPublished - 2019
Externally publishedYes
Event12th International Conference on Recent Advances in Natural Language Processing, RANLP 2019 - Varna, Bulgaria
Duration: 2 Sep 20194 Sep 2019

Publication series

NameInternational Conference Recent Advances in Natural Language Processing, RANLP
Volume2019-September
ISSN (Print)1313-8502

Conference

Conference12th International Conference on Recent Advances in Natural Language Processing, RANLP 2019
CountryBulgaria
CityVarna
Period2/09/194/09/19

Fingerprint Dive into the research topics of 'Combining SMT and NMT back-translated data for efficient NMT'. Together they form a unique fingerprint.

  • Cite this

    Poncelas, A., Popović, M., Shterionov, D., De Buy Wenniger, G. M., & Way, A. (2019). Combining SMT and NMT back-translated data for efficient NMT. In G. Angelova, R. Mitkov, I. Nikolova, I. Temnikova, & I. Temnikova (Eds.), International Conference on Recent Advances in Natural Language Processing in a Deep Learning World, RANLP 2019 - Proceedings (pp. 922-931). (International Conference Recent Advances in Natural Language Processing, RANLP; Vol. 2019-September). Incoma Ltd., Shoumen, Bulgaria.