TY - GEN
T1 - Combining SMT and NMT back-translated data for efficient NMT
AU - Poncelas, Alberto
AU - Popović, Maja
AU - Shterionov, Dimitar
AU - De Buy Wenniger, Gideon Maillette
AU - Way, Andy
PY - 2019
Y1 - 2019
N2 - Neural Machine Translation (NMT) models achieve their best performance when large sets of parallel data are used for training. Consequently, techniques for augmenting the training set have become popular recently. One of these methods is back-translation (Sennrich et al., 2016a), which consists on generating synthetic sentences by translating a set of monolingual, target-language sentences using a Machine Translation (MT) model. Generally, NMT models are used for back-translation. In this work, we analyze the performance of models when the training data is extended with synthetic data using different MT approaches. In particular we investigate back-translated data generated not only by NMT but also by Statistical Machine Translation (SMT) models and combinations of both. The results reveal that the models achieve the best performances when the training set is augmented with back-translated data created by merging different MT approaches.
AB - Neural Machine Translation (NMT) models achieve their best performance when large sets of parallel data are used for training. Consequently, techniques for augmenting the training set have become popular recently. One of these methods is back-translation (Sennrich et al., 2016a), which consists on generating synthetic sentences by translating a set of monolingual, target-language sentences using a Machine Translation (MT) model. Generally, NMT models are used for back-translation. In this work, we analyze the performance of models when the training data is extended with synthetic data using different MT approaches. In particular we investigate back-translated data generated not only by NMT but also by Statistical Machine Translation (SMT) models and combinations of both. The results reveal that the models achieve the best performances when the training set is augmented with back-translated data created by merging different MT approaches.
UR - http://www.scopus.com/inward/record.url?scp=85076461454&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85076461454
T3 - International Conference Recent Advances in Natural Language Processing, RANLP
SP - 922
EP - 931
BT - International Conference on Recent Advances in Natural Language Processing in a Deep Learning World, RANLP 2019 - Proceedings
A2 - Angelova, Galia
A2 - Mitkov, Ruslan
A2 - Nikolova, Ivelina
A2 - Temnikova, Irina
A2 - Temnikova, Irina
PB - Incoma Ltd., Shoumen, Bulgaria
T2 - 12th International Conference on Recent Advances in Natural Language Processing, RANLP 2019
Y2 - 2 September 2019 through 4 September 2019
ER -