Zero-shot translation for indian languages with sparse data

Giulia Mattoni, Pat Nagle, Carlos Collantes, Dimitar Shterionov

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review


Neural Machine Translation (NMT) is a recently-emerged paradigm for Machine Translation (MT) that has shown promising results as well as a great potential to solve challenging MT tasks. One such a task is how to provide good MT for languages with sparse training data. In this paper we investigate a Zero Shot Translation (ZST) approach for such language combinations. ZST is a multilingual translation mechanism which uses a single NMT engine to translate between multiple languages, even such languages for which no direct parallel data was provided during training. After assessing ZST feasibility, by training a proof-of-concept engine ZST on French↔English and Italian↔English data, we focus on languages with sparse training data. In particular, we address the Tamil↔Hindi language pair. Our analysis shows the potential and effectiveness of ZST in such scenarios. To train and translate with ZST engines, we extend the training and translation pipelines of a commercial MT provider-KantanMT-with ZST capabilities, making this technology available to all users of the platform.
Original languageEnglish
Title of host publicationProceedings of the MT Summit
Publication statusPublished - 18 Sept 2017
Externally publishedYes


Dive into the research topics of 'Zero-shot translation for indian languages with sparse data'. Together they form a unique fingerprint.

Cite this