Improving KantanMT training efficiency with fast align

Dimitar Shterionov, Jinhua Du, Marc Anthony Palminteri, Laura Casanellas, Tony O'Dowd, Andy Way

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

2 Citations (Scopus)

Abstract

In recent years, statistical machine translation (SMT) has been widely deployed in translators' workflow with significant improvement of productivity. However, prior to invoking an SMT system to translate an unknown text, an SMT engine needs to be built. As such, building speed of the engine is essential for the translation workflow, i.e., the sooner an engine is built, the sooner it will be exploited. With the increase of the computational capabilities of recent technology the building time for an SMT engine has decreased substantially. For example, cloud-based SMT providers, such as KantanMT, can built high-quality, ready-to-use, custom SMT engines in less than a couple of days. To speed-up furthermore this process we look into optimizing the word alignment process that takes place during building the SMT engine. Namely, we substitute the word alignment tool used by KantanMT pipeline - Giza++ - with a more efficient one, i.e., fast_align. In this work we present the design and the implementation of the KantanMT pipeline that uses fast_align in place of Giza++. We also conduct a comparison between the two word alignment tools with industry data and report on our findings. Up to our knowledge, such extensive empirical evaluation of the two tools has not been done before.

Original languageEnglish
Title of host publicationMT Users' Track
EditorsOlga Beregovaya, Jennifer Doyon, Lucie Langlois, Steve Richardson
PublisherAssociation for Machine Translation in the Americas
Pages222-231
Number of pages10
ISBN (Electronic)9780000000002
Publication statusPublished - 2016
Externally publishedYes
Event12th Conference of the Association for Machine Translation in the Americas, AMTA 2016 - Austin, United States
Duration: 28 Oct 20161 Nov 2016

Publication series

NameProceedings - AMTA 2016: 12th Conference of the Association for Machine Translation in the Americas
Volume2

Conference

Conference12th Conference of the Association for Machine Translation in the Americas, AMTA 2016
Country/TerritoryUnited States
CityAustin
Period28/10/161/11/16

Fingerprint

Dive into the research topics of 'Improving KantanMT training efficiency with fast align'. Together they form a unique fingerprint.

Cite this