Machine translationese: Effects of algorithmic bias on linguistic complexity in machine translation

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

52 Citations (Scopus)

Abstract

Recent studies in the field of Machine Translation (MT) and Natural Language Processing (NLP) have shown that existing models amplify biases observed in the training data. The amplification of biases in language technology has mainly been examined with respect to specific phenomena, such as gender bias. In this work, we go beyond the study of gender in MT and investigate how bias amplification might affect language in a broader sense. We hypothesize that the 'algorithmic bias', i.e. an exacerbation of frequently observed patterns in combination with a loss of less frequent ones, not only exacerbates societal biases present in current datasets but could also lead to an artificially impoverished language: 'machine translationese'. We assess the linguistic richness (on a lexical and morphological level) of translations created by different data-driven MT paradigms - phrase-based statistical (PB-SMT) and neural MT (NMT). Our experiments show that there is a loss of lexical and morphological richness in the translations produced by all investigated MT paradigms for two language pairs (EN↔FR and EN↔ES).

Original languageEnglish
Title of host publicationEACL 2021 - 16th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings of the Conference
PublisherAssociation for Computational Linguistics (ACL)
Pages2203-2213
Number of pages11
ISBN (Electronic)9781954085022
DOIs
Publication statusPublished - May 2021
Event16th Conference of the European Chapter of the Associationfor Computational Linguistics, EACL 2021 - Virtual, Online
Duration: 19 Apr 202123 Apr 2021

Publication series

NameEACL 2021 - 16th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings of the Conference

Conference

Conference16th Conference of the European Chapter of the Associationfor Computational Linguistics, EACL 2021
CityVirtual, Online
Period19/04/2123/04/21

Fingerprint

Dive into the research topics of 'Machine translationese: Effects of algorithmic bias on linguistic complexity in machine translation'. Together they form a unique fingerprint.

Cite this