Tailoring Domain Adaptation for Machine Translation Quality Estimation

    Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

    75 Downloads (Pure)

    Abstract

    While quality estimation (QE) can play an important role in the translation process, its effectiveness relies on the availability and quality of training data. For QE in particular, high-quality labeled data is often lacking due to the high cost and effort associated with labeling such data. Aside from the data scarcity challenge, QE models should also be generalizable, i.e., they should be able to handle data from different domains, both generic and specific. To alleviate these two main issues -- data scarcity and domain mismatch -- this paper combines domain adaptation and data augmentation within a robust QE system. Our method first trains a generic QE model and then fine-tunes it on a specific domain while retaining generic knowledge. Our results show a significant improvement for all the language pairs investigated, better cross-lingual inference, and a superior performance in zero-shot learning scenarios as compared to state-of-the-art baselines.
    Original languageEnglish
    Title of host publicationProceedings of the 24th Annual Conference of the European Association for Machine Translation
    Number of pages13
    Publication statusAccepted/In press - 18 Apr 2023
    EventThe 24th Annual Conference of the European Association for Machine Translation - Tampere, Finland
    Duration: 12 Jun 202315 Jun 2023

    Conference

    ConferenceThe 24th Annual Conference of the European Association for Machine Translation
    Abbreviated titleEAMT 2023
    Country/TerritoryFinland
    CityTampere
    Period12/06/2315/06/23

    Keywords

    • cs.CL
    • Quality estimation
    • Translation
    • Data scarcity

    Fingerprint

    Dive into the research topics of 'Tailoring Domain Adaptation for Machine Translation Quality Estimation'. Together they form a unique fingerprint.

    Cite this