Challenges with Sign Language Datasets

    Research output: Chapter in Book/Report/Conference proceedingChapterScientificpeer-review

    67 Downloads (Pure)

    Abstract

    Sign Languages are the primary means of communication more than half a million people in Europe alone. However, the development of sign language recognition and translation tools is slowed down by a series of obstacles concerning resource scarcity and, when data is available, in standardisation issues in the available data.

    The former challenge relates to the volume and quality of data available for machine learning as well as the time required to collect and process new data. The latter obstacle is linked to the variety of the data, i.e., annotation formats are not unified and vary amongst different resources. The available data formats are often not suitable for machine learning, obstructing the provision of automatic tools based on neural models.

    This chapter provides an overview of such challenges by comparing various sign language corpora and sign language machine learning datasets. Furthermore, it proposes a framework to address the lack of standardisation at format level, unify the available resources and facilitate sign language research for different languages. The framework takes ELAN files as inputs and returns textual and visual data ready to train sign language recognition and translation models. We present a proof of concept, training neural translation models on the data produced by the proposed framework.
    Original languageEnglish
    Title of host publicationSign Language Machine Translation
    EditorsAndy Way, Lorraine Leeson, Dimitar Shterionov
    PublisherSpringer Cham
    Chapter5
    Pages117-139
    Number of pages23
    ISBN (Electronic)978-3-031-47362-3
    ISBN (Print)978-3-031-47361-6
    DOIs
    Publication statusPublished - 12 Nov 2024

    Publication series

    NameMachine Translation: Technologies and Applications
    Volume5
    ISSN (Print)2522-8021
    ISSN (Electronic)2522-803X

    Keywords

    • sign languages

    Fingerprint

    Dive into the research topics of 'Challenges with Sign Language Datasets'. Together they form a unique fingerprint.

    Cite this