Abstract
Sign Languages are the primary means of communication more than half a million people in Europe alone. However, the development of sign language recognition and translation tools is slowed down by a series of obstacles concerning resource scarcity and, when data is available, in standardisation issues in the available data.
The former challenge relates to the volume and quality of data available for machine learning as well as the time required to collect and process new data. The latter obstacle is linked to the variety of the data, i.e., annotation formats are not unified and vary amongst different resources. The available data formats are often not suitable for machine learning, obstructing the provision of automatic tools based on neural models.
This chapter provides an overview of such challenges by comparing various sign language corpora and sign language machine learning datasets. Furthermore, it proposes a framework to address the lack of standardisation at format level, unify the available resources and facilitate sign language research for different languages. The framework takes ELAN files as inputs and returns textual and visual data ready to train sign language recognition and translation models. We present a proof of concept, training neural translation models on the data produced by the proposed framework.
The former challenge relates to the volume and quality of data available for machine learning as well as the time required to collect and process new data. The latter obstacle is linked to the variety of the data, i.e., annotation formats are not unified and vary amongst different resources. The available data formats are often not suitable for machine learning, obstructing the provision of automatic tools based on neural models.
This chapter provides an overview of such challenges by comparing various sign language corpora and sign language machine learning datasets. Furthermore, it proposes a framework to address the lack of standardisation at format level, unify the available resources and facilitate sign language research for different languages. The framework takes ELAN files as inputs and returns textual and visual data ready to train sign language recognition and translation models. We present a proof of concept, training neural translation models on the data produced by the proposed framework.
| Original language | English |
|---|---|
| Title of host publication | Sign Language Machine Translation |
| Editors | Andy Way, Lorraine Leeson, Dimitar Shterionov |
| Publisher | Springer Cham |
| Chapter | 5 |
| Pages | 117-139 |
| Number of pages | 23 |
| ISBN (Electronic) | 978-3-031-47362-3 |
| ISBN (Print) | 978-3-031-47361-6 |
| DOIs | |
| Publication status | Published - 12 Nov 2024 |
Publication series
| Name | Machine Translation: Technologies and Applications |
|---|---|
| Volume | 5 |
| ISSN (Print) | 2522-8021 |
| ISSN (Electronic) | 2522-803X |
Keywords
- sign languages