Skip to main navigation Skip to search Skip to main content

Genetic Classification of Accented Speech from Audio Recordings of Spoken Nonsense Words

    Research output: Contribution to conferenceAbstractScientificpeer-review

    Abstract

    This thesis investigates the extent to which accented speech can be classified, based on their underlying genetic language class, from spoken nonsense words. Different models were compared; k-Nearest Neighbors (KNNs), Convolutional Neural Networks (CNNs), and Long-Short Term Memory Recursive Neural Networks (LSTM-RNNs). In addition, two different audio feature representation formats (Mel-Frequency Cepstral Coefficients (MFCCs) and the first two vowel formants), two taxonomic levels (family and genus) are explored. For the vowel formants representation (F1/F2), two ways of combining models are explored; early fusion combines all training data into one model and late fusion, which aggregates the prediction results per word. While it is inconclusive if language families can be used as a classification basis, it was found that the CNN using MFCCs was most effective in genera-based accent classification, achieving an accuracy of 60.49%, outperforming the baseline dummy classifier. These findings can thus serve as a baseline for language accent group classification in future works
    Original languageEnglish
    Number of pages3
    Publication statusPublished - 9 Nov 2023
    Event35rd Benelux Conference on Artificial Intelligence and the 32th Belgian Dutch Conference on Machine Learning
    - TU Delft, Delft , Netherlands
    Duration: 8 Nov 202310 Nov 2023
    https://bnaic2023.tudelft.nl/

    Conference

    Conference35rd Benelux Conference on Artificial Intelligence and the 32th Belgian Dutch Conference on Machine Learning
    Abbreviated titleBNAIC/BeNeLearn 2023
    Country/TerritoryNetherlands
    CityDelft
    Period8/11/2310/11/23
    Internet address

    UN SDGs

    This output contributes to the following UN Sustainable Development Goals (SDGs)

    1. SDG 10 - Reduced Inequalities
      SDG 10 Reduced Inequalities

    Keywords

    • accent classification
    • audio/sound classification
    • mel-frequency cepstral coefficients
    • vowel formants

    Fingerprint

    Dive into the research topics of 'Genetic Classification of Accented Speech from Audio Recordings of Spoken Nonsense Words'. Together they form a unique fingerprint.

    Cite this