Abstract
This thesis investigates the extent to which accented speech can be classified, based on their underlying genetic language class, from spoken nonsense words. Different models were compared; k-Nearest Neighbors (KNNs), Convolutional Neural Networks (CNNs), and Long-Short Term Memory Recursive Neural Networks (LSTM-RNNs). In addition, two different audio feature representation formats (Mel-Frequency Cepstral Coefficients (MFCCs) and the first two vowel formants), two taxonomic levels (family and genus) are explored. For the vowel formants representation (F1/F2), two ways of combining models are explored; early fusion combines all training data into one model and late fusion, which aggregates the prediction results per word. While it is inconclusive if language families can be used as a classification basis, it was found that the CNN using MFCCs was most effective in genera-based accent classification, achieving an accuracy of 60.49%, outperforming the baseline dummy classifier. These findings can thus serve as a baseline for language accent group classification in future works
| Original language | English |
|---|---|
| Number of pages | 3 |
| Publication status | Published - 9 Nov 2023 |
| Event | 35rd Benelux Conference on Artificial Intelligence and the 32th Belgian Dutch Conference on Machine Learning - TU Delft, Delft , Netherlands Duration: 8 Nov 2023 → 10 Nov 2023 https://bnaic2023.tudelft.nl/ |
Conference
| Conference | 35rd Benelux Conference on Artificial Intelligence and the 32th Belgian Dutch Conference on Machine Learning |
|---|---|
| Abbreviated title | BNAIC/BeNeLearn 2023 |
| Country/Territory | Netherlands |
| City | Delft |
| Period | 8/11/23 → 10/11/23 |
| Internet address |
UN SDGs
This output contributes to the following UN Sustainable Development Goals (SDGs)
-
SDG 10 Reduced Inequalities
Keywords
- accent classification
- audio/sound classification
- mel-frequency cepstral coefficients
- vowel formants
Fingerprint
Dive into the research topics of 'Genetic Classification of Accented Speech from Audio Recordings of Spoken Nonsense Words'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver