Abstract
With increased interest in the use of virtual avatars for educational purposes, there is a growing need for high-quality text-to-speech solutions. However, the effects of using synthesized speech in educational applications for younger listeners are still unclear as past findings have been inconsistent and most of them have been obtained in a lab setting with adult assessors. Next to that, it is unclear how much training material is needed for high quality speech synthesis. Particularly for low resource languages, the assumption that good quality synthesized speech requires substantial amounts of vocal recordings to train may be hindering the development of TTS-based solutions. In this study, we created four Dutch text-to-speech (TTS) models from different amounts of training material and evaluated the models in terms of voice perception and recall with K12 students in a classroom environment. Results showed that while the original human voice outperformed the synthesized voices in terms of the listening experience and knowledge test score, more hours of training material did not necessarily result in better outcomes suggesting that 10-15 hours of speech material might be sufficient for training a Dutch TTS. A weak positive correlation was found between listening experience and knowledge test performance, with the low listening effort being the most important factor. This outcome suggests that comprehensibility is likely the most important TTS feature for educational applications.
Original language | English |
---|---|
Title of host publication | ICEEL '22: Proceedings of the 2022 6th International Conference on Education and E-Learning |
Publisher | Association for Computing Machinery |
Pages | 182-188 |
Number of pages | 7 |
ISBN (Print) | 978-1-4503-9842-8 |
DOIs | |
Publication status | Published - 21 Nov 2022 |
Event | ICEEL 2022: 2022 6th International Conference on Education and E-Learning - Yamanashi , Japan Duration: 21 Nov 2022 → 23 Nov 2022 |
Conference
Conference | ICEEL 2022: 2022 6th International Conference on Education and E-Learning |
---|---|
Country/Territory | Japan |
City | Yamanashi |
Period | 21/11/22 → 23/11/22 |
Keywords
- Text-to-speech
- K12education