The Processing of Stress in End-to-End Automatic Speech Recognition Models

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

18 Downloads (Pure)

Abstract


Listeners use stress to facilitate word recognition and speech segmentation. Classical ASR systems did not incorporate stress in their recognition process. In contrast, end-to-end
ASR systems may use the information carried by stress. The present study shows that Wav2vec 2.0 is indeed sensitive to
stress, and that this sensitivity is not a mere reflection of acoustic correlates of stress. Diagnostic classifiers of the CNN output reveal vowel-specific stress representations, that perform on par
with acoustic features. Stress classifiers trained on transformer layers outperform classifiers based on acoustic correlates, but degrade when context is removed, showing that higher layers
take the relative nature of stress into account. Results obtained by testing a stress classifier on a vowel it is not trained on, show that stress processing is to some extent abstract, i.e., the classifier does not simply detect a set of stressed vowel representations but rather, their common denominator
Original languageEnglish
Title of host publicationInterspeech 2024
Pages2350-2354
Number of pages5
DOIs
Publication statusPublished - 1 Sept 2024
EventInterspeech 2024 - Kos, Greece
Duration: 1 Sept 20245 Sept 2024

Conference

ConferenceInterspeech 2024
Country/TerritoryGreece
CityKos
Period1/09/245/09/24

Keywords

  • word stress
  • explainable AI
  • ASR

Fingerprint

Dive into the research topics of 'The Processing of Stress in End-to-End Automatic Speech Recognition Models'. Together they form a unique fingerprint.

Cite this