Predicting end of utterance in multimodal and unimodal conditions

P. Barkhuysen, E.J. Krahmer, M.G.J. Swerts

    Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

    Abstract

    In this paper, we describe a series of perception studies on uni-and multimodal cues to end of utterance. Stimuli were fragments taken from a recorded interview session, consisting of the parts in which speakers provided answers. The answers varied in length and were presented without the preceding question of the interviewer. The subjects had to predict when the speaker would finish his turn, based on video material and/or auditory material. The experiment consisted of 3 conditions: in one condition, the stimuli were presented as they were recorded (both audio and vision), in the two remaining conditions stimuli were presented in only the auditory or the visual channel.Results show that the audiovisual condition evoked the fastest reaction times and the visual condition the slowest. Arguably, the combination of cues from different modalities function as complementary sources and might thus improve prediction.determine the relative weight of the different modalities used for end of utterance marking. We compare a multimodal condition in which subjects have both auditory and visual cues at their disposal (stimuli presented as they were produced) with two unimodal conditions where subjects could only use auditory or visual cues.
    Original languageEnglish
    Title of host publicationProceedings of the 9th European Conference on Speech Communication and Technology Proceedings (Interspeech 2005)
    Place of PublicationLisbon, Portugal
    PublisherISCA
    Publication statusPublished - 2005

    Fingerprint Dive into the research topics of 'Predicting end of utterance in multimodal and unimodal conditions'. Together they form a unique fingerprint.

    Cite this