Predicting end of utterance in multimodal and unimodal conditions

P. Barkhuysen, E.J. Krahmer, M.G.J. Swerts

    Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

    45 Downloads (Pure)

    Abstract

    In this paper, we describe a series of perception studies on uni-and multimodal cues to end of utterance. Stimuli were fragments taken from a recorded interview session, consisting of the parts in which speakers provided answers. The answers varied in length and were presented without the preceding question of the interviewer. The subjects had to predict when the speaker would finish his turn, based on video material and/or auditory material. The experiment consisted of 3 conditions: in one condition, the stimuli were presented as they were recorded (both audio and vision), in the two remaining conditions stimuli were presented in only the auditory or the visual channel.Results show that the audiovisual condition evoked the fastest reaction times and the visual condition the slowest. Arguably, the combination of cues from different modalities function as complementary sources and might thus improve prediction.determine the relative weight of the different modalities used for end of utterance marking. We compare a multimodal condition in which subjects have both auditory and visual cues at their disposal (stimuli presented as they were produced) with two unimodal conditions where subjects could only use auditory or visual cues.
    Original languageEnglish
    Title of host publicationProceedings of the 9th European Conference on Speech Communication and Technology Proceedings (Interspeech 2005)
    Place of PublicationLisbon, Portugal
    PublisherISCA
    Publication statusPublished - 2005

    Fingerprint

    Experiments

    Cite this

    Barkhuysen, P., Krahmer, E. J., & Swerts, M. G. J. (2005). Predicting end of utterance in multimodal and unimodal conditions. In Proceedings of the 9th European Conference on Speech Communication and Technology Proceedings (Interspeech 2005) Lisbon, Portugal: ISCA.
    Barkhuysen, P. ; Krahmer, E.J. ; Swerts, M.G.J. / Predicting end of utterance in multimodal and unimodal conditions. Proceedings of the 9th European Conference on Speech Communication and Technology Proceedings (Interspeech 2005). Lisbon, Portugal : ISCA, 2005.
    @inproceedings{905e2d981aa54395b952830c82fd2b2b,
    title = "Predicting end of utterance in multimodal and unimodal conditions",
    abstract = "In this paper, we describe a series of perception studies on uni-and multimodal cues to end of utterance. Stimuli were fragments taken from a recorded interview session, consisting of the parts in which speakers provided answers. The answers varied in length and were presented without the preceding question of the interviewer. The subjects had to predict when the speaker would finish his turn, based on video material and/or auditory material. The experiment consisted of 3 conditions: in one condition, the stimuli were presented as they were recorded (both audio and vision), in the two remaining conditions stimuli were presented in only the auditory or the visual channel.Results show that the audiovisual condition evoked the fastest reaction times and the visual condition the slowest. Arguably, the combination of cues from different modalities function as complementary sources and might thus improve prediction.determine the relative weight of the different modalities used for end of utterance marking. We compare a multimodal condition in which subjects have both auditory and visual cues at their disposal (stimuli presented as they were produced) with two unimodal conditions where subjects could only use auditory or visual cues.",
    author = "P. Barkhuysen and E.J. Krahmer and M.G.J. Swerts",
    year = "2005",
    language = "English",
    booktitle = "Proceedings of the 9th European Conference on Speech Communication and Technology Proceedings (Interspeech 2005)",
    publisher = "ISCA",

    }

    Barkhuysen, P, Krahmer, EJ & Swerts, MGJ 2005, Predicting end of utterance in multimodal and unimodal conditions. in Proceedings of the 9th European Conference on Speech Communication and Technology Proceedings (Interspeech 2005). ISCA, Lisbon, Portugal.

    Predicting end of utterance in multimodal and unimodal conditions. / Barkhuysen, P.; Krahmer, E.J.; Swerts, M.G.J.

    Proceedings of the 9th European Conference on Speech Communication and Technology Proceedings (Interspeech 2005). Lisbon, Portugal : ISCA, 2005.

    Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

    TY - GEN

    T1 - Predicting end of utterance in multimodal and unimodal conditions

    AU - Barkhuysen, P.

    AU - Krahmer, E.J.

    AU - Swerts, M.G.J.

    PY - 2005

    Y1 - 2005

    N2 - In this paper, we describe a series of perception studies on uni-and multimodal cues to end of utterance. Stimuli were fragments taken from a recorded interview session, consisting of the parts in which speakers provided answers. The answers varied in length and were presented without the preceding question of the interviewer. The subjects had to predict when the speaker would finish his turn, based on video material and/or auditory material. The experiment consisted of 3 conditions: in one condition, the stimuli were presented as they were recorded (both audio and vision), in the two remaining conditions stimuli were presented in only the auditory or the visual channel.Results show that the audiovisual condition evoked the fastest reaction times and the visual condition the slowest. Arguably, the combination of cues from different modalities function as complementary sources and might thus improve prediction.determine the relative weight of the different modalities used for end of utterance marking. We compare a multimodal condition in which subjects have both auditory and visual cues at their disposal (stimuli presented as they were produced) with two unimodal conditions where subjects could only use auditory or visual cues.

    AB - In this paper, we describe a series of perception studies on uni-and multimodal cues to end of utterance. Stimuli were fragments taken from a recorded interview session, consisting of the parts in which speakers provided answers. The answers varied in length and were presented without the preceding question of the interviewer. The subjects had to predict when the speaker would finish his turn, based on video material and/or auditory material. The experiment consisted of 3 conditions: in one condition, the stimuli were presented as they were recorded (both audio and vision), in the two remaining conditions stimuli were presented in only the auditory or the visual channel.Results show that the audiovisual condition evoked the fastest reaction times and the visual condition the slowest. Arguably, the combination of cues from different modalities function as complementary sources and might thus improve prediction.determine the relative weight of the different modalities used for end of utterance marking. We compare a multimodal condition in which subjects have both auditory and visual cues at their disposal (stimuli presented as they were produced) with two unimodal conditions where subjects could only use auditory or visual cues.

    M3 - Conference contribution

    BT - Proceedings of the 9th European Conference on Speech Communication and Technology Proceedings (Interspeech 2005)

    PB - ISCA

    CY - Lisbon, Portugal

    ER -

    Barkhuysen P, Krahmer EJ, Swerts MGJ. Predicting end of utterance in multimodal and unimodal conditions. In Proceedings of the 9th European Conference on Speech Communication and Technology Proceedings (Interspeech 2005). Lisbon, Portugal: ISCA. 2005