Projects per year
In this paper, we describe a series of perception studies on uni-and multimodal cues to end of utterance. Stimuli were fragments taken from a recorded interview session, consisting of the parts in which speakers provided answers. The answers varied in length and were presented without the preceding question of the interviewer. The subjects had to predict when the speaker would finish his turn, based on video material and/or auditory material. The experiment consisted of 3 conditions: in one condition, the stimuli were presented as they were recorded (both audio and vision), in the two remaining conditions stimuli were presented in only the auditory or the visual channel.Results show that the audiovisual condition evoked the fastest reaction times and the visual condition the slowest. Arguably, the combination of cues from different modalities function as complementary sources and might thus improve prediction.determine the relative weight of the different modalities used for end of utterance marking. We compare a multimodal condition in which subjects have both auditory and visual cues at their disposal (stimuli presented as they were produced) with two unimodal conditions where subjects could only use auditory or visual cues.
|Title of host publication||Proceedings of the 9th European Conference on Speech Communication and Technology Proceedings (Interspeech 2005)|
|Place of Publication||Lisbon, Portugal|
|Publication status||Published - 2005|