A perceptual study of how rapidly and accurately audiovisual cues to utterance-final boundaries can be interpreted in Chinese and English

Ran Bi, Marc Swerts

Research output: Contribution to journalArticleScientificpeer-review

Abstract

Speakers and their addressees make use of both auditory and visual features as cues to the end of a speaking turn. Prior work, mostly based on analyses of languages like Dutch and English, has shown that intonational markers such as melodic boundary tones as well as variation in eye gaze behaviour are often exploited to pre-signal the terminal edge of an utterance. However, we still lack knowledge on how such auditory and visual cues relate to each other, and whether the results for Dutch and English also generalize to other languages. This article compares possible audiovisual cues to prosodic boundaries in two typologically different languages, i.e., English and Chinese. A specific paradigm was used to elicit natural stimuli from 16 speakers, evenly distributed over both languages, which were then presented to L1 and L2 observers. They were asked to judge whether a spoken fragment had occurred in utterance-final position or not, measuring both the participants’ reaction time and accuracy. Participants were exposed to stimuli in three different formats: audio-only, vision-only or audiovisual. Our most important results are that (1) visual cues were important for boundary perception in both languages; (2) judges from either language group identified boundaries faster and more accurately in English than in Chinese; (3) there is no in-group advantage as observers were equally good in judging finality in their L1 and L2; (4) there are consistent correlations between the measures of reaction time and accuracy (shorter responses correlate with higher accuracy).
Original languageEnglish
Pages (from-to)68-77
JournalSpeech Communication
Volume95
DOIs
Publication statusPublished - Dec 2017

Fingerprint

language
stimulus
Reaction Time
finality
language group
Observer
English language
speaking
paradigm
Correlate
Language
Utterance
lack
High Accuracy
Fragment
Paradigm
Generalise
Vision
Group
time

Keywords

  • Chinese and English
  • L1 and L2 observers
  • Perceptual study
  • Utterance-final boundaries
  • Audiovisual cues

Cite this

@article{30bb81e112d544d6b246bc9d98bcad9b,
title = "A perceptual study of how rapidly and accurately audiovisual cues to utterance-final boundaries can be interpreted in Chinese and English",
abstract = "Speakers and their addressees make use of both auditory and visual features as cues to the end of a speaking turn. Prior work, mostly based on analyses of languages like Dutch and English, has shown that intonational markers such as melodic boundary tones as well as variation in eye gaze behaviour are often exploited to pre-signal the terminal edge of an utterance. However, we still lack knowledge on how such auditory and visual cues relate to each other, and whether the results for Dutch and English also generalize to other languages. This article compares possible audiovisual cues to prosodic boundaries in two typologically different languages, i.e., English and Chinese. A specific paradigm was used to elicit natural stimuli from 16 speakers, evenly distributed over both languages, which were then presented to L1 and L2 observers. They were asked to judge whether a spoken fragment had occurred in utterance-final position or not, measuring both the participants’ reaction time and accuracy. Participants were exposed to stimuli in three different formats: audio-only, vision-only or audiovisual. Our most important results are that (1) visual cues were important for boundary perception in both languages; (2) judges from either language group identified boundaries faster and more accurately in English than in Chinese; (3) there is no in-group advantage as observers were equally good in judging finality in their L1 and L2; (4) there are consistent correlations between the measures of reaction time and accuracy (shorter responses correlate with higher accuracy).",
keywords = "Chinese and English, L1 and L2 observers, Perceptual study, Utterance-final boundaries, Audiovisual cues",
author = "Ran Bi and Marc Swerts",
year = "2017",
month = "12",
doi = "10.1016/j.specom.2017.07.002",
language = "English",
volume = "95",
pages = "68--77",
journal = "Speech Communication",
issn = "0167-6393",
publisher = "Elsevier Science BV",

}

A perceptual study of how rapidly and accurately audiovisual cues to utterance-final boundaries can be interpreted in Chinese and English. / Bi, Ran; Swerts, Marc.

In: Speech Communication, Vol. 95, 12.2017, p. 68-77.

Research output: Contribution to journalArticleScientificpeer-review

TY - JOUR

T1 - A perceptual study of how rapidly and accurately audiovisual cues to utterance-final boundaries can be interpreted in Chinese and English

AU - Bi, Ran

AU - Swerts, Marc

PY - 2017/12

Y1 - 2017/12

N2 - Speakers and their addressees make use of both auditory and visual features as cues to the end of a speaking turn. Prior work, mostly based on analyses of languages like Dutch and English, has shown that intonational markers such as melodic boundary tones as well as variation in eye gaze behaviour are often exploited to pre-signal the terminal edge of an utterance. However, we still lack knowledge on how such auditory and visual cues relate to each other, and whether the results for Dutch and English also generalize to other languages. This article compares possible audiovisual cues to prosodic boundaries in two typologically different languages, i.e., English and Chinese. A specific paradigm was used to elicit natural stimuli from 16 speakers, evenly distributed over both languages, which were then presented to L1 and L2 observers. They were asked to judge whether a spoken fragment had occurred in utterance-final position or not, measuring both the participants’ reaction time and accuracy. Participants were exposed to stimuli in three different formats: audio-only, vision-only or audiovisual. Our most important results are that (1) visual cues were important for boundary perception in both languages; (2) judges from either language group identified boundaries faster and more accurately in English than in Chinese; (3) there is no in-group advantage as observers were equally good in judging finality in their L1 and L2; (4) there are consistent correlations between the measures of reaction time and accuracy (shorter responses correlate with higher accuracy).

AB - Speakers and their addressees make use of both auditory and visual features as cues to the end of a speaking turn. Prior work, mostly based on analyses of languages like Dutch and English, has shown that intonational markers such as melodic boundary tones as well as variation in eye gaze behaviour are often exploited to pre-signal the terminal edge of an utterance. However, we still lack knowledge on how such auditory and visual cues relate to each other, and whether the results for Dutch and English also generalize to other languages. This article compares possible audiovisual cues to prosodic boundaries in two typologically different languages, i.e., English and Chinese. A specific paradigm was used to elicit natural stimuli from 16 speakers, evenly distributed over both languages, which were then presented to L1 and L2 observers. They were asked to judge whether a spoken fragment had occurred in utterance-final position or not, measuring both the participants’ reaction time and accuracy. Participants were exposed to stimuli in three different formats: audio-only, vision-only or audiovisual. Our most important results are that (1) visual cues were important for boundary perception in both languages; (2) judges from either language group identified boundaries faster and more accurately in English than in Chinese; (3) there is no in-group advantage as observers were equally good in judging finality in their L1 and L2; (4) there are consistent correlations between the measures of reaction time and accuracy (shorter responses correlate with higher accuracy).

KW - Chinese and English

KW - L1 and L2 observers

KW - Perceptual study

KW - Utterance-final boundaries

KW - Audiovisual cues

U2 - 10.1016/j.specom.2017.07.002

DO - 10.1016/j.specom.2017.07.002

M3 - Article

VL - 95

SP - 68

EP - 77

JO - Speech Communication

JF - Speech Communication

SN - 0167-6393

ER -