Children probably store short rather than frequent or predictable chunks

quantitative evidence from a corpus study

Robert Grimm*, Giovanni Cassani, Steven Gillis, Walter Daelemans

*Corresponding author for this work

Research output: Contribution to journalArticleScientificpeer-review

Abstract

One of the tasks faced by young children is the segmentation of a continuous stream of speech into discrete linguistic units. Early in development, syllables emerge as perceptual primitives, and the wholesale storage of syllable chunks is one possible strategy for bootstrapping the segmentation process. Here, we investigate what types of chunks children store. Our method involves selecting syllabified utterances from corpora of child-directed speech, which we vary according to (a) their length in syllables, (b) the mutual predictability of their syllables, and (c) their frequency. We then use the number of utterances within which words are contained to predict the time course of word learning, arguing that utterances which perform well at this task are also more likely to be stored, by young children, as undersegmented chunks. Our results show that short utterances are best-suited for predicting when children acquire the words contained within them, although the effect is rather small. Beyond this, we also find that short utterances are the most likely to correspond to words. Together, the two findings suggest that children may not store many complete utterances as undersegmented chunks, with most of the units that children store as hypothesized words corresponding to actual words. However, dovetailing with an item-based account of language-acquisition, when children do store undersegmented chunks, these are likely to be short sequences-not frequent or internally predictable multi-word chunks. We end by discussing implications for work on formulaic multi-word sequences.
Original languageEnglish
Article number80
Pages (from-to)1-19
Number of pages19
JournalFrontiers in Psychology
Volume10
DOIs
Publication statusPublished - 30 Jan 2019
Externally publishedYes

Fingerprint

Linguistics
Language

Keywords

  • ACQUISITION
  • DIRECTED SPEECH
  • DURATIONAL CUES
  • INDIVIDUAL-DIFFERENCES
  • INFANTS
  • LANGUAGE-DEVELOPMENT
  • TRANSITION
  • UNITS
  • VOCABULARY
  • WORD
  • age of first production
  • chunks
  • formulaic language
  • multi-word units
  • segmentation
  • undersegmentation

Cite this

@article{b33bd98f07ab435db2636032bbf115cc,
title = "Children probably store short rather than frequent or predictable chunks: quantitative evidence from a corpus study",
abstract = "One of the tasks faced by young children is the segmentation of a continuous stream of speech into discrete linguistic units. Early in development, syllables emerge as perceptual primitives, and the wholesale storage of syllable chunks is one possible strategy for bootstrapping the segmentation process. Here, we investigate what types of chunks children store. Our method involves selecting syllabified utterances from corpora of child-directed speech, which we vary according to (a) their length in syllables, (b) the mutual predictability of their syllables, and (c) their frequency. We then use the number of utterances within which words are contained to predict the time course of word learning, arguing that utterances which perform well at this task are also more likely to be stored, by young children, as undersegmented chunks. Our results show that short utterances are best-suited for predicting when children acquire the words contained within them, although the effect is rather small. Beyond this, we also find that short utterances are the most likely to correspond to words. Together, the two findings suggest that children may not store many complete utterances as undersegmented chunks, with most of the units that children store as hypothesized words corresponding to actual words. However, dovetailing with an item-based account of language-acquisition, when children do store undersegmented chunks, these are likely to be short sequences-not frequent or internally predictable multi-word chunks. We end by discussing implications for work on formulaic multi-word sequences.",
keywords = "ACQUISITION, DIRECTED SPEECH, DURATIONAL CUES, INDIVIDUAL-DIFFERENCES, INFANTS, LANGUAGE-DEVELOPMENT, TRANSITION, UNITS, VOCABULARY, WORD, age of first production, chunks, formulaic language, multi-word units, segmentation, undersegmentation",
author = "Robert Grimm and Giovanni Cassani and Steven Gillis and Walter Daelemans",
year = "2019",
month = "1",
day = "30",
doi = "10.3389/FPSYG.2019.00080",
language = "English",
volume = "10",
pages = "1--19",
journal = "Frontiers in Psychology",
issn = "1664-1078",
publisher = "Frontiers Media S.A.",

}

Children probably store short rather than frequent or predictable chunks : quantitative evidence from a corpus study. / Grimm, Robert; Cassani, Giovanni; Gillis, Steven; Daelemans, Walter.

In: Frontiers in Psychology, Vol. 10, 80, 30.01.2019, p. 1-19.

Research output: Contribution to journalArticleScientificpeer-review

TY - JOUR

T1 - Children probably store short rather than frequent or predictable chunks

T2 - quantitative evidence from a corpus study

AU - Grimm, Robert

AU - Cassani, Giovanni

AU - Gillis, Steven

AU - Daelemans, Walter

PY - 2019/1/30

Y1 - 2019/1/30

N2 - One of the tasks faced by young children is the segmentation of a continuous stream of speech into discrete linguistic units. Early in development, syllables emerge as perceptual primitives, and the wholesale storage of syllable chunks is one possible strategy for bootstrapping the segmentation process. Here, we investigate what types of chunks children store. Our method involves selecting syllabified utterances from corpora of child-directed speech, which we vary according to (a) their length in syllables, (b) the mutual predictability of their syllables, and (c) their frequency. We then use the number of utterances within which words are contained to predict the time course of word learning, arguing that utterances which perform well at this task are also more likely to be stored, by young children, as undersegmented chunks. Our results show that short utterances are best-suited for predicting when children acquire the words contained within them, although the effect is rather small. Beyond this, we also find that short utterances are the most likely to correspond to words. Together, the two findings suggest that children may not store many complete utterances as undersegmented chunks, with most of the units that children store as hypothesized words corresponding to actual words. However, dovetailing with an item-based account of language-acquisition, when children do store undersegmented chunks, these are likely to be short sequences-not frequent or internally predictable multi-word chunks. We end by discussing implications for work on formulaic multi-word sequences.

AB - One of the tasks faced by young children is the segmentation of a continuous stream of speech into discrete linguistic units. Early in development, syllables emerge as perceptual primitives, and the wholesale storage of syllable chunks is one possible strategy for bootstrapping the segmentation process. Here, we investigate what types of chunks children store. Our method involves selecting syllabified utterances from corpora of child-directed speech, which we vary according to (a) their length in syllables, (b) the mutual predictability of their syllables, and (c) their frequency. We then use the number of utterances within which words are contained to predict the time course of word learning, arguing that utterances which perform well at this task are also more likely to be stored, by young children, as undersegmented chunks. Our results show that short utterances are best-suited for predicting when children acquire the words contained within them, although the effect is rather small. Beyond this, we also find that short utterances are the most likely to correspond to words. Together, the two findings suggest that children may not store many complete utterances as undersegmented chunks, with most of the units that children store as hypothesized words corresponding to actual words. However, dovetailing with an item-based account of language-acquisition, when children do store undersegmented chunks, these are likely to be short sequences-not frequent or internally predictable multi-word chunks. We end by discussing implications for work on formulaic multi-word sequences.

KW - ACQUISITION

KW - DIRECTED SPEECH

KW - DURATIONAL CUES

KW - INDIVIDUAL-DIFFERENCES

KW - INFANTS

KW - LANGUAGE-DEVELOPMENT

KW - TRANSITION

KW - UNITS

KW - VOCABULARY

KW - WORD

KW - age of first production

KW - chunks

KW - formulaic language

KW - multi-word units

KW - segmentation

KW - undersegmentation

U2 - 10.3389/FPSYG.2019.00080

DO - 10.3389/FPSYG.2019.00080

M3 - Article

VL - 10

SP - 1

EP - 19

JO - Frontiers in Psychology

JF - Frontiers in Psychology

SN - 1664-1078

M1 - 80

ER -