Sidestepping the Combinatorial Explosion: An Explanation of n-gram Frequency Effects Based on Naive Discriminative Learning

R. Harald Baayen, Peter Hendrix, Michael Ramscar

Research output: Contribution to journalArticleScientificpeer-review

43 Citations (Scopus)

Abstract

Arnon and Snider ((2010). More than words: Frequency effects for multi-word phrases. Journal of Memory and Language, 62, 67-82) documented frequency effects for compositional four-grams independently of the frequencies of lower-order n-grams. They argue that comprehenders apparently store frequency information about multi-word units. We show that n-gram frequency effects can emerge in a parameter-free computational model driven by naive discriminative learning, trained on a sample of 300,000 four-word phrases from the British National Corpus. The discriminative learning model is a full decomposition model, associating orthographic input features straightforwardly with meanings. The model does not make use of separate representations for derived or inflected words, nor for compounds, nor for phrases. Nevertheless, frequency effects are correctly predicted for all these linguistic units. Naive discriminative learning provides the simplest and most economical explanation for frequency effects in language processing, obviating the need to posit counters in the head for, and the existence of, hundreds of millions of n-gram representations.

Original languageEnglish
Pages (from-to)329-347
Number of pages19
JournalLanguage and Speech
Volume56
Issue number3
DOIs
Publication statusPublished - Sep 2013
Externally publishedYes

Keywords

  • computational modeling
  • n-gram frequency effects
  • Naive discriminative learning
  • Rescorla-Wagner equations

Fingerprint Dive into the research topics of 'Sidestepping the Combinatorial Explosion: An Explanation of n-gram Frequency Effects Based on Naive Discriminative Learning'. Together they form a unique fingerprint.

  • Cite this