Conceptualization in Reference Production: Probabilistic Modeling and Experimental Testing

Roger P. G. van Gompel*, Kees van Deemter, Albert Gatt, Rick Snoeren, Emiel J. Krahmer

*Corresponding author for this work

Research output: Contribution to journalArticleScientificpeer-review

Abstract

In psycholinguistics, there has been relatively little work investigating conceptualization-how speakers decide which concepts to express. This contrasts with work in natural language generation (NLG), a subfield of artificial intelligence, where much research has explored content determination during the generation of referring expressions. Existing NLG algorithms for conceptualization during reference production do not fully explain previous psycholinguistic results, so we developed new models that we tested in three language production experiments. In our experiments, participants described target objects to another participant. In Experiment 1, either size, color, or both distinguished the target from all distractor objects; in Experiment 2, either color, type, or both color and type distinguished it from all distractors; In Experiment 3, color, size, or the border around the object distinguished the target. We tested how well the different models fit the distribution of description types (e.g., "small candle," "gray candle," "small gray candle") that participants produced. Across these experiments, the probabilistic referential overspecification model (PRO) provided the best fit. In this model, speakers first choose a property that rules out all distractors. If there is more than one such property, then they probabilistically choose one on the basis of a preference for that property. Next, they sometimes add another property, with the probability again determined by its preference and speakers' eagerness to overspecify.

Original languageEnglish
Pages (from-to)345-373
Number of pages29
JournalPsychological Review
Volume126
Issue number3
DOIs
Publication statusPublished - Apr 2019

Keywords

  • reference production
  • referring expressions
  • conceptualization
  • overspecification
  • computational models
  • REFERRING EXPRESSIONS
  • MESSAGE FORMULATION
  • LEXICAL ACCESS
  • SPEAKERS
  • COLOR
  • GENERATION
  • KNOWLEDGE
  • SPEECH
  • OVERSPECIFICATION
  • COMPREHENSION

Cite this

van Gompel, Roger P. G. ; van Deemter, Kees ; Gatt, Albert ; Snoeren, Rick ; Krahmer, Emiel J. / Conceptualization in Reference Production : Probabilistic Modeling and Experimental Testing. In: Psychological Review. 2019 ; Vol. 126, No. 3. pp. 345-373.
@article{c72f8ca29e5f419886b2534bb0ce39e2,
title = "Conceptualization in Reference Production: Probabilistic Modeling and Experimental Testing",
abstract = "In psycholinguistics, there has been relatively little work investigating conceptualization-how speakers decide which concepts to express. This contrasts with work in natural language generation (NLG), a subfield of artificial intelligence, where much research has explored content determination during the generation of referring expressions. Existing NLG algorithms for conceptualization during reference production do not fully explain previous psycholinguistic results, so we developed new models that we tested in three language production experiments. In our experiments, participants described target objects to another participant. In Experiment 1, either size, color, or both distinguished the target from all distractor objects; in Experiment 2, either color, type, or both color and type distinguished it from all distractors; In Experiment 3, color, size, or the border around the object distinguished the target. We tested how well the different models fit the distribution of description types (e.g., {"}small candle,{"} {"}gray candle,{"} {"}small gray candle{"}) that participants produced. Across these experiments, the probabilistic referential overspecification model (PRO) provided the best fit. In this model, speakers first choose a property that rules out all distractors. If there is more than one such property, then they probabilistically choose one on the basis of a preference for that property. Next, they sometimes add another property, with the probability again determined by its preference and speakers' eagerness to overspecify.",
keywords = "reference production, referring expressions, conceptualization, overspecification, computational models, REFERRING EXPRESSIONS, MESSAGE FORMULATION, LEXICAL ACCESS, SPEAKERS, COLOR, GENERATION, KNOWLEDGE, SPEECH, OVERSPECIFICATION, COMPREHENSION",
author = "{van Gompel}, {Roger P. G.} and {van Deemter}, Kees and Albert Gatt and Rick Snoeren and Krahmer, {Emiel J.}",
year = "2019",
month = "4",
doi = "10.1037/rev0000138",
language = "English",
volume = "126",
pages = "345--373",
journal = "Psychological Review",
issn = "0033-295X",
publisher = "American Psychological Association",
number = "3",

}

Conceptualization in Reference Production : Probabilistic Modeling and Experimental Testing. / van Gompel, Roger P. G.; van Deemter, Kees; Gatt, Albert; Snoeren, Rick; Krahmer, Emiel J.

In: Psychological Review, Vol. 126, No. 3, 04.2019, p. 345-373.

Research output: Contribution to journalArticleScientificpeer-review

TY - JOUR

T1 - Conceptualization in Reference Production

T2 - Probabilistic Modeling and Experimental Testing

AU - van Gompel, Roger P. G.

AU - van Deemter, Kees

AU - Gatt, Albert

AU - Snoeren, Rick

AU - Krahmer, Emiel J.

PY - 2019/4

Y1 - 2019/4

N2 - In psycholinguistics, there has been relatively little work investigating conceptualization-how speakers decide which concepts to express. This contrasts with work in natural language generation (NLG), a subfield of artificial intelligence, where much research has explored content determination during the generation of referring expressions. Existing NLG algorithms for conceptualization during reference production do not fully explain previous psycholinguistic results, so we developed new models that we tested in three language production experiments. In our experiments, participants described target objects to another participant. In Experiment 1, either size, color, or both distinguished the target from all distractor objects; in Experiment 2, either color, type, or both color and type distinguished it from all distractors; In Experiment 3, color, size, or the border around the object distinguished the target. We tested how well the different models fit the distribution of description types (e.g., "small candle," "gray candle," "small gray candle") that participants produced. Across these experiments, the probabilistic referential overspecification model (PRO) provided the best fit. In this model, speakers first choose a property that rules out all distractors. If there is more than one such property, then they probabilistically choose one on the basis of a preference for that property. Next, they sometimes add another property, with the probability again determined by its preference and speakers' eagerness to overspecify.

AB - In psycholinguistics, there has been relatively little work investigating conceptualization-how speakers decide which concepts to express. This contrasts with work in natural language generation (NLG), a subfield of artificial intelligence, where much research has explored content determination during the generation of referring expressions. Existing NLG algorithms for conceptualization during reference production do not fully explain previous psycholinguistic results, so we developed new models that we tested in three language production experiments. In our experiments, participants described target objects to another participant. In Experiment 1, either size, color, or both distinguished the target from all distractor objects; in Experiment 2, either color, type, or both color and type distinguished it from all distractors; In Experiment 3, color, size, or the border around the object distinguished the target. We tested how well the different models fit the distribution of description types (e.g., "small candle," "gray candle," "small gray candle") that participants produced. Across these experiments, the probabilistic referential overspecification model (PRO) provided the best fit. In this model, speakers first choose a property that rules out all distractors. If there is more than one such property, then they probabilistically choose one on the basis of a preference for that property. Next, they sometimes add another property, with the probability again determined by its preference and speakers' eagerness to overspecify.

KW - reference production

KW - referring expressions

KW - conceptualization

KW - overspecification

KW - computational models

KW - REFERRING EXPRESSIONS

KW - MESSAGE FORMULATION

KW - LEXICAL ACCESS

KW - SPEAKERS

KW - COLOR

KW - GENERATION

KW - KNOWLEDGE

KW - SPEECH

KW - OVERSPECIFICATION

KW - COMPREHENSION

U2 - 10.1037/rev0000138

DO - 10.1037/rev0000138

M3 - Article

VL - 126

SP - 345

EP - 373

JO - Psychological Review

JF - Psychological Review

SN - 0033-295X

IS - 3

ER -