The bootstrapping toolbox

Which cues are more useful to learn lexical categories and why

Giovanni Cassani*

*Corresponding author for this work

Research output: ThesisDoctoral ThesisScientific

Abstract

This work addresses the problem of bootstrapping of lexical categories, i.e. how children learn that words can be clustered according to categorical restrictions which define their use in the language, separating, for example, nouns from verbs. The experiments which are going to be described tackle the issue of which properties in the input children receive determine the usefulness of linguistic items in supporting the learning of lexical categories, with a focus on how frequent a cue is, on how many other linguistic units it co-occurs with (i.e. how diverse it is), and on how predictable it is given co-occurring linguistic items. A number of computational simulations are carried out using corpora of transcribed child-directed speech.
Part I of this book focuses on the analysis of distributional information in supporting learning of lexical categories. Chapter 1 provides an introduction about distributional bootstrapping, i.e. the hypothesis that children start learning about lexical categories by considering co-occurrence patterns, and presents the main issues which are going to be addressed in the rest of the work.
In Chapter 2, the attention is focused on distributional contexts, i.e. words and combina- tions thereof, and the properties which make them more useful to learn about the lexical categories of co-occurring words. Methods from machine learning are used to quantify the usefulness of each context. Linear mixed effect models are used to investigate the effect of context frequency, diversity, and predictability on usefulness. Results show that diversity is a more important predictor than frequency, with a positive effect.
Chapter 3 complements the previous one, detailing the analysis of which properties of words make them easier to categorize by using logistic mixed effect models. The simulations in this chapter also test a number of predictions derived from the studies in Chapter 2 and validate the methods used there. Results show that words are easier to categorize when they cannot be easily predicted given co-occurring contexts.
Chapter 4 describes a functional model which quantifies context usefulness as a combi- nation of frequency, diversity, and predictability. This selection heuristic is inspired by the results presented in Chapter 2 and is compared to previous heuristics which only rely on context frequency to decide whether they are useful. Two categorization experiments show that the proposed heuristics selects more useful contexts, which result in more accurate categorization of the words in the input corpora. Results obtained in these three chapters are discussed in Chapter 5.
The studies described in Part II introduce three main innovations. First, simulations are carried out using an error-driven learning mechanism rather than a count model. Second, phonological information is considered as well as distributional information. Third, statistical analyses are conducted using Generalized Additive Mixed Models to dispense with the assumption that predictors of interest have linear effects on the target dependent variable. Chapter 6 introduces the new learning mechanism, the Naïve Discriminative Learning (NDL) model proposed by Baayen, Milin, Durdevi ́c, Hendrix, and Marelli (2011) and reviews the psycholinguistic evidence supporting its validity.
Chapter 7 describes two experiments which aim to verify whether accurate lexical cate- gorization can be obtained from the output of the NDL model. Two training conditions are explored, isolated words and full, segmented utterances, crossed with two vocabulary conditions, one with fewer and one with more target words. Categorization from full and partial phonological information as well as from full and partial distributional information is explored. Results show that categorization is particularly accurate in two conditions. On the one hand, when the full phonological information is leveraged, regardless of the availability of context and of the size of the target vocabulary. On the other hand, when distributional information is considered, even partially, and the NDL model is trained on full utterances.
Chapter 8 analyzes which predictors best explain the usefulness of phonological and distributional cues in the two training conditions, quantifying usefulness using the variance of each cue vector. Results show that frequency best predicts phonological cue variance in both training conditions and also distributional cue variance when the NDL model is trained on isolated words. On the contrary, distributional cue variance is best predicted by diversity when the NDL model is trained on full utterances. Moreover, findings show that the effect of predictability is modulated by the availability of context during training: linear when context is not available, quadratic when it is.
Finally, Chapter 9 investigates the role of semantic information encoded in distributional co-occurrences and how this information relates to phonological cues. A function mapping the phonology of a word to its meaning is learned and used to generate the meaning of a set of non-words, created to phonologically resemble English nouns or verbs. The target non-words are then clustered on the basis of their relation with semantic information of known words. Results show that the phonological distinction is reflected in the semantic domain and that it is possible to devise the likely lexical category of a word simply from its sound, by considering how its implicit semantics relate to the semantics of known words. Chapter 10 finally reviews the results of all the studies in Part II and connects them to those obtained in the first part of the work.
Importantly, all algorithms used in this work learn representations which support cate- gorical abstractions without being trained to learn categories. If categorical abstractions can be learned from a model’s representations it is because the interaction between the learning algorithm and the input data highlighted regularities which implicitly encode lexical categories.
Summing up, this work connects predictors of usefulness (diversity, frequency, and predictability) to information sources (phonological or distributional cues) and learning conditions (modulating the availability of context and the vocabulary size). Results presented in this book offer the possibility of inferring other aspects of category learning and its driving factors by targeting only one of the three: for example, if it is shown that children only learn categories when contextual information is available, it is predicted that learning is primarily driven by diversity and relies on distributional co-occurrences. This information can be used to devise behavioral experiments to elucidate how children learn categories and what information they rely upon.
Original languageEnglish
QualificationDoctor of Philosophy
Awarding Institution
  • University of Antwerp
Supervisors/Advisors
  • Daelemans, Walter, Promotor, External person
  • Gillis, Steven, Promotor, External person
Award date16 May 2019
Publication statusPublished - May 2019
Externally publishedYes

Fingerprint

Semantics
Linguistics
Availability
Experiments
Learning algorithms
Learning systems
Logistics
Innovation
Acoustic waves

Cite this

@phdthesis{dd04b2d66657437fbb9480dfd777a53d,
title = "The bootstrapping toolbox: Which cues are more useful to learn lexical categories and why",
abstract = "This work addresses the problem of bootstrapping of lexical categories, i.e. how children learn that words can be clustered according to categorical restrictions which define their use in the language, separating, for example, nouns from verbs. The experiments which are going to be described tackle the issue of which properties in the input children receive determine the usefulness of linguistic items in supporting the learning of lexical categories, with a focus on how frequent a cue is, on how many other linguistic units it co-occurs with (i.e. how diverse it is), and on how predictable it is given co-occurring linguistic items. A number of computational simulations are carried out using corpora of transcribed child-directed speech.Part I of this book focuses on the analysis of distributional information in supporting learning of lexical categories. Chapter 1 provides an introduction about distributional bootstrapping, i.e. the hypothesis that children start learning about lexical categories by considering co-occurrence patterns, and presents the main issues which are going to be addressed in the rest of the work.In Chapter 2, the attention is focused on distributional contexts, i.e. words and combina- tions thereof, and the properties which make them more useful to learn about the lexical categories of co-occurring words. Methods from machine learning are used to quantify the usefulness of each context. Linear mixed effect models are used to investigate the effect of context frequency, diversity, and predictability on usefulness. Results show that diversity is a more important predictor than frequency, with a positive effect.Chapter 3 complements the previous one, detailing the analysis of which properties of words make them easier to categorize by using logistic mixed effect models. The simulations in this chapter also test a number of predictions derived from the studies in Chapter 2 and validate the methods used there. Results show that words are easier to categorize when they cannot be easily predicted given co-occurring contexts.Chapter 4 describes a functional model which quantifies context usefulness as a combi- nation of frequency, diversity, and predictability. This selection heuristic is inspired by the results presented in Chapter 2 and is compared to previous heuristics which only rely on context frequency to decide whether they are useful. Two categorization experiments show that the proposed heuristics selects more useful contexts, which result in more accurate categorization of the words in the input corpora. Results obtained in these three chapters are discussed in Chapter 5.The studies described in Part II introduce three main innovations. First, simulations are carried out using an error-driven learning mechanism rather than a count model. Second, phonological information is considered as well as distributional information. Third, statistical analyses are conducted using Generalized Additive Mixed Models to dispense with the assumption that predictors of interest have linear effects on the target dependent variable. Chapter 6 introduces the new learning mechanism, the Na{\"i}ve Discriminative Learning (NDL) model proposed by Baayen, Milin, Durdevi ́c, Hendrix, and Marelli (2011) and reviews the psycholinguistic evidence supporting its validity.Chapter 7 describes two experiments which aim to verify whether accurate lexical cate- gorization can be obtained from the output of the NDL model. Two training conditions are explored, isolated words and full, segmented utterances, crossed with two vocabulary conditions, one with fewer and one with more target words. Categorization from full and partial phonological information as well as from full and partial distributional information is explored. Results show that categorization is particularly accurate in two conditions. On the one hand, when the full phonological information is leveraged, regardless of the availability of context and of the size of the target vocabulary. On the other hand, when distributional information is considered, even partially, and the NDL model is trained on full utterances.Chapter 8 analyzes which predictors best explain the usefulness of phonological and distributional cues in the two training conditions, quantifying usefulness using the variance of each cue vector. Results show that frequency best predicts phonological cue variance in both training conditions and also distributional cue variance when the NDL model is trained on isolated words. On the contrary, distributional cue variance is best predicted by diversity when the NDL model is trained on full utterances. Moreover, findings show that the effect of predictability is modulated by the availability of context during training: linear when context is not available, quadratic when it is.Finally, Chapter 9 investigates the role of semantic information encoded in distributional co-occurrences and how this information relates to phonological cues. A function mapping the phonology of a word to its meaning is learned and used to generate the meaning of a set of non-words, created to phonologically resemble English nouns or verbs. The target non-words are then clustered on the basis of their relation with semantic information of known words. Results show that the phonological distinction is reflected in the semantic domain and that it is possible to devise the likely lexical category of a word simply from its sound, by considering how its implicit semantics relate to the semantics of known words. Chapter 10 finally reviews the results of all the studies in Part II and connects them to those obtained in the first part of the work.Importantly, all algorithms used in this work learn representations which support cate- gorical abstractions without being trained to learn categories. If categorical abstractions can be learned from a model’s representations it is because the interaction between the learning algorithm and the input data highlighted regularities which implicitly encode lexical categories.Summing up, this work connects predictors of usefulness (diversity, frequency, and predictability) to information sources (phonological or distributional cues) and learning conditions (modulating the availability of context and the vocabulary size). Results presented in this book offer the possibility of inferring other aspects of category learning and its driving factors by targeting only one of the three: for example, if it is shown that children only learn categories when contextual information is available, it is predicted that learning is primarily driven by diversity and relies on distributional co-occurrences. This information can be used to devise behavioral experiments to elucidate how children learn categories and what information they rely upon.",
author = "Giovanni Cassani",
year = "2019",
month = "5",
language = "English",
school = "University of Antwerp",

}

The bootstrapping toolbox : Which cues are more useful to learn lexical categories and why. / Cassani, Giovanni.

2019. 355 p.

Research output: ThesisDoctoral ThesisScientific

TY - THES

T1 - The bootstrapping toolbox

T2 - Which cues are more useful to learn lexical categories and why

AU - Cassani, Giovanni

PY - 2019/5

Y1 - 2019/5

N2 - This work addresses the problem of bootstrapping of lexical categories, i.e. how children learn that words can be clustered according to categorical restrictions which define their use in the language, separating, for example, nouns from verbs. The experiments which are going to be described tackle the issue of which properties in the input children receive determine the usefulness of linguistic items in supporting the learning of lexical categories, with a focus on how frequent a cue is, on how many other linguistic units it co-occurs with (i.e. how diverse it is), and on how predictable it is given co-occurring linguistic items. A number of computational simulations are carried out using corpora of transcribed child-directed speech.Part I of this book focuses on the analysis of distributional information in supporting learning of lexical categories. Chapter 1 provides an introduction about distributional bootstrapping, i.e. the hypothesis that children start learning about lexical categories by considering co-occurrence patterns, and presents the main issues which are going to be addressed in the rest of the work.In Chapter 2, the attention is focused on distributional contexts, i.e. words and combina- tions thereof, and the properties which make them more useful to learn about the lexical categories of co-occurring words. Methods from machine learning are used to quantify the usefulness of each context. Linear mixed effect models are used to investigate the effect of context frequency, diversity, and predictability on usefulness. Results show that diversity is a more important predictor than frequency, with a positive effect.Chapter 3 complements the previous one, detailing the analysis of which properties of words make them easier to categorize by using logistic mixed effect models. The simulations in this chapter also test a number of predictions derived from the studies in Chapter 2 and validate the methods used there. Results show that words are easier to categorize when they cannot be easily predicted given co-occurring contexts.Chapter 4 describes a functional model which quantifies context usefulness as a combi- nation of frequency, diversity, and predictability. This selection heuristic is inspired by the results presented in Chapter 2 and is compared to previous heuristics which only rely on context frequency to decide whether they are useful. Two categorization experiments show that the proposed heuristics selects more useful contexts, which result in more accurate categorization of the words in the input corpora. Results obtained in these three chapters are discussed in Chapter 5.The studies described in Part II introduce three main innovations. First, simulations are carried out using an error-driven learning mechanism rather than a count model. Second, phonological information is considered as well as distributional information. Third, statistical analyses are conducted using Generalized Additive Mixed Models to dispense with the assumption that predictors of interest have linear effects on the target dependent variable. Chapter 6 introduces the new learning mechanism, the Naïve Discriminative Learning (NDL) model proposed by Baayen, Milin, Durdevi ́c, Hendrix, and Marelli (2011) and reviews the psycholinguistic evidence supporting its validity.Chapter 7 describes two experiments which aim to verify whether accurate lexical cate- gorization can be obtained from the output of the NDL model. Two training conditions are explored, isolated words and full, segmented utterances, crossed with two vocabulary conditions, one with fewer and one with more target words. Categorization from full and partial phonological information as well as from full and partial distributional information is explored. Results show that categorization is particularly accurate in two conditions. On the one hand, when the full phonological information is leveraged, regardless of the availability of context and of the size of the target vocabulary. On the other hand, when distributional information is considered, even partially, and the NDL model is trained on full utterances.Chapter 8 analyzes which predictors best explain the usefulness of phonological and distributional cues in the two training conditions, quantifying usefulness using the variance of each cue vector. Results show that frequency best predicts phonological cue variance in both training conditions and also distributional cue variance when the NDL model is trained on isolated words. On the contrary, distributional cue variance is best predicted by diversity when the NDL model is trained on full utterances. Moreover, findings show that the effect of predictability is modulated by the availability of context during training: linear when context is not available, quadratic when it is.Finally, Chapter 9 investigates the role of semantic information encoded in distributional co-occurrences and how this information relates to phonological cues. A function mapping the phonology of a word to its meaning is learned and used to generate the meaning of a set of non-words, created to phonologically resemble English nouns or verbs. The target non-words are then clustered on the basis of their relation with semantic information of known words. Results show that the phonological distinction is reflected in the semantic domain and that it is possible to devise the likely lexical category of a word simply from its sound, by considering how its implicit semantics relate to the semantics of known words. Chapter 10 finally reviews the results of all the studies in Part II and connects them to those obtained in the first part of the work.Importantly, all algorithms used in this work learn representations which support cate- gorical abstractions without being trained to learn categories. If categorical abstractions can be learned from a model’s representations it is because the interaction between the learning algorithm and the input data highlighted regularities which implicitly encode lexical categories.Summing up, this work connects predictors of usefulness (diversity, frequency, and predictability) to information sources (phonological or distributional cues) and learning conditions (modulating the availability of context and the vocabulary size). Results presented in this book offer the possibility of inferring other aspects of category learning and its driving factors by targeting only one of the three: for example, if it is shown that children only learn categories when contextual information is available, it is predicted that learning is primarily driven by diversity and relies on distributional co-occurrences. This information can be used to devise behavioral experiments to elucidate how children learn categories and what information they rely upon.

AB - This work addresses the problem of bootstrapping of lexical categories, i.e. how children learn that words can be clustered according to categorical restrictions which define their use in the language, separating, for example, nouns from verbs. The experiments which are going to be described tackle the issue of which properties in the input children receive determine the usefulness of linguistic items in supporting the learning of lexical categories, with a focus on how frequent a cue is, on how many other linguistic units it co-occurs with (i.e. how diverse it is), and on how predictable it is given co-occurring linguistic items. A number of computational simulations are carried out using corpora of transcribed child-directed speech.Part I of this book focuses on the analysis of distributional information in supporting learning of lexical categories. Chapter 1 provides an introduction about distributional bootstrapping, i.e. the hypothesis that children start learning about lexical categories by considering co-occurrence patterns, and presents the main issues which are going to be addressed in the rest of the work.In Chapter 2, the attention is focused on distributional contexts, i.e. words and combina- tions thereof, and the properties which make them more useful to learn about the lexical categories of co-occurring words. Methods from machine learning are used to quantify the usefulness of each context. Linear mixed effect models are used to investigate the effect of context frequency, diversity, and predictability on usefulness. Results show that diversity is a more important predictor than frequency, with a positive effect.Chapter 3 complements the previous one, detailing the analysis of which properties of words make them easier to categorize by using logistic mixed effect models. The simulations in this chapter also test a number of predictions derived from the studies in Chapter 2 and validate the methods used there. Results show that words are easier to categorize when they cannot be easily predicted given co-occurring contexts.Chapter 4 describes a functional model which quantifies context usefulness as a combi- nation of frequency, diversity, and predictability. This selection heuristic is inspired by the results presented in Chapter 2 and is compared to previous heuristics which only rely on context frequency to decide whether they are useful. Two categorization experiments show that the proposed heuristics selects more useful contexts, which result in more accurate categorization of the words in the input corpora. Results obtained in these three chapters are discussed in Chapter 5.The studies described in Part II introduce three main innovations. First, simulations are carried out using an error-driven learning mechanism rather than a count model. Second, phonological information is considered as well as distributional information. Third, statistical analyses are conducted using Generalized Additive Mixed Models to dispense with the assumption that predictors of interest have linear effects on the target dependent variable. Chapter 6 introduces the new learning mechanism, the Naïve Discriminative Learning (NDL) model proposed by Baayen, Milin, Durdevi ́c, Hendrix, and Marelli (2011) and reviews the psycholinguistic evidence supporting its validity.Chapter 7 describes two experiments which aim to verify whether accurate lexical cate- gorization can be obtained from the output of the NDL model. Two training conditions are explored, isolated words and full, segmented utterances, crossed with two vocabulary conditions, one with fewer and one with more target words. Categorization from full and partial phonological information as well as from full and partial distributional information is explored. Results show that categorization is particularly accurate in two conditions. On the one hand, when the full phonological information is leveraged, regardless of the availability of context and of the size of the target vocabulary. On the other hand, when distributional information is considered, even partially, and the NDL model is trained on full utterances.Chapter 8 analyzes which predictors best explain the usefulness of phonological and distributional cues in the two training conditions, quantifying usefulness using the variance of each cue vector. Results show that frequency best predicts phonological cue variance in both training conditions and also distributional cue variance when the NDL model is trained on isolated words. On the contrary, distributional cue variance is best predicted by diversity when the NDL model is trained on full utterances. Moreover, findings show that the effect of predictability is modulated by the availability of context during training: linear when context is not available, quadratic when it is.Finally, Chapter 9 investigates the role of semantic information encoded in distributional co-occurrences and how this information relates to phonological cues. A function mapping the phonology of a word to its meaning is learned and used to generate the meaning of a set of non-words, created to phonologically resemble English nouns or verbs. The target non-words are then clustered on the basis of their relation with semantic information of known words. Results show that the phonological distinction is reflected in the semantic domain and that it is possible to devise the likely lexical category of a word simply from its sound, by considering how its implicit semantics relate to the semantics of known words. Chapter 10 finally reviews the results of all the studies in Part II and connects them to those obtained in the first part of the work.Importantly, all algorithms used in this work learn representations which support cate- gorical abstractions without being trained to learn categories. If categorical abstractions can be learned from a model’s representations it is because the interaction between the learning algorithm and the input data highlighted regularities which implicitly encode lexical categories.Summing up, this work connects predictors of usefulness (diversity, frequency, and predictability) to information sources (phonological or distributional cues) and learning conditions (modulating the availability of context and the vocabulary size). Results presented in this book offer the possibility of inferring other aspects of category learning and its driving factors by targeting only one of the three: for example, if it is shown that children only learn categories when contextual information is available, it is predicted that learning is primarily driven by diversity and relies on distributional co-occurrences. This information can be used to devise behavioral experiments to elucidate how children learn categories and what information they rely upon.

M3 - Doctoral Thesis

ER -