Abstract
Mainstream linguistic theories posit that the relation between the form of a word and the meaning it refers to is arbitrary, implying that the meaning of a word cannot be inferred from its form [1]. However, recent research has pointed to the contrary, highlighting how the relation between the form and meaning of a word is likely not entirely arbitrary [2,3,4,5]. This seems especially prevalent in the first few years of language acquisition [6], when non-arbitrary form-to-meaning relationships have been suggested to help with acquiring new words [7,5].
The current work investigates the relationship between word forms and lexical meanings during language acquisition, building on [4] and [5]. Using computational tools from distributional semantics and adopting a zero-shot learning perspective, we tested whether systematicity facilitates subsequent word learning. In detail, we use the Linear Discriminative Learning (LDL [4]) model and Form-Semantic Consistency (FSC [5]) to capture systematicity, as the two models have been successfully used before to study form-to-meaning systematicity in language processing and acquisition [2, 3, 5]. These models map word-form representations onto semantic space, with the ability to generate semantic representations for novel word-forms exploiting statistical regularities in form-to-meaning mappings observed in the vocabulary learned up to that point. This improves over previous approaches to the study of systematicity in language acquisition, which looked at correlations between form and meaning representations of known words and could not assess the degree of systematicity of a novel word with respect to a learner’s available vocabulary [6].
The US and UK English parts of the CHILDES database were used as training corpus, which was split into nine bins (0-24m, 25m-30m, 31m-36m, 37m-42m, 43m-48m, 49m-54m, 55m-60m, 61m-72m, 73m-120m), to have sufficient data in each bin. Different semantic spaces were obtained for each bin: word2vec [8] spaces for the FSC model and Naïve Discriminative Learning (NDL) spaces for the LDL model [4]. Moreover, boolean form vectors were obtained considering which character 3-grams are present in every word. Both the FSC and LDL models were used to derive a measure of form-to-meaning systematicity for words yet to be learned (considering words produced by children at previous time points as the reference vocabulary). The analysis focused on words which were present at least twice in the corpus, for a total of 7791 words. We then coded to-be-learned words according to the corpus bin in which they were first produced by a child and fitted a Cumulative Link Mixed Model in R to predict when a word would be produced (ordinal, 3 levels: in the same bin, in a later bin, not produced in the corpus) as a function of its frequency, semantic neighborhood density, and length (base model). These measures were considered as they have been shown to be relevant in facilitating word learning [9]. We then added the target measures of systematicity separately and evaluated the change in AIC to determine whether systematicity helps predicting age of first production. All variables were Box-Cox transformed and z-standardized.
Results indicate that FSC has a positive effect on word learning (DAIC = 25.13, β = 0.23, se = 0.05, z = -5.15, p < 0.001) but LDL does not (DAIC = 2.38, β = 0.01, se = 0.04, z = 0.38, p = 0.7). Words with higher FSC scores given the available vocabulary thus tend to be learned earlier. Moreover, we ensured that the effect of FSC was not a by-product of other properties of the lexicon by randomly permuting the semantic vectors of all words, destroying any systematic relation between form and meaning. AIC scores were then computed by including the systematicity scores derived from permuted embeddings over the baseline model, observing no improvement.
These results support theories which posit a role for form-to-meaning systematicity in facilitating word learning [6,7] and for the first time provide evidence that systematicity computed on developing vocabularies predicts subsequent word learning. FSC [5], relying on local analogies in form and meaning, seems to be a better predictor than LDL [4], which attempts to derive a general mapping function, offering indications about which mechanism may better leverage systematic relations in the vocabulary.
Computational modelling frameworks
The FSC model derives a distributed semantic representation for any string by considering its neighbors in form space. A word’s FSC score is high when the form neighbors of the word are semantically like the word itself and to each other. The model can also generate semantic representations for novel word forms exploiting statistical regularities in form-to-meaning mappings observed in the vocabulary learned up to that point (zero-shot learning) [5]. In our implementation, FSC is computed using Equation (1), where t is the target word, n in N is a neighbor from the set of neighbors in form space based on Levenshtein distance, cos(•,•) is the cosine similarity function, while t and ni are the embeddings of the target and neighbors. The symbol di indicates the Levenshtein distance between the target t and neighbor n[i]. The semantic similarity is weighted by the distance in form space, such that closer form neighbors weigh more on the measure of systematicity.
(1) FSC(t) = sum i=1 to N (cos(t, n[i])*1/d[i] ) / N
The LDL model uses linear networks which map form onto meaning and vice versa [4]. Given the form-to-meaning mapping, inputting the form vector of a word will return the expected semantic vector of the word. This same mapping can also be used to generate the semantic vectors of yet unknown words. LDL is implemented using Equation (2), in which Cw is the form matrix of all words in the corpus and Sw the is the semantic matrix of all words in the corpus (same number of rows as Cw), F can be obtained by multiplying the Moore-Penrose generalized inverse of Cw with Sw. The estimated semantic matrix of novel words (Snw) can then be generated by multiplying the form matrix of novel words (Cnw) with F following Equation (3).
(2) C[w]F = S[w]
(3) C[nw]F = S[nw]
References:
[1] de Saussure, F. (1916). Course in general linguistics. New York, NY: McGraw-Hill.
[2] Cassani, G., Chuang, Y. and Baayen, R. (2020). On the Semantics of Nonwords and Their Lexical Category. J Exp Psychol Learn, 46(4), page 621-637.
[3] Hendrix, P. and Sun, C. C. (2020). A Word or Two About Nonwords: Frequency, Semantic Neighborhood Density, and Orthography-to-Semantics Consistency Effects for Nonwords in the Lexical Decision Task. J Exp Psychol Learn. 47(1), 157–183
[4] Baayen, R. H., et al. (2019). The Discriminative Lexicon: A Unified Computational Model for the Lexicon and Lexical Processing in Comprehension and Production Grounded Not in (De)Composition but in Linear Discriminative Learning. Complexity, Vol. 2019.
[5] Cassani, G. and Limacher, N. (2021). Not just form, not just meaning: Words with consistent form-meaning mappings are learned earlier. Q J Exp Psychol. October 2021.
[6] Monaghan, P., Shillcock, R. C., Christansen, M. H., and Kirby, S. (2014). How arbitrary is language? Philos T Roy Soc B, 369:20130299.
[7] Imai, M., et al. (2015). Sound symbolism facilitates word learning in14-month-olds. PLoS ONE, 10(2) :e0116494.
[8] Mikolov, T., Sutskever, I., Chen, K., Corrado, G., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. NeurIPS, Lake Tahoe, Nevada.
[9] Yu, H., Zhang, H., and Xu, W. (2017). A Deep Compositional Framework for Human-like Language Acquisition in Virtual Environment. CoRR.
[10] Braginsky, M., Yurovsky, D., Marchman, V. A., and Frank, M. C. (2019). Consistency and Variability in Children's Word Learning Across Languages. Open Mind, 3, 52-67.
The current work investigates the relationship between word forms and lexical meanings during language acquisition, building on [4] and [5]. Using computational tools from distributional semantics and adopting a zero-shot learning perspective, we tested whether systematicity facilitates subsequent word learning. In detail, we use the Linear Discriminative Learning (LDL [4]) model and Form-Semantic Consistency (FSC [5]) to capture systematicity, as the two models have been successfully used before to study form-to-meaning systematicity in language processing and acquisition [2, 3, 5]. These models map word-form representations onto semantic space, with the ability to generate semantic representations for novel word-forms exploiting statistical regularities in form-to-meaning mappings observed in the vocabulary learned up to that point. This improves over previous approaches to the study of systematicity in language acquisition, which looked at correlations between form and meaning representations of known words and could not assess the degree of systematicity of a novel word with respect to a learner’s available vocabulary [6].
The US and UK English parts of the CHILDES database were used as training corpus, which was split into nine bins (0-24m, 25m-30m, 31m-36m, 37m-42m, 43m-48m, 49m-54m, 55m-60m, 61m-72m, 73m-120m), to have sufficient data in each bin. Different semantic spaces were obtained for each bin: word2vec [8] spaces for the FSC model and Naïve Discriminative Learning (NDL) spaces for the LDL model [4]. Moreover, boolean form vectors were obtained considering which character 3-grams are present in every word. Both the FSC and LDL models were used to derive a measure of form-to-meaning systematicity for words yet to be learned (considering words produced by children at previous time points as the reference vocabulary). The analysis focused on words which were present at least twice in the corpus, for a total of 7791 words. We then coded to-be-learned words according to the corpus bin in which they were first produced by a child and fitted a Cumulative Link Mixed Model in R to predict when a word would be produced (ordinal, 3 levels: in the same bin, in a later bin, not produced in the corpus) as a function of its frequency, semantic neighborhood density, and length (base model). These measures were considered as they have been shown to be relevant in facilitating word learning [9]. We then added the target measures of systematicity separately and evaluated the change in AIC to determine whether systematicity helps predicting age of first production. All variables were Box-Cox transformed and z-standardized.
Results indicate that FSC has a positive effect on word learning (DAIC = 25.13, β = 0.23, se = 0.05, z = -5.15, p < 0.001) but LDL does not (DAIC = 2.38, β = 0.01, se = 0.04, z = 0.38, p = 0.7). Words with higher FSC scores given the available vocabulary thus tend to be learned earlier. Moreover, we ensured that the effect of FSC was not a by-product of other properties of the lexicon by randomly permuting the semantic vectors of all words, destroying any systematic relation between form and meaning. AIC scores were then computed by including the systematicity scores derived from permuted embeddings over the baseline model, observing no improvement.
These results support theories which posit a role for form-to-meaning systematicity in facilitating word learning [6,7] and for the first time provide evidence that systematicity computed on developing vocabularies predicts subsequent word learning. FSC [5], relying on local analogies in form and meaning, seems to be a better predictor than LDL [4], which attempts to derive a general mapping function, offering indications about which mechanism may better leverage systematic relations in the vocabulary.
Computational modelling frameworks
The FSC model derives a distributed semantic representation for any string by considering its neighbors in form space. A word’s FSC score is high when the form neighbors of the word are semantically like the word itself and to each other. The model can also generate semantic representations for novel word forms exploiting statistical regularities in form-to-meaning mappings observed in the vocabulary learned up to that point (zero-shot learning) [5]. In our implementation, FSC is computed using Equation (1), where t is the target word, n in N is a neighbor from the set of neighbors in form space based on Levenshtein distance, cos(•,•) is the cosine similarity function, while t and ni are the embeddings of the target and neighbors. The symbol di indicates the Levenshtein distance between the target t and neighbor n[i]. The semantic similarity is weighted by the distance in form space, such that closer form neighbors weigh more on the measure of systematicity.
(1) FSC(t) = sum i=1 to N (cos(t, n[i])*1/d[i] ) / N
The LDL model uses linear networks which map form onto meaning and vice versa [4]. Given the form-to-meaning mapping, inputting the form vector of a word will return the expected semantic vector of the word. This same mapping can also be used to generate the semantic vectors of yet unknown words. LDL is implemented using Equation (2), in which Cw is the form matrix of all words in the corpus and Sw the is the semantic matrix of all words in the corpus (same number of rows as Cw), F can be obtained by multiplying the Moore-Penrose generalized inverse of Cw with Sw. The estimated semantic matrix of novel words (Snw) can then be generated by multiplying the form matrix of novel words (Cnw) with F following Equation (3).
(2) C[w]F = S[w]
(3) C[nw]F = S[nw]
References:
[1] de Saussure, F. (1916). Course in general linguistics. New York, NY: McGraw-Hill.
[2] Cassani, G., Chuang, Y. and Baayen, R. (2020). On the Semantics of Nonwords and Their Lexical Category. J Exp Psychol Learn, 46(4), page 621-637.
[3] Hendrix, P. and Sun, C. C. (2020). A Word or Two About Nonwords: Frequency, Semantic Neighborhood Density, and Orthography-to-Semantics Consistency Effects for Nonwords in the Lexical Decision Task. J Exp Psychol Learn. 47(1), 157–183
[4] Baayen, R. H., et al. (2019). The Discriminative Lexicon: A Unified Computational Model for the Lexicon and Lexical Processing in Comprehension and Production Grounded Not in (De)Composition but in Linear Discriminative Learning. Complexity, Vol. 2019.
[5] Cassani, G. and Limacher, N. (2021). Not just form, not just meaning: Words with consistent form-meaning mappings are learned earlier. Q J Exp Psychol. October 2021.
[6] Monaghan, P., Shillcock, R. C., Christansen, M. H., and Kirby, S. (2014). How arbitrary is language? Philos T Roy Soc B, 369:20130299.
[7] Imai, M., et al. (2015). Sound symbolism facilitates word learning in14-month-olds. PLoS ONE, 10(2) :e0116494.
[8] Mikolov, T., Sutskever, I., Chen, K., Corrado, G., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. NeurIPS, Lake Tahoe, Nevada.
[9] Yu, H., Zhang, H., and Xu, W. (2017). A Deep Compositional Framework for Human-like Language Acquisition in Virtual Environment. CoRR.
[10] Braginsky, M., Yurovsky, D., Marchman, V. A., and Frank, M. C. (2019). Consistency and Variability in Children's Word Learning Across Languages. Open Mind, 3, 52-67.
Original language | English |
---|---|
Pages | 1-2 |
Number of pages | 2 |
Publication status | Published - Sept 2022 |
Event | Architectures and Mechanisms for Language Processing (2022) - York, United Kingdom Duration: 7 Sept 2022 → 9 Sept 2022 https://amlap2022.york.ac.uk |
Conference
Conference | Architectures and Mechanisms for Language Processing (2022) |
---|---|
Abbreviated title | AMLaP |
Country/Territory | United Kingdom |
City | York |
Period | 7/09/22 → 9/09/22 |
Internet address |
Keywords
- Word learning
- Computational approaches
- Word forms
- n non-arbitrary form-to-meaning
- lexical meaning