Representation of linguistic form and function in recurrent neural networks

    Research output: Contribution to journalArticleScientificpeer-review

    7 Downloads (Pure)

    Abstract

    We present novel methods for analyzing the activation patterns of recurrent neural networks from a linguistic point of view and explore the types of linguistic structure they learn. As a case study, we use a standard standalone language model, and a multi-task gated recurrent network architecture consisting of two parallel pathways with shared word embeddings: The Visual pathway is trained on predicting the representations of the visual scene corresponding to an input sentence, and the Textual pathway is trained to predict the next word in the same sentence. We propose a method for estimating the amount of contribution of individual tokens in the input to the final prediction of the networks. Using this method, we show that the Visual pathway pays selective attention to lexical categories and grammatical functions that carry semantic information, and learns to treat word types differently depending on their grammatical function and their position in the sequential structure of the sentence. In contrast, the language models are comparatively more sensitive to words with a syntactic function. Further analysis of the most informative n-gram contexts for each model shows that in comparison with the Visual pathway, the language models react more strongly to abstract contexts that represent syntactic constructions.
    Original languageEnglish
    Pages (from-to)761-780
    JournalComputational Linguistics
    Volume43
    Issue number4
    DOIs
    Publication statusPublished - 2017

    Fingerprint

    Recurrent neural networks
    Linguistics
    neural network
    linguistics
    Syntactics
    standard language
    language
    Network architecture
    activation
    Chemical activation
    Semantics
    semantics
    Linguistic Form
    Recurrent Neural Networks
    Pathway
    Linguistic Function
    Language Model
    Grammatical Functions

    Cite this

    @article{26d4cf881d1f4b6b8dd513a3481d0401,
    title = "Representation of linguistic form and function in recurrent neural networks",
    abstract = "We present novel methods for analyzing the activation patterns of recurrent neural networks from a linguistic point of view and explore the types of linguistic structure they learn. As a case study, we use a standard standalone language model, and a multi-task gated recurrent network architecture consisting of two parallel pathways with shared word embeddings: The Visual pathway is trained on predicting the representations of the visual scene corresponding to an input sentence, and the Textual pathway is trained to predict the next word in the same sentence. We propose a method for estimating the amount of contribution of individual tokens in the input to the final prediction of the networks. Using this method, we show that the Visual pathway pays selective attention to lexical categories and grammatical functions that carry semantic information, and learns to treat word types differently depending on their grammatical function and their position in the sequential structure of the sentence. In contrast, the language models are comparatively more sensitive to words with a syntactic function. Further analysis of the most informative n-gram contexts for each model shows that in comparison with the Visual pathway, the language models react more strongly to abstract contexts that represent syntactic constructions.",
    author = "Akos Kadar and Grzegorz Chrupala and Afra Alishahi",
    year = "2017",
    doi = "10.1162/COLI_a_00300",
    language = "English",
    volume = "43",
    pages = "761--780",
    journal = "Computational Linguistics",
    issn = "0891-2017",
    publisher = "The MIT Press",
    number = "4",

    }

    Representation of linguistic form and function in recurrent neural networks. / Kadar, Akos; Chrupala, Grzegorz; Alishahi, Afra.

    In: Computational Linguistics, Vol. 43, No. 4, 2017, p. 761-780.

    Research output: Contribution to journalArticleScientificpeer-review

    TY - JOUR

    T1 - Representation of linguistic form and function in recurrent neural networks

    AU - Kadar, Akos

    AU - Chrupala, Grzegorz

    AU - Alishahi, Afra

    PY - 2017

    Y1 - 2017

    N2 - We present novel methods for analyzing the activation patterns of recurrent neural networks from a linguistic point of view and explore the types of linguistic structure they learn. As a case study, we use a standard standalone language model, and a multi-task gated recurrent network architecture consisting of two parallel pathways with shared word embeddings: The Visual pathway is trained on predicting the representations of the visual scene corresponding to an input sentence, and the Textual pathway is trained to predict the next word in the same sentence. We propose a method for estimating the amount of contribution of individual tokens in the input to the final prediction of the networks. Using this method, we show that the Visual pathway pays selective attention to lexical categories and grammatical functions that carry semantic information, and learns to treat word types differently depending on their grammatical function and their position in the sequential structure of the sentence. In contrast, the language models are comparatively more sensitive to words with a syntactic function. Further analysis of the most informative n-gram contexts for each model shows that in comparison with the Visual pathway, the language models react more strongly to abstract contexts that represent syntactic constructions.

    AB - We present novel methods for analyzing the activation patterns of recurrent neural networks from a linguistic point of view and explore the types of linguistic structure they learn. As a case study, we use a standard standalone language model, and a multi-task gated recurrent network architecture consisting of two parallel pathways with shared word embeddings: The Visual pathway is trained on predicting the representations of the visual scene corresponding to an input sentence, and the Textual pathway is trained to predict the next word in the same sentence. We propose a method for estimating the amount of contribution of individual tokens in the input to the final prediction of the networks. Using this method, we show that the Visual pathway pays selective attention to lexical categories and grammatical functions that carry semantic information, and learns to treat word types differently depending on their grammatical function and their position in the sequential structure of the sentence. In contrast, the language models are comparatively more sensitive to words with a syntactic function. Further analysis of the most informative n-gram contexts for each model shows that in comparison with the Visual pathway, the language models react more strongly to abstract contexts that represent syntactic constructions.

    U2 - 10.1162/COLI_a_00300

    DO - 10.1162/COLI_a_00300

    M3 - Article

    VL - 43

    SP - 761

    EP - 780

    JO - Computational Linguistics

    JF - Computational Linguistics

    SN - 0891-2017

    IS - 4

    ER -