Encoding of phonology in a recurrent neural model of grounded speech

    Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

    Abstract

    We study the representation and encoding of phonemes in a recurrent
    neural network model of grounded speech. We use a model which
    processes images and their spoken descriptions, and projects the
    visual and auditory representations into the same semantic space. We
    perform a number of analyses on how information about individual
    phonemes is encoded in the MFCC features extracted from the speech
    signal, and the activations of the layers of the model. Via
    experiments with phoneme decoding and phoneme discrimination we show
    that phoneme representations are most salient in the lower layers of
    the model, where low-level signals are processed at a fine-grained
    level, although a large amount of phonological information is retain at
    the top recurrent layer. We further find out that the
    attention mechanism following the top recurrent layer significantly
    attenuates encoding of phonology and makes the utterance embeddings
    much more invariant to synonymy. Moreover, a hierarchical clustering
    of phoneme representations learned by the network shows an
    organizational structure of phonemes similar to those proposed in
    linguistics.
    Original languageEnglish
    Title of host publicationProceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017)
    EditorsRoger Levy, Lucia Specia
    Place of PublicationVancouver, Canada
    PublisherAssociation for Computational Linguistics
    Pages368-378
    Number of pages11
    ISBN (Electronic)9781945626548
    DOIs
    Publication statusPublished - 2017
    Event Conference on Computational Natural Language Learning: CoNLL 2017 - Vancouver, Canada
    Duration: 3 Aug 20174 Aug 2017
    Conference number: 21

    Conference

    Conference Conference on Computational Natural Language Learning
    CountryCanada
    CityVancouver
    Period3/08/174/08/17

    Fingerprint

    Recurrent neural networks
    Linguistics
    Decoding
    Chemical activation
    Semantics
    Experiments

    Cite this

    Alishahi, A., Barking, M., & Chrupala, G. (2017). Encoding of phonology in a recurrent neural model of grounded speech. In R. Levy, & L. Specia (Eds.), Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017) (pp. 368-378). Vancouver, Canada: Association for Computational Linguistics. https://doi.org/10.18653/v1/K17-1037
    Alishahi, Afra ; Barking, Marie ; Chrupala, Grzegorz. / Encoding of phonology in a recurrent neural model of grounded speech. Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017). editor / Roger Levy ; Lucia Specia. Vancouver, Canada : Association for Computational Linguistics, 2017. pp. 368-378
    @inproceedings{4adf30c388d746dcb41eea3048715f3c,
    title = "Encoding of phonology in a recurrent neural model of grounded speech",
    abstract = "We study the representation and encoding of phonemes in a recurrent neural network model of grounded speech. We use a model which processes images and their spoken descriptions, and projects the visual and auditory representations into the same semantic space. We perform a number of analyses on how information about individual phonemes is encoded in the MFCC features extracted from the speech signal, and the activations of the layers of the model. Via experiments with phoneme decoding and phoneme discrimination we show that phoneme representations are most salient in the lower layers of the model, where low-level signals are processed at a fine-grained level, although a large amount of phonological information is retain at the top recurrent layer. We further find out that the attention mechanism following the top recurrent layer significantly attenuates encoding of phonology and makes the utterance embeddings much more invariant to synonymy. Moreover, a hierarchical clustering of phoneme representations learned by the network shows an organizational structure of phonemes similar to those proposed in linguistics.",
    author = "Afra Alishahi and Marie Barking and Grzegorz Chrupala",
    year = "2017",
    doi = "10.18653/v1/K17-1037",
    language = "English",
    pages = "368--378",
    editor = "Roger Levy and Lucia Specia",
    booktitle = "Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017)",
    publisher = "Association for Computational Linguistics",

    }

    Alishahi, A, Barking, M & Chrupala, G 2017, Encoding of phonology in a recurrent neural model of grounded speech. in R Levy & L Specia (eds), Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017). Association for Computational Linguistics, Vancouver, Canada, pp. 368-378, Conference on Computational Natural Language Learning, Vancouver, Canada, 3/08/17. https://doi.org/10.18653/v1/K17-1037

    Encoding of phonology in a recurrent neural model of grounded speech. / Alishahi, Afra; Barking, Marie; Chrupala, Grzegorz.

    Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017). ed. / Roger Levy; Lucia Specia. Vancouver, Canada : Association for Computational Linguistics, 2017. p. 368-378.

    Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

    TY - GEN

    T1 - Encoding of phonology in a recurrent neural model of grounded speech

    AU - Alishahi, Afra

    AU - Barking, Marie

    AU - Chrupala, Grzegorz

    PY - 2017

    Y1 - 2017

    N2 - We study the representation and encoding of phonemes in a recurrent neural network model of grounded speech. We use a model which processes images and their spoken descriptions, and projects the visual and auditory representations into the same semantic space. We perform a number of analyses on how information about individual phonemes is encoded in the MFCC features extracted from the speech signal, and the activations of the layers of the model. Via experiments with phoneme decoding and phoneme discrimination we show that phoneme representations are most salient in the lower layers of the model, where low-level signals are processed at a fine-grained level, although a large amount of phonological information is retain at the top recurrent layer. We further find out that the attention mechanism following the top recurrent layer significantly attenuates encoding of phonology and makes the utterance embeddings much more invariant to synonymy. Moreover, a hierarchical clustering of phoneme representations learned by the network shows an organizational structure of phonemes similar to those proposed in linguistics.

    AB - We study the representation and encoding of phonemes in a recurrent neural network model of grounded speech. We use a model which processes images and their spoken descriptions, and projects the visual and auditory representations into the same semantic space. We perform a number of analyses on how information about individual phonemes is encoded in the MFCC features extracted from the speech signal, and the activations of the layers of the model. Via experiments with phoneme decoding and phoneme discrimination we show that phoneme representations are most salient in the lower layers of the model, where low-level signals are processed at a fine-grained level, although a large amount of phonological information is retain at the top recurrent layer. We further find out that the attention mechanism following the top recurrent layer significantly attenuates encoding of phonology and makes the utterance embeddings much more invariant to synonymy. Moreover, a hierarchical clustering of phoneme representations learned by the network shows an organizational structure of phonemes similar to those proposed in linguistics.

    U2 - 10.18653/v1/K17-1037

    DO - 10.18653/v1/K17-1037

    M3 - Conference contribution

    SP - 368

    EP - 378

    BT - Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017)

    A2 - Levy, Roger

    A2 - Specia, Lucia

    PB - Association for Computational Linguistics

    CY - Vancouver, Canada

    ER -

    Alishahi A, Barking M, Chrupala G. Encoding of phonology in a recurrent neural model of grounded speech. In Levy R, Specia L, editors, Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017). Vancouver, Canada: Association for Computational Linguistics. 2017. p. 368-378 https://doi.org/10.18653/v1/K17-1037