From phonemes to images

levels of representation in a recurrent neural model of visually-grounded language learning

    Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

    Abstract

    We present a model of visually-grounded language learning based on stacked gated recurrent neural networks which learns to predict visual features given an image description in the form of a sequence of phonemes. The learning task resembles that faced by human language learners who need to discover both structure and meaning from noisy and ambiguous data across modalities.
    We show that our model indeed learns to predict features of the visual context given phonetically transcribed image descriptions, and show that it represents linguistic information in a hierarchy of levels: lower layers in the stack are comparatively more sensitive to form, whereas higher layers are more sensitive to meaning.
    Original languageEnglish
    Title of host publicationProceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers
    PublisherInternational Committee on Computational Linguistics
    Pages1309-1319
    Number of pages10
    ISBN (Electronic)978-4-87974-702-0
    Publication statusPublished - 2016

    Fingerprint

    Recurrent neural networks
    Linguistics

    Cite this

    Gelderloos, L. J., & Chrupala, G. (2016). From phonemes to images: levels of representation in a recurrent neural model of visually-grounded language learning. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers (pp. 1309-1319). International Committee on Computational Linguistics.
    Gelderloos, L.J. ; Chrupala, Grzegorz. / From phonemes to images : levels of representation in a recurrent neural model of visually-grounded language learning. Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers. International Committee on Computational Linguistics, 2016. pp. 1309-1319
    @inproceedings{20e1c29876444557bc2b570b824b19f9,
    title = "From phonemes to images: levels of representation in a recurrent neural model of visually-grounded language learning",
    abstract = "We present a model of visually-grounded language learning based on stacked gated recurrent neural networks which learns to predict visual features given an image description in the form of a sequence of phonemes. The learning task resembles that faced by human language learners who need to discover both structure and meaning from noisy and ambiguous data across modalities.We show that our model indeed learns to predict features of the visual context given phonetically transcribed image descriptions, and show that it represents linguistic information in a hierarchy of levels: lower layers in the stack are comparatively more sensitive to form, whereas higher layers are more sensitive to meaning.",
    author = "L.J. Gelderloos and Grzegorz Chrupala",
    year = "2016",
    language = "English",
    pages = "1309--1319",
    booktitle = "Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers",
    publisher = "International Committee on Computational Linguistics",

    }

    Gelderloos, LJ & Chrupala, G 2016, From phonemes to images: levels of representation in a recurrent neural model of visually-grounded language learning. in Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers. International Committee on Computational Linguistics, pp. 1309-1319.

    From phonemes to images : levels of representation in a recurrent neural model of visually-grounded language learning. / Gelderloos, L.J.; Chrupala, Grzegorz.

    Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers. International Committee on Computational Linguistics, 2016. p. 1309-1319.

    Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

    TY - GEN

    T1 - From phonemes to images

    T2 - levels of representation in a recurrent neural model of visually-grounded language learning

    AU - Gelderloos, L.J.

    AU - Chrupala, Grzegorz

    PY - 2016

    Y1 - 2016

    N2 - We present a model of visually-grounded language learning based on stacked gated recurrent neural networks which learns to predict visual features given an image description in the form of a sequence of phonemes. The learning task resembles that faced by human language learners who need to discover both structure and meaning from noisy and ambiguous data across modalities.We show that our model indeed learns to predict features of the visual context given phonetically transcribed image descriptions, and show that it represents linguistic information in a hierarchy of levels: lower layers in the stack are comparatively more sensitive to form, whereas higher layers are more sensitive to meaning.

    AB - We present a model of visually-grounded language learning based on stacked gated recurrent neural networks which learns to predict visual features given an image description in the form of a sequence of phonemes. The learning task resembles that faced by human language learners who need to discover both structure and meaning from noisy and ambiguous data across modalities.We show that our model indeed learns to predict features of the visual context given phonetically transcribed image descriptions, and show that it represents linguistic information in a hierarchy of levels: lower layers in the stack are comparatively more sensitive to form, whereas higher layers are more sensitive to meaning.

    M3 - Conference contribution

    SP - 1309

    EP - 1319

    BT - Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

    PB - International Committee on Computational Linguistics

    ER -

    Gelderloos LJ, Chrupala G. From phonemes to images: levels of representation in a recurrent neural model of visually-grounded language learning. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers. International Committee on Computational Linguistics. 2016. p. 1309-1319