Learning language through pictures

    Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

    163 Downloads (Pure)

    Abstract

    We propose Imaginet, a model of learning visually grounded representations of language from coupled textual and visual input. The model consists of two Gated Recurrent Unit networks with shared word embeddings, and uses a multi-task objective by receiving a textual description of a scene and trying to concurrently predict its visual representation and the next word in the sentence. Like humans, it acquires meaning representations for individual words from descriptions of visual scenes. Moreover, it learns to effectively use sequential structure in semantic interpretation of multi-word phrases.
    Original languageEnglish
    Title of host publicationProceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)
    EditorsChengqing Zong, Michael Strube
    Place of PublicationBeijing, China
    PublisherAssociation for Computational Linguistics
    Pages112-118
    Number of pages6
    ISBN (Electronic)9781941643730
    Publication statusPublished - 2015
    Event53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing - Beijing, China
    Duration: 26 Jul 201531 Jul 2015

    Conference

    Conference53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing
    CountryChina
    CityBeijing
    Period26/07/1531/07/15

    Fingerprint

    Semantics

    Cite this

    Chrupala, G., Kadar, A., & Alishahi, A. (2015). Learning language through pictures. In C. Zong, & M. Strube (Eds.), Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers) (pp. 112-118). Beijing, China: Association for Computational Linguistics.
    Chrupala, Grzegorz ; Kadar, Akos ; Alishahi, Afra. / Learning language through pictures. Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers). editor / Chengqing Zong ; Michael Strube. Beijing, China : Association for Computational Linguistics, 2015. pp. 112-118
    @inproceedings{b9a427df6fc7416b94b0df38706e998c,
    title = "Learning language through pictures",
    abstract = "We propose Imaginet, a model of learning visually grounded representations of language from coupled textual and visual input. The model consists of two Gated Recurrent Unit networks with shared word embeddings, and uses a multi-task objective by receiving a textual description of a scene and trying to concurrently predict its visual representation and the next word in the sentence. Like humans, it acquires meaning representations for individual words from descriptions of visual scenes. Moreover, it learns to effectively use sequential structure in semantic interpretation of multi-word phrases.",
    author = "Grzegorz Chrupala and Akos Kadar and Afra Alishahi",
    year = "2015",
    language = "English",
    pages = "112--118",
    editor = "Chengqing Zong and Michael Strube",
    booktitle = "Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)",
    publisher = "Association for Computational Linguistics",

    }

    Chrupala, G, Kadar, A & Alishahi, A 2015, Learning language through pictures. in C Zong & M Strube (eds), Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers). Association for Computational Linguistics, Beijing, China, pp. 112-118, 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, Beijing, China, 26/07/15.

    Learning language through pictures. / Chrupala, Grzegorz; Kadar, Akos; Alishahi, Afra.

    Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers). ed. / Chengqing Zong; Michael Strube. Beijing, China : Association for Computational Linguistics, 2015. p. 112-118.

    Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

    TY - GEN

    T1 - Learning language through pictures

    AU - Chrupala, Grzegorz

    AU - Kadar, Akos

    AU - Alishahi, Afra

    PY - 2015

    Y1 - 2015

    N2 - We propose Imaginet, a model of learning visually grounded representations of language from coupled textual and visual input. The model consists of two Gated Recurrent Unit networks with shared word embeddings, and uses a multi-task objective by receiving a textual description of a scene and trying to concurrently predict its visual representation and the next word in the sentence. Like humans, it acquires meaning representations for individual words from descriptions of visual scenes. Moreover, it learns to effectively use sequential structure in semantic interpretation of multi-word phrases.

    AB - We propose Imaginet, a model of learning visually grounded representations of language from coupled textual and visual input. The model consists of two Gated Recurrent Unit networks with shared word embeddings, and uses a multi-task objective by receiving a textual description of a scene and trying to concurrently predict its visual representation and the next word in the sentence. Like humans, it acquires meaning representations for individual words from descriptions of visual scenes. Moreover, it learns to effectively use sequential structure in semantic interpretation of multi-word phrases.

    M3 - Conference contribution

    SP - 112

    EP - 118

    BT - Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

    A2 - Zong, Chengqing

    A2 - Strube, Michael

    PB - Association for Computational Linguistics

    CY - Beijing, China

    ER -

    Chrupala G, Kadar A, Alishahi A. Learning language through pictures. In Zong C, Strube M, editors, Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers). Beijing, China: Association for Computational Linguistics. 2015. p. 112-118