Learning language through pictures

    Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

    15 Citations (Scopus)
    265 Downloads (Pure)

    Abstract

    We propose Imaginet, a model of learning visually grounded representations of language from coupled textual and visual input. The model consists of two Gated Recurrent Unit networks with shared word embeddings, and uses a multi-task objective by receiving a textual description of a scene and trying to concurrently predict its visual representation and the next word in the sentence. Like humans, it acquires meaning representations for individual words from descriptions of visual scenes. Moreover, it learns to effectively use sequential structure in semantic interpretation of multi-word phrases.
    Original languageEnglish
    Title of host publicationProceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)
    EditorsChengqing Zong, Michael Strube
    Place of PublicationBeijing, China
    PublisherAssociation for Computational Linguistics
    Pages112-118
    Number of pages6
    ISBN (Electronic)9781941643730
    Publication statusPublished - 2015
    Event53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing - Beijing, China
    Duration: 26 Jul 201531 Jul 2015

    Conference

    Conference53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing
    CountryChina
    CityBeijing
    Period26/07/1531/07/15

    Fingerprint Dive into the research topics of 'Learning language through pictures'. Together they form a unique fingerprint.

    Cite this