Modeling relations in a referential game

    Research output: Contribution to conferenceAbstractOther research output

    Abstract

    Grounding language in the physical world enables humans to use words and sentences in context and to link them to actions. Several recent computer vision studies have worked on the task of expression grounding: learning to select that part of an image that depicts the referent of a multi-word expression. The task is approached by joint processing of the language expression, visual information of individual candidate referents, and in some cases the general visual context, using neural models that combine recurrent and convolutional components (Rohrbach et al., 2016; Hu et al., 2016b,a). However, there is more than just the intended referent by itself that determines how a referring expression is phrased. When referring to an element of a scene, its relations with and contrasts to other elements are taken into account in order to produce an expression that uniquely identifies the intended referent. Inspired by recent work on visual question answering using Relation Networks (Santoro et al., 2017) we build and evaluate models of expression grounding that take in account interactions between elements of the visual scene. We provide an analysis of the performance and the relational representations learned in this setting.
    Original languageEnglish
    Publication statusPublished - 2017
    EventComputational Linguistics in the Netherlands - Nijmegen, Netherlands
    Duration: 26 Jan 2018 → …

    Conference

    ConferenceComputational Linguistics in the Netherlands
    CountryNetherlands
    CityNijmegen
    Period26/01/18 → …

    Fingerprint

    Electric grounding
    Computer vision
    Processing

    Cite this

    Gelderloos, L., Alishahi, A., Chrupala, G., & Fernández, R. (2017). Modeling relations in a referential game. Abstract from Computational Linguistics in the Netherlands, Nijmegen, Netherlands.
    Gelderloos, Lieke ; Alishahi, Afra ; Chrupala, Grzegorz ; Fernández, Raquel. / Modeling relations in a referential game. Abstract from Computational Linguistics in the Netherlands, Nijmegen, Netherlands.
    @conference{3917e8d746d141ae9716f92074798f87,
    title = "Modeling relations in a referential game",
    abstract = "Grounding language in the physical world enables humans to use words and sentences in context and to link them to actions. Several recent computer vision studies have worked on the task of expression grounding: learning to select that part of an image that depicts the referent of a multi-word expression. The task is approached by joint processing of the language expression, visual information of individual candidate referents, and in some cases the general visual context, using neural models that combine recurrent and convolutional components (Rohrbach et al., 2016; Hu et al., 2016b,a). However, there is more than just the intended referent by itself that determines how a referring expression is phrased. When referring to an element of a scene, its relations with and contrasts to other elements are taken into account in order to produce an expression that uniquely identifies the intended referent. Inspired by recent work on visual question answering using Relation Networks (Santoro et al., 2017) we build and evaluate models of expression grounding that take in account interactions between elements of the visual scene. We provide an analysis of the performance and the relational representations learned in this setting.",
    author = "Lieke Gelderloos and Afra Alishahi and Grzegorz Chrupala and Raquel Fern{\'a}ndez",
    year = "2017",
    language = "English",
    note = "Computational Linguistics in the Netherlands ; Conference date: 26-01-2018",

    }

    Gelderloos, L, Alishahi, A, Chrupala, G & Fernández, R 2017, 'Modeling relations in a referential game' Computational Linguistics in the Netherlands, Nijmegen, Netherlands, 26/01/18, .

    Modeling relations in a referential game. / Gelderloos, Lieke; Alishahi, Afra; Chrupala, Grzegorz; Fernández, Raquel.

    2017. Abstract from Computational Linguistics in the Netherlands, Nijmegen, Netherlands.

    Research output: Contribution to conferenceAbstractOther research output

    TY - CONF

    T1 - Modeling relations in a referential game

    AU - Gelderloos, Lieke

    AU - Alishahi, Afra

    AU - Chrupala, Grzegorz

    AU - Fernández, Raquel

    PY - 2017

    Y1 - 2017

    N2 - Grounding language in the physical world enables humans to use words and sentences in context and to link them to actions. Several recent computer vision studies have worked on the task of expression grounding: learning to select that part of an image that depicts the referent of a multi-word expression. The task is approached by joint processing of the language expression, visual information of individual candidate referents, and in some cases the general visual context, using neural models that combine recurrent and convolutional components (Rohrbach et al., 2016; Hu et al., 2016b,a). However, there is more than just the intended referent by itself that determines how a referring expression is phrased. When referring to an element of a scene, its relations with and contrasts to other elements are taken into account in order to produce an expression that uniquely identifies the intended referent. Inspired by recent work on visual question answering using Relation Networks (Santoro et al., 2017) we build and evaluate models of expression grounding that take in account interactions between elements of the visual scene. We provide an analysis of the performance and the relational representations learned in this setting.

    AB - Grounding language in the physical world enables humans to use words and sentences in context and to link them to actions. Several recent computer vision studies have worked on the task of expression grounding: learning to select that part of an image that depicts the referent of a multi-word expression. The task is approached by joint processing of the language expression, visual information of individual candidate referents, and in some cases the general visual context, using neural models that combine recurrent and convolutional components (Rohrbach et al., 2016; Hu et al., 2016b,a). However, there is more than just the intended referent by itself that determines how a referring expression is phrased. When referring to an element of a scene, its relations with and contrasts to other elements are taken into account in order to produce an expression that uniquely identifies the intended referent. Inspired by recent work on visual question answering using Relation Networks (Santoro et al., 2017) we build and evaluate models of expression grounding that take in account interactions between elements of the visual scene. We provide an analysis of the performance and the relational representations learned in this setting.

    M3 - Abstract

    ER -

    Gelderloos L, Alishahi A, Chrupala G, Fernández R. Modeling relations in a referential game. 2017. Abstract from Computational Linguistics in the Netherlands, Nijmegen, Netherlands.