How Do Image Description Systems Describe People? A Targeted Assessment of System Competence in the PEOPLE-domain

    Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

    Abstract

    Evaluations of image description systems are typically domain-general: generated descriptions for the held-out test images are either compared to a set of reference descriptions (using automated metrics), or rated by human judges on one or more Likert scales (for fluency, overall quality, and other quality criteria). While useful, these evaluations do not tell us anything about the kinds of image descriptions that systems are able to produce. Or, phrased differently, these evaluations do not tell us anything about the cognitive capabilities of image description systems. This paper proposes a different kind of assessment, that is able to quantify the extent to which these systems are able to describe humans. This assessment is based on a manual characterisation (a context-free grammar) of English entity labels in the PEOPLE domain, to determine the range of possible outputs. We examined 9 systems to see what kinds of labels they actually use. We found that these systems only use a small subset of at most 13 different kinds of modifiers (e.g. tall and short modify HEIGHT, sad and happy modify MOOD), but 27 kinds of modifiers are never used. Future research could study these semantic dimensions in more detail.
    Original languageEnglish
    Title of host publicationProceedings of the Second Workshop on Beyond Vision and LANguage: inTEgrating Real-world kNowledge (LANTERN)
    Place of PublicationBarcelona, Spain
    PublisherAssociation for Computational Linguistics
    Pages30-36
    Number of pages7
    Publication statusPublished - 1 Dec 2020
    EventWorkshop on Beyond Vision and LANguage: inTEgrating Real-world kNowledge - online, Saarbrucken, Germany
    Duration: 1 Dec 2020 → …
    Conference number: 2
    https://www.lantern.uni-saarland.de/2020/

    Workshop

    WorkshopWorkshop on Beyond Vision and LANguage: inTEgrating Real-world kNowledge
    Abbreviated titleLANTERN
    Country/TerritoryGermany
    CitySaarbrucken
    Period1/12/20 → …
    Internet address

    Fingerprint

    Dive into the research topics of 'How Do Image Description Systems Describe People? A Targeted Assessment of System Competence in the PEOPLE-domain'. Together they form a unique fingerprint.

    Cite this