How Do Image Description Systems Describe People? A Targeted Assessment of System Competence in the PEOPLE-domain

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

Abstract

Evaluations of image description systems are typically domain-general: generated descriptions for the held-out test images are either compared to a set of reference descriptions (using automated metrics), or rated by human judges on one or more Likert scales (for fluency, overall quality, and other quality criteria). While useful, these evaluations do not tell us anything about the kinds of image descriptions that systems are able to produce. Or, phrased differently, these evaluations do not tell us anything about the cognitive capabilities of image description systems. This paper proposes a different kind of assessment, that is able to quantify the extent to which these systems are able to describe humans. This assessment is based on a manual characterisation (a context-free grammar) of English entity labels in the PEOPLE domain, to determine the range of possible outputs. We examined 9 systems to see what kinds of labels they actually use. We found that these systems only use a small subset of at most 13 different kinds of modifiers (e.g. tall and short modify HEIGHT, sad and happy modify MOOD), but 27 kinds of modifiers are never used. Future research could study these semantic dimensions in more detail.
Original languageEnglish
Title of host publicationProceedings of the Second Workshop on Beyond Vision and LANguage: inTEgrating Real-world kNowledge (LANTERN)
Place of PublicationBarcelona, Spain
PublisherAssociation for Computational Linguistics
Pages30-36
Number of pages7
Publication statusPublished - 1 Dec 2020
EventWorkshop on Beyond Vision and LANguage: inTEgrating Real-world kNowledge - online, Saarbrucken, Germany
Duration: 1 Dec 2020 → …
Conference number: 2
https://www.lantern.uni-saarland.de/2020/

Workshop

WorkshopWorkshop on Beyond Vision and LANguage: inTEgrating Real-world kNowledge
Abbreviated titleLANTERN
CountryGermany
CitySaarbrucken
Period1/12/20 → …
Internet address

Fingerprint Dive into the research topics of 'How Do Image Description Systems Describe People? A Targeted Assessment of System Competence in the PEOPLE-domain'. Together they form a unique fingerprint.

Cite this