The PhotoBook Dataset: Building Common Ground through Visually-Grounded Dialogue

Janosch Haber*, Tim Baumgärtner, Ece Takmaz, Lieke Gelderloos, Elia Bruni, Raquel Fernández

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

Abstract

This paper introduces the PhotoBook dataset, a large-scale collection of visually-grounded, task-oriented dialogues in English designed to investigate shared dialogue history accumulating during conversation. Taking inspiration from seminal work on dialogue analysis, we propose a data-collection task formulated as a collaborative game prompting two online participants to refer to images utilising both their visual context as well as previously established referring expressions. We provide a detailed description of the task setup and a thorough analysis of the 2,500 dialogues collected. To further illustrate the novel features of the dataset, we propose a baseline model for reference resolution which uses a simple method to take into account shared information accumulated in a reference chain. Our results show that this information is particularly important to resolve later descriptions and underline the need to develop more sophisticated models of common ground in dialogue interaction.
Original languageEnglish
Title of host publicationProceedings of the 57th Annual Meeting of the Association for Computational Linguistics
PublisherAssociation for Computational Linguistics
Pages1895-1910
DOIs
Publication statusPublished - Jul 2019
EventAnnual Meeting of the Association for Computational Linguistics 2019 - Florence, Italy
Duration: 28 Jul 20192 Aug 2019
Conference number: 57
http://www.acl2019.org/EN/index.xhtml

Conference

ConferenceAnnual Meeting of the Association for Computational Linguistics 2019
Abbreviated titleACL 2019
Country/TerritoryItaly
CityFlorence
Period28/07/192/08/19
Internet address

Fingerprint

Dive into the research topics of 'The PhotoBook Dataset: Building Common Ground through Visually-Grounded Dialogue'. Together they form a unique fingerprint.

Cite this