Query-based summarization of discussion threads

Suzan Verberne, Emiel Krahmer, Sander Wubben, Antal van den Bosch

Research output: Contribution to journalArticleScientificpeer-review


In this paper, we address query-based summarization of discussion threads. New users can profit from the information shared in the forum, Please check if the inserted city and country names in the affiliations are correct. if they can find back the previously posted information. However, discussion threads on a single topic can easily comprise dozens or hundreds of individual posts. Our aim is to summarize forum threads given real web search queries. We created a data set with search queries from a discussion forum's search engine log and the discussion threads that were clicked by the user who entered the query. For 120 thread-query combinations, a reference summary was made by five different human raters. We compared two methods for automatic summarization of the threads: a query-independent method based on post features, and Maximum Marginal Relevance (MMR), a method that takes the query into account. We also compared four different word embeddings representations as alternative for standard word vectors in extractive summarization. We find (1) that the agreement between human summarizers does not improve when a query is provided that: (2) the query-independent post features as well as a centroid-based baseline outperform MMR by a large margin; (3) combining the post features with query similarity gives a small improvement over the use of post features alone; and (4) for the word embeddings, a match in domain appears to be more important than corpus size and dimensionality. However, the differences between the models were not reflected by differences in quality of the summaries created with help of these models. We conclude that query-based summarization with web queries is challenging because the queries are short, and a click on a result is not a direct indicator for the relevance of the result.

Original languageEnglish
Article number1351324919000123
Pages (from-to)3-29
Number of pages27
JournalNatural Language Engineering
Issue number1
Publication statusPublished - Jan 2020


  • discussion forums
  • evaluation
  • query-based summarization
  • reference summaries
  • word embeddings


Dive into the research topics of 'Query-based summarization of discussion threads'. Together they form a unique fingerprint.

Cite this