EMBench++: Benchmark data for thorough evaluation of matching-related methods

Ekaterini Ioannou*, Yannis Velegrakis

*Corresponding author for this work

Research output: Contribution to journalArticleScientificpeer-review

Abstract

Matching-related methods, i.e., entity resolution, entity search, or detecting evolution of entities, are essential parts in a variety of applications. The specific research area contains a plethora of methods focusing on efficiently and effectively detecting whether two different pieces of information describe the same real world object or, in the case of entity search and evolution, retrieving the entities of a given collection that best match the user’s description. A primary limitation of the particular research area is the lack of a widely accepted benchmark for performing extensive experimental evaluation of the proposed methods, including not only the accuracy of results but also scalability as well as performance given different data characteristics.

This paper introduces EMBench++, a principled system that can be used for generating benchmark data for the extensive evaluation of matching-related methods. Our tool is a continuation of a previous system, with the primary contributions including: modifiers that consider not only individual entity types but all available types according to the overall schema; techniques supporting the evolution of entities; and mechanisms for controlling the generation of not single data sets but collections of data sets. We also illustrate collections of entity sets generated by EMBench++ and discuss the benefits of using our system through the results of an experimental evaluation.
Original languageEnglish
Pages (from-to)435-450
JournalSemantic web
Volume10
Issue number2
DOIs
Publication statusPublished - 2019
Externally publishedYes

Keywords

  • data integration
  • matching-related methods
  • benchmarking data
  • benchmark tool
  • entity resolution
  • blocking
  • WEB

Fingerprint

Dive into the research topics of 'EMBench++: Benchmark data for thorough evaluation of matching-related methods'. Together they form a unique fingerprint.

Cite this