EMBench++: Benchmark data for thorough evaluation of matching-related methods

Ekaterini Ioannou*, Yannis Velegrakis

*Corresponding author for this work

Research output: Contribution to journalArticleScientificpeer-review

Abstract

Matching-related methods, i.e., entity resolution, entity search, or detecting evolution of entities, are essential parts in a variety of applications. The specific research area contains a plethora of methods focusing on efficiently and effectively detecting whether two different pieces of information describe the same real world object or, in the case of entity search and evolution, retrieving the entities of a given collection that best match the user’s description. A primary limitation of the particular research area is the lack of a widely accepted benchmark for performing extensive experimental evaluation of the proposed methods, including not only the accuracy of results but also scalability as well as performance given different data characteristics.

This paper introduces EMBench++, a principled system that can be used for generating benchmark data for the extensive evaluation of matching-related methods. Our tool is a continuation of a previous system, with the primary contributions including: modifiers that consider not only individual entity types but all available types according to the overall schema; techniques supporting the evolution of entities; and mechanisms for controlling the generation of not single data sets but collections of data sets. We also illustrate collections of entity sets generated by EMBench++ and discuss the benefits of using our system through the results of an experimental evaluation.
Original languageEnglish
Pages (from-to)435-450
JournalSemantic web
Volume10
Issue number2
DOIs
Publication statusPublished - 2019
Externally publishedYes

Keywords

  • data integration
  • matching-related methods
  • benchmarking data
  • benchmark tool
  • entity resolution
  • blocking
  • WEB

Cite this

@article{c79abdc72f644fb585a6efdd6513d48f,
title = "EMBench++: Benchmark data for thorough evaluation of matching-related methods",
abstract = "Matching-related methods, i.e., entity resolution, entity search, or detecting evolution of entities, are essential parts in a variety of applications. The specific research area contains a plethora of methods focusing on efficiently and effectively detecting whether two different pieces of information describe the same real world object or, in the case of entity search and evolution, retrieving the entities of a given collection that best match the user’s description. A primary limitation of the particular research area is the lack of a widely accepted benchmark for performing extensive experimental evaluation of the proposed methods, including not only the accuracy of results but also scalability as well as performance given different data characteristics.This paper introduces EMBench++, a principled system that can be used for generating benchmark data for the extensive evaluation of matching-related methods. Our tool is a continuation of a previous system, with the primary contributions including: modifiers that consider not only individual entity types but all available types according to the overall schema; techniques supporting the evolution of entities; and mechanisms for controlling the generation of not single data sets but collections of data sets. We also illustrate collections of entity sets generated by EMBench++ and discuss the benefits of using our system through the results of an experimental evaluation.",
keywords = "data integration, matching-related methods, benchmarking data, benchmark tool, entity resolution, blocking, WEB",
author = "Ekaterini Ioannou and Yannis Velegrakis",
year = "2019",
doi = "10.3233/SW-180331",
language = "English",
volume = "10",
pages = "435--450",
journal = "Semantic web",
issn = "1570-0844",
publisher = "IOS Press",
number = "2",

}

EMBench++ : Benchmark data for thorough evaluation of matching-related methods. / Ioannou, Ekaterini; Velegrakis, Yannis.

In: Semantic web, Vol. 10, No. 2, 2019, p. 435-450.

Research output: Contribution to journalArticleScientificpeer-review

TY - JOUR

T1 - EMBench++

T2 - Benchmark data for thorough evaluation of matching-related methods

AU - Ioannou, Ekaterini

AU - Velegrakis, Yannis

PY - 2019

Y1 - 2019

N2 - Matching-related methods, i.e., entity resolution, entity search, or detecting evolution of entities, are essential parts in a variety of applications. The specific research area contains a plethora of methods focusing on efficiently and effectively detecting whether two different pieces of information describe the same real world object or, in the case of entity search and evolution, retrieving the entities of a given collection that best match the user’s description. A primary limitation of the particular research area is the lack of a widely accepted benchmark for performing extensive experimental evaluation of the proposed methods, including not only the accuracy of results but also scalability as well as performance given different data characteristics.This paper introduces EMBench++, a principled system that can be used for generating benchmark data for the extensive evaluation of matching-related methods. Our tool is a continuation of a previous system, with the primary contributions including: modifiers that consider not only individual entity types but all available types according to the overall schema; techniques supporting the evolution of entities; and mechanisms for controlling the generation of not single data sets but collections of data sets. We also illustrate collections of entity sets generated by EMBench++ and discuss the benefits of using our system through the results of an experimental evaluation.

AB - Matching-related methods, i.e., entity resolution, entity search, or detecting evolution of entities, are essential parts in a variety of applications. The specific research area contains a plethora of methods focusing on efficiently and effectively detecting whether two different pieces of information describe the same real world object or, in the case of entity search and evolution, retrieving the entities of a given collection that best match the user’s description. A primary limitation of the particular research area is the lack of a widely accepted benchmark for performing extensive experimental evaluation of the proposed methods, including not only the accuracy of results but also scalability as well as performance given different data characteristics.This paper introduces EMBench++, a principled system that can be used for generating benchmark data for the extensive evaluation of matching-related methods. Our tool is a continuation of a previous system, with the primary contributions including: modifiers that consider not only individual entity types but all available types according to the overall schema; techniques supporting the evolution of entities; and mechanisms for controlling the generation of not single data sets but collections of data sets. We also illustrate collections of entity sets generated by EMBench++ and discuss the benefits of using our system through the results of an experimental evaluation.

KW - data integration

KW - matching-related methods

KW - benchmarking data

KW - benchmark tool

KW - entity resolution

KW - blocking

KW - WEB

U2 - 10.3233/SW-180331

DO - 10.3233/SW-180331

M3 - Article

VL - 10

SP - 435

EP - 450

JO - Semantic web

JF - Semantic web

SN - 1570-0844

IS - 2

ER -