The five generations of entity resolution on web data

Nikoletos Konstantinos , Ekaterini Ioannou, George Papadakis

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

Abstract

Entity Resolution constitutes a core data integration task that has attracted a bulk of works on improving its effectiveness and time efficiency. This tutorial provides a comprehensive overview of the field, distinguishing relevant methods into five main generations. The first one targets Veracity in the context of structured data with a clean schema. The second generation extends its focus to cover Volume, as well, leveraging multi-core or massive parallelization to process large-scale datasets. The third generation addresses the additional challenge of Variety, targeting voluminous, noisy, semi-structured, and highly heterogeneous data from the Semantic Web. The fourth generation also tackles Velocity so as to process data collections of a continuously increasing volume. The latest works, though, belong to the fifth generation, involving pre-trained (large) language models which heavily rely on external knowledge to address all four Vs with high effectiveness.
Original languageEnglish
Title of host publicationInternational Conference on Web Engineering
Subtitle of host publicationICWE 2024
PublisherSpringer
Publication statusPublished - 2024
Event24th International Conference on Web Engineering - Tamperef, Finland
Duration: 17 Jun 202420 Jun 2024
https://icwe2024.webengineering.org/

Conference

Conference24th International Conference on Web Engineering
Abbreviated titleICWE 2024
Country/TerritoryFinland
CityTamperef
Period17/06/2420/06/24
Internet address

Keywords

  • Entity Resolution
  • Data Integration
  • LLMs

Fingerprint

Dive into the research topics of 'The five generations of entity resolution on web data'. Together they form a unique fingerprint.

Cite this