Entity resolution: Past, present and yet-to-come: From structured to heterogeneous, to crowd-sourced, to deep learned

George Papadakis, Ekaterini Ioannou, Themis Palpanas

Research output: Contribution to conferencePaperScientificpeer-review

Abstract

Entity Resolution (ER) lies at the core of data integration, with a bulk of research focusing on its effectiveness and its time efficiency. Most past relevant works were crafted for addressing Veracity over structured (relational) data. They typically rely on schema, expert and external knowledge to maximize accuracy. Part of these methods have been recently extended to process large volumes of data through massive parallelization techniques, such as the MapReduce paradigm. With the present advent of Big Web Data, the scope moved towards Variety, aiming to handle semi-structured data collections, with noisy and highly heterogeneous information. Relevant works adopt a novel, loosely schema-aware functionality that emphasizes scalability and robustness to noise. Another line of present research focuses on Velocity, i.e., processing data collections of a continuously increasing volume.

In this tutorial, we present the ER generations by discussing past, present, and yet-to-come mechanisms. For each generation, we outline the corresponding ER workflow along with the state-of-the-art methods per workflow step. Thus, we provide the participants with a deep understanding of the broad field of ER, highlighting the recent advances in crowd-sourcing and deep learning applications in this active research domain. We also equip them with practical skills in applying ER workflows through a hands-on session that involves our publicly available ER toolbox and data.
Original languageEnglish
Publication statusPublished - 2020
EventEDBT/ICDT 2020 Joint Conference - Copenhagen, Denmark
Duration: 20 Mar 20202 Apr 2020
https://diku-dk.github.io/edbticdt2020/

Conference

ConferenceEDBT/ICDT 2020 Joint Conference
CountryDenmark
CityCopenhagen
Period20/03/202/04/20
Internet address

Fingerprint Dive into the research topics of 'Entity resolution: Past, present and yet-to-come: From structured to heterogeneous, to crowd-sourced, to deep learned'. Together they form a unique fingerprint.

  • Projects

    4gER

    Papadakis, G., Ioannou, E. & Palpanas, T.

    1/01/20 → …

    Project: Research project

    File

    Cite this

    Papadakis, G., Ioannou, E., & Palpanas, T. (2020). Entity resolution: Past, present and yet-to-come: From structured to heterogeneous, to crowd-sourced, to deep learned. Paper presented at EDBT/ICDT 2020 Joint Conference, Copenhagen, Denmark.