Entity resolution: Past, present and yet-to-come: From structured to heterogeneous, to crowd-sourced, to deep learned

George Papadakis, Ekaterini Ioannou, Themis Palpanas

Research output: Contribution to conferencePaperScientificpeer-review

Abstract

Entity Resolution (ER) lies at the core of data integration, with a bulk of research focusing on its effectiveness and its time efficiency. Most past relevant works were crafted for addressing Veracity over structured (relational) data. They typically rely on schema, expert and external knowledge to maximize accuracy. Part of these methods have been recently extended to process large volumes of data through massive parallelization techniques, such as the MapReduce paradigm. With the present advent of Big Web Data, the scope moved towards Variety, aiming to handle semi-structured data collections, with noisy and highly heterogeneous information. Relevant works adopt a novel, loosely schema-aware functionality that emphasizes scalability and robustness to noise. Another line of present research focuses on Velocity, i.e., processing data collections of a continuously increasing volume.

In this tutorial, we present the ER generations by discussing past, present, and yet-to-come mechanisms. For each generation, we outline the corresponding ER workflow along with the state-of-the-art methods per workflow step. Thus, we provide the participants with a deep understanding of the broad field of ER, highlighting the recent advances in crowd-sourcing and deep learning applications in this active research domain. We also equip them with practical skills in applying ER workflows through a hands-on session that involves our publicly available ER toolbox and data.
Original languageEnglish
Publication statusPublished - 2020
EventEDBT/ICDT 2020 Joint Conference - Copenhagen, Denmark
Duration: 20 Mar 20202 Apr 2020
https://diku-dk.github.io/edbticdt2020/

Conference

ConferenceEDBT/ICDT 2020 Joint Conference
CountryDenmark
CityCopenhagen
Period20/03/202/04/20
Internet address

Fingerprint

Data integration
Scalability
Deep learning

Cite this

Papadakis, G., Ioannou, E., & Palpanas, T. (2020). Entity resolution: Past, present and yet-to-come: From structured to heterogeneous, to crowd-sourced, to deep learned. Paper presented at EDBT/ICDT 2020 Joint Conference, Copenhagen, Denmark.
Papadakis, George ; Ioannou, Ekaterini ; Palpanas, Themis. / Entity resolution: Past, present and yet-to-come : From structured to heterogeneous, to crowd-sourced, to deep learned. Paper presented at EDBT/ICDT 2020 Joint Conference, Copenhagen, Denmark.
@conference{bf176f9157b24a5693bcc761a14589a3,
title = "Entity resolution: Past, present and yet-to-come: From structured to heterogeneous, to crowd-sourced, to deep learned",
abstract = "Entity Resolution (ER) lies at the core of data integration, with a bulk of research focusing on its effectiveness and its time efficiency. Most past relevant works were crafted for addressing Veracity over structured (relational) data. They typically rely on schema, expert and external knowledge to maximize accuracy. Part of these methods have been recently extended to process large volumes of data through massive parallelization techniques, such as the MapReduce paradigm. With the present advent of Big Web Data, the scope moved towards Variety, aiming to handle semi-structured data collections, with noisy and highly heterogeneous information. Relevant works adopt a novel, loosely schema-aware functionality that emphasizes scalability and robustness to noise. Another line of present research focuses on Velocity, i.e., processing data collections of a continuously increasing volume.In this tutorial, we present the ER generations by discussing past, present, and yet-to-come mechanisms. For each generation, we outline the corresponding ER workflow along with the state-of-the-art methods per workflow step. Thus, we provide the participants with a deep understanding of the broad field of ER, highlighting the recent advances in crowd-sourcing and deep learning applications in this active research domain. We also equip them with practical skills in applying ER workflows through a hands-on session that involves our publicly available ER toolbox and data.",
author = "George Papadakis and Ekaterini Ioannou and Themis Palpanas",
year = "2020",
language = "English",
note = "EDBT/ICDT 2020 Joint Conference ; Conference date: 20-03-2020 Through 02-04-2020",
url = "https://diku-dk.github.io/edbticdt2020/",

}

Papadakis, G, Ioannou, E & Palpanas, T 2020, 'Entity resolution: Past, present and yet-to-come: From structured to heterogeneous, to crowd-sourced, to deep learned', Paper presented at EDBT/ICDT 2020 Joint Conference, Copenhagen, Denmark, 20/03/20 - 2/04/20.

Entity resolution: Past, present and yet-to-come : From structured to heterogeneous, to crowd-sourced, to deep learned. / Papadakis, George; Ioannou, Ekaterini; Palpanas, Themis.

2020. Paper presented at EDBT/ICDT 2020 Joint Conference, Copenhagen, Denmark.

Research output: Contribution to conferencePaperScientificpeer-review

TY - CONF

T1 - Entity resolution: Past, present and yet-to-come

T2 - From structured to heterogeneous, to crowd-sourced, to deep learned

AU - Papadakis, George

AU - Ioannou, Ekaterini

AU - Palpanas, Themis

PY - 2020

Y1 - 2020

N2 - Entity Resolution (ER) lies at the core of data integration, with a bulk of research focusing on its effectiveness and its time efficiency. Most past relevant works were crafted for addressing Veracity over structured (relational) data. They typically rely on schema, expert and external knowledge to maximize accuracy. Part of these methods have been recently extended to process large volumes of data through massive parallelization techniques, such as the MapReduce paradigm. With the present advent of Big Web Data, the scope moved towards Variety, aiming to handle semi-structured data collections, with noisy and highly heterogeneous information. Relevant works adopt a novel, loosely schema-aware functionality that emphasizes scalability and robustness to noise. Another line of present research focuses on Velocity, i.e., processing data collections of a continuously increasing volume.In this tutorial, we present the ER generations by discussing past, present, and yet-to-come mechanisms. For each generation, we outline the corresponding ER workflow along with the state-of-the-art methods per workflow step. Thus, we provide the participants with a deep understanding of the broad field of ER, highlighting the recent advances in crowd-sourcing and deep learning applications in this active research domain. We also equip them with practical skills in applying ER workflows through a hands-on session that involves our publicly available ER toolbox and data.

AB - Entity Resolution (ER) lies at the core of data integration, with a bulk of research focusing on its effectiveness and its time efficiency. Most past relevant works were crafted for addressing Veracity over structured (relational) data. They typically rely on schema, expert and external knowledge to maximize accuracy. Part of these methods have been recently extended to process large volumes of data through massive parallelization techniques, such as the MapReduce paradigm. With the present advent of Big Web Data, the scope moved towards Variety, aiming to handle semi-structured data collections, with noisy and highly heterogeneous information. Relevant works adopt a novel, loosely schema-aware functionality that emphasizes scalability and robustness to noise. Another line of present research focuses on Velocity, i.e., processing data collections of a continuously increasing volume.In this tutorial, we present the ER generations by discussing past, present, and yet-to-come mechanisms. For each generation, we outline the corresponding ER workflow along with the state-of-the-art methods per workflow step. Thus, we provide the participants with a deep understanding of the broad field of ER, highlighting the recent advances in crowd-sourcing and deep learning applications in this active research domain. We also equip them with practical skills in applying ER workflows through a hands-on session that involves our publicly available ER toolbox and data.

UR - https://research.tilburguniversity.edu/en/projects/4ger

M3 - Paper

ER -

Papadakis G, Ioannou E, Palpanas T. Entity resolution: Past, present and yet-to-come: From structured to heterogeneous, to crowd-sourced, to deep learned. 2020. Paper presented at EDBT/ICDT 2020 Joint Conference, Copenhagen, Denmark.