Projects per year
Abstract
Entity Resolution (ER) lies at the core of data integration, with a bulk of research focusing on its effectiveness and its time efficiency. Most past relevant works were crafted for addressing Veracity over structured (relational) data. They typically rely on schema, expert and external knowledge to maximize accuracy. Part of these methods have been recently extended to process large volumes of data through massive parallelization techniques, such as the MapReduce paradigm. With the present advent of Big Web Data, the scope moved towards Variety, aiming to handle semi-structured data collections, with noisy and highly heterogeneous information. Relevant works adopt a novel, loosely schema-aware functionality that emphasizes scalability and robustness to noise. Another line of present research focuses on Velocity, i.e., processing data collections of a continuously increasing volume.
In this tutorial, we present the ER generations by discussing past, present, and yet-to-come mechanisms. For each generation, we outline the corresponding ER workflow along with the state-of-the-art methods per workflow step. Thus, we provide the participants with a deep understanding of the broad field of ER, highlighting the recent advances in crowd-sourcing and deep learning applications in this active research domain. We also equip them with practical skills in applying ER workflows through a hands-on session that involves our publicly available ER toolbox and data.
In this tutorial, we present the ER generations by discussing past, present, and yet-to-come mechanisms. For each generation, we outline the corresponding ER workflow along with the state-of-the-art methods per workflow step. Thus, we provide the participants with a deep understanding of the broad field of ER, highlighting the recent advances in crowd-sourcing and deep learning applications in this active research domain. We also equip them with practical skills in applying ER workflows through a hands-on session that involves our publicly available ER toolbox and data.
Original language | English |
---|---|
Publication status | Published - 2020 |
Event | EDBT/ICDT 2020 Joint Conference - Copenhagen, Denmark Duration: 20 Mar 2020 → 2 Apr 2020 https://diku-dk.github.io/edbticdt2020/ |
Conference
Conference | EDBT/ICDT 2020 Joint Conference |
---|---|
Country/Territory | Denmark |
City | Copenhagen |
Period | 20/03/20 → 2/04/20 |
Internet address |
Fingerprint
Dive into the research topics of 'Entity resolution: Past, present and yet-to-come: From structured to heterogeneous, to crowd-sourced, to deep learned'. Together they form a unique fingerprint.Projects
- 1 Active