Abstract
Entity Resolution constitutes a core data integration task that has attracted a bulk of works on improving its effectiveness and time efficiency. This tutorial provides a comprehensive overview of the field, distinguishing relevant methods into five main generations. The first one targets Veracity in the context of structured data with a clean schema. The second generation extends its focus to cover Volume, as well, leveraging multi-core or massive parallelization to process large-scale datasets. The third generation addresses the additional challenge of Variety, targeting voluminous, noisy, semi-structured, and highly heterogeneous data from the Semantic Web. The fourth generation also tackles Velocity so as to process data collections of a continuously increasing volume. The latest works, though, belong to the fifth generation, involving pre-trained (large) language models which heavily rely on external knowledge to address all four Vs with high effectiveness.
Original language | English |
---|---|
Title of host publication | International Conference on Web Engineering |
Subtitle of host publication | ICWE 2024 |
Publisher | Springer |
Publication status | Published - 2024 |
Event | 24th International Conference on Web Engineering - Tamperef, Finland Duration: 17 Jun 2024 → 20 Jun 2024 https://icwe2024.webengineering.org/ |
Conference
Conference | 24th International Conference on Web Engineering |
---|---|
Abbreviated title | ICWE 2024 |
Country/Territory | Finland |
City | Tamperef |
Period | 17/06/24 → 20/06/24 |
Internet address |
Keywords
- Entity Resolution
- Data Integration
- LLMs