Entity resolution in large patent databases: An optimization approach

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

Abstract

Entity resolution in databases focuses on detecting and merging entities that refer to the same real-world object. Collective resolution is among the most prominent mechanisms suggested to address this challenge since the resolution decisions are not made independently, but are based on the available relationships within the data. In this paper, we introduce a novel resolution approach that combines the essence of collective resolution with rules and transformations among entity attributes and values. We illustrate how the approach’s parameters are optimized based on a global optimization algorithm, i.e., simulated annealing, and explain how this optimization is performed using a small training set. The quality of the approach is verified through an extensive experimental evaluation with 40M real-world scientific entities from the Patstat database.
Original languageEnglish
Title of host publicationProceedings of the 23th International Conference on Enterprise Information Systems (ICEIS 2021)
EditorsJoaquim Filipe, Michal Smialek, Alexander Brodsky, Slimane Hammoudi
PublisherINSTICC Press
Pages148-156
Volume1
Edition23
ISBN (Print)9789897585098
Publication statusPublished - 1 May 2021
Event23rd International Conference on Enterprise Information Systems (ICEIS 2021) -
Duration: 26 Apr 202128 Apr 2021
Conference number: 23
http://www.iceis.org/

Publication series

Name
ISSN (Print)2184-4992

Conference

Conference23rd International Conference on Enterprise Information Systems (ICEIS 2021)
Abbreviated titleICEIS 2021
Period26/04/2128/04/21
Internet address

Keywords

  • Entity Resolution
  • Data Disambiguation
  • Data Cleaning
  • Data Integration
  • Bibliographic Databases

Fingerprint

Dive into the research topics of 'Entity resolution in large patent databases: An optimization approach'. Together they form a unique fingerprint.

Cite this