Real-world K-Anonymity applications: The KGEN approach and its evaluation in fraudulent transactions

Research output: Contribution to journalArticleScientificpeer-review

6 Downloads (Pure)

Abstract

K-Anonymity is a property for the measurement, management, and governance of the data anonymization. Many implementations of k-anonymity have been described in state of the art, but most of them are not practically usable over a large number of attributes in a "Big"dataset, i.e., a dataset drawing from Big Data. To address this significant shortcoming, we introduce and evaluate KGEN, an approach to K-anonymity featuring meta-heuristics, specifically, Genetic Algorithms to compute a permutation of the dataset which is both K-anonymized and still usable for further processing, e.g., for private-bydesign analytics. KGEN promotes such a meta-heuristic approach since it can solve the problem by finding a pseudo-optimal solution in a reasonable time over a considerable load of input. KGEN allows the data manager to guarantee a high anonymity level while preserving the usability and preventing loss of information entropy over the data. Differently from other approaches that provide optimal global solutions compatible with smaller datasets, KGEN works properly also over Big datasets while still providing a good-enough K-anonymized but still processable dataset. Evaluation results show how our approach can still work efficiently on a real world dataset, provided by Dutch Tax Authority, with 47 attributes (i.e., the columns of the dataset to be anonymized) and over 1.5K+ observations (i.e., the rows of that dataset), as well as on a dataset with 97 attributes and over 3942 observations. (c) 2023 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
Original languageEnglish
Article number102193
Number of pages13
JournalInformation Systems
Volume115
Early online dateFeb 2023
DOIs
Publication statusPublished - May 2023

Keywords

  • Big data
  • Data -intensive applications design &amp
  • K-Anonymity
  • Privacy-by design
  • Scalability
  • Operations

Fingerprint

Dive into the research topics of 'Real-world K-Anonymity applications: The KGEN approach and its evaluation in fraudulent transactions'. Together they form a unique fingerprint.

Cite this