Skip to main navigation Skip to search Skip to main content

Investigating Fairness with FanFAIR: is Pre-processing Useful Only for Performances?

  • Michele Rispoli
  • , Marco S. Nobile
  • , Luca Manzoni
  • , Alberto D'Onofrio
  • , Marco Confalonieri
  • , Francesco Salton
  • , Paola Confalonieri
  • , Barbara Ruaro
  • , Chiara Gallese
  • , Ieee

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

Abstract

Artificial Intelligence, and Machine Learning systems in general, are becoming pervasive in our society, from the industry to the public administration. AI can often provide a very efficient means to support decision-making, but it can represent a danger for high-risk applications such as bio-medicine and healthcare. In particular, biased datasets might lead to inaccurate or discriminatory ML systems, undermining the accuracy of their predictions and putting patients' health at risk. FanFAIR is a python tool that provides the community with a semi-automatic tool for datasets' fairness assessment. FanFAIR is designed to integrate qualitative considerations - such as ethics, human rights assessment, and data protection - with quantitative indicators of dataset's fairness, such as balance, the presence of invalid entries, or outliers. In this work, we extend FanFAIR to deal with categorical data, and introduce a new algorithm for outlier detection in the presence of missing values. We then provide a case study on the data collected from COVID patients admitted to pneumology departments in Italy. We show how the successive steps of data cleaning and variable selection improve the indicators provided by FanFAIR. This shows that data cleaning procedures are not only necessary to improve the performance of the machine learning algorithm using the data for learning, but are also a way to improve (a measure of) fairness. Hence, the proposed case study provides an example in which performance and fairness are not in contrast, like it is commonly believed to be, but they improve together.
Original languageEnglish
Title of host publication2025 Ieee Symposium On Computational Intelligence In Health And Medicine, Cihm
PublisherIEEE
Number of pages7
ISBN (Electronic)979-8-3315-0833-3
ISBN (Print)979-8-3315-0834-0
DOIs
Publication statusPublished - 2025
Event2025 Symposium on Computational Intelligence in Health and Medicine-CIHM - Trondheim, Norway
Duration: 17 Mar 202520 Mar 2025

Conference

Conference2025 Symposium on Computational Intelligence in Health and Medicine-CIHM
Country/TerritoryNorway
CityTrondheim
Period17/03/2520/03/25

Keywords

  • Data cleaning
  • Dataset assessment
  • Debiasing
  • Fairness
  • Preprocessing
  • Sensitive attributes

Fingerprint

Dive into the research topics of 'Investigating Fairness with FanFAIR: is Pre-processing Useful Only for Performances?'. Together they form a unique fingerprint.

Cite this