TY - JOUR
T1 - Explainable data mining model for hyperinsulinemia diagnostics
AU - Ranković, Nevena
AU - Rankovic, Dragica
AU - Ivanovic, Mirjana
AU - Lukić, Igor
N1 - Publisher Copyright:
© 2024 The Author(s). Published by Informa UK Limited, trading as Taylor & Francis Group.
PY - 2024/3/4
Y1 - 2024/3/4
N2 - In our research, we present a data mining model for the early diagnosis of hyperinsulinemia, potentially reducing the risk of diabetes, heart disease, and other chronic conditions. The dataset, gathered from 2019 to 2022 by Serbia's Healthcare Center through an observational cross-sectional study, includes 1008 adolescents. Medical datasets are often highly imbalanced and may contain irrelevant features that hinder predictive performance. To address these challenges in the medical data analysis, we propose a model employing Functional Principal Component Analysis (FPCA), which also accounts for outliers that could otherwise lead to the inclusion of irrelevant features. Unlike standard Principal Component Analysis (PCA), which is sensitive to the initial positions of cluster centers influencing the final outcome, our model integrates FPCA with K-Means clustering to improve the preprocessing stage. Additionally, we have incorporated the post-hoc explanatory method SHAP (SHapley Additive exPlanations) alongside algorithms such as Random Forest, XGBoost, and LightGBM to provide deeper insights into our model, identifying the most contributory features for the development of hyperinsulinemia. Experimental results showed that combining FPCA with K-Means clustering enhances the accuracy of the XGBoost classifier, with this model achieving an accuracy score of 0.99.
AB - In our research, we present a data mining model for the early diagnosis of hyperinsulinemia, potentially reducing the risk of diabetes, heart disease, and other chronic conditions. The dataset, gathered from 2019 to 2022 by Serbia's Healthcare Center through an observational cross-sectional study, includes 1008 adolescents. Medical datasets are often highly imbalanced and may contain irrelevant features that hinder predictive performance. To address these challenges in the medical data analysis, we propose a model employing Functional Principal Component Analysis (FPCA), which also accounts for outliers that could otherwise lead to the inclusion of irrelevant features. Unlike standard Principal Component Analysis (PCA), which is sensitive to the initial positions of cluster centers influencing the final outcome, our model integrates FPCA with K-Means clustering to improve the preprocessing stage. Additionally, we have incorporated the post-hoc explanatory method SHAP (SHapley Additive exPlanations) alongside algorithms such as Random Forest, XGBoost, and LightGBM to provide deeper insights into our model, identifying the most contributory features for the development of hyperinsulinemia. Experimental results showed that combining FPCA with K-Means clustering enhances the accuracy of the XGBoost classifier, with this model achieving an accuracy score of 0.99.
KW - FPCA
KW - Hyperinsulinemia
KW - K-Means
KW - PCA
KW - SHAP
UR - http://www.scopus.com/inward/record.url?scp=85186619850&partnerID=8YFLogxK
U2 - 10.1080/09540091.2024.2325496
DO - 10.1080/09540091.2024.2325496
M3 - Article
SN - 0954-0091
VL - 36
JO - Connection Science: Journal of Neural Computing, Artificial Intelligence and Cognitive Research
JF - Connection Science: Journal of Neural Computing, Artificial Intelligence and Cognitive Research
IS - 1
M1 - 2325496
ER -