Data quality measures based on granular computing for multi-label classification

Marilyn Bello*, Gonzalo Nápoles, Koen Vanhoof, Rafael Bello

*Corresponding author for this work

    Research output: Contribution to journalArticleScientificpeer-review

    27 Citations (Scopus)

    Abstract

    Rough set theory is a granular computing formalism that allows analyzing a given dataset through well-defined measures. Some of these measures aim to characterize datasets used to discover knowledge, mostly in traditional classification problems. Measuring the data quality is pivotal to estimate beforehand the problem’s difficulty since a classification model’s accuracy heavily depends on the data quality. However, to the best of our knowledge, there are no measures devoted to analyzing the quality of multi-label datasets. In this paper, we propose six data quality measures for multi-label problems, which are based on different granular approaches. Some of these measures redefine the decision class concept, while others redefine the consistency concept. Moreover, we study the impact of the similarity threshold parameters and the distance functions on the behavior of these measures. The numerical simulations show a statistical correlation between the measures that redefine the consistency concept and the performance of the ML-kNN classifier.
    Original languageEnglish
    Pages (from-to)51-67
    Number of pages17
    JournalInformation Sciences
    Volume560
    DOIs
    Publication statusPublished - Jun 2021

    Keywords

    • Multi-label classification
    • Granular computing
    • Rough set theory
    • Data quality measures

    Fingerprint

    Dive into the research topics of 'Data quality measures based on granular computing for multi-label classification'. Together they form a unique fingerprint.

    Cite this