RASCL: a randomised approach to subspace clusters

Sandy Moens, Boris Čule, Bart Goethals

    Research output: Contribution to journalArticleScientificpeer-review

    Abstract

    Subspace clustering aims to discover clusters in projections of highly dimensional numerical data. In this paper, we focus on discovering small collections of highly interesting subspace clusters that do not try to cluster all data points, leaving noisy data points unclustered. To this end, we propose a randomised method that first converts the highly dimensional database to a binarised one using projected samples of the original database. Subsequently, this database is mined for frequent itemsets, which we show can be translated back to subspace clusters. In this way, we are able to explore multiple subspaces of different sizes at the same time. In our extensive experimental analysis, we show on synthetic as well as real-world data that our method is capable of discovering highly interesting subspace clusters efficiently.
    Original languageEnglish
    Pages (from-to)243-259
    JournalInternational Journal of Data Science and Analytics
    Volume14
    Issue number3
    DOIs
    Publication statusPublished - 11 May 2022

    Keywords

    • Subspace Clusters
    • High-dimensional Data
    • Sampling
    • Maximal Itemset

    Fingerprint

    Dive into the research topics of 'RASCL: a randomised approach to subspace clusters'. Together they form a unique fingerprint.

    Cite this