Subspace K-means clustering

Marieke E. Timmerman*, Eva Ceulemans, Kim De Roover, Karla Van Leeuwen

*Corresponding author for this work

Research output: Contribution to journalArticleScientificpeer-review

25 Citations (Scopus)

Abstract

To achieve an insightful clustering of multivariate data, we propose subspace K-means. Its central idea is to model the centroids and cluster residuals in reduced spaces, which allows for dealing with a wide range of cluster types and yields rich interpretations of the clusters. We review the existing related clustering methods, including deterministic, stochastic, and unsupervised learning approaches. To evaluate subspace K-means, we performed a comparative simulation study, in which we manipulated the overlap of subspaces, the between-cluster variance, and the error variance. The study shows that the subspace K-means algorithm is sensitive to local minima but that the problem can be reasonably dealt with by using partitions of various cluster procedures as a starting point for the algorithm. Subspace K-means performs very well in recovering the true clustering across all conditions considered and appears to be superior to its competitor methods: K-means, reduced K-means, factorial K-means, mixtures of factor analyzers (MFA), and MCLUST. The best competitor method, MFA, showed a performance similar to that of subspace K-means in easy conditions but deteriorated in more difficult ones. Using data from a study on parental behavior, we show that subspace K-means analysis provides a rich insight into the cluster characteristics, in terms of both the relative positions of the clusters (via the centroids) and the shape of the clusters (via the within-cluster residuals).

Original languageEnglish
Pages (from-to)1011-1023
Number of pages13
JournalBehavior Research Methods
Volume45
Issue number4
DOIs
Publication statusPublished - Dec 2013
Externally publishedYes

Keywords

  • Cluster analysis
  • Cluster recovery
  • Multivariate data
  • Reduced K-means
  • K-means
  • Factorial K-means
  • Mixtures of factor analyzers
  • MCLUST
  • PRINCIPAL COMPONENT ANALYSIS
  • HIGH-DIMENSIONAL DATA
  • PARENTAL BEHAVIOR
  • LOCAL OPTIMA
  • MODEL
  • ALGORITHM
  • COMPLEXITIES
  • PSYCHOLOGY
  • REDUCTION
  • FACTORIAL

Cite this

Timmerman, M. E., Ceulemans, E., De Roover, K., & Van Leeuwen, K. (2013). Subspace K-means clustering. Behavior Research Methods, 45(4), 1011-1023. https://doi.org/10.3758/s13428-013-0329-y