Model selection techniques for sparse weight-based principal component analysis

Niek de Schipper*, Katrijn Van Deun

*Corresponding author for this work

Research output: Contribution to journalArticleScientificpeer-review

13 Downloads (Pure)

Abstract

Many studies make use of multiple types of data that are collected for the same
set of samples, resulting in so-called multiblock data (e.g., multiomics studies).
A popular analysis framework is sparse principal component analysis (PCA) of
the concatenated data. The sparseness in the component weights of these
models is usually induced by penalties. A crucial factor in the use of such
penalized methods is a proper tuning of the regularization parameters used to
give more or less weight to the penalties. In this paper, we examine several
model selection procedures to tune these regularization parameters for
sparse PCA. The model selection procedures include cross-validation, Bayesian
information criterion (BIC), index of sparseness, and the convex hull procedure. Furthermore, to account for the multiblock structure, we present a sparse PCA algorithm with a group least absolute shrinkage and selection operator (LASSO) penalty added to it, to either select or cancel out blocks of data in an automated way. Also, the tuning of the group LASSO parameter is studied for the proposed model selection procedures. We conclude that when the component weights are to be interpreted, cross-validation with the one standard error rule is preferred; alternatively, if the interest lies in obtaining component scores using a very limited set of variables, the convex hull, BIC, and index of sparseness are all suitable.
Original languageEnglish
Article numbere3289
Number of pages20
JournalJournal of Chemometrics
DOIs
Publication statusE-pub ahead of print - 2020

Keywords

  • JOINT
  • REGRESSION
  • REGULARIZATION
  • TUTORIAL
  • model selection
  • multiblock data
  • sparse PCA

Fingerprint Dive into the research topics of 'Model selection techniques for sparse weight-based principal component analysis'. Together they form a unique fingerprint.

  • Cite this