Obtaining insights from high-dimensional data: Sparse principal covariates regression

Research output: Contribution to journalArticleScientificpeer-review

2 Citations (Scopus)
30 Downloads (Pure)

Abstract

Background
Data analysis methods are usually subdivided in two distinct classes: There are methods for prediction and there are methods for exploration. In practice, however, there often is a need to learn from the data in both ways. For example, when predicting the antibody titers a few weeks after vaccination on the basis of genomewide mRNA transcription rates, also mechanistic insights about the effect of vaccinations on the immune system are sought. Principal covariates regression (PCovR) is a method that combines both purposes. Yet, it misses insightful representations of the data as these include all the variables.
Results
Here, we propose a sparse extension of principal covariates regression such that the resulting solutions are based on an automatically selected subset of the variables. Our method is shown to outperform competing methods like sparse principal components regression and sparse partial least squares in a simulation study. Furthermore good performance of the method is illustrated on publicly available data including antibody titers and genomewide transcription rates for subjects vaccinated against the flu: the selected genes by sparse PCovR are higly enriched for immune related terms and the method predicts the titers for an independent test sample well. In comparison, no significantly enriched terms were found for the genes selected by sparse partial least squares and out-of-sample prediction was worse.
Conclusions
Sparse principal covariates regression is a promising and competitive tool for obtaining insights from high-dimensional data.
Original languageEnglish
Article number104
Number of pages13
JournalBMC Bioinformatics
Volume19
Issue number1
DOIs
Publication statusPublished - 2018

Keywords

  • Algorithms
  • Computer Simulation
  • Gene Ontology
  • Humans
  • Influenza Vaccines/immunology
  • Least-Squares Analysis
  • Principal Component Analysis
  • Regression Analysis
  • Selection, Genetic
  • Systems Biology

Fingerprint Dive into the research topics of 'Obtaining insights from high-dimensional data: Sparse principal covariates regression'. Together they form a unique fingerprint.

  • Cite this