Obtaining insights from high-dimensional data: Sparse principal covariates regression

Research output: Contribution to journalArticleScientificpeer-review

Abstract

Background
Data analysis methods are usually subdivided in two distinct classes: There are methods for prediction and there are methods for exploration. In practice, however, there often is a need to learn from the data in both ways. For example, when predicting the antibody titers a few weeks after vaccination on the basis of genomewide mRNA transcription rates, also mechanistic insights about the effect of vaccinations on the immune system are sought. Principal covariates regression (PCovR) is a method that combines both purposes. Yet, it misses insightful representations of the data as these include all the variables.
Results
Here, we propose a sparse extension of principal covariates regression such that the resulting solutions are based on an automatically selected subset of the variables. Our method is shown to outperform competing methods like sparse principal components regression and sparse partial least squares in a simulation study. Furthermore good performance of the method is illustrated on publicly available data including antibody titers and genomewide transcription rates for subjects vaccinated against the flu: the selected genes by sparse PCovR are higly enriched for immune related terms and the method predicts the titers for an independent test sample well. In comparison, no significantly enriched terms were found for the genes selected by sparse partial least squares and out-of-sample prediction was worse.
Conclusions
Sparse principal covariates regression is a promising and competitive tool for obtaining insights from high-dimensional data.
LanguageEnglish
Article number104
JournalBMC Bioinformatics
Volume19
DOIs
StatePublished - 2018

Fingerprint

Transcription
High-dimensional Data
Antibodies
Covariates
Genes
Regression
Immune system
Messenger RNA
Vaccination
Partial Least Squares
Least-Squares Analysis
Antibody
Gene
Principal Component Regression
Prediction
Immune System
Term
Data analysis
Simulation Study
Distinct

Cite this

@article{ece3e4bd836a41b2800a4c2d90134957,
title = "Obtaining insights from high-dimensional data: Sparse principal covariates regression",
abstract = "BackgroundData analysis methods are usually subdivided in two distinct classes: There are methods for prediction and there are methods for exploration. In practice, however, there often is a need to learn from the data in both ways. For example, when predicting the antibody titers a few weeks after vaccination on the basis of genomewide mRNA transcription rates, also mechanistic insights about the effect of vaccinations on the immune system are sought. Principal covariates regression (PCovR) is a method that combines both purposes. Yet, it misses insightful representations of the data as these include all the variables.ResultsHere, we propose a sparse extension of principal covariates regression such that the resulting solutions are based on an automatically selected subset of the variables. Our method is shown to outperform competing methods like sparse principal components regression and sparse partial least squares in a simulation study. Furthermore good performance of the method is illustrated on publicly available data including antibody titers and genomewide transcription rates for subjects vaccinated against the flu: the selected genes by sparse PCovR are higly enriched for immune related terms and the method predicts the titers for an independent test sample well. In comparison, no significantly enriched terms were found for the genes selected by sparse partial least squares and out-of-sample prediction was worse.ConclusionsSparse principal covariates regression is a promising and competitive tool for obtaining insights from high-dimensional data.",
author = "{Van Deun}, K. and E.A.V. Crompvoets and Eva Ceulemans",
year = "2018",
doi = "10.1186/s12859-018-2114-5",
language = "English",
volume = "19",
journal = "BMC Bioinformatics",
issn = "1471-2105",
publisher = "BioMed Central",

}

Obtaining insights from high-dimensional data : Sparse principal covariates regression. / Van Deun, K.; Crompvoets, E.A.V.; Ceulemans, Eva.

In: BMC Bioinformatics, Vol. 19, 104, 2018.

Research output: Contribution to journalArticleScientificpeer-review

TY - JOUR

T1 - Obtaining insights from high-dimensional data

T2 - BMC Bioinformatics

AU - Van Deun,K.

AU - Crompvoets,E.A.V.

AU - Ceulemans,Eva

PY - 2018

Y1 - 2018

N2 - BackgroundData analysis methods are usually subdivided in two distinct classes: There are methods for prediction and there are methods for exploration. In practice, however, there often is a need to learn from the data in both ways. For example, when predicting the antibody titers a few weeks after vaccination on the basis of genomewide mRNA transcription rates, also mechanistic insights about the effect of vaccinations on the immune system are sought. Principal covariates regression (PCovR) is a method that combines both purposes. Yet, it misses insightful representations of the data as these include all the variables.ResultsHere, we propose a sparse extension of principal covariates regression such that the resulting solutions are based on an automatically selected subset of the variables. Our method is shown to outperform competing methods like sparse principal components regression and sparse partial least squares in a simulation study. Furthermore good performance of the method is illustrated on publicly available data including antibody titers and genomewide transcription rates for subjects vaccinated against the flu: the selected genes by sparse PCovR are higly enriched for immune related terms and the method predicts the titers for an independent test sample well. In comparison, no significantly enriched terms were found for the genes selected by sparse partial least squares and out-of-sample prediction was worse.ConclusionsSparse principal covariates regression is a promising and competitive tool for obtaining insights from high-dimensional data.

AB - BackgroundData analysis methods are usually subdivided in two distinct classes: There are methods for prediction and there are methods for exploration. In practice, however, there often is a need to learn from the data in both ways. For example, when predicting the antibody titers a few weeks after vaccination on the basis of genomewide mRNA transcription rates, also mechanistic insights about the effect of vaccinations on the immune system are sought. Principal covariates regression (PCovR) is a method that combines both purposes. Yet, it misses insightful representations of the data as these include all the variables.ResultsHere, we propose a sparse extension of principal covariates regression such that the resulting solutions are based on an automatically selected subset of the variables. Our method is shown to outperform competing methods like sparse principal components regression and sparse partial least squares in a simulation study. Furthermore good performance of the method is illustrated on publicly available data including antibody titers and genomewide transcription rates for subjects vaccinated against the flu: the selected genes by sparse PCovR are higly enriched for immune related terms and the method predicts the titers for an independent test sample well. In comparison, no significantly enriched terms were found for the genes selected by sparse partial least squares and out-of-sample prediction was worse.ConclusionsSparse principal covariates regression is a promising and competitive tool for obtaining insights from high-dimensional data.

U2 - 10.1186/s12859-018-2114-5

DO - 10.1186/s12859-018-2114-5

M3 - Article

VL - 19

JO - BMC Bioinformatics

JF - BMC Bioinformatics

SN - 1471-2105

M1 - 104

ER -