Obtaining insights from high-dimensional data

Sparse principal covariates regression

Research output: Contribution to journalArticleScientificpeer-review

19 Downloads (Pure)

Abstract

Background
Data analysis methods are usually subdivided in two distinct classes: There are methods for prediction and there are methods for exploration. In practice, however, there often is a need to learn from the data in both ways. For example, when predicting the antibody titers a few weeks after vaccination on the basis of genomewide mRNA transcription rates, also mechanistic insights about the effect of vaccinations on the immune system are sought. Principal covariates regression (PCovR) is a method that combines both purposes. Yet, it misses insightful representations of the data as these include all the variables.
Results
Here, we propose a sparse extension of principal covariates regression such that the resulting solutions are based on an automatically selected subset of the variables. Our method is shown to outperform competing methods like sparse principal components regression and sparse partial least squares in a simulation study. Furthermore good performance of the method is illustrated on publicly available data including antibody titers and genomewide transcription rates for subjects vaccinated against the flu: the selected genes by sparse PCovR are higly enriched for immune related terms and the method predicts the titers for an independent test sample well. In comparison, no significantly enriched terms were found for the genes selected by sparse partial least squares and out-of-sample prediction was worse.
Conclusions
Sparse principal covariates regression is a promising and competitive tool for obtaining insights from high-dimensional data.
Original languageEnglish
Article number104
Number of pages13
JournalBMC Bioinformatics
Volume19
Issue number1
DOIs
Publication statusPublished - 2018

Fingerprint

Transcription
High-dimensional Data
Antibodies
Covariates
Genes
Regression
Immune system
Messenger RNA
Vaccination
Partial Least Squares
Least-Squares Analysis
Antibody
Gene
Principal Component Regression
Prediction
Immune System
Term
Simulation Study
Distinct
Predict

Keywords

  • Algorithms
  • Computer Simulation
  • Gene Ontology
  • Humans
  • Influenza Vaccines/immunology
  • Least-Squares Analysis
  • Principal Component Analysis
  • Regression Analysis
  • Selection, Genetic
  • Systems Biology

Cite this

@article{ece3e4bd836a41b2800a4c2d90134957,
title = "Obtaining insights from high-dimensional data: Sparse principal covariates regression",
abstract = "BackgroundData analysis methods are usually subdivided in two distinct classes: There are methods for prediction and there are methods for exploration. In practice, however, there often is a need to learn from the data in both ways. For example, when predicting the antibody titers a few weeks after vaccination on the basis of genomewide mRNA transcription rates, also mechanistic insights about the effect of vaccinations on the immune system are sought. Principal covariates regression (PCovR) is a method that combines both purposes. Yet, it misses insightful representations of the data as these include all the variables.ResultsHere, we propose a sparse extension of principal covariates regression such that the resulting solutions are based on an automatically selected subset of the variables. Our method is shown to outperform competing methods like sparse principal components regression and sparse partial least squares in a simulation study. Furthermore good performance of the method is illustrated on publicly available data including antibody titers and genomewide transcription rates for subjects vaccinated against the flu: the selected genes by sparse PCovR are higly enriched for immune related terms and the method predicts the titers for an independent test sample well. In comparison, no significantly enriched terms were found for the genes selected by sparse partial least squares and out-of-sample prediction was worse.ConclusionsSparse principal covariates regression is a promising and competitive tool for obtaining insights from high-dimensional data.",
keywords = "Algorithms, Computer Simulation, Gene Ontology, Humans, Influenza Vaccines/immunology, Least-Squares Analysis, Principal Component Analysis, Regression Analysis, Selection, Genetic, Systems Biology",
author = "{Van Deun}, K. and E.A.V. Crompvoets and Eva Ceulemans",
year = "2018",
doi = "10.1186/s12859-018-2114-5",
language = "English",
volume = "19",
journal = "BMC Bioinformatics",
issn = "1471-2105",
publisher = "BioMed Central",
number = "1",

}

Obtaining insights from high-dimensional data : Sparse principal covariates regression. / Van Deun, K.; Crompvoets, E.A.V.; Ceulemans, Eva.

In: BMC Bioinformatics, Vol. 19, No. 1, 104, 2018.

Research output: Contribution to journalArticleScientificpeer-review

TY - JOUR

T1 - Obtaining insights from high-dimensional data

T2 - Sparse principal covariates regression

AU - Van Deun, K.

AU - Crompvoets, E.A.V.

AU - Ceulemans, Eva

PY - 2018

Y1 - 2018

N2 - BackgroundData analysis methods are usually subdivided in two distinct classes: There are methods for prediction and there are methods for exploration. In practice, however, there often is a need to learn from the data in both ways. For example, when predicting the antibody titers a few weeks after vaccination on the basis of genomewide mRNA transcription rates, also mechanistic insights about the effect of vaccinations on the immune system are sought. Principal covariates regression (PCovR) is a method that combines both purposes. Yet, it misses insightful representations of the data as these include all the variables.ResultsHere, we propose a sparse extension of principal covariates regression such that the resulting solutions are based on an automatically selected subset of the variables. Our method is shown to outperform competing methods like sparse principal components regression and sparse partial least squares in a simulation study. Furthermore good performance of the method is illustrated on publicly available data including antibody titers and genomewide transcription rates for subjects vaccinated against the flu: the selected genes by sparse PCovR are higly enriched for immune related terms and the method predicts the titers for an independent test sample well. In comparison, no significantly enriched terms were found for the genes selected by sparse partial least squares and out-of-sample prediction was worse.ConclusionsSparse principal covariates regression is a promising and competitive tool for obtaining insights from high-dimensional data.

AB - BackgroundData analysis methods are usually subdivided in two distinct classes: There are methods for prediction and there are methods for exploration. In practice, however, there often is a need to learn from the data in both ways. For example, when predicting the antibody titers a few weeks after vaccination on the basis of genomewide mRNA transcription rates, also mechanistic insights about the effect of vaccinations on the immune system are sought. Principal covariates regression (PCovR) is a method that combines both purposes. Yet, it misses insightful representations of the data as these include all the variables.ResultsHere, we propose a sparse extension of principal covariates regression such that the resulting solutions are based on an automatically selected subset of the variables. Our method is shown to outperform competing methods like sparse principal components regression and sparse partial least squares in a simulation study. Furthermore good performance of the method is illustrated on publicly available data including antibody titers and genomewide transcription rates for subjects vaccinated against the flu: the selected genes by sparse PCovR are higly enriched for immune related terms and the method predicts the titers for an independent test sample well. In comparison, no significantly enriched terms were found for the genes selected by sparse partial least squares and out-of-sample prediction was worse.ConclusionsSparse principal covariates regression is a promising and competitive tool for obtaining insights from high-dimensional data.

KW - Algorithms

KW - Computer Simulation

KW - Gene Ontology

KW - Humans

KW - Influenza Vaccines/immunology

KW - Least-Squares Analysis

KW - Principal Component Analysis

KW - Regression Analysis

KW - Selection, Genetic

KW - Systems Biology

U2 - 10.1186/s12859-018-2114-5

DO - 10.1186/s12859-018-2114-5

M3 - Article

VL - 19

JO - BMC Bioinformatics

JF - BMC Bioinformatics

SN - 1471-2105

IS - 1

M1 - 104

ER -