DISCO-SCA and properly applied GSVD as swinging methods to find common and distinctive processes

K. Van Deun, Iven Van Mechelen, Lieven Thorrez, Martijn Schouteden, Bart De Moor, Mariët J. Van Der Werf, Lieven De Lathauwer, Age K. Smilde, Henk A. L. Kiers, Anna Tramontano (Editor)

Research output: Contribution to journalArticleScientificpeer-review

Abstract

Background
In systems biology it is common to obtain for the same set of biological entities information from multiple sources. Examples include expression data for the same set of orthologous genes screened in different organisms and data on the same set of culture samples obtained with different high-throughput techniques. A major challenge is to find the important biological processes underlying the data and to disentangle therein processes common to all data sources and processes distinctive for a specific source. Recently, two promising simultaneous data integration methods have been proposed to attain this goal, namely generalized singular value decomposition (GSVD) and simultaneous component analysis with rotation to common and distinctive components (DISCO-SCA).
Results
Both theoretical analyses and applications to biologically relevant data show that: (1) straightforward applications of GSVD yield unsatisfactory results, (2) DISCO-SCA performs well, (3) provided proper pre-processing and algorithmic adaptations, GSVD reaches a performance level similar to that of DISCO-SCA, and (4) DISCO-SCA is directly generalizable to more than two data sources. The biological relevance of DISCO-SCA is illustrated with two applications. First, in a setting of comparative genomics, it is shown that DISCO-SCA recovers a common theme of cell cycle progression and a yeast-specific response to pheromones. The biological annotation was obtained by applying Gene Set Enrichment Analysis in an appropriate way. Second, in an application of DISCO-SCA to metabolomics data for Escherichia coli obtained with two different chemical analysis platforms, it is illustrated that the metabolites involved in some of the biological processes underlying the data are detected by one of the two platforms only; therefore, platforms for microbial metabolomics should be tailored to the biological question.
Conclusions
Both DISCO-SCA and properly applied GSVD are promising integrative methods for finding common and distinctive processes in multisource data. Open source code for both methods is provided.
Original languageEnglish
Article numbere37840
JournalPLoS ONE
Volume7
Issue number5
DOIs
Publication statusPublished - 2012
Externally publishedYes

Fingerprint

Singular value decomposition
Information Storage and Retrieval
Genes
Data integration
Pheromones
Cell Cycle
Metabolites
Yeast
Escherichia coli
Cells
Throughput
Processing
Chemical analysis
Metabolomics

Cite this

Van Deun, K., Van Mechelen, I., Thorrez, L., Schouteden, M., De Moor, B., Van Der Werf, M. J., ... Tramontano, A. (Ed.) (2012). DISCO-SCA and properly applied GSVD as swinging methods to find common and distinctive processes. PLoS ONE, 7(5), [e37840]. https://doi.org/10.1371/journal.pone.0037840
Van Deun, K. ; Van Mechelen, Iven ; Thorrez, Lieven ; Schouteden, Martijn ; De Moor, Bart ; Van Der Werf, Mariët J. ; De Lathauwer, Lieven ; Smilde, Age K. ; Kiers, Henk A. L. ; Tramontano, Anna (Editor). / DISCO-SCA and properly applied GSVD as swinging methods to find common and distinctive processes. In: PLoS ONE. 2012 ; Vol. 7, No. 5.
@article{ccd435a2f5f54dc5abb62e1aaf66081e,
title = "DISCO-SCA and properly applied GSVD as swinging methods to find common and distinctive processes",
abstract = "BackgroundIn systems biology it is common to obtain for the same set of biological entities information from multiple sources. Examples include expression data for the same set of orthologous genes screened in different organisms and data on the same set of culture samples obtained with different high-throughput techniques. A major challenge is to find the important biological processes underlying the data and to disentangle therein processes common to all data sources and processes distinctive for a specific source. Recently, two promising simultaneous data integration methods have been proposed to attain this goal, namely generalized singular value decomposition (GSVD) and simultaneous component analysis with rotation to common and distinctive components (DISCO-SCA).ResultsBoth theoretical analyses and applications to biologically relevant data show that: (1) straightforward applications of GSVD yield unsatisfactory results, (2) DISCO-SCA performs well, (3) provided proper pre-processing and algorithmic adaptations, GSVD reaches a performance level similar to that of DISCO-SCA, and (4) DISCO-SCA is directly generalizable to more than two data sources. The biological relevance of DISCO-SCA is illustrated with two applications. First, in a setting of comparative genomics, it is shown that DISCO-SCA recovers a common theme of cell cycle progression and a yeast-specific response to pheromones. The biological annotation was obtained by applying Gene Set Enrichment Analysis in an appropriate way. Second, in an application of DISCO-SCA to metabolomics data for Escherichia coli obtained with two different chemical analysis platforms, it is illustrated that the metabolites involved in some of the biological processes underlying the data are detected by one of the two platforms only; therefore, platforms for microbial metabolomics should be tailored to the biological question.ConclusionsBoth DISCO-SCA and properly applied GSVD are promising integrative methods for finding common and distinctive processes in multisource data. Open source code for both methods is provided.",
author = "{Van Deun}, K. and {Van Mechelen}, Iven and Lieven Thorrez and Martijn Schouteden and {De Moor}, Bart and {Van Der Werf}, {Mari{\"e}t J.} and {De Lathauwer}, Lieven and Smilde, {Age K.} and Kiers, {Henk A. L.} and Anna Tramontano",
year = "2012",
doi = "10.1371/journal.pone.0037840",
language = "English",
volume = "7",
journal = "PLoS ONE",
issn = "1932-6203",
publisher = "PUBLIC LIBRARY SCIENCE",
number = "5",

}

Van Deun, K, Van Mechelen, I, Thorrez, L, Schouteden, M, De Moor, B, Van Der Werf, MJ, De Lathauwer, L, Smilde, AK, Kiers, HAL & Tramontano, A (ed.) 2012, 'DISCO-SCA and properly applied GSVD as swinging methods to find common and distinctive processes', PLoS ONE, vol. 7, no. 5, e37840. https://doi.org/10.1371/journal.pone.0037840

DISCO-SCA and properly applied GSVD as swinging methods to find common and distinctive processes. / Van Deun, K.; Van Mechelen, Iven; Thorrez, Lieven; Schouteden, Martijn; De Moor, Bart; Van Der Werf, Mariët J.; De Lathauwer, Lieven; Smilde, Age K.; Kiers, Henk A. L.; Tramontano, Anna (Editor).

In: PLoS ONE, Vol. 7, No. 5, e37840, 2012.

Research output: Contribution to journalArticleScientificpeer-review

TY - JOUR

T1 - DISCO-SCA and properly applied GSVD as swinging methods to find common and distinctive processes

AU - Van Deun, K.

AU - Van Mechelen, Iven

AU - Thorrez, Lieven

AU - Schouteden, Martijn

AU - De Moor, Bart

AU - Van Der Werf, Mariët J.

AU - De Lathauwer, Lieven

AU - Smilde, Age K.

AU - Kiers, Henk A. L.

A2 - Tramontano, Anna

PY - 2012

Y1 - 2012

N2 - BackgroundIn systems biology it is common to obtain for the same set of biological entities information from multiple sources. Examples include expression data for the same set of orthologous genes screened in different organisms and data on the same set of culture samples obtained with different high-throughput techniques. A major challenge is to find the important biological processes underlying the data and to disentangle therein processes common to all data sources and processes distinctive for a specific source. Recently, two promising simultaneous data integration methods have been proposed to attain this goal, namely generalized singular value decomposition (GSVD) and simultaneous component analysis with rotation to common and distinctive components (DISCO-SCA).ResultsBoth theoretical analyses and applications to biologically relevant data show that: (1) straightforward applications of GSVD yield unsatisfactory results, (2) DISCO-SCA performs well, (3) provided proper pre-processing and algorithmic adaptations, GSVD reaches a performance level similar to that of DISCO-SCA, and (4) DISCO-SCA is directly generalizable to more than two data sources. The biological relevance of DISCO-SCA is illustrated with two applications. First, in a setting of comparative genomics, it is shown that DISCO-SCA recovers a common theme of cell cycle progression and a yeast-specific response to pheromones. The biological annotation was obtained by applying Gene Set Enrichment Analysis in an appropriate way. Second, in an application of DISCO-SCA to metabolomics data for Escherichia coli obtained with two different chemical analysis platforms, it is illustrated that the metabolites involved in some of the biological processes underlying the data are detected by one of the two platforms only; therefore, platforms for microbial metabolomics should be tailored to the biological question.ConclusionsBoth DISCO-SCA and properly applied GSVD are promising integrative methods for finding common and distinctive processes in multisource data. Open source code for both methods is provided.

AB - BackgroundIn systems biology it is common to obtain for the same set of biological entities information from multiple sources. Examples include expression data for the same set of orthologous genes screened in different organisms and data on the same set of culture samples obtained with different high-throughput techniques. A major challenge is to find the important biological processes underlying the data and to disentangle therein processes common to all data sources and processes distinctive for a specific source. Recently, two promising simultaneous data integration methods have been proposed to attain this goal, namely generalized singular value decomposition (GSVD) and simultaneous component analysis with rotation to common and distinctive components (DISCO-SCA).ResultsBoth theoretical analyses and applications to biologically relevant data show that: (1) straightforward applications of GSVD yield unsatisfactory results, (2) DISCO-SCA performs well, (3) provided proper pre-processing and algorithmic adaptations, GSVD reaches a performance level similar to that of DISCO-SCA, and (4) DISCO-SCA is directly generalizable to more than two data sources. The biological relevance of DISCO-SCA is illustrated with two applications. First, in a setting of comparative genomics, it is shown that DISCO-SCA recovers a common theme of cell cycle progression and a yeast-specific response to pheromones. The biological annotation was obtained by applying Gene Set Enrichment Analysis in an appropriate way. Second, in an application of DISCO-SCA to metabolomics data for Escherichia coli obtained with two different chemical analysis platforms, it is illustrated that the metabolites involved in some of the biological processes underlying the data are detected by one of the two platforms only; therefore, platforms for microbial metabolomics should be tailored to the biological question.ConclusionsBoth DISCO-SCA and properly applied GSVD are promising integrative methods for finding common and distinctive processes in multisource data. Open source code for both methods is provided.

U2 - 10.1371/journal.pone.0037840

DO - 10.1371/journal.pone.0037840

M3 - Article

VL - 7

JO - PLoS ONE

JF - PLoS ONE

SN - 1932-6203

IS - 5

M1 - e37840

ER -