Integrating functional genomics data using maximum likelihood based simultaneous component analysis

Robert A Van Den Berg, Iven Van Mechelen, Tom F Wilderjans, K. Van Deun, Henk Al Kiers, Age K Smilde

Research output: Contribution to journalArticleScientificpeer-review

Abstract

Background
In contemporary biology, complex biological processes are increasingly studied by collecting and analyzing measurements of the same entities that are collected with different analytical platforms. Such data comprise a number of data blocks that are coupled via a common mode. The goal of collecting this type of data is to discover biological mechanisms that underlie the behavior of the variables in the different data blocks. The simultaneous component analysis (SCA) family of data analysis methods is suited for this task. However, a SCA may be hampered by the data blocks being subjected to different amounts of measurement error, or noise. To unveil the true mechanisms underlying the data, it could be fruitful to take noise heterogeneity into consideration in the data analysis. Maximum likelihood based SCA (MxLSCA-P) was developed for this purpose. In a previous simulation study it outperformed normal SCA-P. This previous study, however, did not mimic in many respects typical functional genomics data sets, such as, data blocks coupled via the experimental mode, more variables than experimental units, and medium to high correlations between variables. Here, we present a new simulation study in which the usefulness of MxLSCA-P compared to ordinary SCA-P is evaluated within a typical functional genomics setting. Subsequently, the performance of the two methods is evaluated by analysis of a real life Escherichia coli metabolomics data set.
Results
In the simulation study, MxLSCA-P outperforms SCA-P in terms of recovery of the true underlying scores of the common mode and of the true values underlying the data entries. MxLSCA-P further performed especially better when the simulated data blocks were subject to different noise levels. In the analysis of an E. coli metabolomics data set, MxLSCA-P provided a slightly better and more consistent interpretation.
Conclusion
MxLSCA-P is a promising addition to the SCA family. The analysis of coupled functional genomics data blocks could benefit from its ability to take different noise levels per data block into consideration and improve the recovery of the true patterns underlying the data. Moreover, the maximum likelihood based approach underlying MxLSCA-P could be extended to custom-made solutions to specific problems encountered.
Original languageEnglish
Article number340
JournalBMC Bioinformatics
Volume10
Issue number1
DOIs
Publication statusPublished - 2009
Externally publishedYes

Fingerprint

Functional Genomics
Maximum likelihood
Maximum Likelihood
Escherichia coli
Recovery
Measurement errors
Data acquisition
Metabolomics
Genomics
Simulation Study
Escherichia Coli
Data analysis

Cite this

Van Den Berg, R. A., Van Mechelen, I., Wilderjans, T. F., Van Deun, K., Kiers, H. A., & Smilde, A. K. (2009). Integrating functional genomics data using maximum likelihood based simultaneous component analysis. BMC Bioinformatics, 10(1), [340]. https://doi.org/10.1186/1471-2105-10-340
Van Den Berg, Robert A ; Van Mechelen, Iven ; Wilderjans, Tom F ; Van Deun, K. ; Kiers, Henk Al ; Smilde, Age K. / Integrating functional genomics data using maximum likelihood based simultaneous component analysis. In: BMC Bioinformatics. 2009 ; Vol. 10, No. 1.
@article{17a43a66a5224323bc08e41c3c21f3c6,
title = "Integrating functional genomics data using maximum likelihood based simultaneous component analysis",
abstract = "BackgroundIn contemporary biology, complex biological processes are increasingly studied by collecting and analyzing measurements of the same entities that are collected with different analytical platforms. Such data comprise a number of data blocks that are coupled via a common mode. The goal of collecting this type of data is to discover biological mechanisms that underlie the behavior of the variables in the different data blocks. The simultaneous component analysis (SCA) family of data analysis methods is suited for this task. However, a SCA may be hampered by the data blocks being subjected to different amounts of measurement error, or noise. To unveil the true mechanisms underlying the data, it could be fruitful to take noise heterogeneity into consideration in the data analysis. Maximum likelihood based SCA (MxLSCA-P) was developed for this purpose. In a previous simulation study it outperformed normal SCA-P. This previous study, however, did not mimic in many respects typical functional genomics data sets, such as, data blocks coupled via the experimental mode, more variables than experimental units, and medium to high correlations between variables. Here, we present a new simulation study in which the usefulness of MxLSCA-P compared to ordinary SCA-P is evaluated within a typical functional genomics setting. Subsequently, the performance of the two methods is evaluated by analysis of a real life Escherichia coli metabolomics data set.ResultsIn the simulation study, MxLSCA-P outperforms SCA-P in terms of recovery of the true underlying scores of the common mode and of the true values underlying the data entries. MxLSCA-P further performed especially better when the simulated data blocks were subject to different noise levels. In the analysis of an E. coli metabolomics data set, MxLSCA-P provided a slightly better and more consistent interpretation.ConclusionMxLSCA-P is a promising addition to the SCA family. The analysis of coupled functional genomics data blocks could benefit from its ability to take different noise levels per data block into consideration and improve the recovery of the true patterns underlying the data. Moreover, the maximum likelihood based approach underlying MxLSCA-P could be extended to custom-made solutions to specific problems encountered.",
author = "{Van Den Berg}, {Robert A} and {Van Mechelen}, Iven and Wilderjans, {Tom F} and {Van Deun}, K. and Kiers, {Henk Al} and Smilde, {Age K}",
year = "2009",
doi = "10.1186/1471-2105-10-340",
language = "English",
volume = "10",
journal = "BMC Bioinformatics",
issn = "1471-2105",
publisher = "BioMed Central",
number = "1",

}

Integrating functional genomics data using maximum likelihood based simultaneous component analysis. / Van Den Berg, Robert A; Van Mechelen, Iven; Wilderjans, Tom F; Van Deun, K.; Kiers, Henk Al; Smilde, Age K.

In: BMC Bioinformatics, Vol. 10, No. 1, 340, 2009.

Research output: Contribution to journalArticleScientificpeer-review

TY - JOUR

T1 - Integrating functional genomics data using maximum likelihood based simultaneous component analysis

AU - Van Den Berg, Robert A

AU - Van Mechelen, Iven

AU - Wilderjans, Tom F

AU - Van Deun, K.

AU - Kiers, Henk Al

AU - Smilde, Age K

PY - 2009

Y1 - 2009

N2 - BackgroundIn contemporary biology, complex biological processes are increasingly studied by collecting and analyzing measurements of the same entities that are collected with different analytical platforms. Such data comprise a number of data blocks that are coupled via a common mode. The goal of collecting this type of data is to discover biological mechanisms that underlie the behavior of the variables in the different data blocks. The simultaneous component analysis (SCA) family of data analysis methods is suited for this task. However, a SCA may be hampered by the data blocks being subjected to different amounts of measurement error, or noise. To unveil the true mechanisms underlying the data, it could be fruitful to take noise heterogeneity into consideration in the data analysis. Maximum likelihood based SCA (MxLSCA-P) was developed for this purpose. In a previous simulation study it outperformed normal SCA-P. This previous study, however, did not mimic in many respects typical functional genomics data sets, such as, data blocks coupled via the experimental mode, more variables than experimental units, and medium to high correlations between variables. Here, we present a new simulation study in which the usefulness of MxLSCA-P compared to ordinary SCA-P is evaluated within a typical functional genomics setting. Subsequently, the performance of the two methods is evaluated by analysis of a real life Escherichia coli metabolomics data set.ResultsIn the simulation study, MxLSCA-P outperforms SCA-P in terms of recovery of the true underlying scores of the common mode and of the true values underlying the data entries. MxLSCA-P further performed especially better when the simulated data blocks were subject to different noise levels. In the analysis of an E. coli metabolomics data set, MxLSCA-P provided a slightly better and more consistent interpretation.ConclusionMxLSCA-P is a promising addition to the SCA family. The analysis of coupled functional genomics data blocks could benefit from its ability to take different noise levels per data block into consideration and improve the recovery of the true patterns underlying the data. Moreover, the maximum likelihood based approach underlying MxLSCA-P could be extended to custom-made solutions to specific problems encountered.

AB - BackgroundIn contemporary biology, complex biological processes are increasingly studied by collecting and analyzing measurements of the same entities that are collected with different analytical platforms. Such data comprise a number of data blocks that are coupled via a common mode. The goal of collecting this type of data is to discover biological mechanisms that underlie the behavior of the variables in the different data blocks. The simultaneous component analysis (SCA) family of data analysis methods is suited for this task. However, a SCA may be hampered by the data blocks being subjected to different amounts of measurement error, or noise. To unveil the true mechanisms underlying the data, it could be fruitful to take noise heterogeneity into consideration in the data analysis. Maximum likelihood based SCA (MxLSCA-P) was developed for this purpose. In a previous simulation study it outperformed normal SCA-P. This previous study, however, did not mimic in many respects typical functional genomics data sets, such as, data blocks coupled via the experimental mode, more variables than experimental units, and medium to high correlations between variables. Here, we present a new simulation study in which the usefulness of MxLSCA-P compared to ordinary SCA-P is evaluated within a typical functional genomics setting. Subsequently, the performance of the two methods is evaluated by analysis of a real life Escherichia coli metabolomics data set.ResultsIn the simulation study, MxLSCA-P outperforms SCA-P in terms of recovery of the true underlying scores of the common mode and of the true values underlying the data entries. MxLSCA-P further performed especially better when the simulated data blocks were subject to different noise levels. In the analysis of an E. coli metabolomics data set, MxLSCA-P provided a slightly better and more consistent interpretation.ConclusionMxLSCA-P is a promising addition to the SCA family. The analysis of coupled functional genomics data blocks could benefit from its ability to take different noise levels per data block into consideration and improve the recovery of the true patterns underlying the data. Moreover, the maximum likelihood based approach underlying MxLSCA-P could be extended to custom-made solutions to specific problems encountered.

U2 - 10.1186/1471-2105-10-340

DO - 10.1186/1471-2105-10-340

M3 - Article

VL - 10

JO - BMC Bioinformatics

JF - BMC Bioinformatics

SN - 1471-2105

IS - 1

M1 - 340

ER -