A structured overview of simultaneous component based data integration

K. Van Deun, Age K Smilde, Mariët J Van Der Werf, Henk Al Kiers, Iven Van Mechelen

Research output: Contribution to journalArticleScientificpeer-review

Abstract

Background
Data integration is currently one of the main challenges in the biomedical sciences. Often different pieces of information are gathered on the same set of entities (e.g., tissues, culture samples, biomolecules) with the different pieces stemming, for example, from different measurement techniques. This implies that more and more data appear that consist of two or more data arrays that have a shared mode. An integrative analysis of such coupled data should be based on a simultaneous analysis of all data arrays. In this respect, the family of simultaneous component methods (e.g., SUM-PCA, unrestricted PCovR, MFA, STATIS, and SCA-P) is a natural choice. Yet, different simultaneous component methods may lead to quite different results.
Results
We offer a structured overview of simultaneous component methods that frames them in a principal components setting such that both the common core of the methods and the specific elements with regard to which they differ are highlighted. An overview of principles is given that may guide the data analyst in choosing an appropriate simultaneous component method. Several theoretical and practical issues are illustrated with an empirical example on metabolomics data for Escherichia coli as obtained with different analytical chemical measurement methods.
Conclusion
Of the aspects in which the simultaneous component methods differ, pre-processing and weighting are consequential. Especially, the type of weighting of the different matrices is essential for simultaneous component analysis. These types are shown to be linked to different specifications of the idea of a fair integration of the different coupled arrays.
Original languageEnglish
Article number246
JournalBMC Bioinformatics
Volume10
Issue number1
DOIs
Publication statusPublished - 2009
Externally publishedYes

Fingerprint

Data integration
Data Integration
Tissue culture
Biomolecules
Escherichia coli
Specifications
Processing
Weighting
Tissue Culture
Metabolomics
Passive Cutaneous Anaphylaxis
Measurement Techniques
Principal Components
Escherichia Coli
Preprocessing
Specification
Imply

Cite this

Van Deun, K., Smilde, A. K., Van Der Werf, M. J., Kiers, H. A., & Van Mechelen, I. (2009). A structured overview of simultaneous component based data integration. BMC Bioinformatics, 10(1), [246]. https://doi.org/10.1186/1471-2105-10-246
Van Deun, K. ; Smilde, Age K ; Van Der Werf, Mariët J ; Kiers, Henk Al ; Van Mechelen, Iven. / A structured overview of simultaneous component based data integration. In: BMC Bioinformatics. 2009 ; Vol. 10, No. 1.
@article{a71506930e394f17805ff58a931e45a7,
title = "A structured overview of simultaneous component based data integration",
abstract = "BackgroundData integration is currently one of the main challenges in the biomedical sciences. Often different pieces of information are gathered on the same set of entities (e.g., tissues, culture samples, biomolecules) with the different pieces stemming, for example, from different measurement techniques. This implies that more and more data appear that consist of two or more data arrays that have a shared mode. An integrative analysis of such coupled data should be based on a simultaneous analysis of all data arrays. In this respect, the family of simultaneous component methods (e.g., SUM-PCA, unrestricted PCovR, MFA, STATIS, and SCA-P) is a natural choice. Yet, different simultaneous component methods may lead to quite different results.ResultsWe offer a structured overview of simultaneous component methods that frames them in a principal components setting such that both the common core of the methods and the specific elements with regard to which they differ are highlighted. An overview of principles is given that may guide the data analyst in choosing an appropriate simultaneous component method. Several theoretical and practical issues are illustrated with an empirical example on metabolomics data for Escherichia coli as obtained with different analytical chemical measurement methods.ConclusionOf the aspects in which the simultaneous component methods differ, pre-processing and weighting are consequential. Especially, the type of weighting of the different matrices is essential for simultaneous component analysis. These types are shown to be linked to different specifications of the idea of a fair integration of the different coupled arrays.",
author = "{Van Deun}, K. and Smilde, {Age K} and {Van Der Werf}, {Mari{\"e}t J} and Kiers, {Henk Al} and {Van Mechelen}, Iven",
year = "2009",
doi = "10.1186/1471-2105-10-246",
language = "English",
volume = "10",
journal = "BMC Bioinformatics",
issn = "1471-2105",
publisher = "BioMed Central",
number = "1",

}

Van Deun, K, Smilde, AK, Van Der Werf, MJ, Kiers, HA & Van Mechelen, I 2009, 'A structured overview of simultaneous component based data integration', BMC Bioinformatics, vol. 10, no. 1, 246. https://doi.org/10.1186/1471-2105-10-246

A structured overview of simultaneous component based data integration. / Van Deun, K.; Smilde, Age K; Van Der Werf, Mariët J; Kiers, Henk Al; Van Mechelen, Iven.

In: BMC Bioinformatics, Vol. 10, No. 1, 246, 2009.

Research output: Contribution to journalArticleScientificpeer-review

TY - JOUR

T1 - A structured overview of simultaneous component based data integration

AU - Van Deun, K.

AU - Smilde, Age K

AU - Van Der Werf, Mariët J

AU - Kiers, Henk Al

AU - Van Mechelen, Iven

PY - 2009

Y1 - 2009

N2 - BackgroundData integration is currently one of the main challenges in the biomedical sciences. Often different pieces of information are gathered on the same set of entities (e.g., tissues, culture samples, biomolecules) with the different pieces stemming, for example, from different measurement techniques. This implies that more and more data appear that consist of two or more data arrays that have a shared mode. An integrative analysis of such coupled data should be based on a simultaneous analysis of all data arrays. In this respect, the family of simultaneous component methods (e.g., SUM-PCA, unrestricted PCovR, MFA, STATIS, and SCA-P) is a natural choice. Yet, different simultaneous component methods may lead to quite different results.ResultsWe offer a structured overview of simultaneous component methods that frames them in a principal components setting such that both the common core of the methods and the specific elements with regard to which they differ are highlighted. An overview of principles is given that may guide the data analyst in choosing an appropriate simultaneous component method. Several theoretical and practical issues are illustrated with an empirical example on metabolomics data for Escherichia coli as obtained with different analytical chemical measurement methods.ConclusionOf the aspects in which the simultaneous component methods differ, pre-processing and weighting are consequential. Especially, the type of weighting of the different matrices is essential for simultaneous component analysis. These types are shown to be linked to different specifications of the idea of a fair integration of the different coupled arrays.

AB - BackgroundData integration is currently one of the main challenges in the biomedical sciences. Often different pieces of information are gathered on the same set of entities (e.g., tissues, culture samples, biomolecules) with the different pieces stemming, for example, from different measurement techniques. This implies that more and more data appear that consist of two or more data arrays that have a shared mode. An integrative analysis of such coupled data should be based on a simultaneous analysis of all data arrays. In this respect, the family of simultaneous component methods (e.g., SUM-PCA, unrestricted PCovR, MFA, STATIS, and SCA-P) is a natural choice. Yet, different simultaneous component methods may lead to quite different results.ResultsWe offer a structured overview of simultaneous component methods that frames them in a principal components setting such that both the common core of the methods and the specific elements with regard to which they differ are highlighted. An overview of principles is given that may guide the data analyst in choosing an appropriate simultaneous component method. Several theoretical and practical issues are illustrated with an empirical example on metabolomics data for Escherichia coli as obtained with different analytical chemical measurement methods.ConclusionOf the aspects in which the simultaneous component methods differ, pre-processing and weighting are consequential. Especially, the type of weighting of the different matrices is essential for simultaneous component analysis. These types are shown to be linked to different specifications of the idea of a fair integration of the different coupled arrays.

U2 - 10.1186/1471-2105-10-246

DO - 10.1186/1471-2105-10-246

M3 - Article

VL - 10

JO - BMC Bioinformatics

JF - BMC Bioinformatics

SN - 1471-2105

IS - 1

M1 - 246

ER -