Identifying common and distinctive processes underlying multiset data

K. Van Deun, A.k. Smilde, L. Thorrez, H.a.l. Kiers, I. Van Mechelen

Research output: Contribution to journalArticleScientificpeer-review

Abstract

In many research domains it has become a common practice to rely on multiple sources of data to study the same object of interest. Examples include a systems biology approach to immunology with collection of both gene expression data and immunological readouts for the same set of subjects, and the use of several high-throughput techniques for the same set of fermentation batches. A major challenge is to find the processes underlying such multiset data and to disentangle therein the common processes from those that are distinctive for a specific source. Several integrative methods have been proposed to address this challenge including canonical correlation analysis, simultaneous component analysis, OnPLS, generalized singular value decomposition, DISCO-SCA, and ECO-POWER. To get a better understanding 1) of the methods with respect to finding common and distinctive components and 2) of the relations between these methods, this paper brings the methods together and compares them both on a theoretical level and in terms of analyses of high-dimensional micro-array gene expression data obtained from subjects vaccinated against influenza.
Keywords: Multiset data, Common and distinctive, Data integration
Original languageEnglish
Pages (from-to)40-51
JournalChemometrics & Intelligent Laboratory Systems
Volume129
DOIs
Publication statusPublished - 2013
Externally publishedYes

Fingerprint

Gene expression
Immunology
Data integration
Singular value decomposition
Fermentation
Throughput
Systems Biology

Cite this

Van Deun, K. ; Smilde, A.k. ; Thorrez, L. ; Kiers, H.a.l. ; Van Mechelen, I. / Identifying common and distinctive processes underlying multiset data. In: Chemometrics & Intelligent Laboratory Systems. 2013 ; Vol. 129. pp. 40-51.
@article{25d81391155b45269e4561fde8dde62e,
title = "Identifying common and distinctive processes underlying multiset data",
abstract = "In many research domains it has become a common practice to rely on multiple sources of data to study the same object of interest. Examples include a systems biology approach to immunology with collection of both gene expression data and immunological readouts for the same set of subjects, and the use of several high-throughput techniques for the same set of fermentation batches. A major challenge is to find the processes underlying such multiset data and to disentangle therein the common processes from those that are distinctive for a specific source. Several integrative methods have been proposed to address this challenge including canonical correlation analysis, simultaneous component analysis, OnPLS, generalized singular value decomposition, DISCO-SCA, and ECO-POWER. To get a better understanding 1) of the methods with respect to finding common and distinctive components and 2) of the relations between these methods, this paper brings the methods together and compares them both on a theoretical level and in terms of analyses of high-dimensional micro-array gene expression data obtained from subjects vaccinated against influenza.Keywords: Multiset data, Common and distinctive, Data integration",
author = "{Van Deun}, K. and A.k. Smilde and L. Thorrez and H.a.l. Kiers and {Van Mechelen}, I.",
year = "2013",
doi = "10.1016/j.chemolab.2013.07.005",
language = "English",
volume = "129",
pages = "40--51",
journal = "Chemometrics & Intelligent Laboratory Systems",
issn = "0169-7439",
publisher = "Elsevier Science BV",

}

Identifying common and distinctive processes underlying multiset data. / Van Deun, K.; Smilde, A.k.; Thorrez, L.; Kiers, H.a.l.; Van Mechelen, I.

In: Chemometrics & Intelligent Laboratory Systems, Vol. 129, 2013, p. 40-51.

Research output: Contribution to journalArticleScientificpeer-review

TY - JOUR

T1 - Identifying common and distinctive processes underlying multiset data

AU - Van Deun, K.

AU - Smilde, A.k.

AU - Thorrez, L.

AU - Kiers, H.a.l.

AU - Van Mechelen, I.

PY - 2013

Y1 - 2013

N2 - In many research domains it has become a common practice to rely on multiple sources of data to study the same object of interest. Examples include a systems biology approach to immunology with collection of both gene expression data and immunological readouts for the same set of subjects, and the use of several high-throughput techniques for the same set of fermentation batches. A major challenge is to find the processes underlying such multiset data and to disentangle therein the common processes from those that are distinctive for a specific source. Several integrative methods have been proposed to address this challenge including canonical correlation analysis, simultaneous component analysis, OnPLS, generalized singular value decomposition, DISCO-SCA, and ECO-POWER. To get a better understanding 1) of the methods with respect to finding common and distinctive components and 2) of the relations between these methods, this paper brings the methods together and compares them both on a theoretical level and in terms of analyses of high-dimensional micro-array gene expression data obtained from subjects vaccinated against influenza.Keywords: Multiset data, Common and distinctive, Data integration

AB - In many research domains it has become a common practice to rely on multiple sources of data to study the same object of interest. Examples include a systems biology approach to immunology with collection of both gene expression data and immunological readouts for the same set of subjects, and the use of several high-throughput techniques for the same set of fermentation batches. A major challenge is to find the processes underlying such multiset data and to disentangle therein the common processes from those that are distinctive for a specific source. Several integrative methods have been proposed to address this challenge including canonical correlation analysis, simultaneous component analysis, OnPLS, generalized singular value decomposition, DISCO-SCA, and ECO-POWER. To get a better understanding 1) of the methods with respect to finding common and distinctive components and 2) of the relations between these methods, this paper brings the methods together and compares them both on a theoretical level and in terms of analyses of high-dimensional micro-array gene expression data obtained from subjects vaccinated against influenza.Keywords: Multiset data, Common and distinctive, Data integration

U2 - 10.1016/j.chemolab.2013.07.005

DO - 10.1016/j.chemolab.2013.07.005

M3 - Article

VL - 129

SP - 40

EP - 51

JO - Chemometrics & Intelligent Laboratory Systems

JF - Chemometrics & Intelligent Laboratory Systems

SN - 0169-7439

ER -