Introducing SNAC: Sparse Network and Component model for integration of multi-source data

Pia Tio, Lourens J Waldorp, Katrijn Van Deun

Research output: Working paperScientific

Abstract

Gaussian graphical models (GGMs) are a popular method for analysing complex data by modelling the unique relationships between variables. Recently, a shift in interest has taken place from investigating relationships within a discipline (e.g. genetics) to estimating relationships between variables from various disciplines (e.g. how gene expression relates to cognitive performance). It is thus not surprising that there is an increasing need for analysing large, so-called \textit{multi-source} datasets, each containing detailed information from many data sources on the same individuals. GGMs are a straightforward statistical candidate for estimating \textit{unique cross-source relationships} from such network-oriented data. However, the multi-source nature of the data poses two challenges: First, different sources may inherently differ from one another, biasing the estimated relations. Second, GGMs are not cut out for separating cross-source relationships from all other, source-specific relationships. In this paper we propose adding a simultaneous-component-model as a pre-pocessing step to the GGM, the combination of which is suitable for estimating cross-source relationships from multi-source data. Compared to the graphical lasso (a commonly used GGM technique), this Sparse Network And Component (SNAC) model more accurately estimates the unique cross-source relationships from multi-source data. This holds in particular when the data contains more variables than observations ($p>n$). Neither differences in sparseness of the underlying component structure of the data nor in the relative dominance of the cross-source compared to source-specific relationships strongly affect the relationship estimates. Sparse Network And Component analysis, a hybrid component-graphical model, is a promising tool for modelling unique relationships between different data sources, thus providing insight in how various disciplines are connected to one another.
Original languageEnglish
PublisherPsyArXiv Preprints
Number of pages23
Publication statusPublished - 2018

Fingerprint

Dive into the research topics of 'Introducing SNAC: Sparse Network and Component model for integration of multi-source data'. Together they form a unique fingerprint.

Cite this