Multiple nested reductions of single data modes as a tool to deal with large data sets

Iven Van Mechelen, K. Van Deun

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientific


The increased accessibility and concerted use of novel measurement technologies give rise to a data tsunami with matrices that comprise both a high number of variables and a high number of objects. As an example, one may think of transcriptomics data pertaining to the expression of a large number of genes in a large number of samples or tissues (as included in various compendia). The analysis of such data typically implies ill-conditioned optimization problems, as well as major challenges on both a computational and an interpretational level.
In the present paper, we develop a generic method to deal with these problems. This method was originally briefly proposed by Van Mechelen and Schepers (2007). It implies that single data modes (i.e., the set of objects or the set of variables under study) are subjected to multiple (discrete and/or dimensional) nested reductions.
We first formally introduce the generic multiple nested reductions method. Next, we show how a few recently proposed modeling approaches fit within the framework of this method. Subsequently, we briefly introduce a novel instantiation of the generic method, which simultaneously includes a two-mode partitioning of the objects and variables under study (Van Mechelen et al. (2004)) and a low-dimensional, principal component-type dimensional reduction of the two-mode cluster centroids. We illustrate this novel instantiation with an application on transcriptomics data for normal and tumourous colon tissues.
In the discussion, we highlight multiple nested mode reductions as a key feature of the novel method. Furthermore, we contrast the novel method with other approaches that imply different reductions for different modes, and approaches that imply a hybrid dimensional/discrete reduction of a single mode. Finally, we show in which way the multiple reductions method allows a researcher to deal with the challenges implied by the analyis of large data sets as outlined above.
Keywords: high dimensional data, clustering, dimension reduction
Original languageEnglish
Title of host publicationProceedings of COMPSTAT'2010
EditorsY. Lechevallier, G. Saporta
ISBN (Electronic)978-3-7908-2604-3
ISBN (Print)978-3-7908-2603-6
Publication statusPublished - 2010
Externally publishedYes


Dive into the research topics of 'Multiple nested reductions of single data modes as a tool to deal with large data sets'. Together they form a unique fingerprint.

Cite this