Multiple nested reductions of single data modes as a tool to deal with large data sets

Iven Van Mechelen, K. Van Deun

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientific

Abstract

The increased accessibility and concerted use of novel measurement technologies give rise to a data tsunami with matrices that comprise both a high number of variables and a high number of objects. As an example, one may think of transcriptomics data pertaining to the expression of a large number of genes in a large number of samples or tissues (as included in various compendia). The analysis of such data typically implies ill-conditioned optimization problems, as well as major challenges on both a computational and an interpretational level.
In the present paper, we develop a generic method to deal with these problems. This method was originally briefly proposed by Van Mechelen and Schepers (2007). It implies that single data modes (i.e., the set of objects or the set of variables under study) are subjected to multiple (discrete and/or dimensional) nested reductions.
We first formally introduce the generic multiple nested reductions method. Next, we show how a few recently proposed modeling approaches fit within the framework of this method. Subsequently, we briefly introduce a novel instantiation of the generic method, which simultaneously includes a two-mode partitioning of the objects and variables under study (Van Mechelen et al. (2004)) and a low-dimensional, principal component-type dimensional reduction of the two-mode cluster centroids. We illustrate this novel instantiation with an application on transcriptomics data for normal and tumourous colon tissues.
In the discussion, we highlight multiple nested mode reductions as a key feature of the novel method. Furthermore, we contrast the novel method with other approaches that imply different reductions for different modes, and approaches that imply a hybrid dimensional/discrete reduction of a single mode. Finally, we show in which way the multiple reductions method allows a researcher to deal with the challenges implied by the analyis of large data sets as outlined above.
Keywords: high dimensional data, clustering, dimension reduction
Original languageEnglish
Title of host publicationProceedings of COMPSTAT'2010
EditorsY. Lechevallier, G. Saporta
PublisherPhysica-Verlag
Pages349-358
ISBN (Electronic)978-3-7908-2604-3
ISBN (Print)978-3-7908-2603-6
DOIs
Publication statusPublished - 2010
Externally publishedYes

Fingerprint

Tissue
Tsunamis
Genes

Cite this

Van Mechelen, I., & Van Deun, K. (2010). Multiple nested reductions of single data modes as a tool to deal with large data sets. In Y. Lechevallier, & G. Saporta (Eds.), Proceedings of COMPSTAT'2010 (pp. 349-358). Physica-Verlag. https://doi.org/10.1007/978-3-7908-2604-3_32
Van Mechelen, Iven ; Van Deun, K. / Multiple nested reductions of single data modes as a tool to deal with large data sets. Proceedings of COMPSTAT'2010. editor / Y. Lechevallier ; G. Saporta. Physica-Verlag, 2010. pp. 349-358
@inproceedings{5dff4b9dc6af40159c44197357f8c31b,
title = "Multiple nested reductions of single data modes as a tool to deal with large data sets",
abstract = "The increased accessibility and concerted use of novel measurement technologies give rise to a data tsunami with matrices that comprise both a high number of variables and a high number of objects. As an example, one may think of transcriptomics data pertaining to the expression of a large number of genes in a large number of samples or tissues (as included in various compendia). The analysis of such data typically implies ill-conditioned optimization problems, as well as major challenges on both a computational and an interpretational level.In the present paper, we develop a generic method to deal with these problems. This method was originally briefly proposed by Van Mechelen and Schepers (2007). It implies that single data modes (i.e., the set of objects or the set of variables under study) are subjected to multiple (discrete and/or dimensional) nested reductions.We first formally introduce the generic multiple nested reductions method. Next, we show how a few recently proposed modeling approaches fit within the framework of this method. Subsequently, we briefly introduce a novel instantiation of the generic method, which simultaneously includes a two-mode partitioning of the objects and variables under study (Van Mechelen et al. (2004)) and a low-dimensional, principal component-type dimensional reduction of the two-mode cluster centroids. We illustrate this novel instantiation with an application on transcriptomics data for normal and tumourous colon tissues.In the discussion, we highlight multiple nested mode reductions as a key feature of the novel method. Furthermore, we contrast the novel method with other approaches that imply different reductions for different modes, and approaches that imply a hybrid dimensional/discrete reduction of a single mode. Finally, we show in which way the multiple reductions method allows a researcher to deal with the challenges implied by the analyis of large data sets as outlined above.Keywords: high dimensional data, clustering, dimension reduction",
author = "{Van Mechelen}, Iven and {Van Deun}, K.",
year = "2010",
doi = "10.1007/978-3-7908-2604-3_32",
language = "English",
isbn = "978-3-7908-2603-6",
pages = "349--358",
editor = "Y. Lechevallier and G. Saporta",
booktitle = "Proceedings of COMPSTAT'2010",
publisher = "Physica-Verlag",
address = "Germany",

}

Van Mechelen, I & Van Deun, K 2010, Multiple nested reductions of single data modes as a tool to deal with large data sets. in Y Lechevallier & G Saporta (eds), Proceedings of COMPSTAT'2010. Physica-Verlag, pp. 349-358. https://doi.org/10.1007/978-3-7908-2604-3_32

Multiple nested reductions of single data modes as a tool to deal with large data sets. / Van Mechelen, Iven; Van Deun, K.

Proceedings of COMPSTAT'2010. ed. / Y. Lechevallier; G. Saporta. Physica-Verlag, 2010. p. 349-358.

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientific

TY - GEN

T1 - Multiple nested reductions of single data modes as a tool to deal with large data sets

AU - Van Mechelen, Iven

AU - Van Deun, K.

PY - 2010

Y1 - 2010

N2 - The increased accessibility and concerted use of novel measurement technologies give rise to a data tsunami with matrices that comprise both a high number of variables and a high number of objects. As an example, one may think of transcriptomics data pertaining to the expression of a large number of genes in a large number of samples or tissues (as included in various compendia). The analysis of such data typically implies ill-conditioned optimization problems, as well as major challenges on both a computational and an interpretational level.In the present paper, we develop a generic method to deal with these problems. This method was originally briefly proposed by Van Mechelen and Schepers (2007). It implies that single data modes (i.e., the set of objects or the set of variables under study) are subjected to multiple (discrete and/or dimensional) nested reductions.We first formally introduce the generic multiple nested reductions method. Next, we show how a few recently proposed modeling approaches fit within the framework of this method. Subsequently, we briefly introduce a novel instantiation of the generic method, which simultaneously includes a two-mode partitioning of the objects and variables under study (Van Mechelen et al. (2004)) and a low-dimensional, principal component-type dimensional reduction of the two-mode cluster centroids. We illustrate this novel instantiation with an application on transcriptomics data for normal and tumourous colon tissues.In the discussion, we highlight multiple nested mode reductions as a key feature of the novel method. Furthermore, we contrast the novel method with other approaches that imply different reductions for different modes, and approaches that imply a hybrid dimensional/discrete reduction of a single mode. Finally, we show in which way the multiple reductions method allows a researcher to deal with the challenges implied by the analyis of large data sets as outlined above.Keywords: high dimensional data, clustering, dimension reduction

AB - The increased accessibility and concerted use of novel measurement technologies give rise to a data tsunami with matrices that comprise both a high number of variables and a high number of objects. As an example, one may think of transcriptomics data pertaining to the expression of a large number of genes in a large number of samples or tissues (as included in various compendia). The analysis of such data typically implies ill-conditioned optimization problems, as well as major challenges on both a computational and an interpretational level.In the present paper, we develop a generic method to deal with these problems. This method was originally briefly proposed by Van Mechelen and Schepers (2007). It implies that single data modes (i.e., the set of objects or the set of variables under study) are subjected to multiple (discrete and/or dimensional) nested reductions.We first formally introduce the generic multiple nested reductions method. Next, we show how a few recently proposed modeling approaches fit within the framework of this method. Subsequently, we briefly introduce a novel instantiation of the generic method, which simultaneously includes a two-mode partitioning of the objects and variables under study (Van Mechelen et al. (2004)) and a low-dimensional, principal component-type dimensional reduction of the two-mode cluster centroids. We illustrate this novel instantiation with an application on transcriptomics data for normal and tumourous colon tissues.In the discussion, we highlight multiple nested mode reductions as a key feature of the novel method. Furthermore, we contrast the novel method with other approaches that imply different reductions for different modes, and approaches that imply a hybrid dimensional/discrete reduction of a single mode. Finally, we show in which way the multiple reductions method allows a researcher to deal with the challenges implied by the analyis of large data sets as outlined above.Keywords: high dimensional data, clustering, dimension reduction

U2 - 10.1007/978-3-7908-2604-3_32

DO - 10.1007/978-3-7908-2604-3_32

M3 - Conference contribution

SN - 978-3-7908-2603-6

SP - 349

EP - 358

BT - Proceedings of COMPSTAT'2010

A2 - Lechevallier, Y.

A2 - Saporta, G.

PB - Physica-Verlag

ER -

Van Mechelen I, Van Deun K. Multiple nested reductions of single data modes as a tool to deal with large data sets. In Lechevallier Y, Saporta G, editors, Proceedings of COMPSTAT'2010. Physica-Verlag. 2010. p. 349-358 https://doi.org/10.1007/978-3-7908-2604-3_32