Effect of distance measures on confidences of t-SNE embeddings and its implications on clustering for scRNA-seq data

    Research output: Contribution to journalArticleScientificpeer-review

    4 Citations (Scopus)

    Abstract

    Arguably one of the most famous dimensionality reduction algorithms of today is t-distributed stochastic neighbor embedding (t-SNE). Although being widely used for the visualization of scRNA-seq data, it is prone to errors as any algorithm and may lead to inaccurate interpretations of the visualized data. A reasonable way to avoid misinterpretations is to quantify the reliability of the visualizations. The focus of this work is first to find the best possible way to predict sample-based confidence scores for t-SNE embeddings and next, to use these confidence scores to improve the clustering algorithms. We adopt an RF regression algorithm using seven distance measures as features for having the sample-based confidence scores with a variety of different distance measures. The best configuration is used to assess the clustering improvement using K-means and Density-Based Spatial Clustering of Applications with Noise (DBSCAN) based on Adjusted Rank Index (ARI), Normalized Mutual Information (NMI), and accuracy (ACC) scores. The experimental results show that distance measures have a considerable effect on the precision of confidence scores and clustering performance can be improved substantially if these confidence scores are incorporated before the clustering algorithm. Our findings reveal the usefulness of these confidence scores on downstream analyses for scRNA-seq data.
    Original languageEnglish
    Article number6567
    Number of pages10
    JournalScientific Reports
    Volume13
    DOIs
    Publication statusPublished - 21 Apr 2023

    Keywords

    • Algorithms
    • Cluster Analysis
    • Reproducibility of Results
    • Sequence Analysis, RNA
    • Single-Cell Analysis
    • Single-Cell Gene Expression Analysis

    Fingerprint

    Dive into the research topics of 'Effect of distance measures on confidences of t-SNE embeddings and its implications on clustering for scRNA-seq data'. Together they form a unique fingerprint.

    Cite this