Yayınlanmış 1 Ocak 2023 | Sürüm v1
Dergi makalesi Açık

SUMA: a lightweight machine learning model-powered shared nearest neighbour-based clustering application interface for scRNA-Seq data

  • 1. Gebze Tech Univ, Fac Engn, Dept Bioengn, Kocaeli, Turkiye
  • 2. Idea Technol Solut R&D Ctr, Istanbul, Turkiye

Açıklama

Background/aim: Single-cell transcriptomics (scRNA-Seq) explores cellular diversity at the gene expression level. Due to the inherent sparsity and noise in scRNA-Seq data and the uncertainty on the types of sequenced cells, effective clustering and cell type annotation are essential. The graph-based clustering of scRNA-Seq data is a simple yet powerful approach that presents data as a "shared nearest neighbour" graph and clusters the cells using graph clustering algorithms. These algorithms are dependent on several user-defined parameters. Here we present SUMA, a lightweight tool that uses a random forest model to predict the optimum number of neighbours to obtain the optimum clustering results. Moreover, we integrated our method with other commonly used methods in an RShiny application. SUMA can be used in a local environment (https://github.com/hkarakurt8742/SUMA) or as a browser tool (https://hkarakurt.shinyapps.io/ suma/).Materials and methods: Publicly available scRNA-Seq datasets and 3 different graph-based clustering algorithms were used to develop SUMA, and a large range for number of neighbours and variant genes was taken into consideration. The quality of clustering was assessed using the adjusted Rand index (ARI) and true labels of each dataset. The data were split into training and test datasets, and the model was built and optimised using Scikit-learn (Python) and randomForest (R) libraries.Results: The accuracy of our machine learning model was 0.96, while the AUC of the ROC curve was 0.98. The model indicated that the number of cells in scRNA-Seq data is the most important feature when deciding the number of neighbours.Conclusion: We developed and evaluated the SUMA model and implemented the method in the SUMAShiny app, which integrates SUMA with different clustering methods and enables nonbioinformatician users to cluster and visualise their scRNA data easily. The SUMAShiny app is available both for desktop and browser use.

Dosyalar

bib-92c2a33e-acc6-42aa-ae98-1fb69dc71bf0.txt

Dosyalar (212 Bytes)

Ad Boyut Hepisini indir
md5:911a122bb58447a96b18ad47c0ad9dc1
212 Bytes Ön İzleme İndir