Abstract
Gene set analysis is a leading bioinformatical technique allowing comparison of phenotypes on gene set level, which is applied to different transcriptome-wide gene expression platforms and omics levels. The aim of this study was to measure the performance of three single-sample gene set enrichment algorithms, based on their ability to obtain the statistical significance of enrichment in each cell separately using scRNA-Seq data. The peripheral blood mononuclear cell dataset was used in the evaluation process and individual enrichment within the B cell subtype was investigated based on reference gene set collection. Sensitivity, specificity, prioritization, and balanced accuracy were used as evaluation metrics, accompanied by correlation analysis between gene sets. AUCell, originally designed for scRNA-Seq, showed the best sensitivity and balanced accuracy, good prioritization and acceptable specificity. However, large correlation between gene set size and specificity was observed, so we recommend its usage on large gene sets (>80). Moreover, the computational time is much longer compared to other tested methods. Among other algorithms, CERNO gave very high specificity and prioritization, but the sensitivity needs to be enhanced by algorithm improvement. Finally, the problem of the “gold standard” dataset and gene set collection that could be used for gene set analysis algorithms performance evaluation in scRNA-Seq, was stated and the initial solution was presented.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Kanehisa, M., Furumichi, M., Tanabe, M., Sato, Y., Morishima, K.: KEGG: new perspectives on genomes, pathways, diseases and drugs. Nucleic Acids Res 45, D353–D361 (2017)
Consortium, G.O.: The Gene Ontology (GO) database and informatics resource. Nucleic Acids Res. 32, D258–D261 (2004)
Subramanian, A., et al.: Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A 102, 15545–15550 (2005)
Dong, X., Hao, Y., Wang, X., Tian, W.: LEGO: a novel method for gene set over-representation analysis by incorporating network-based gene weights. Sci. Rep. 6, 18871 (2016)
Zyla, J., Marczyk, M., Domaszewska, T., Kaufmann, S.H.E., Polanska, J., Weiner, J.: Gene set enrichment for reproducible science: comparison of CERNO and eight other algorithms. Bioinformatics 35, 5146–5154 (2019)
Tomfohr, J., Lu, J., Kepler, T.B.: Pathway level analysis of gene expression using singular value decomposition. BMC Bioinformatics 6, 225 (2005)
Hänzelmann, S., Castelo, R., Guinney, J.: GSVA: gene set variation analysis for microarray and RNA-Seq data. BMC Bioinformatics 14, 7 (2013)
Jaakkola, M.K., McGlinchey, A.J., Klen, R., Elo, L.L.: PASI: a novel pathway method to identify delicate group effects. PLoS ONE 13, e0199991 (2018)
Foroutan, M., Bhuva, D.D., Lyu, R., Horan, K., Cursons, J., Davis, M.J.: Single sample scoring of molecular phenotypes. BMC Bioinformatics 19, 1–10 (2018)
Lee, E., Chuang, H.-Y., Kim, J.-W., Ideker, T., Lee, D.: Inferring pathway activity toward precise disease classification. PLoS Comput. Biol. 4, e1000217 (2008)
Zyla, J., Leszczorz, K., Polanska, J.: Robustness of pathway enrichment analysis to transcriptome-wide gene expression platform. In: International Conference on Practical Applications of Computational Biology & Bioinformatics, pp. 176–185. Springer (Year)
Geistlinger, L., et al.: Toward a gold standard for benchmarking gene set enrichment analysis. Brief Bioinform 22, 545–556 (2021)
Stuart, T., et al.: Comprehensive integration of single-cell data. Cell 177, 1888–1902 (2019). e1821
Andreatta, M., Carmona, S.J.: UCell: robust and scalable single-cell gene signature scoring. bioRxiv (2021)
Aibar, S., et al.: SCENIC: single-cell regulatory network inference and clustering. Nat. Methods 14, 1083–1086 (2017)
Zhang, Y., et al.: Benchmarking algorithms for pathway activity transformation of single-cell RNA-seq data. Comput. Struct. Biotechnol. J. 18, 2953–2961 (2020)
Ding, J., et al.: Systematic comparison of single-cell and single-nucleus RNA-sequencing methods. Nat. Biotechnol. 38, 737–746 (2020)
Marczyk, M., Jaksik, R., Polanski, A., Polanska, J.: GaMRed—adaptive filtering of high-throughput biological data. IEEE/ACM Trans. Comput. Biol. Bioinf. 17, 149–157 (2020)
Widlak, P., et al.: Detection of molecular signatures of oral squamous cell carcinoma and normal epithelium–application of a novel methodology for unsupervised segmentation of imaging mass spectrometry data. Proteomics 16, 1613–1621 (2016)
Chaussabel, D., et al.: A modular analysis framework for blood genomics studies: application to systemic lupus erythematosus. Immunity 29, 150–164 (2008)
Li, S., et al.: Molecular signatures of antibody responses derived from a systems biology study of five human vaccines. Nat. Immunol. 15, 195–204 (2014)
Chen, B., Khodadoust, M.S., Liu, C.L., Newman, A.M., Alizadeh, A.A.: Profiling tumor infiltrating immune cells with CIBERSORT. Methods in molecular biology (Clifton, NJ) 1711, 243 (2018)
Demerath, N.J.: The American Soldier: Volume I, Adjustment During Army Life. By S. A. Stouffer, E. A. Suchman, L. C. DeVinney, S. A. Star, R. M. Williams, Jr. Volume II, Combat and Its Aftermath. By S. A. Stouffer, A. A. Lumsdaine, M. H. Lumsdaine, R. M. Williams, Jr., M. B. Smith, I. L. Janis, S. A. Star, L. S. Cottrell, Jr. Princeton, New Jersey: Princeton University Press, 1949. Vol. I, 599 pp., Vol. II, 675 pp. $7.50 each; $13.50 together. Social Forces 28, 87–90 (1949)
Benjamini, Y., Hochberg, Y.: Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. Roy. Stat. Soc.: Ser. B (Methodol.) 57, 289–300 (1995)
Maaten, L.V.D., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008)
Tarca, A.L., Bhatti, G., Romero, R.: A comparison of gene set analysis methods in terms of sensitivity, prioritization and specificity. PLoS ONE 8, e79217 (2013)
Xie, C., Jauhari, S., Mora, A.: Popularity and performance of bioinformatics software: the case of gene set analysis. BMC Bioinformatics 22, 191 (2021)
Acknowledgements
This work was financed by Silesian University of Technology grant for maintaining and developing research potential (MM, JZ). Anna Mrukwa takes part in mentor program “Spread your wings” at Silesian University of Technology and was financed by reserve of the Vice-Rector for Student Affairs and Education (60/001).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this paper
Cite this paper
Mrukwa, A., Marczyk, M., Zyla, J. (2022). Finding Significantly Enriched Cells in Single-Cell RNA Sequencing by Single-Sample Approaches. In: Rojas, I., Valenzuela, O., Rojas, F., Herrera, L.J., Ortuño, F. (eds) Bioinformatics and Biomedical Engineering. IWBBIO 2022. Lecture Notes in Computer Science(), vol 13347. Springer, Cham. https://doi.org/10.1007/978-3-031-07802-6_3
Download citation
DOI: https://doi.org/10.1007/978-3-031-07802-6_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-07801-9
Online ISBN: 978-3-031-07802-6
eBook Packages: Computer ScienceComputer Science (R0)