Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3383783.3383787acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicbraConference Proceedingsconference-collections
research-article

Identifying the best metrics to find the best quality clusters of genes from gene expression data

Published: 17 April 2020 Publication History

Abstract

With the recent advancement of computing technique and data availability in the field of computational biology, it has been a great opportunity for the scientists to find the evolutionary relation among the living beings in terms of their genotypic and phenotypic attributes. Microarray, one of the efficient ways to store the expression level of genes in the living being, can be used to create groups from a set of genes based on their phenotypic information. This information plays an important role in pathway analysis, disease prediction, target identification in drug design and many other important functionalities and applications in biology. However, it has become a great challenge over time to select a particular distance metric to calculate the similarity between the genes. In this work, we have studied 16 possible combinations of metrics to find the groups of similar genes in terms of their expression level by building their phylogenetic relation and keeping the most related genes together. Moreover, we have validated our findings by evaluating the output of the same trials on different data sets. We have found that, for grouping the similar genes together by building a Phylogenetic Tree, Maximum Distance Metric and Average Linkage tends to give the best quality.

References

[1]
H. Anton and C. Rorres. Elementary linear algebra. anton textbooks, 1994.
[2]
A. Bairoch and R. Apweiler. The swiss-prot protein sequence database and its supplement trembl in 2000. Nucleic acids research, 28(1):45--48. 2000.
[3]
C. D. Cantrell. Modern mathematical methods for physicists and engineers. Cambridge University Press, 2000.
[4]
C. Chen, K. Grennan, J. Badner, D. Zhang, E. Gershon, L. Jin, and C. Liu. Removing batch effects in analysis of expression microarray data: an evaluation of six batch adjustment methods. PloS one, 6(2):e17238, 2011.
[5]
S. Craw. Manhattan distance. Encyclopedia of Machine Learning and Data Mining, pages 790--791. 2017.
[6]
P.-E. Danielsson. Euclidean distance mapping. Computer Graphics and image processing, 14(3):227--248. 1980.
[7]
P. D'haeseleer. How does gene expression clustering work? Nature biotechnology, 23(12):1499, 2005.
[8]
B. S. Everitt, S. Landau, M. Leese, and D. Stahl. Hierarchical clustering. Cluster analysis, 5, 2011.
[9]
V. Greiff, P. Bhat, S. C. Cook, U. Menzel, W. Kang, and S. T. Reddy. A bioinformatic framework for immune repertoire diversity profiling enables detection of immunological status. Genome medicine, 7(1):49, 2015.
[10]
N. T. Gupta, K. D. Adams, A. W. Briggs, S. C. Timberlake, F. Vigneault, and S. H. Kleinstein. Hierarchical clustering can identify b cell clones with high confidence in ig repertoire sequencing data. The Journal of Immunology, 198(6):2489-2499, 2017.
[11]
A. K. Jain, M. N. Murty, and P. J. Flynn. Data clustering: a review. ACM computing surveys (CSUR), 31(3):264--323. 1999.
[12]
S. C.Johnson. Hierarchical clustering schemes. Psychometrika, 32(3):241--254. 1967.
[13]
T. Klove, T.-T. Lin, S.-C. Tsai, and W.-G. Tzeng. Permutation arrays under the chebyshev distance. IEEE Transactions on Information Theory, 56(6):2611--2617. 2010.
[14]
I. Letunic and P. Bork. Interactive tree of life (itol): an online tool for phylogenetic tree display and annotation. Bioinformatics, 23(1):127--128. 2006.
[15]
I. Letunic and P. Bork. Interactive tree of life v2: online annotation and display of phylogenetic trees made easy. Nucleic acids research, 39(suppl_2):W475-W478, 2011.
[16]
X. Lin, R. Wang, J. Zhang, X. Sun, Z. Zou, S. Wang, and M. Jin. Insights into human astrocyte response to h5n1 infection by microarray analysis. Viruses, 7(5):2618--2640. 2015.
[17]
E. Lord, A. B. Diallo, and V. Makarenkov. Classification of bioin-formatics workflows using weighted versions of partitioning and hierarchical clustering algorithms. BMC bioinformatics, 16(1):68, 2015.
[18]
H. Ma and A.-P. Zeng. Reconstruction of metabolic networks from genome data and analysis of their global structure for various organisms. Bioinformatics, 19(2):270--277. 2003.
[19]
W. P. Maddison. A method for testing the correlated evolution of two binary characters: are gains or losses concentrated on certain branches of a phylogenetic tree? Evolution, 44(3):539--557. 1990.
[20]
M. Pop and S. L. Salzberg. Bioinformatics challenges of new sequencing technology. Trends in genetics, 24(3):142--149. 2008.
[21]
D. C. Richter, F. Ott, A. F. Auch, R. Schmid, and D. H. Huson. MetasimâĂŤa sequencing simulator for genomics and metagenomics. PloS one, 3(10):e3373, 2008.
[22]
R. Wang, X.-F. Chen, and Y.-Q. Shu. Prediction of non-small cell lung cancer metastasis-associated micrornas using bioinformatics. American journal of cancer research, 5(1):32, 2015.
[23]
J. H. Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236--244. 1963.
[24]
C. N. White, D. W. Chan, and Z. Zhang. Bioinformatics strategies for proteomic profiling. Clinical biochemistry, 37(7):636--641. 2004.
[25]
C. Xu and Z. Su. Identification of cell types from single-cell transcriptomes using a novel clustering method. Bioinformatics, 31(12):1974-1980, 2015.
[26]
Z. Zhu, Y.-S. Ong, and M. Dash. Markov blanket-embedded genetic algorithm for gene selection. Pattern Recognition, 40(11):3236--3248, 2007.

Index Terms

  1. Identifying the best metrics to find the best quality clusters of genes from gene expression data

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image ACM Other conferences
        ICBRA '19: Proceedings of the 6th International Conference on Bioinformatics Research and Applications
        December 2019
        169 pages
        ISBN:9781450372183
        DOI:10.1145/3383783
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        In-Cooperation

        • Sun Yat-Sen University
        • Seoul National University

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 17 April 2020

        Permissions

        Request permissions for this article.

        Check for updates

        Author Tags

        1. Bioinformatics
        2. Distance Metric
        3. Gene Expression
        4. Hierarchical Clustering
        5. Linkage Method
        6. Microarray
        7. Phylogenetic Tree

        Qualifiers

        • Research-article
        • Research
        • Refereed limited

        Conference

        ICBRA '19

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • 0
          Total Citations
        • 69
          Total Downloads
        • Downloads (Last 12 months)7
        • Downloads (Last 6 weeks)0
        Reflects downloads up to 15 Jan 2025

        Other Metrics

        Citations

        View Options

        Login options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media