Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Computational cluster validation in post-genomic data analysis

Published: 01 August 2005 Publication History

Abstract

Motivation: The discovery of novel biological knowledge from the ab initio analysis of post-genomic data relies upon the use of unsupervised processing methods, in particular clustering techniques. Much recent research in bioinformatics has therefore been focused on the transfer of clustering methods introduced in other scientific fields and on the development of novel algorithms specifically designed to tackle the challenges posed by post-genomic data. The partitions returned by a clustering algorithm are commonly validated using visual inspection and concordance with prior biological knowledge---whether the clusters actually correspond to the real structure in the data is somewhat less frequently considered. Suitable computational cluster validation techniques are available in the general data-mining literature, but have been given only a fraction of the same attention in bioinformatics.
Results: This review paper aims to familiarize the reader with the battery of techniques available for the validation of clustering results, with a particular focus on their application to post-genomic data analysis. Synthetic and real biological datasets are used to demonstrate the benefits, and also some of the perils, of analytical clustervalidation.
Availability: The software used in the experiments is available at http://dbkweb.ch.umist.ac.uk/handl/clustervalidation/
Supplementary information: Enlarged colour plots are provided in the Supplementary Material, which is available at http://dbkweb.ch.umist.ac.uk/handl/clustervalidation/

Cited By

View all
  • (2024)A Survey on AutoML Methods and Systems for ClusteringACM Transactions on Knowledge Discovery from Data10.1145/364356418:5(1-30)Online publication date: 26-Jan-2024
  • (2024)A Comparative Study of Marriage-Divorce and Related Factors in Türkiye’s Regions Using Kohonen SOM - MDS Tandem and MULTIMOORA ApproachesSN Computer Science10.1007/s42979-024-02914-15:5Online publication date: 11-May-2024
  • (2024)Energy efficient power cap configurations through Pareto front analysis and machine learning categorizationCluster Computing10.1007/s10586-023-04151-227:3(3433-3449)Online publication date: 1-Jun-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Bioinformatics
Bioinformatics  Volume 21, Issue 15
August 2005
131 pages

Publisher

Oxford University Press, Inc.

United States

Publication History

Published: 01 August 2005

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 16 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)A Survey on AutoML Methods and Systems for ClusteringACM Transactions on Knowledge Discovery from Data10.1145/364356418:5(1-30)Online publication date: 26-Jan-2024
  • (2024)A Comparative Study of Marriage-Divorce and Related Factors in Türkiye’s Regions Using Kohonen SOM - MDS Tandem and MULTIMOORA ApproachesSN Computer Science10.1007/s42979-024-02914-15:5Online publication date: 11-May-2024
  • (2024)Energy efficient power cap configurations through Pareto front analysis and machine learning categorizationCluster Computing10.1007/s10586-023-04151-227:3(3433-3449)Online publication date: 1-Jun-2024
  • (2023)Uncertainty clustering internal validity assessment using Fréchet distance for unsupervised learningEngineering Applications of Artificial Intelligence10.1016/j.engappai.2023.106635124:COnline publication date: 1-Sep-2023
  • (2023)DStab: estimating clustering quality by distance stabilityPattern Analysis & Applications10.1007/s10044-023-01175-726:3(1463-1479)Online publication date: 1-Aug-2023
  • (2023)DDCAL: Evenly Distributing Data into Low Variance Clusters Based on Iterative Feature ScalingJournal of Classification10.1007/s00357-022-09428-640:1(106-144)Online publication date: 25-Jan-2023
  • (2022)Machine Learning Approaches for Anomaly Detection in IoT: An Overview and Future Research DirectionsWireless Personal Communications: An International Journal10.1007/s11277-021-08994-z122:3(2309-2324)Online publication date: 1-Feb-2022
  • (2021)Person-Centered Predictions of Psychological Constructs with Social Media Contextualized by Multimodal SensingProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/34481175:1(1-32)Online publication date: 30-Mar-2021
  • (2021)Device personalization for heterogeneous populations: leveraging physician expertise and national population data to identify medical device patient user groupsUser Modeling and User-Adapted Interaction10.1007/s11257-021-09305-831:5(979-1025)Online publication date: 1-Nov-2021
  • (2021)Using Projection-Based Clustering to Find Distance- and Density-Based Clusters in High-Dimensional DataJournal of Classification10.1007/s00357-020-09373-238:2(280-312)Online publication date: 1-Jul-2021
  • Show More Cited By

View Options

View options

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media