Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Penalized and weighted K-means for clustering with scattered objects and prior information in high-throughput biological data

Published: 15 August 2007 Publication History

Abstract

Motivation: Cluster analysis is one of the most important data mining tools for investigating high-throughput biological data. The existence of many scattered objects that should not be clustered has been found to hinder performance of most traditional clustering algorithms in such a high-dimensional complex situation. Very often, additional prior knowledge from databases or previous experiments is also available in the analysis. Excluding scattered objects and incorporating existing prior information are desirable to enhance the clustering performance.
Results: In this article, a class of loss functions is proposed for cluster analysis and applied in high-throughput genomic and proteomic data. Two major extensions from K-means are involved: penalization and weighting. The additive penalty term is used to allow a set of scattered objects without being clustered. Weights are introduced to account for prior information of preferred or prohibited cluster patterns to be identified. Their relationship with the classification likelihood of Gaussian mixture models is explored. Incorporation of good prior information is also shown to improve the global optimization issue in clustering. Applications of the proposed method on simulated data as well as high-throughput data sets from tandem mass spectrometry (MS/MS) and microarray experiments are presented. Our results demonstrate its superior performance over most existing methods and its computational simplicity and extensibility in the application of large complex biological data sets.
Availability: http://www.pitt.edu/~ctseng/research/software.html
Supplementary information: Supplementary data are available at Bioinformatics online.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Bioinformatics
Bioinformatics  Volume 23, Issue 17
August 2007
149 pages

Publisher

Oxford University Press, Inc.

United States

Publication History

Published: 15 August 2007

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 14 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Sparse dual-weighting ensemble clusteringCluster Computing10.1007/s10586-024-04864-y28:2Online publication date: 1-Apr-2025
  • (2024)Jacobian-scaled K-means clustering for physics-informed segmentation of reacting flowsJournal of Computational Physics10.1016/j.jcp.2024.113227514:COnline publication date: 18-Oct-2024
  • (2024)Capacitated spatial clustering with multiple constraints and attributesEngineering Applications of Artificial Intelligence10.1016/j.engappai.2023.107182127:PAOnline publication date: 1-Feb-2024
  • (2023)Nonparametric bagging clustering methods to identify latent structures from a sequence of dependent categorical dataComputational Statistics & Data Analysis10.1016/j.csda.2022.107583177:COnline publication date: 1-Jan-2023
  • (2022)The bi-criteria seeding algorithms for two variants of k-means problemJournal of Combinatorial Optimization10.1007/s10878-020-00537-944:3(1693-1704)Online publication date: 1-Oct-2022
  • (2020)The seeding algorithm for k-means problem with penaltiesJournal of Combinatorial Optimization10.1007/s10878-019-00450-w39:1(15-32)Online publication date: 1-Jan-2020
  • (2019)Local search approximation algorithms for the k-means problem with penaltiesJournal of Combinatorial Optimization10.1007/s10878-018-0278-637:2(439-453)Online publication date: 1-Feb-2019
  • (2018)Supervising unsupervised learningProceedings of the 32nd International Conference on Neural Information Processing Systems10.5555/3327345.3327407(4996-5006)Online publication date: 3-Dec-2018
  • (2018)A Survey of Data Mining and Deep Learning in BioinformaticsJournal of Medical Systems10.1007/s10916-018-1003-942:8(1-20)Online publication date: 1-Aug-2018
  • (2016)An improved computational framework using one stage filtration by incorporating knowledge in gene expression clusteringProceedings of the International Conference on Artificial Intelligence and Robotics and the International Conference on Automation, Control and Robotics Engineering10.1145/2952744.2952752(1-5)Online publication date: 13-Jul-2016
  • Show More Cited By

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media