Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1109/CICN.2011.62guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Document Clustering Using K-Means, Heuristic K-Means and Fuzzy C-Means

Published: 07 October 2011 Publication History

Abstract

Document clustering refers to unsupervised classification (categorization) of documents into groups (clusters) in such a way that the documents in a cluster are similar, whereas documents in different clusters are dissimilar. The documents may be web pages, blog posts, news articles, or other text files. This paper presents our experimental work on applying K-means, heuristic K-means and fuzzy C-means algorithms for clustering text documents. We have experimented with different representations (tf, tf.idf & Boolean) and different feature selection schemes (with or without stop word removal & with or without stemming). We ran our implementations on some standard datasets and computed various performance measures for these algorithms. The results indicate that tf.idf representation, and use of stemming obtains better clustering. Moreover, fuzzy clustering produces better results than both K-means and heuristic K-means on almost all datasets, and is a more stable method.

Cited By

View all
  • (2022)A News-Based Framework for Uncovering and Tracking City Area Profiles: Assessment in Covid-19 SettingACM Transactions on Knowledge Discovery from Data10.1145/353218616:6(1-29)Online publication date: 30-Jul-2022
  • (2022)Konkani WordNet: Corpus-Based Enhancement using CrowdsourcingACM Transactions on Asian and Low-Resource Language Information Processing10.1145/350315621:4(1-18)Online publication date: 4-Mar-2022
  • (2018)Hybrid clustering analysis using improved krill herd algorithmApplied Intelligence10.5555/3288064.328809748:11(4047-4071)Online publication date: 1-Nov-2018
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings
CICN '11: Proceedings of the 2011 International Conference on Computational Intelligence and Communication Networks
October 2011
771 pages
ISBN:9780769545875

Publisher

IEEE Computer Society

United States

Publication History

Published: 07 October 2011

Author Tags

  1. Cluster Evaluation
  2. Document Clustering
  3. Fuzzy C-means
  4. Heuristic K-means
  5. K-means

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 08 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2022)A News-Based Framework for Uncovering and Tracking City Area Profiles: Assessment in Covid-19 SettingACM Transactions on Knowledge Discovery from Data10.1145/353218616:6(1-29)Online publication date: 30-Jul-2022
  • (2022)Konkani WordNet: Corpus-Based Enhancement using CrowdsourcingACM Transactions on Asian and Low-Resource Language Information Processing10.1145/350315621:4(1-18)Online publication date: 4-Mar-2022
  • (2018)Hybrid clustering analysis using improved krill herd algorithmApplied Intelligence10.5555/3288064.328809748:11(4047-4071)Online publication date: 1-Nov-2018
  • (2018)Solving document clustering problem through meta heuristic algorithmProceedings of the 2nd International Conference on Machine Learning and Soft Computing10.1145/3184066.3184085(77-81)Online publication date: 2-Feb-2018
  • (2017)Generating stochastic data to simulate a twitter userProceedings of the 20th Communications & Networking Symposium10.5555/3107979.3107989(1-11)Online publication date: 23-Apr-2017
  • (2016)Parallel Document Clustering using Iterative MapReduceProceedings of the International Conference on Big Data and Advanced Wireless Technologies10.1145/3010089.3010122(1-5)Online publication date: 10-Nov-2016

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media