Feature Weighting in k-Means Clustering

Modha, Dharmendra S.; Spangler, W. Scott

doi:10.1023/A:1024016609528

Feature Weighting in k-Means Clustering

Published: September 2003

Volume 52, pages 217–237, (2003)
Cite this article

Download PDF

Machine Learning Aims and scope Submit manuscript

Feature Weighting in k-Means Clustering

Download PDF

Dharmendra S. Modha¹ &
W. Scott Spangler¹

6607 Accesses
255 Citations
3 Altmetric
Explore all metrics

Abstract

Data sets with multiple, heterogeneous feature spaces occur frequently. We present an abstract framework for integrating multiple feature spaces in the k-means clustering algorithm. Our main ideas are (i) to represent each data object as a tuple of multiple feature vectors, (ii) to assign a suitable (and possibly different) distortion measure to each feature space, (iii) to combine distortions on different feature spaces, in a convex fashion, by assigning (possibly) different relative weights to each, (iv) for a fixed weighting, to cluster using the proposed convex k-means algorithm, and (v) to determine the optimal feature weighting to be the one that yields the clustering that simultaneously minimizes the average within-cluster dispersion and maximizes the average between-cluster dispersion along all the feature spaces. Using precision/recall evaluations and known ground truth classifications, we empirically demonstrate the effectiveness of feature weighting in clustering on several different application domains.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Agrawal, R., & Srikant, R. (1995). Mining sequential patterns. In Proc. Int. Conf. Data Eng. (pp. 3–14).
Ahonen-Myka, H. (1999). Finding all maximal frequent sequences in text. In D. Mladenic & M. Grobelnik (eds.), ICML-99 Workshop: Machine Learning in Text Data Analysis (pp. 11–17).
Bay, S. D. (1999). The UCI KDD archive. Dept. Inform. and Comput. Sci., Univ. California, Irvine, CA. Available at http://kdd.ics.uci.edu.
Google Scholar
Blake, C., & Merz, C. (1998). UCI repository of machine learning databases. Dept. Inform. and Comput. Sci., Univ. California, Irvine, CA. Available at http://www.ics.uci.edu/?mlearn/MLRepository.html.
Google Scholar
Blum, A. L., & Langley, P. (1997). Selection of relevant features and examples in machine learning. Artificial Intelligence, 97, 245–271.
Google Scholar
Bradley, P.,& Fayyad, U. (1998). Refining initial points for k-means clustering. In Proc. 16th Int. Machine Learning Conf., (pp. 91–99). Bled, Slovenia.
Caruana, R., & Freitag, D. (1994). Greedy attribute selection. In Proc. 11th Int. Machine Learning Conf. (pp. 28–36).
Devaney, M., & Ram, A. (1997). Efficient feature selection in conceptual clustering. In Proc. 14th Int. Machine Learning Conf. (pp. 92–97). Nashville, TN.
Dhillon, I. S.,& Modha, D. S. (2001). Concept decompositions for large sparse text data using clustering. Machine Learning,42:1/2, 143–175.
Google Scholar
Duda, R. O., & Hart, P. E. (1973). Pattern Classification and Scene Analysis. Wiley.
Fisher, D. H. (1987). Knowledge acquisition via incremental conceptual clustering. Machine Learning, 139–172.
Flickner, M., Sawhney, H., Niblack,W., Ashley, J., Huang, Q., Dom, B., Gorkani, M., Hafner, J., Lee, D., Petkovic, D., Steele, D., & Yanker, P. (1995). Query by image and video content: The QBIC system. IEEE Computer, 28:9, 23–32.
Google Scholar
Frakes, W. B., & Baeza-Yates, R. (1992). Information Retrieval: Data Structures and Algorithms. New Jersey: Prentice Hall, Englewood Cliffs.
Google Scholar
Hartigan, J. A. (1975). Clustering Algorithms. Wiley.
Joachims, T. (1997). A probabilistic analysis of the Rocchio Algorithm with TFIDF for text categorization. In Proc. 14th Int. Conf. Machine Learning. (pp. 143–151).
John, G. H., Kohavi, R., & Pfleger, K. (1994). Irrelevant features and the subset selection problem. In Proc. 11th Int. Machine Learning Conf. (pp. 121–129).
Kendall, W. S. (1991). Convexity and the hemisphere. J. London Math. Soc. 43, 567–576.
Google Scholar
Kleinberg, J., Papadimitriou, C. H., & Raghavan, P. (1998). A microeconomic view of data mining. Data Mining and Knowledge Discovery, 2/4, 311–324.
Google Scholar
Koller, D., & Sahami, M. (1996). Towards optimal feature selection. In Proc. 13th Int. Conf. Machine Learning. (pp. 284–292). Bari, Italy.
Mitra, M., Buckley, C., Singhal, A., & Cardie, C. (1997). An analysis of statistical and syntactic phrases. In Proc. RIAO97: Computer-Assisted Inform. Searching on the Internet (pp. 200–214). Montreal, Canada.
Mladeni´c, D., & Grobelnik, M. (1998). Word sequences as features in text-learning. In Proc. 7th Electrotech. Comput. Sci. Conf. ERK'98 (pp. 145–148). Ljubljana, Slovenia.
Modha, D. S., & Spangler, W. S. (2000). Clustering hypertext with applications to web searching. In Proc. ACM Hypertext Conf. (pp. 143–152). San Antonio, TX.
Nelder, J., & Mead, R. (1965). A simplex method for function minimization. Computer Journal, 7, 308.
Google Scholar
Press, W. H., Teukolsky, S. A., Vetterling, W. T., & Flannery, B. P. (1993). Numerical Recipes in C. New York: Cambridge University Press.
Google Scholar
Sabin, M. J., & Gray, R. M. (1986). Global convergence and empirical consistency of the generalized Lloyd algorithm. IEEE Trans. Inform. Theory, 32:2, 148–155.
Google Scholar
Salton, G., & Buckley, C. (1988). Term-weighting approaches in automatic text retrieval. Inform. Proc. & Management (pp. 513–523).
Salton, G., & McGill, M. J. (1983). Introduction to Modern Retrieval. McGraw-Hill Book Company.
Salton, G., Yang, C. S., & Yu, C. T. (1975). A theory of term importance in automatic text analysis. J. Amer. Soc. Inform. Sci.,26:1, 33–44.
Google Scholar
Singhal, A., Buckley, C., Mitra, M., & Salton, G. (1996). Pivoted document length normalization. In Proc. ACM SIGIR (pp. 21–29).
Smeaton, A. F., & Kelledy, F. (1998). User-chosen phrases in interactive query formulation for information retrieval. In Proc. 20th BCS-IRSG Colloquium, Springer-Verlag Electronic Workshops in Comput., Grenoble, France.
Talavera, L. (1999). Feature selection as a preprocessing step for hierarchical clustering. In Proc. 16th Int. Machine Learning Conf. (pp. 389–397). Bled, Slovenia.
Vaithyanathan, S., & Dom, B. (1999). Model selection in unsupervised learning with applications to document clustering. In Proc. 16th Int. Machine Learning Conf. Bled, Slovenia.
Wettschereck, D., Aha, D.W., & Mohri, T. (1997). A review and empirical evaluation of feature weighting methods for a class of lazy learning algorithms. Artificial Intelligence Review, 11, 273–314.
Google Scholar
Yang, Y. (1999). An evaluation of statistical approaches to text categorization. Inform. Retrieval J., 1:1/2, 67–88.
Google Scholar

Download references

Author information

Authors and Affiliations

IBM Almaden Research Center, 650 Harry Road, San Jose, CA, 95120, USA
Dharmendra S. Modha & W. Scott Spangler

Authors

Dharmendra S. Modha
View author publications
You can also search for this author in PubMed Google Scholar
W. Scott Spangler
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Modha, D.S., Spangler, W.S. Feature Weighting in k-Means Clustering. Machine Learning 52, 217–237 (2003). https://doi.org/10.1023/A:1024016609528

Download citation

Issue Date: September 2003
DOI: https://doi.org/10.1023/A:1024016609528

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Feature Weighting in k-Means Clustering

Abstract

Article PDF

Similar content being viewed by others

A Survey on Feature Weighting Based K-Means Algorithms

Feature Relevance in Ward’s Hierarchical Clustering Using the L_p Norm

Feature Maximization Based Clustering Quality Evaluation: A Promising Approach

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

Feature Weighting in k-Means Clustering

Abstract

Article PDF

Similar content being viewed by others

A Survey on Feature Weighting Based K-Means Algorithms

Feature Relevance in Ward’s Hierarchical Clustering Using the L p Norm

Feature Maximization Based Clustering Quality Evaluation: A Promising Approach

Explore related subjects

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation

Feature Relevance in Ward’s Hierarchical Clustering Using the L_p Norm