Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article
Free access

Data clustering: a review

Published: 01 September 1999 Publication History

Abstract

Clustering is the unsupervised classification of patterns (observations, data items, or feature vectors) into groups (clusters). The clustering problem has been addressed in many contexts and by researchers in many disciplines; this reflects its broad appeal and usefulness as one of the steps in exploratory data analysis. However, clustering is a difficult problem combinatorially, and differences in assumptions and contexts in different communities has made the transfer of useful generic concepts and methodologies slow to occur. This paper presents an overview of pattern clustering methods from a statistical pattern recognition perspective, with a goal of providing useful advice and references to fundamental concepts accessible to the broad community of clustering practitioners. We present a taxonomy of clustering techniques, and identify cross-cutting themes and recent advances. We also describe some important applications of clustering algorithms such as image segmentation, object recognition, and information retrieval.

References

[1]
AARTS, E. AND KORST, J. 1989. Simulated Annealing and Boltzmann Machines: A Stochastic Approach to Combinatorial Optimization and Neural Computing. Wiley-Interscience series in discrete mathematics and optimization. John Wiley and Sons, Inc., New York, NY.
[2]
ACM, 1994. ACM CR Classifications. ACM Computing Surveys 35, 5-16.
[3]
AL-SULTAN, K.S. 1995. A tabu search approach to clustering problems. Pattern Recogn. 28, 1443-1451.
[4]
AL-SULTAN, K. S. AND KHAN, M. M. 1996. Computational experience on four algorithms for the hard clustering problem. Pattern Recogn. Lett. 17, 3, 295-308.
[5]
ALLEN, P. A. AND ALLEN, J. R. 1990. Basin Analysis: Principles and Applications. Blackwell Scientific Publications, Inc., Cambridge, MA.
[6]
ALTA VISTA, 1999. http://altavista.digital.com.
[7]
AMADASUN, M. AND KING, R.A. 1988. Low-level segmentation of multispectral images via agglomerative clustering of uniform neighbourhoods. Pattern Recogn. 21, 3 (1988), 261-268.
[8]
ANDERBERG, M. R. 1973. Cluster Analysis for Applications. Academic Press, Inc., New York, NY.
[9]
AUGUSTSON, J. G. AND MINKER, J. 1970. An analysis of some graph theoretical clustering techniques. J. ACM 17, 4 (Oct. 1970), 571- 588.
[10]
BABU, G. P. AND MURTY, M. N. 1993. A nearoptimal initial seed value selection in K-means algorithm using a genetic algorithm. Pattern Recogn. Lett. 14, 10 (Oct. 1993), 763- 769.
[11]
BABU, G. P. AND MURTY, M.N. 1994. Clustering with evolution strategies. Pattern Recogn. 27, 321-329.
[12]
BABU, G. P., MURTY, M. N., AND KEERTHI, S. S. 2000. Stochastic connectionist approach for pattern clustering (To appear). IEEE Trans. Syst. Man Cybern.
[13]
BACKER, F. B. AND HUBERT, L.g. 1976. A graphtheoretic approach to goodness-of-fit in complete-link hierarchical clustering. J. Am. Stat. Assoc. 71,870-878.
[14]
BACKER, E. 1995. Computer-Assisted Reasoning in Cluster Analysis. Prentice Hall International (UK) Ltd., Hertfordshire, UK.
[15]
BAEZA-YATES, R.A. 1992. Introduction to data structures and algorithms related to information retrieval. In Information Retrieval: Data Structures and Algorithms, W. B. Frakes and R. Baeza-Yates, Eds. Prentice- Hall, Inc., Upper Saddle River, NJ, 13-27.
[16]
BAJCSY, P. 1997. Hierarchical segmentation and clustering using similarity analysis. Ph.D. Dissertation. Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL.
[17]
BALL, G. H. AND HALL, D.J. 1965. ISODATA, a novel method of data analysis and classification. Tech. Rep. Stanford University, Stanford, CA.
[18]
BENTLEY, J. L. AND FRIEDMAN, J.H. 1978. Fast algorithms for constructing minimal spanning trees in coordinate spaces. IEEE Trans. Comput. C-27, 6 (June), 97-105.
[19]
BEZDEK, J. C. 1981. Pattern Recognition With Fuzzy Objective Function Algorithms. Plenum Press, New York, NY.
[20]
BHUYAN, J. N., RAGHAVAN, V. V., AND VENKATESH, K.E. 1991. Genetic algorithm for clustering with an ordered representation. In Proceedings of the Fourth International Conference on Genetic Algorithms, 408-415.
[21]
BISWAS, G., WEINBERG, J., AND LI, C. 1995. A Conceptual Clustering Method for Knowledge Discovery in Databases. Editions Technip.
[22]
BRAILOVSKY, V. L. 1991. A probabilistic approach to clustering. Pattern Recogn. Lett. 12, 4 (Apr. 1991), 193-198.
[23]
BRODATZ, P. 1966. Textures: A Photographic Album for Artists and Designers. Dover Publications, Inc., Mineola, NY.
[24]
CAN, F. 1993. Incremental clustering for dynamic information processing. ACM Trans. Inf. Syst. 11, 2 (Apr. 1993), 143-164.
[25]
CARPENTER, G. AND GROSSBERG, S. 1990. ART3: Hierarchical search using chemical transmitters in self-organizing pattern recognition architectures. Neural Networks 3, 129-152.
[26]
CHEKURI, C., GOLDWASSER, M. H., RAGHAVAN, P., AND UPFAL, E. 1997. Web search using automatic classification. In Proceedings of the Sixth International Conference on the World Wide Web (Santa Clara, CA, Apr.), http:// theory.stanford.edu/people/wass/publications/ Web Search/Web Search.html.
[27]
CHENG, C. H. 1995. A branch-and-bound clustering algorithm. IEEE Trans. Syst. Man Cybern. 25, 895-898.
[28]
CHENG, Y. AND FU, K.S. 1985. Conceptual clustering in knowledge organization. IEEE Trans. Pattern Anal. Mach. Intell. 7, 592-598.
[29]
CHENG, Y. 1995. Mean shift, mode seeking, and clustering. IEEE Trans. Pattern Anal. Mach. Intell. 17, 7 (July), 790-799.
[30]
CHIEN, Y.T. 1978. Interactive Pattern Recognition. Marcel Dekker, Inc., New York, NY.
[31]
CHOUDHURY, S. AND MURTY, M.N. 1990. A divisive scheme for constructing minimal spanning trees in coordinate space. Pattern Recogn. Lett. 11, 6 (Jun. 1990), 385-389.
[32]
1996. Special issue on data mining. Commun. ACM 39, 11.
[33]
COLEMAN, G. B. AND ANDREWS, H. C. 1979. Image segmentation by clustering. Proc. IEEE 67, 5, 773-785.
[34]
CONNELL, S. AND JAIN, A. K. 1998. Learning prototypes for on-line handwritten digits. In Proceedings of the 14th International Conference on Pattern Recognition (Brisbane, Australia, Aug.), 182-184.
[35]
CROSS, S. E., Ed. 1996. Special issue on data mining. IEEE Expert 11, 5 (Oct.).
[36]
DALE, M. B. 1985. On the comparison of conceptual clustering and numerical taxonomy. IEEE Trans. Pattern Anal. Mach. Intell. 7, 241-244.
[37]
DAVE, R. N. 1992. Generalized fuzzy C-shells clustering and detection of circular and elliptic boundaries. Pattern Recogn. 25, 713-722.
[38]
DAVIS, T., Ed. 1991. The Handbook of Genetic Algorithms. Van Nostrand Reinhold Co., New York, NY.
[39]
DAY, W. H.E. 1992. Complexity theory: An introduction for practitioners of classification. In Clustering and Classification, P. Arabie and L. Hubert, Eds. World Scientific Publishing Co., Inc., River Edge, NJ.
[40]
DEMPSTER, A. P., LAIRD, N. M., AND RUB IN, D. B. 1977. Maximum likelihood from incomplete data via the EM algorithm. J. Royal Stat. Soc. B. 39, 1, 1-38.
[41]
DIDAY, E. 1973. The dynamic cluster method in non-hierarchical clustering. J. Comput. Inf. Sci. 2, 61-88.
[42]
DIDAY, E. AND SIMON, J. C. 1976. Clustering analysis. In Digital Pattern Recognition, K. S. Fu, Ed. Springer-Verlag, Secaucus, NJ, 47-94.
[43]
DIDAY, E. 1988. The symbolic approach in clustering. In Classification and Related Methods, H. H. Bock, Ed. North-Holland Publishing Co., Amsterdam, The Netherlands.
[44]
DORAI, C. AND JAIN, A.K. 1995. Shape spectra based view grouping for free-form objects. In Proceedings of the International Conference on Image Processing (ICIP-95), 240-243.
[45]
DUBES, R. C. AND JAIN, A. K. 1976. Clustering techniques: The user's dilemma. Pattern Recogn. 8, 247-260.
[46]
DUBES, R. C. AND JAIN, A. K. 1980. Clustering methodology in exploratory data analysis. In Advances in Computers, M. C. Yovits, Ed. Academic Press, Inc., New York, NY, 113- 125.
[47]
DUBES, R. C. 1987. How many clusters are best?--an experiment. Pattern Recogn. 20, 6 (Nov. 1, 1987), 645-663.
[48]
DUBES, R.C. 1993. Cluster analysis and related issues. In Handbook of Pattern Recognition & Computer Vision, C. H. Chen, L. F. Pau, and P. S. P. Wang, Eds. World Scientific Publishing Co., Inc., River Edge, NJ, 3-32.
[49]
DUBUISSON, M. P. AND JAIN, A.K. 1994. A modified Hausdorff distance for object matching. In Proceedings of the International Conference on Pattern Recognition (ICPR '94), 566-568.
[50]
DUDA, R. O. AND HART, P. E. 1973. Pattern Classification and Scene Analysis. John Wiley and Sons, Inc., New York, NY.
[51]
DUNN, S., JANOS, L., AND ROSENFELD, A. 1983. Bimean clustering. Pattern Recogn. Lett. 1, 169-173.
[52]
DURAN, B. S. AND ODELL, P. L. 1974. Cluster Analysis: A Survey. Springer-Verlag, New York, NY.
[53]
EDDY, W. F., MOCKUS, A., AND OUE, S. 1996. Approximate single linkage cluster analysis of large data sets in high-dimensional spaces. Comput. Stat. Data Anal. 23, 1, 29-43.
[54]
ETZIONI, O. 1996. The World-Wide Web: quagmire or gold mine? Commun. ACM 39, 11, 65-68.
[55]
EVERITT, B.S. 1993. Cluster Analysis. Edward Arnold, Ltd., London, UK.
[56]
FABER, V. 1994. Clustering and the continuous k-means algorithm. Los Alamos Science 22, 138-144.
[57]
FABER, V., HOCHBERG, J. C., KELLY, P. M., THOMAS, T. R., AND WHITE, J.M. 1994. Concept extraction: A data-mining technique. Los Alamos Science 22, 122-149.
[58]
FAYYAD, U. M. 1996. Data mining and knowledge discovery: Making sense out of data. IEEE Expert 11, 5 (Oct.), 20-25.
[59]
FISHER, D. AND LANGLEY, P. 1986. Conceptual clustering and its relation to numerical taxonomy. In Artificial Intelligence and Statistics, A W. Gale, Ed. Addison-Wesley Longman Publ. Co., Inc., Reading, MA, 77-116.
[60]
FISHER, D. 1987. Knowledge acquisition via incremental conceptual clustering. Mach. Learn. 2, 139-172.
[61]
FISHER, D., Xu, L., CARNES, R., RICH, Y., FENVES, S. J., CHEN, J., SHIAVI, R., BISWAS, G., AND WEIN- BERG, J. 1993. Applying AI clustering to engineering tasks. IEEE Expert 8, 51-60.
[62]
FISHER, L. AND VAN NESS, J. W. 1971. Admissible clustering procedures. Biometrika 58, 91-104.
[63]
FLYNN, P. J. AND JAIN, A.K. 1991. BONSAI: 3D object recognition using constrained search. IEEE Trans. Pattern Anal. Mach. Intell. 13, 10 (Oct. 1991), 1066-1075.
[64]
FOGEL, D. B. AND SIMPSON, P.K. 1993. Evolving fuzzy clusters. In Proceedings of the International Conference on Neural Networks (San Francisco, CA), 1829-1834.
[65]
FOGEL, D. B. AND FOGEL, L. J., Eds. 1994. Special issue on evolutionary computation. IEEE Trans. Neural Netw. (Jan.).
[66]
FOGEL, L. J., OWENS, A. J., AND WALSH, M. J. 1965. Artificial Intelligence Through Simulated Evolution. John Wiley and Sons, Inc., New York, NY.
[67]
FRAKES, W. B. AND BAEZA-YATES, R., Eds. 1992. Information Retrieval: Data Structures and Algorithms. Prentice-Hall, Inc., Upper Saddle River, NJ.
[68]
FRED, A. L. N. AND LEITAO, J. M. N. 1996. A minimum code length technique for clustering of syntactic patterns. In Proceedings of the International Conference on Pattern Recognition (Vienna, Austria), 680-684.
[69]
FRED, A. L. N. 1996. Clustering of sequences using a minimum grammar complexity criterion. In Grammatical Inference: Learning Syntax from Sentences, L. Miclet and C. Higuera, Eds. Springer-Verlag, Secaucus, NJ, 107-116.
[70]
Fu, K. S. AND LU, S.Y. 1977. A clustering procedure for syntactic patterns. IEEE Trans. Syst. Man Cybern. 7, 734-742.
[71]
Fu, K. S. AND MUI, J. K. 1981. A survey on image segmentation. Pattern Recogn. 13, 3-16.
[72]
FUKUNAGA, Z. 1990. Introduction to Statistical Pattern Recognition. 2nd ed. Academic Press Prof., Inc., San Diego, CA.
[73]
GLOVER, F. 1986. Future paths for integer programming and links to artificial intelligence. Comput. Oper. Res. 13, 5 (May 1986), 533- 549.
[74]
GOLDBERG, D. E. 1989. Genetic Algorithms in Search, Optimization and Machine Learning. Addison-Wesley Publishing Co., Inc., Redwood City, CA.
[75]
GORDON, A. D. AND HENDERSON, J. T. 1977. Algorithm for Euclidean sum of squares. Biometrics 33, 355-362.
[76]
GOTLIEB, G. C. AND KUMAR, S. 1968. Semantic clustering of index terms. J. ACM 15, 493- 513.
[77]
GOWDA, K. C. 1984. A feature reduction and unsupervised classification algorithm for multispectral data. Pattern Recogn. 17, 6, 667- 676.
[78]
GOWDA, K. C. AND KRISHNA, G. 1977. Agglomerative clustering using the concept of mutual nearest neighborhood. Pattern Recogn. 10, 105-112.
[79]
GOWDA, K. C. AND DIDAY, E. 1992. Symbolic clustering using a new dissimilarity meG- sure. IEEE Trans. Syst. Man Cybern. 22, 368-378.
[80]
GOWER, J. C. AND ROSS, G. J.S. 1969. Minimum spanning rees and single-linkage cluster analysis. Appl. Stat. 18, 54-64.
[81]
GREFENSTETTE, J 1986. Optimization of control parameters for genetic algorithms. IEEE Trans. Syst. Man Cybern. SMC-16, 1 (Jan./ Feb. 1986), 122-128.
[82]
HARALICK, R. M. AND KELLY, G. L. 1969. Pattern recognition with measurement space and spatial clustering for multiple images. Proc. IEEE 57, 4, 654-665.
[83]
HARTIGAN, J. A. 1975. Clustering Algorithms. John Wiley and Sons, Inc., New York, NY.
[84]
HEDBERG, S. 1996. Searching for the mother lode: Tales of the first data miners. IEEE Expert 11, 5 (Oct.), 4-7.
[85]
HERTZ, J., KROGH, A., AND PALMER, R. G. 1991. Introduction to the Theory of Neural Computation. Santa Fe Institute Studies in the Sciences of Complexity lecture notes. Addison- Wesley Longman Publ. Co., Inc., Reading, MA.
[86]
HOFFMAN, R. AND JAIN, A. K. 1987. Segmentation and classification of range images. IEEE Trans. Pattern Anal. Mach. Intell. PAMI-9, 5 (Sept. 1987), 608-620.
[87]
HOFMANN, T. AND BUHMANN, J. 1997. Pairwise data clustering by deterministic annealing. IEEE Trans. Pattern Anal. Mach. Intell. 19, 1 (Jan.), 1-14.
[88]
HOFMANN, T., PUZICHA, J., AND BUCHMANN, J. M. 1998. Unsupervised texture segmentation in a deterministic annealing framework. IEEE Trans. Pattern Anal. Mach. Intell. 20, 8, 803-818.
[89]
HOLLAND, J.H. 1975. Adaption in Natural and Artificial Systems. University of Michigan Press, Ann Arbor, MI.
[90]
HOOVER, A., JEAN-BAPTISTE, G., JIANG, X., FLYNN, P. J., BUNKE, H., GOLDGOF, D. B., BOWYER, K., EGGERT, D. W., FITZGIBBON, A., AND FISHER, R. B. 1996. An experimental comparison of range image segmentation algorithms. IEEE Trans. Pattern Anal. Mach. Intell. 18, 7, 673- 689.
[91]
HUTTENLOCHER, D. P., KLANDERMAN, G. A., AND RUCKLIDGE, W.J. 1993. Comparing images using the Hausdorff distance. IEEE Trans. Pattern Anal. Mach. Intell. 15, 9, 850-863.
[92]
ICHINO, M. AND YAGUCHI, H. 1994. Generalized Minkowski metrics for mixed feature-type data analysis. IEEE Trans. Syst. Man Cybern. 24, 698-708.
[93]
1991. Proceedings of the International Joint Conference on Neural Networks. (IJCNN'91).
[94]
1992. Proceedings of the International Joint Conference on Neural Networks.
[95]
ISMAIL, M. A. AND KAMEL, M. S. 1989. Multidimensional data clustering utilizing hybrid search strategies. Pattern Recogn. 22, 1 (Jan. 1989), 75-89.
[96]
JAIN, A. K. AND DUBES, R.C. 1988. Algorithms for Clustering Data. Prentice-Hall advanced reference series. Prentice-Hall, Inc., Upper Saddle River, NJ.
[97]
JAIN, A. K. AND FARROKHNIA, F. 1991. Unsupervised texture segmentation using Gabor filters. Pattern Recogn. 24, 12 (Dec. 1991), 1167-1186.
[98]
JAIN, A. K. AND BHATTACHARJEE, S. 1992. Text segmentation using Gabor filters for automatic document processing. Mach. Vision Appl. 5, 3 (Summer 1992), 169-184.
[99]
JAIN, A. J. AND FLYNN, P. J., Eds. 1993. Three Dimensional Object Recognition Systems. Elsevier Science Inc., New York, NY.
[100]
JAIN, A. K. AND MAO, J. 1994. Neural networks and pattern recognition. In Computational Intelligence: Imitating Life, J. M. Zurada, R. J. Marks, and C. J. Robinson, Eds. 194- 212.
[101]
JAIN, A. K. AND FLYNN, P.J. 1996. Image segmentation using clustering. In Advances in Image Understanding: A Festschrift for Azriel Rosenfeld, N. Ahuja and K. Bowyer, Eds, IEEE Press, Piscataway, NJ, 65-83.
[102]
JAIN, A. K. AND MAO, J. 1996. Artificial neural networks: A tutorial. IEEE Computer 29 (Mar.), 31-44.
[103]
JAIN, A. K., RATHA, N. K., AND LAKSHMANAN, S. 1997. Object detection using Gabor filters. Pattern Recogn. 30, 2, 295-309.
[104]
JAIN, N. C., INDRAYAN, A., AND GOEL, L. R. 1986. Monte Carlo comparison of six hierarchical clustering methods on random data. Pattern Recogn. 19, 1 (Jan./Feb. 1986), 95-99.
[105]
JAIN, R., KASTURI, R., AND SCHUNCK, B. G. 1995. Machine Vision. McGraw-Hill series in computer science. McGraw-Hill, Inc., New York, NY.
[106]
JARVIS, R. A. AND PATRICK, E. A. 1973. Clustering using a similarity method based on shared near neighbors. IEEE Trans. Comput. C-22, 8 (Aug.), 1025-1034.
[107]
JOLION, J.-M., MEER, P., AND BATAOUCHE, S. 1991. Robust clustering with applications in computer vision. IEEE Trans. Pattern Anal. Mach. Intell. 13, 8 (Aug. 1991), 791-802.
[108]
JONES, D. AND BELTRAMO, M.A. 1991. Solving partitioning problems with genetic algorithms. In Proceedings of the Fourth International Conference on Genetic Algorithms, 442-449.
[109]
JUDD, D., MCKINLEY, P., AND JAIN, A. K. 1996. Large-scale parallel data clustering. In Proceedings of the International Conference on Pattern Recognition (Vienna, AustriG), 488-493.
[110]
KING, B. 1967. Step-wise clustering procedures. J. Am. Stat. Assoc. 69, 86-101.
[111]
KIRKPATRICK, S., GELATT, C. D., JR., AND VECCHI, M.P. 1983. Optimization by simulated annealing. Science 220, 4598 (May), 671-680.
[112]
KLEIN, R. W. AND DUBES, R. C. 1989. Experiments in projection and clustering by simulated annealing. Pattern Recogn. 22, 213-220.
[113]
KNUTH, D. 1973. The Art of Computer Programming. Addison-Wesley, Reading, MA.
[114]
KOONTZ, W. L. G., FUKUNAGA, K., AND NARENDRA, P.M. 1975. A branch and bound clustering algorithm. IEEE Trans. Comput. 23, 908- 914.
[115]
KOHONEN, T. 1989. Self-Organization andAssociative Memory. 3rd ed. Springer information sciences series. Springer-Verlag, New York, NY.
[116]
KRAAIJVELD, M., MAO, J., AND JAIN, A. K. 1995. A non-linear projection method based on Kohonen's topology preserving maps. IEEE Trans. Neural Netw. 6, 548-559.
[117]
KRISHNAPURAM, R., FRIGUI, H., AND NASRAOUI, O. 1995. Fuzzy and probabilistic shell clustering algorithms and their application to boundary detection and surface approximation. IEEE Trans. Fuzzy Systems 3, 29-60.
[118]
KURITA, T. 1991. An efficient agglomerative clustering algorithm using a heap. Pattern Recogn. 24, 3 (1991), 205-209.
[119]
LIBRARY OF CONGRESS, 1990. LC classification outline. Library of Congress, Washington, DC.
[120]
LEBOWITZ, M. 1987. Experiments with incremental concept formation. Mach. Learn. 2, 103-138.
[121]
LEE, H.-Y. AND ONG, H.-L. 1996. Visualization support for data mining. IEEE Expert 11, 5 (Oct.), 69-75.
[122]
LEE, R. C. T., SLAGLE, J. R., AND MONG, C. T. 1978. Towards automatic auditing of records. IEEE Trans. Softw. Eng. 4, 441- 448.
[123]
LEE, R. C. T. 1981. Cluster analysis and its applications. In Advances in Information Systems Science, J. T. Tou, Ed. Plenum Press, New York, NY.
[124]
LI, C. AND BISWAS, G. 1995. Knowledge-based scientific discovery in geological databases. In Proceedings of the First International Conference on Knowledge Discovery and Data Mining (Montreal, Canada, Aug. 20-21), 204 -209.
[125]
Lu, S. Y. AND FU, K. S. 1978. A sentence-tosentence clustering procedure for pattern analysis. IEEE Trans. Syst. Man Cybern. 8, 381-389.
[126]
LUNDERVOLD, A., FENSTAD, A. M., ERSLAND, L., AND TAXT, T. 1996. Brain tissue volumes from multispectral 3D MRI: A comparative study of four classifiers. In Proceedings of the Conference of the Society on Magnetic Resonance,
[127]
MAAREK, Y. S. AND BEN SHAUL, I. Z. 1996. Automatically organizing bookmarks per contents. In Proceedings of the Fifth International Conference on the World Wide Web (Paris, May), http://www5conf.inria.fr/fichhtml/paper-sessions.html.
[128]
MCQUEEN, J. 1967. Some methods for classification and analysis of multivariate observations. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, 281-297.
[129]
MAO, J. AND JAIN, A.K. 1992. Texture classification and segmentation using multiresolution simultaneous autoregressive models. Pattern Recogn. 25, 2 (Feb. 1992), 173-188.
[130]
MAO, J. AND JAIN, A.K. 1995. Artificial neural networks for feature extraction and multivariate data projection. IEEE Trans. Neural Netw. 6, 296-317.
[131]
MAO, J. AND JAIN, A.K. 1996. A self-organizing network for hyperellipsoidal clustering (HEC). IEEE Trans. Neural Netw. 7, 16-29.
[132]
MEVINS, A.J. 1995. A branch and bound incremental conceptual clusterer. Mach. Learn. 18, 5-22.
[133]
MICHALSKI, R., STEPP, R. E., AND DIDAY, E. 1981. A recent advance in data analysis: Clustering objects into classes characterized by conjunctive concepts. In Progress in Pattern Recognition, Vol. 1, L. Kanal and A. Rosenfeld, Eds. North-Holland Publishing Co., Amsterdam, The Netherlands.
[134]
MICHALSKI, R., STEPP, R. E., AND DIDAY, E. 1983. Automated construction of classifications: conceptual clustering versus numerical taxonomy. IEEE Trans. Pattern Anal. Mach. Intell. PAMI-5, 5 (Sept.), 396-409.
[135]
MISHRA, S. K. AND RAGHAVAN, V. V. 1994. An empirical study of the performance of heuristic methods for clustering. In Pattern Recognition in Practice, E. S. Gelsema and L. N. Kanal, Eds. 425-436.
[136]
MITCHELL, T. 1997. Machine Learning. McGraw- Hill, Inc., New York, NY.
[137]
MOHIUDDIN, K. M. AND MAO, g. 1994. A comparative study of different classifiers for handprinted character recognition. In Pattern Recognition in Practice, E. S. Gelsema and L. N. Kanal, Eds. 437-448.
[138]
MOOR, B.K. 1988. ART 1 and Pattern Clustering. In 1988 Connectionist Summer School, Morgan Kaufmann, San Mateo, CA, 174-185.
[139]
MURTAGH, F. 1984. A survey of recent advances in hierarchical clustering algorithms which use cluster centers. Comput. J. 26, 354-359.
[140]
MURTY, M. N. AND KRISHNA, G. 1980. A computationally efficient technique for data clustering. Pattern Recogn. 12, 153-158.
[141]
MURTY, M. N. AND JAIN, A.K. 1995. Knowledgebased clustering scheme for collection management and retrieval of library books. Pattern Recogn. 28, 949-964.
[142]
NAGY, G. 1968. State of the art in pattern recognition. Proc. IEEE 56, 836-862.
[143]
NG, R. AND HAN, J. 1994. Very large data bases. In Proceedings of the 20th International Conference on Very Large Data Bases (VLDB'94, Santiago, Chile, Sept.), VLDB Endowment, Berkeley, CA, 144-155.
[144]
NGUYEN, H. H. AND COHEN, P. 1993. Gibbs random fields, fuzzy clustering, and the unsupervised segmentation of textured images. CV- GIP: Graph. Models Image Process. 55, 1 (Jan. 1993), 1-19.
[145]
OEHLER, K. L. AND GRAY, R. M. 1995. Combining image compression and classification using vector quantization. IEEE Trans. Pattern Anal. Mach. Intell. 17, 461-473.
[146]
OJA, E. 1982. A simplified neuron model as a principal component analyzer. Bull. Math. Bio. 15, 267-273.
[147]
OZAWA, K. 1985. A stratificational overlapping cluster scheme. Pattern Recogn. 18, 279-286.
[148]
OPEN TEXT, 1999. http://index.opentext.net.
[149]
KAMGAR-PARSI, B., GUALTIERI, J. A., DEVANEY, J. A., AND KAMGAR-PARSI, K. 1990. Clustering with neural networks. Biol. Cybern. 63, 201-208.
[150]
LYCOS, 1999. http://www.lycos.com.
[151]
PAL, N. R., BEZDEK, J. C., AND TSAO, E. C.-K. 1993. Generalized clustering networks and Kohonen's self-organizing scheme. IEEE Trans. Neural Netw. 4, 549-557.
[152]
QUINLAN, J. R. 1990. Decision trees and decision making. IEEE Trans. Syst. Man Cybern. 20, 339-346.
[153]
RAGHAVAN, V. V. AND BIRCHAND, K. 1979. A clustering strategy based on a formalism of the reproductive process in a natural system. In Proceedings of the Second International Conference on Information Storage and Retrieval, 10-22.
[154]
RAGHAVAN, V. V. AND YU, C.T. 1981. A comparison of the stability characteristics of some graph theoretic clustering methods. IEEE Trans. Pattern Anal. Mach. Intell. 3, 393-402.
[155]
RASMUSSEN, E. 1992. Clustering algorithms. In Information Retrieval: Data Structures and Algorithms, W. B. Frakes and R. Baeza-Yates, Eds. Prentice-Hall, Inc., Upper Saddle River, NJ, 419-442.
[156]
RICH, E. 1983. ArtificialIntelligence. McGraw- Hill, Inc., New York, NY.
[157]
RIPLEY, B. D., Ed. 1989. Statistical Inference for Spatial Processes. Cambridge University Press, New York, NY.
[158]
ROSE, K., GUREWITZ, E., AND FOX, G. C. 1993. Deterministic annealing approach to constrained clustering. IEEE Trans. Pattern Anal. Mach. Intell. 15, 785-794.
[159]
ROSENFELD, A. AND KAK, A.C. 1982. Digital Picture Processing. 2nd ed. Academic Press, Inc., New York, NY.
[160]
ROSENFELD, A., SCHNEIDER, V. B., AND HUANG, M. K. 1969. An application of cluster detection to text and picture processing. IEEE Trans. Inf. Theor. 15, 6, 672-681.
[161]
Ross, G. J. S. 1968. Classification techniques for large sets of data. In Numerical Taxonomy, A. J. Cole, Ed. Academic Press, Inc., New York, NY.
[162]
RuSPINI, E.H. 1969. A new approach to clustering. Inf. Control 15, 22-32.
[163]
SALTON, G. 1991. Developments in automatic text retrieval. Science 253, 974-980.
[164]
SAMAL, A. AND IYENGAR, P.A. 1992. Automatic recognition and analysis of human faces and facial expressions: A survey. Pattern Recogn. 25, 1 (Jan. 1992), 65-77.
[165]
SAMMON, J. W. JR. 1969. A nonlinear mapping for data structure analysis. IEEE Trans. Comput. 18, 401-409.
[166]
SANGAL, R. 1991. Programming Paradigms in LISP. McGraw-Hill, Inc., New York, NY.
[167]
SCHACHTER, B. J., DAVIS, L. S., AND ROSENFELD, A. 1979. Some experiments in image segmentation by clustering of local feature values. Pattern Recogn. 11, 19-28.
[168]
SCHWEFEL, H.P. 1981. Numerical Optimization of Computer Models. John Wiley and Sons, Inc., New York, NY.
[169]
SELIM, S. Z. AND ISMAIL, M.A. 1984. K-meanstype algorithms: A generalized convergence theorem and characterization of local optimality. IEEE Trans. Pattern Anal. Mach. Intell. 6, 81-87.
[170]
SELIM, S. Z. AND ALSULTAN, K. 1991. A simulated annealing algorithm for the clustering problem. Pattern Recogn. 24, 10 (1991), 1003-1008.
[171]
SEN, A. AND SRIVASTAVA, M. 1990. Regression Analysis. Springer-Verlag, New York, NY.
[172]
SETHI, I. AND JAIN, A. K., Eds. 1991. Artificial Neural Networks and Pattern Recognition: Old and New Connections. Elsevier Science Inc., New York, NY.
[173]
SHEKAR, B., MURTY, N. M., AND KRISHNA, G. 1987. A knowledge-based clustering scheme. Pattern Recogn. Lett. 5, 4 (Apr. 1, 1987), 253- 259.
[174]
SILVERMAN, J. F. AND COOPER, D. B. 1988. Bayesian clustering for unsupervised estimation of surface and texture models. IEEE Trans. Pattern Anal. Mach. Intell. 10, 4 (July 1988), 482-495.
[175]
SIMOUDIS, E. 1996. Reality check for data mining. IEEE Expert 11, 5 (Oct.), 26-33.
[176]
SLAGLE, J. R., CHANG, C. L., AND HELLER, S. R. 1975. A clustering and data-reorganizing algorithm. IEEE Trans. Syst. Man Cybern. 5, 125-128.
[177]
SNEATH, P. H. A. AND SOKAL, R. R. 1973. Numerical Taxonomy. Freeman, London, UK.
[178]
SPATH, H. 1980. Cluster Analysis Algorithms for Data Reduction and Classification. Ellis Horwood, Upper Saddle River, NJ.
[179]
SOLBERG, A., TAXT, T., AND JAIN, A. 1996. A Markov random field model for classification of multisource satellite imagery. IEEE Trans. Geoscience and Remote Sensing 34, 1, 100-113.
[180]
SRIVASTAVA, A. AND MURTY, M. N 1990. A comparison between conceptual clustering and conventional clustering. Pattern Recogn. 23, 9 (1990), 975-981.
[181]
STAHL, H. 1986. Cluster analysis of large data sets. In Classification as a Tool of Research, W. Gaul and M. Schader, Eds. Elsevier North-Holland, Inc., New York, NY, 423-430.
[182]
STEPP, R. E. AND MICHALSKI, R. S. 1986. Conceptual clustering of structured objects: A goal-oriented approach. Artif. Intell. 28, 1 (Feb. 1986), 43-69.
[183]
SUTTON, M., STARK, L., AND BOWYER, K. 1993. Function-based generic recognition for multiple object categories. In Three-Dimensional Object Recognition Systems, A. Jain and P. J. Flynn, Eds. Elsevier Science Inc., New York, NY.
[184]
SYMON, M. J. 1977. Clustering criterion and multi-variate normal mixture. Biometrics 77, 35-43.
[185]
TANAKA, E. 1995. Theoretical aspects of syntactic pattern recognition. Pattern Recogn. 28, 1053-1061.
[186]
TAXT, T. AND LUNDERVOLD, A. 1994. Multispectral analysis of the brain using magnetic resonance imaging. IEEE Trans. Medical Imaging 13, 3, 470-481.
[187]
TITTERINGTON, D. M., SMITH, A. F. M., AND MAKOV, U.E. 1985. Statistical Analysis of Finite Mixture Distributions. John Wiley and Sons, Inc., New York, NY.
[188]
TOUSSAINT, G. T. 1980. The relative neighborhood graph of a finite planar set. Pattern Recogn. 12, 261-268.
[189]
TRIER, O. D. AND JAIN, A. K. 1995. Goaldirected evaluation of binarization methods. IEEE Trans. Pattern Anal. Mach. Intell. 17, 1191-1201.
[190]
UCHIYAMA, T. AND ARBIB, M.A. 1994. Color image segmentation using competitive learning. IEEE Trans. Pattern Anal. Mach. Intell. 16, 12 (Dec. 1994), 1197-1206.
[191]
URQUHART, R.B. 1982. Graph theoretical clustering based on limited neighborhood sets. Pattern Recogn. 15, 173-187.
[192]
VENKATESWARLU, N. B. AND RAJU, P. S. V. S. K. 1992. Fast ISODATA clustering algorithms. Pattern Recogn. 25, 3 (Mar. 1992), 335-342.
[193]
VINOD, V. V., CHAUDHURY, S., MUKHERJEE, J., AND GHOSE, S. 1994. A connectionist approach for clustering with applications in image analysis. IEEE Trans. Syst. Man Cybern. 24, 365-384.
[194]
WAH, B. W., Ed. 1996. Special section on mining of databases. IEEE Trans. Knowl. Data Eng. (Dec.).
[195]
WARD, J. H. JR. 1963. Hierarchical grouping to optimize an objective function. J. Am. Stat. Assoc. 58, 236-244.
[196]
WATANABE, S. 1985. Pattern Recognition: Human and Mechanical. John Wiley and Sons, Inc., New York, NY.
[197]
WESZKA, J. 1978. A survey of threshold selection techniques. Pattern Recogn. 7, 259-265.
[198]
WHITLEY, D., STARKWEATHER, T., AND FUQUAY, D. 1989. Scheduling problems and traveling salesman: the genetic edge recombination. In Proceedings of the Third International Conference on Genetic Algorithms (George Mason University, June 4-7), J. D. Schaffer, Ed. Morgan Kaufmann Publishers Inc., San Francisco, CA, 133-140.
[199]
WILSON, D. R. AND MARTINEZ, T. R. 1997. Improved heterogeneous distance functions. J. Artif Intell. Res. 6, 1-34.
[200]
Wu, Z. AND LEAHY, R. 1993. An optimal graph theoretic approach to data clustering: Theory and its application to image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 15, 1101-1113.
[201]
WULFEKUHLER, M. AND PUNCH, W. 1997. Finding salient features for personal web page categories. In Proceedings of the Sixth International Conference on the World Wide Web (Santa Clara, CA, Apr.), http://theory, stanford.edu/people/ wass/publications/Web Search/Web Search.html.
[202]
ZADEH, L.A. 1965. Fuzzy sets. Inf. Control 8, 338 -353.
[203]
ZAHN, C. T. 1971. Graph-theoretical methods for detecting and describing gestalt clusters. IEEE Trans. Comput. C-20 (Apr.), 68-86.
[204]
ZHANG, K. 1995. Algorithms for the constrained editing distance between ordered labeled trees and related problems. Pattern Recogn. 28, 463-474.
[205]
ZHANG, J. AND MICHALSKI, R.S. 1995. An integration of rule induction and exemplar-based learning for graded concepts. Mach. Learn. 21, 3 (Dec. 1995), 235-267.
[206]
ZHANG, T., RAMAKRISHNAN, R., AND LIVNY, M. 1996. BIRCH: An efficient data clustering method for very large databases. SIGMOD Rec. 25, 2, 103-114.
[207]
ZUPAN, J. 1982. Clustering of Large Data Sets. Research Studies Press Ltd., Taunton, UK.

Cited By

View all
  • (2024)Comprehensive analysis of clustering algorithms: exploring limitations and innovative solutionsPeerJ Computer Science10.7717/peerj-cs.228610(e2286)Online publication date: 29-Aug-2024
  • (2024)A clustering effectiveness measurement model based on merging similar clustersPeerJ Computer Science10.7717/peerj-cs.186310(e1863)Online publication date: 29-Feb-2024
  • (2024)Bibliometric Analysis and Systematic Review of Infill Development Based on Mapping the Scientific Fields StructureJournal of Researches in Islamic Architecture10.61186/jria.12.1.712:1(0-0)Online publication date: 1-Mar-2024
  • Show More Cited By

Recommendations

Reviews

Jose M. Ramirez

Data clustering is not defined the same way in each of the disciplines that use it to deal with problems that involve the extraction of information or structure from data. The authors have produced a good survey of this slippery topic. They devote a considerable amount of space to presenting clustering techniques from the perspective of several disciplines, including fuzzy systems, neural networks, and searching. The section of definitions and notations is weak: it is just a glossary of terms, with no context provided. It would have been better to define the terms when they were needed for each technique described. References are numerous, as expected in a survey, but are not annotated sufficiently to enable readers to define a research plan on a given aspect. In some places the paper does not reflect the state of the art in the use of clustering, as, for examples in neural networks and fuzzy systems. One cause of this weakness is the problem of dealing with a multidisciplinary subject whose advances are reported in a wide range of journals and proceedings. The other cause, namely the extremely long review process, is completely out of the authors' control: the paper was received in March 1997, but accepted in January 1999.

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Information & Contributors

Information

Published In

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 September 1999
Published in CSUR Volume 31, Issue 3

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. cluster analysis
  2. clustering applications
  3. exploratory data analysis
  4. incremental clustering
  5. similarity indices
  6. unsupervised learning

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)5,807
  • Downloads (Last 6 weeks)616
Reflects downloads up to 22 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Comprehensive analysis of clustering algorithms: exploring limitations and innovative solutionsPeerJ Computer Science10.7717/peerj-cs.228610(e2286)Online publication date: 29-Aug-2024
  • (2024)A clustering effectiveness measurement model based on merging similar clustersPeerJ Computer Science10.7717/peerj-cs.186310(e1863)Online publication date: 29-Feb-2024
  • (2024)Bibliometric Analysis and Systematic Review of Infill Development Based on Mapping the Scientific Fields StructureJournal of Researches in Islamic Architecture10.61186/jria.12.1.712:1(0-0)Online publication date: 1-Mar-2024
  • (2024)MODELO DE AUTOENCODER COM ENSEMBLE LEARNING E CLUSTERIZAÇÃO PARA DETECÇÃO DE INTRUSÃO EM REDESRevista Contemporânea10.56083/RCV4N6-2234:6(e4910)Online publication date: 28-Jun-2024
  • (2024)Clustering Techniques in Data Mining: A Survey of Methods, Challenges, and ApplicationsVeri Madenciliğinde Kümeleme Teknikleri: Yöntemler, Zorluklar ve Uygulamalar Üzerine Bir AraştırmaComputer Science10.53070/bbd.1421527Online publication date: 5-Mar-2024
  • (2024)A Novel Study on IoT and Machine Learning-Based TransportationMachine Learning Techniques and Industry Applications10.4018/979-8-3693-5271-7.ch001(1-28)Online publication date: 3-May-2024
  • (2024)Beyond SupervisionRecent Trends and Future Direction for Data Analytics10.4018/979-8-3693-3609-0.ch007(170-196)Online publication date: 12-Jul-2024
  • (2024)Discovering the Micro-Clusters From a Group of DHH LearnersTransforming Education for Personalized Learning10.4018/979-8-3693-0868-4.ch010(159-175)Online publication date: 26-Apr-2024
  • (2024)IMPROVEMENT OF INCIDENT MANAGEMENT MODEL USING MACHINE LEARNING METHODSMokslas - Lietuvos ateitis10.3846/mla.2024.2163316(1-6)Online publication date: 5-Jan-2024
  • (2024)Research on Water Resource Modeling Based on Machine Learning TechnologiesWater10.3390/w1603047216:3(472)Online publication date: 31-Jan-2024
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media