Abstract
Hartigan (1972) discusses the direct clustering of a matrix of data into homogeneous blocks. He introduces a stepwise divisive method for block clustering within a certain class of block structures which induce clustering trees for both row and column margins. While this class of structures is appealing, the stopping criterion for his method, which is based on asymptotic theory and the assumption that the individual elements of the data matrix are normally distributed, is quite restrictive. In this paper we propose a permutation-based algorithm for block clustering within the same class of block structures. By using permutation arguments to decide where to split and when to stop, our algorithm becomes applicable in a wide variety of cases, including matrices of categorical data and matrices of small-to-moderate size. In addition, our algorithm offers considerable flexibility in how block homogeneity is defined. The algorithm is studied in a series of simulation experiments on matrices of known structure, and illustrated in examples drawn from the fields of taxonomy, political science, and data architecture.
Similar content being viewed by others
References
ADELSON, R. M., nORMAN, J. M., and LAPORTE, G. (1976), “A Dynamic Programming Formulation With Diverse Applications,”Operational Research Quaterly, 27, 119–121.
ALDOUS, D. (1987), “On the Markov Chain Simulation Method for Uniform Combinatorial Distributions and Simulated Annealing,”Probability in the Engineering and Informational Sciences, 1, 33–46.
ARABIE, P., BOORMAN, S. A. and LEVITT, P. R. (1978), “Constructing Blockmodels: How and Why,”Journal of Mathematical Psychology, 17, 21–63.
ARABIE, P., and BOORMAN, S. A. (1982), “Blockmodels; Development and Prospects,” inClassifying Social Data: New Applications of Analytic Methods for Social Science Research, Eds., Herschel C. Hudson and Associates, San Francisco: Jossey-Bass Publisher, Chapter 11.
BILLINGSLEY, P. (1968),Convergence of Probability Measures, New York: Wiley.
BOCK, H. H. (1979), “Simultaneous Clustering of Objects and Variables,” inAnalyse de Données et Informatique. Le Chesnay (France): Institut National de Recherche en Informatique et en Automatique, 187–203.
BREIGER, R. L., BOORMAN, S. A., and ARABIE, P. (1975), “An Algorithm for Clustering Relational Data With Applications to Social Network Analysis and Comparison with Multidimensional Scaling,”Journal of Matheamtical Psychology, 12, 328–383.
BREIMAN, L., FRIEDMAN, J. H., OLSHEN, R., and STONE, C. J. (1984),Classification and Regression Trees, Belmont, CA: Wadsworth:
DE SOETE, G., DESARBO, W. S., FURNAS, G. W., and CARROLL, J. D. (1984), “The Estimation of Ultrametric and Path Length Trees From Rectangular Proximity Data,”Psychometrika, 49 (3), 289–310.
DEUTSCH, S. B. and MARTIN J. J. (1971), “An Ordering Algorithm for Analysis of Data Arrays,”Operations Research, 19, 1350–1362.
DIACONIS, P., and SHASHAHANI, M. (1981), “Generating a Random Permutation with Random Transposition,”Zeitschrift fuer Wahrscheinlichkeitstheorie and Verwandte Gebiete, 57, 159–179.
DUFFY, D. E., and KEMPERMAN, J. H. B. (1990), “Entropy-Based Splitting Criteria,” Morristown, NJ: Bell Communications Research Technical memorandum, in preparation.
DUFFY, D. E., FOWLKES, E. B., and KANE, L. D. (1987), “Cluster Analysis in Strategic Data Architecture Design,” in 1987Bellcore Database Symposium. Morristown, NJ: Bell Communications Research, Inc., 175–186.
EDELSBRUNNER, H. (1987),Algorithms in Combinatorial Geometry, New York: Springer-Verlag.
GILULA, Z. (1986), “Grouping and Association in Contingency Tables: An Exploratory Canonical Correlation Approach,”Journal of the American Statistical Association, 81, 773–779.
GOODMAN, L. A. (1981), “Critieria for Determining Whether Certain Categories in a Cross-Classification Table Should be Combined With Special Reference to Occupation Categories in an Occupational Mobility Table,”American Journal of Sociology, 87, 612–650.
GOVAERT, G. (1977), “Algorithme de Classification d'un Tableau de Contingence,”Proceedings of the First International Symposium on Data Analysis and Informatics, 2, Le Chesnay (France): Institut National de Recherche en Informatique et Automatique, 487–500.
GREENACRE, M. J. (1988), “Clustering the Rows and Columns of a Contingency Table,”Journal of Classification, 5, 39–51.
HARTIGAN, J. A. (1972), “Direct Clustering of a Data Matrix,”Journal of the American Statistical Association, 6, 123–129.
HARTIGAN, J. A. (1975),Clustering Algorithms, New York: Wiley.
HARTIGAN, J. A. (1976), “Modal Blocks in Dentition of West Coast Mammals,”Systematic Zoology, 25, 149–160.
HEISER, W. J., and MEULMAN, J. (1983), “Analyzing Rectangular Tables by Joint and Constrained Multidimensional Scalling,”Journal of Econometrics, 22, 139–167.
HILL, M. O. (1974), “Correspondence Analysis: A Neglected Multivariate Method,”Applied Statistics, 23, 340–354.
HOLLAND, P. W., and LEINHARDT, S. (1981), “An Exponential Family of Probability Distributions for Directed Graphs (with discussion),”Journal of the American Statistical Association, 76, 33–65.
HUBERT, L. J. (1974), “Problems of Seriation Using a Subject by Item Response Matrix,”Psychological Bulletin, 81, 976–983.
HUBERT, L. J., and GOLLEDGE, R. G. (1981), “Matrix Reorganization and Dynamic Programming: Applications to Paired Comparisons and Unidimensional Seriation,”Psychometrika, 46, 429–441.
LAMBERT, J. M. and WILLIAMS, W. T. (1962), “Multivariate Methods in Plant Ecology IV. Nodal Analysis,”Journal of Ecology, 50, 775–802.
MEHTA, C. R., PATEL, N. R., and GRAY, R. (1985), “Computing an Exact Confidence Interval for the Common Odds Ratio in Several 2×2 Contingency Tables,”Journal of the American Statistical Association, 8, 969–973.
SCHMID, B. (1984), “Niche Width and Variation Within and Between Populations in Colonizing Species (Carex flava group)”Oecologia (Berlin),63, 1–5.
SOKAL, R. R., and SNEATH, P. H. A. (1963),Principles of Numerical Taxonomy, San Francisco: Freeman.
WANG, Y. J., and WONG, G. Y. (1987), “Stochastic Blockmodels for Directed Graphs,”Journal of the American Statistical Association, 82, 8–19.
WONG, G. Y. (1987), “Bayesian Models for Directed Graphs,”Journal of the American Statistical Association, 82, 140–148.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Duffy, D.E., Quiroz, a.J. A permutation-based algorithm for block clustering. Journal of Classification 8, 65–91 (1991). https://doi.org/10.1007/BF02616248
Issue Date:
DOI: https://doi.org/10.1007/BF02616248