Abstract
An algorithm for generating artificial data sets which contain distinct nonoverlapping clusters is presented. The algorithm is useful for generating test data sets for Monte Carlo validation research conducted on clustering methods or statistics. The algorithm generates data sets which contain either 1, 2, 3, 4, or 5 clusters. By default, the data are embedded in either a 4, 6, or 8 dimensional space. Three different patterns for assigning the points to the clusters are provided. One pattern assigns the points equally to the clusters while the remaining two schemes produce clusters of unequal sizes. Finally, a number of methods for introducing error in the data have been incorporated in the algorithm.
Similar content being viewed by others
References
Bayne, C. K., Beauchamp, J. J., Begovich, C. L., & Kane, V. E. (1980). Monte Carlo comparisons of selected clustering procedures.Pattern Recognition, 12, 51–62.
Blashfield, R. K. (1976). Mixture model test of cluster analysis: Accuracy of four agglomerative hierarchical methods.Psychological Bulletin, 83, 377–388.
Blashfield, R. K., & Morey, L. C. (1980). A comparison of four clustering methods using MMPI Monte Carlo data.Applied Psychological Measurement, 4, 57–64.
Cormack, R. M. (1971). A review of classification.Journal of the Royal Statistical Society (Series A),14, 279–298.
Dubes, R., & Jain, A. K. (1979). Validity studies in clustering methodologies.Pattern Recognition, 11, 235–254.
Edelbrock, C. (1979). Comparing the accuracy of hierarchical grouping techniques: The problem of classifying everybody.Multivariate Behavioral Research, 14, 367–384.
Everitt, B. S. (1980).Cluster analysis (2nd ed.). London: Halstead Press.
Hartigan, J. A. (1975).Clustering algorithms. New York: Wiley.
Kuiper, F. K., & Fisher, L. (1975). A Monte Carlo comparison of six clustering procedures.Biometrika, 31, 86–101.
Milligan, G. W. (1980). An examination of the effect of six types of error perturbation on fifteen clustering algorithms.Psychometrika, 45, 325–342.
Milligan, G. W. (1981a). A Monte Carlo study of thirty internal criterion measures for cluster analysis.Psychometrika, 46, 187–199.
Milligan, G. W. (1981b). A review of Monte Carlo tests of cluster analysis.Multivariate Behavioral Research, 16, 379–407.
Milligan, G. W., & Cooper, M. C. (in press). An examination of procedures for determining the number of clusters in a data set.Psychometrika, 50.
Milligan, G. W., & Isaac, P. D. (1980). The validation of four ultrametric clustering algorithms.Pattern Recognition, 12, 41–50.
Milligan, G. W., & Mahajan, V. (1980). A note on procedures for testing the quality of a clustering of a set of objects.Decision Sciences, 11, 669–677.
Milligan, G. W., & Schilling, D. A. (in press). Asymptotic and Finite Sample Characteristics of Four External Criterion Measures.Multivariate Behavioral Research.
Milligan, G. W., Soon, S. C., & Sokol, L. M. (1983). The effect of cluster size, dimensionality, and the number of clusters on recovery of true cluster structure.IEEE Transactions on Pattern Analysis and Machine Intelligence, 5, 40–47.
Mojena, R. (1977). Hierarchical grouping methods and stopping rules: An evaluation.The Computer Journal, 20, 359–363.
Morey, L., & Agresti, A. (1984). The measurement of classification agreement: An adjustment to the Rand statistic for chance agreement.Educational and Psychological Measurement, 44, 33–37.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Milligan, G.W. An algorithm for generating artificial test clusters. Psychometrika 50, 123–127 (1985). https://doi.org/10.1007/BF02294153
Issue Date:
DOI: https://doi.org/10.1007/BF02294153