A sampling-based exact algorithm for the solution of the minimax diameter clustering problem

Aloise, Daniel; Contardo, Claudio

doi:10.1007/s10898-018-0634-1

A sampling-based exact algorithm for the solution of the minimax diameter clustering problem

Published: 26 March 2018

Volume 71, pages 613–630, (2018)
Cite this article

Journal of Global Optimization Aims and scope Submit manuscript

Daniel Aloise¹ &
Claudio Contardo²

492 Accesses
2 Altmetric
Explore all metrics

Abstract

We consider the problem of clustering a set of points so as to minimize the maximum intra-cluster dissimilarity, which is strongly NP-hard. Exact algorithms for this problem can handle datasets containing up to a few thousand observations, largely insufficient for the nowadays needs. The most popular heuristic for this problem, the complete-linkage hierarchical algorithm, provides feasible solutions that are usually far from optimal. We introduce a sampling-based exact algorithm aimed at solving large-sized datasets. The algorithm alternates between the solution of an exact procedure on a small sample of points, and a heuristic procedure to prove the optimality of the current solution. Our computational experience shows that our algorithm is capable of solving to optimality problems containing more than 500,000 observations within moderate time limits, this is two orders of magnitude larger than the limits of previous exact methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

DIDES: a fast and effective sampling for clustering algorithm

Article 30 April 2016

Estimating the Clustering Coefficient Using Sample Complexity Analysis

Approximate Algorithms for Some Maximin Clustering Problems

References

Alcock, R., Manolopoulos, Y.: Time-series similarity queries employing a feature-based approach. In: 7th Hellenic Conference on Informatics, Ioannina, Greece, pp. 27–29 (1999)
Alpert, C.J., Kahng, A.B.: Splitting an ordering into a partition to minimize diameter. J. Classif. 14, 51–74 (1997)
Article MathSciNet MATH Google Scholar
Anderberg, M.R.: Cluster Analysis for Applications/Michael R. Anderberg. Academic Press, New York (1973)
MATH Google Scholar
Blackard, J.A.: Comparison of neural networks and discriminant analysis in predicting forest cover types. Ph.D. thesis, Colorado State University (1998)
Bradley, P.S., Fayyad, U.M., Reina, C.: Scaling clustering algorithms to large databases. In: KDD’98 proceedings of the fourth international conference on knowledge discovery and data mining, pp. 9–15 (1998)
Brusco, M.J., Stahl, S.: Branch-and-Bound Applications in Combinatorial Data Analysis. Springer, New York (2006)
MATH Google Scholar
Dao, T.B.H., Duong, K.C., Vrain, C.: Constrained clustering by constraint programming. Artif. Intell. 244, 70–94 (2017)
Article MathSciNet MATH Google Scholar
Defays, D.: An efficient algorithm for a complete link method. Comput. J. 20(4), 364–366 (1977)
Article MathSciNet MATH Google Scholar
Delattre, M., Hansen, P.: Bicriterion cluster analysis. IEEE Trans. Pattern Anal. Mach. Intell. 4, 277–291 (1980)
Article MATH Google Scholar
Duarte, M., Hu, Y.H.: Vehicle classification in distributed sensor networks. J. Parallel Distrib. Comput. 64, 826–838 (2004)
Article Google Scholar
Fioruci, J.A.A., Toledo, F.M., Nascimento, M.A.C.V.: Heuristics for minimizing the maximum within-clusters distance. Pesquisa Operacional 32, 497–522 (2012)
Article Google Scholar
Fraley, C., Raftery, A., Wehrens, R.: Incremental model-based clustering for large datasets with small clusters. J. Comput. Graph. Stat. 14(3), 529–546 (2005)
Article Google Scholar
Garey, M.R., Johnson, D.S.: Computers and Intractability: A Guide to NP-Completeness. WH Freeman, New York (1979)
MATH Google Scholar
Gonzalez, T.F.: Clustering to minimize the maximum intercluster distance. Theor. Comput. Sci. 38, 293–306 (1985)
Article MathSciNet MATH Google Scholar
Hansen, P., Delattre, M.: Complete-link cluster analysis by graph coloring. J. Am. Stat. Assoc. 73(362), 397–403 (1978)
Article MATH Google Scholar
Johnson, S.C.: Hierarchical clustering schemes. Psychometrika 32(3), 241–254 (1967)
Article MATH Google Scholar
Kahraman, H.T., Sagiroglu, S., Colak, I.: Developing intuitive knowledge classifier and modeling of users’ domain dependent data in web. Knowl. Based Syst. 37, 283–295 (2013)
Article Google Scholar
Kaufman, L., Rousseeuw, P.J.: Finding Groups in Data : An Introduction to Cluster Analysis. Wiley Series in Probability and Mathematical Statistics, Wiley, New York (1990)
Book MATH Google Scholar
Lichman, M.: UCI machine learning repository (2013). http://archive.ics.uci.edu/ml. Accessed 27 Feb 2018
Lozano, L., Smith, J.C.: A backward sampling framework for interdiction problems with fortification. INFORMS J. Comput. 29(1), 123–139 (2017)
Article MathSciNet MATH Google Scholar
Östergård, P.R.: A fast algorithm for the maximum clique problem. Discrete Appl. Math. 120(1), 197–207 (2002)
Article MathSciNet Google Scholar
Prokhorov, D.: IJCNN 2001 neural network competition. Slide presentation in IJCNN, 1, 97 (2001)
Sibson, R.: SLINK: an opoptimal efficient algorithm for the single-link cluster method. Comput. J. 16, 30–34 (1973)
Article MathSciNet Google Scholar
Siebert, J.P.: Vehicle recognition using rule based methods. Research Memorandum TIRM-87-018, Turing Institute (1987)
Sørensen, T.: A method of establishing groups of equal amplitude in plant sociology based on similarity of species and its application to analyses of the vegetation on danish commons. Biol Skr 5, 1–34 (1948)
Google Scholar
Torgo, L.: Regression datasets (2009). http://www.dcc.fc.up.pt/~ltorgo/Regression/DataSets.html. Accessed 27 Feb 2018
Ugulino, W., Cardador, D., Vega, K., Velloso, E., Milidiu, R., Fuks, H.: Wearable computing: Accelerometers’ data classification of body postures and movements. In: Proceedings of 21st Brazilian Symposium on Artificial Intelligence, Springer, Berlin/Heidelberg, Lecture Notes in Computer Science, pp. 52–61 (2012)
Uzilov, A.V., Keegan, J.M., Mathews, D.H.: Detection of non-coding rnas on the basis of predicted secondary structure formation free energy change. BMC Bioinform. 7, 173 (2006)
Article Google Scholar
Zhang, T., Ramakrishnan, R., Livny, M.: Birch: a new data clustering algorithm and its applications. Data Min. Knowl. Discrete 1(2), 141–182 (1997)
Article Google Scholar

Download references

Acknowledgements

This research was financed by the Fonds de recherche du Québec - Nature et technologies (FRQNT) under grant no 181909 and by the Natural Sciences and Engineering Research Council of Canada (NSERC) under grants 435824-2013 and 2017-05617. These supports are gratefully acknowledged.

Author information

Authors and Affiliations

Département de génie informatique et génie logiciel, École Polytechnique de Montréal, Montreal, QC, Canada
Daniel Aloise
Département de management et technologie, ESG UQÀM, Montreal, QC, Canada
Claudio Contardo

Authors

Daniel Aloise
View author publications
You can also search for this author in PubMed Google Scholar
Claudio Contardo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Daniel Aloise.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Aloise, D., Contardo, C. A sampling-based exact algorithm for the solution of the minimax diameter clustering problem. J Glob Optim 71, 613–630 (2018). https://doi.org/10.1007/s10898-018-0634-1

Download citation

Received: 11 January 2017
Accepted: 27 February 2018
Published: 26 March 2018
Issue Date: July 2018
DOI: https://doi.org/10.1007/s10898-018-0634-1

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A sampling-based exact algorithm for the solution of the minimax diameter clustering problem

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

DIDES: a fast and effective sampling for clustering algorithm

Estimating the Clustering Coefficient Using Sample Complexity Analysis

Approximate Algorithms for Some Maximin Clustering Problems

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

A sampling-based exact algorithm for the solution of the minimax diameter clustering problem

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

DIDES: a fast and effective sampling for clustering algorithm

Estimating the Clustering Coefficient Using Sample Complexity Analysis

Approximate Algorithms for Some Maximin Clustering Problems

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation