Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Advertisement

A sampling-based exact algorithm for the solution of the minimax diameter clustering problem

  • Published:
Journal of Global Optimization Aims and scope Submit manuscript

Abstract

We consider the problem of clustering a set of points so as to minimize the maximum intra-cluster dissimilarity, which is strongly NP-hard. Exact algorithms for this problem can handle datasets containing up to a few thousand observations, largely insufficient for the nowadays needs. The most popular heuristic for this problem, the complete-linkage hierarchical algorithm, provides feasible solutions that are usually far from optimal. We introduce a sampling-based exact algorithm aimed at solving large-sized datasets. The algorithm alternates between the solution of an exact procedure on a small sample of points, and a heuristic procedure to prove the optimality of the current solution. Our computational experience shows that our algorithm is capable of solving to optimality problems containing more than 500,000 observations within moderate time limits, this is two orders of magnitude larger than the limits of previous exact methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  1. Alcock, R., Manolopoulos, Y.: Time-series similarity queries employing a feature-based approach. In: 7th Hellenic Conference on Informatics, Ioannina, Greece, pp. 27–29 (1999)

  2. Alpert, C.J., Kahng, A.B.: Splitting an ordering into a partition to minimize diameter. J. Classif. 14, 51–74 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  3. Anderberg, M.R.: Cluster Analysis for Applications/Michael R. Anderberg. Academic Press, New York (1973)

    MATH  Google Scholar 

  4. Blackard, J.A.: Comparison of neural networks and discriminant analysis in predicting forest cover types. Ph.D. thesis, Colorado State University (1998)

  5. Bradley, P.S., Fayyad, U.M., Reina, C.: Scaling clustering algorithms to large databases. In: KDD’98 proceedings of the fourth international conference on knowledge discovery and data mining, pp. 9–15 (1998)

  6. Brusco, M.J., Stahl, S.: Branch-and-Bound Applications in Combinatorial Data Analysis. Springer, New York (2006)

    MATH  Google Scholar 

  7. Dao, T.B.H., Duong, K.C., Vrain, C.: Constrained clustering by constraint programming. Artif. Intell. 244, 70–94 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  8. Defays, D.: An efficient algorithm for a complete link method. Comput. J. 20(4), 364–366 (1977)

    Article  MathSciNet  MATH  Google Scholar 

  9. Delattre, M., Hansen, P.: Bicriterion cluster analysis. IEEE Trans. Pattern Anal. Mach. Intell. 4, 277–291 (1980)

    Article  MATH  Google Scholar 

  10. Duarte, M., Hu, Y.H.: Vehicle classification in distributed sensor networks. J. Parallel Distrib. Comput. 64, 826–838 (2004)

    Article  Google Scholar 

  11. Fioruci, J.A.A., Toledo, F.M., Nascimento, M.A.C.V.: Heuristics for minimizing the maximum within-clusters distance. Pesquisa Operacional 32, 497–522 (2012)

    Article  Google Scholar 

  12. Fraley, C., Raftery, A., Wehrens, R.: Incremental model-based clustering for large datasets with small clusters. J. Comput. Graph. Stat. 14(3), 529–546 (2005)

    Article  Google Scholar 

  13. Garey, M.R., Johnson, D.S.: Computers and Intractability: A Guide to NP-Completeness. WH Freeman, New York (1979)

    MATH  Google Scholar 

  14. Gonzalez, T.F.: Clustering to minimize the maximum intercluster distance. Theor. Comput. Sci. 38, 293–306 (1985)

    Article  MathSciNet  MATH  Google Scholar 

  15. Hansen, P., Delattre, M.: Complete-link cluster analysis by graph coloring. J. Am. Stat. Assoc. 73(362), 397–403 (1978)

    Article  MATH  Google Scholar 

  16. Johnson, S.C.: Hierarchical clustering schemes. Psychometrika 32(3), 241–254 (1967)

    Article  MATH  Google Scholar 

  17. Kahraman, H.T., Sagiroglu, S., Colak, I.: Developing intuitive knowledge classifier and modeling of users’ domain dependent data in web. Knowl. Based Syst. 37, 283–295 (2013)

    Article  Google Scholar 

  18. Kaufman, L., Rousseeuw, P.J.: Finding Groups in Data : An Introduction to Cluster Analysis. Wiley Series in Probability and Mathematical Statistics, Wiley, New York (1990)

    Book  MATH  Google Scholar 

  19. Lichman, M.: UCI machine learning repository (2013). http://archive.ics.uci.edu/ml. Accessed 27 Feb 2018

  20. Lozano, L., Smith, J.C.: A backward sampling framework for interdiction problems with fortification. INFORMS J. Comput. 29(1), 123–139 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  21. Östergård, P.R.: A fast algorithm for the maximum clique problem. Discrete Appl. Math. 120(1), 197–207 (2002)

    Article  MathSciNet  Google Scholar 

  22. Prokhorov, D.: IJCNN 2001 neural network competition. Slide presentation in IJCNN, 1, 97 (2001)

  23. Sibson, R.: SLINK: an opoptimal efficient algorithm for the single-link cluster method. Comput. J. 16, 30–34 (1973)

    Article  MathSciNet  Google Scholar 

  24. Siebert, J.P.: Vehicle recognition using rule based methods. Research Memorandum TIRM-87-018, Turing Institute (1987)

  25. Sørensen, T.: A method of establishing groups of equal amplitude in plant sociology based on similarity of species and its application to analyses of the vegetation on danish commons. Biol Skr 5, 1–34 (1948)

    Google Scholar 

  26. Torgo, L.: Regression datasets (2009). http://www.dcc.fc.up.pt/~ltorgo/Regression/DataSets.html. Accessed 27 Feb 2018

  27. Ugulino, W., Cardador, D., Vega, K., Velloso, E., Milidiu, R., Fuks, H.: Wearable computing: Accelerometers’ data classification of body postures and movements. In: Proceedings of 21st Brazilian Symposium on Artificial Intelligence, Springer, Berlin/Heidelberg, Lecture Notes in Computer Science, pp. 52–61 (2012)

  28. Uzilov, A.V., Keegan, J.M., Mathews, D.H.: Detection of non-coding rnas on the basis of predicted secondary structure formation free energy change. BMC Bioinform. 7, 173 (2006)

    Article  Google Scholar 

  29. Zhang, T., Ramakrishnan, R., Livny, M.: Birch: a new data clustering algorithm and its applications. Data Min. Knowl. Discrete 1(2), 141–182 (1997)

    Article  Google Scholar 

Download references

Acknowledgements

This research was financed by the Fonds de recherche du Québec - Nature et technologies (FRQNT) under grant no 181909 and by the Natural Sciences and Engineering Research Council of Canada (NSERC) under grants 435824-2013 and 2017-05617. These supports are gratefully acknowledged.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Daniel Aloise.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Aloise, D., Contardo, C. A sampling-based exact algorithm for the solution of the minimax diameter clustering problem. J Glob Optim 71, 613–630 (2018). https://doi.org/10.1007/s10898-018-0634-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10898-018-0634-1

Keywords