Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/182591.182605acmconferencesArticle/Chapter ViewAbstractPublication PagespodsConference Proceedingsconference-collections
Article
Free access

A decomposition-based simulated annealing technique for data clustering

Published: 24 May 1994 Publication History

Abstract

It has been demonstrated that simulated annealing provides high-quality results for the data clustering problem. However, existing simulated annealing schemes are memory-based algorithms; they are not suited for solving large problems such as data clustering which typically are too big to fit in the memory space in its entirety. Various buffer replacement policies, assuming either temporal or spatial locality, are not useful in this case since simulated annealing is based on a randomized search process. Poor locality of references will cause the memory to thrash because too many replacements are required. This phenomenon will incur excessive disk accesses and force the machine to run at the speed of the I/O subsystem. In this paper, we formulate the data clustering problem as a graph partition problem (GPP), and propose a decomposition-based approach to address the issue of excessive disk accesses during annealing. We apply the statistical sampling technique to randomly select subgraphs of the GPP into memory for annealing. Both the analytical and experimental studies indicate that the decomposition-based approach can dramatically reduce the costly disk I/O activities while obtaining excellent optimized results.

References

[1]
Emile Aarts and Jan Korst. Simulated anneal- #ng and Boltzmann Machines: a stochastic approach to combinatomal optim#zat2on and neural computing. Wiley-Interscience Series in Discrete Mathematics and Optimization. John Wiley & Sons, 1989.
[2]
E.tt.L. Aarts and P.J.M. van Laarhoven. A new polynomial time cooling schedule. In Proc. IEEE Int. Conf. on Computer-Azded Design, pages 206-208, 1985.
[3]
D.A. Bell, F.J. McErlean, and P.M. Stewart. Application of simulated annealing to clustering tuples in databases. J. of the Amemcan Society for Information Science, 41(2):98-110, 1990.
[4]
W. Feller. An Introduction to Probability Theory and Its Apphcations, 1. Wiley, 1950.
[5]
Y.E. Ioannidis and Y.C. Kang. Left-deep vs. bushy trees: An analysis of strategy spaces and its implications for query optimization. In A CM- SIGMOD, pages 168-177, 1991.
[6]
S. Kirkpatrick, C. D. Gelatt, and Jr. M. P. Vecchi. Optimization by simulated annealing. Science, 220(4598):671-680, 1983.
[7]
B.W. Kernighan #nd S. Lin. An efficient heuristic procedure for partitioning graphs. Bell System Techn,cal Journal, 49(2):291-307, 1970.
[8]
J. Larn and J. Delosme. Simulated annealing: A fast heuristic for some generic layout problems. In IEEE International Conference on Computer Aided Design, pages 510-513, 1988.
[9]
R.S.G. Lanzelotte and P. VMduriez. Extending the search strategy in query optimizer. In Proceed,ngs of 17th International Conference on Very Large Data Bases, pages 363-373, September 1991.
[10]
David W. Murray and Bernard F. Buxton. Scene segmentation from visual motion using global optimization. IEEE Trans. on Pattern Analysis and Mach,ne Intelhgence, 9(2):220-228, 1987.
[11]
F. J. McErlean, D. A. Bell, and S. I. Me- Clean. The use of simulated annealing for clustering data in databases. Information Systems, 15(2):233-245, 1990.
[12]
D. Mitra, F. Romeo, and A. Sangiovanni- Vincentelli. Convergence and finite-time behavior of simulated annealing. Adv. Appl. Prob., 18:747-771, 1986.
[13]
C. Sechen and Sangiovanni-Vincentelli. The TimberWolf placement and routing package. IEEE J. Sohd State C#rcu#ts, 20(2):510-522, 1985.
[14]
A. Swami. Optimization of large join queries: Combining heuristics and combinatorial techniques. In ACM-SIGMOD, pages 367-376, 1989.
[15]
M.M. Tsangaris and J. F. Naughton. A stochastic approach for clustering in object bases. In 1991 ACM SIGMOD, pages 12-21, 1991.
[16]
M.M. Tsangaris and J. F. N aughton. On the performance of object clustering techniques. In 1992 ACM SIGMOD, pages 144-153, 1992.
[17]
P.J.M. van Laarhoven. Theoretical and computatzonal aspects of s#mulated anneahng. CWI (Centrum voor Wiskunde en Informatica, Center for Mathematics and Computer Science), Amsterdam, The Netherlands, 1988.
[18]
P.J.M. van Laarhoven and E.H.L. Aarts. Szmulated Annealing: Theory and Apphcat,ons. Reidel, 1988.

Cited By

View all
  • (2011)BibliographyData Clustering in C++10.1201/b10814-28(469-486)Online publication date: 10-May-2011
  • (2007)Physical Database DesignundefinedOnline publication date: 21-Mar-2007
  • (2004)An overview on MEMS-based storage, its research issues and open problemsProceedings of the international workshop on Storage network architecture and parallel I/Os10.1145/1162628.1162635(48-57)Online publication date: 30-Sep-2004
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
PODS '94: Proceedings of the thirteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
May 1994
313 pages
ISBN:0897916425
DOI:10.1145/182591
  • Chairman:
  • Victor Vianu
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 May 1994

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Article

Conference

SIGMOD/PODS94

Acceptance Rates

PODS '94 Paper Acceptance Rate 28 of 117 submissions, 24%;
Overall Acceptance Rate 642 of 2,707 submissions, 24%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)16
  • Downloads (Last 6 weeks)6
Reflects downloads up to 03 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2011)BibliographyData Clustering in C++10.1201/b10814-28(469-486)Online publication date: 10-May-2011
  • (2007)Physical Database DesignundefinedOnline publication date: 21-Mar-2007
  • (2004)An overview on MEMS-based storage, its research issues and open problemsProceedings of the international workshop on Storage network architecture and parallel I/Os10.1145/1162628.1162635(48-57)Online publication date: 30-Sep-2004
  • (2004)Stochastic clustering for organizing distributed information sourcesIEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics10.1109/TSMCB.2004.83359934:5(2035-2047)Online publication date: 1-Oct-2004
  • (2004)A simulated annealing approach for multimedia data placementJournal of Systems and Software10.1016/j.jss.2003.09.02073:3(467-480)Online publication date: 1-Nov-2004
  • (2003)Hierarchical data placement for navigational multimedia applicationsData & Knowledge Engineering10.1016/S0169-023X(02)00124-644:1(49-80)Online publication date: 1-Jan-2003
  • (2002)Automating physical database design in a parallel databaseProceedings of the 2002 ACM SIGMOD international conference on Management of data10.1145/564691.564757(558-569)Online publication date: 3-Jun-2002
  • (2002)Video data storage policies: an access frequency based approachComputers & Electrical Engineering10.1016/S0045-7906(00)00068-928:6(447-464)Online publication date: Nov-2002
  • (2000)Affinity-Based Probabilistic Reasoning and Document Clustering on the WWW24th International Computer Software and Applications Conference10.5555/645982.674928(149-154)Online publication date: 25-Oct-2000
  • (2000)A Tool for Nesting and Clustering Large ObjectsProceedings of the 12th International Conference on Scientific and Statistical Database Management10.1109/SSDM.2000.869786Online publication date: 26-Jul-2000
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media