Heuristic clustering of database objects according to multi-valued attributes

Teuhola, Jukka

doi:10.1007/BFb0022028

Jukka Teuhola¹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1308))

Included in the following conference series:

International Conference on Database and Expert Systems Applications

116 Accesses
1 Citations

Abstract

This paper studies clustering of objects on the basis of set-valued attributes, so that objects in the same cluster share as many attribute values as possible. Our primary application is clustering of objects participating in a many-to-many relationship. Since the precise optimum is computationally too hard to find, a relatively simple and fast heuristic is developed. The sets of attribute values are represented by non-unique, fixed-size signatures, which constitute the basis for clustering. Objects to be clustered are stored on the leaf pages of a binary tree, where each internal node contains a pair of signatures directing the search for a suitable leaf. The core of the method is a page splitting algorithm, which tries to combine two endeavours: enhance clustering and keep the tree balanced. In a random case, there is little chance for beneficial clustering. However, the dependencies and correlations between real-life objects enable us to achieve notable increase of performance.

This work was supported by the Academy of Finland

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Banerjee, J., Kim, W., Kim, S-J., and Garza, J.F.: “Clustering a DAG for CAD Databases”, IEEE Trans. Softw. Eng. 14(11), 1988, 1684–1699.
Google Scholar
Bentley, J.L.: “Multidimensional Search Trees Used for Associative Searching”, Comm. of the ACM 18(9), 1975, 509–517.
Google Scholar
Chang, E.E., and Katz, R.H.: “Exploiting Inheritance and Structure Semantics for Effective Clustering and Buffering in an Object-Oriented DBMS”, Proc. ACM SIGMOD, 1989, 348–357.
Google Scholar
Cheng, J.R., and Hurson, A.R.: “Effective Clustering of Complex Objects in Object-Oriented Databases”, Proc. ACM SIGMOD, 1991, 22–31.
Google Scholar
Christodoulakis, S., and Faloutsos, C.: “Signature Files: An Access Method for Documents and Its Analytical Performance Evaluation”, ACM Trans. Office Inf. Syst. 2(4), 1984, 267–288.
Google Scholar
Christodoulakis, S., et al.: “Multimedia Document Presentation, Information Extraction and Document Formation in MINOS: A Model and a System”, ACM Trans. Office Inf. Syst. 4(4), 1986, 345–383.
Google Scholar
Deppisch, U.: “S-Tree: A Dynamic Balanced Signature Index for Office Retrieval”, Proc. of ACM Conf. on Res. and Dev. in Inf. Retrieval, 1986, 77–87.
Google Scholar
Gusfield, D.: “Efficient Algorithms for Inferring Evolutionary Trees”, Networks 21(1), 1991, 19–28.
Google Scholar
Jagadish, H.V.: “Linear Clustering of Objects with Multiple Attributes”, Proc. ACM SIGMOD, 1990, 332–342.
Google Scholar
Kernighan, B.W., and Lin, S.: “An Efficient Heuristic Procedure for Partitioning Graphs”, Bell System Techn. J. 49(2), 1970, 291–307.
Google Scholar
Lee, D.L., Kim., Y.M., and Patel, G.: “Efficient Signature File Methods for Text Retrieval”, IEEE Trans. Knowl. and Data Eng. 7(3), 1995, 423–435.
Google Scholar
Nievergeld, J., Hinterberger, H., and Sevcik, K.C.: “The Grid File: An Adaptable, Symmetric Multikey File Structure”, ACM Trans. Database Syst. 9(1), 1984, 38–71.
Google Scholar
Sacks-Davis, R., Kent, A., and Ramamohanarao, K.: “Multikey Access Methods Based on Superimposed Coding Techniques”, ACM Trans. Database Syst. 12(4), 1987, 655–696.
Google Scholar
Schkolnick, M.: “A Clustering Algorithm for Hierarchical Structures”, ACM Trans. Database Syst. 2(1), 1977, 27–44.
Google Scholar
Teuhola, J.: “Clustering of Shared Subobjects in Databases”, Proc. Int. Conf. on Inf. Syst. and Manag. of Data (CISMOD), New Delhi, 1993, 175–188.
Google Scholar
Tsangaris, M.M., and Naughton, J.F.: “A Stochastic Approach for Clustering in Object Bases”, Proc. ACM SIGMOD, 1991, 12–21.
Google Scholar
Tsangaris, M.M., and Naughton, J.F.: “On the Performance of Object Clustering Techniques”, Proc. ACM SIGMOD, 1992, 144–153.
Google Scholar
Willard, D.E.: “Multidimensional Search Trees that Provide New Types of Memory Reductions”, Journal of ACM 34(4), 1987, 846–858.
Google Scholar
Yu, C.T., Suen, C-M., Lam, K., and Siu, M.K.: “Adaptive Record Clustering”, ACM Trans. Database Syst. 10(2), 1985, 180–204.
Google Scholar
Zezula, P., Rabitti, F., and Tiberio, P.: “Dynamic Partitioning of Signature Files”, ACM Trans. Inf. Syst. 9(4), 1991, 336–369.
Google Scholar

Download references

Author information

Authors and Affiliations

Turku Centre for Computer Science (TUCS), University of Turku, Lemminkäisenkatu 14 A, FIN-20520, Turku, Finland
Jukka Teuhola

Authors

Jukka Teuhola
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Abdelkader Hameurlain A Min Tjoa

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Teuhola, J. (1997). Heuristic clustering of database objects according to multi-valued attributes. In: Hameurlain, A., Tjoa, A.M. (eds) Database and Expert Systems Applications. DEXA 1997. Lecture Notes in Computer Science, vol 1308. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0022028

Download citation

DOI: https://doi.org/10.1007/BFb0022028
Published: 10 June 2005
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-63478-2
Online ISBN: 978-3-540-69580-6
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics