Abstract
This paper studies clustering of objects on the basis of set-valued attributes, so that objects in the same cluster share as many attribute values as possible. Our primary application is clustering of objects participating in a many-to-many relationship. Since the precise optimum is computationally too hard to find, a relatively simple and fast heuristic is developed. The sets of attribute values are represented by non-unique, fixed-size signatures, which constitute the basis for clustering. Objects to be clustered are stored on the leaf pages of a binary tree, where each internal node contains a pair of signatures directing the search for a suitable leaf. The core of the method is a page splitting algorithm, which tries to combine two endeavours: enhance clustering and keep the tree balanced. In a random case, there is little chance for beneficial clustering. However, the dependencies and correlations between real-life objects enable us to achieve notable increase of performance.
This work was supported by the Academy of Finland
Preview
Unable to display preview. Download preview PDF.
References
Banerjee, J., Kim, W., Kim, S-J., and Garza, J.F.: “Clustering a DAG for CAD Databases”, IEEE Trans. Softw. Eng. 14(11), 1988, 1684–1699.
Bentley, J.L.: “Multidimensional Search Trees Used for Associative Searching”, Comm. of the ACM 18(9), 1975, 509–517.
Chang, E.E., and Katz, R.H.: “Exploiting Inheritance and Structure Semantics for Effective Clustering and Buffering in an Object-Oriented DBMS”, Proc. ACM SIGMOD, 1989, 348–357.
Cheng, J.R., and Hurson, A.R.: “Effective Clustering of Complex Objects in Object-Oriented Databases”, Proc. ACM SIGMOD, 1991, 22–31.
Christodoulakis, S., and Faloutsos, C.: “Signature Files: An Access Method for Documents and Its Analytical Performance Evaluation”, ACM Trans. Office Inf. Syst. 2(4), 1984, 267–288.
Christodoulakis, S., et al.: “Multimedia Document Presentation, Information Extraction and Document Formation in MINOS: A Model and a System”, ACM Trans. Office Inf. Syst. 4(4), 1986, 345–383.
Deppisch, U.: “S-Tree: A Dynamic Balanced Signature Index for Office Retrieval”, Proc. of ACM Conf. on Res. and Dev. in Inf. Retrieval, 1986, 77–87.
Gusfield, D.: “Efficient Algorithms for Inferring Evolutionary Trees”, Networks 21(1), 1991, 19–28.
Jagadish, H.V.: “Linear Clustering of Objects with Multiple Attributes”, Proc. ACM SIGMOD, 1990, 332–342.
Kernighan, B.W., and Lin, S.: “An Efficient Heuristic Procedure for Partitioning Graphs”, Bell System Techn. J. 49(2), 1970, 291–307.
Lee, D.L., Kim., Y.M., and Patel, G.: “Efficient Signature File Methods for Text Retrieval”, IEEE Trans. Knowl. and Data Eng. 7(3), 1995, 423–435.
Nievergeld, J., Hinterberger, H., and Sevcik, K.C.: “The Grid File: An Adaptable, Symmetric Multikey File Structure”, ACM Trans. Database Syst. 9(1), 1984, 38–71.
Sacks-Davis, R., Kent, A., and Ramamohanarao, K.: “Multikey Access Methods Based on Superimposed Coding Techniques”, ACM Trans. Database Syst. 12(4), 1987, 655–696.
Schkolnick, M.: “A Clustering Algorithm for Hierarchical Structures”, ACM Trans. Database Syst. 2(1), 1977, 27–44.
Teuhola, J.: “Clustering of Shared Subobjects in Databases”, Proc. Int. Conf. on Inf. Syst. and Manag. of Data (CISMOD), New Delhi, 1993, 175–188.
Tsangaris, M.M., and Naughton, J.F.: “A Stochastic Approach for Clustering in Object Bases”, Proc. ACM SIGMOD, 1991, 12–21.
Tsangaris, M.M., and Naughton, J.F.: “On the Performance of Object Clustering Techniques”, Proc. ACM SIGMOD, 1992, 144–153.
Willard, D.E.: “Multidimensional Search Trees that Provide New Types of Memory Reductions”, Journal of ACM 34(4), 1987, 846–858.
Yu, C.T., Suen, C-M., Lam, K., and Siu, M.K.: “Adaptive Record Clustering”, ACM Trans. Database Syst. 10(2), 1985, 180–204.
Zezula, P., Rabitti, F., and Tiberio, P.: “Dynamic Partitioning of Signature Files”, ACM Trans. Inf. Syst. 9(4), 1991, 336–369.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1997 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Teuhola, J. (1997). Heuristic clustering of database objects according to multi-valued attributes. In: Hameurlain, A., Tjoa, A.M. (eds) Database and Expert Systems Applications. DEXA 1997. Lecture Notes in Computer Science, vol 1308. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0022028
Download citation
DOI: https://doi.org/10.1007/BFb0022028
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-63478-2
Online ISBN: 978-3-540-69580-6
eBook Packages: Springer Book Archive