Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Heuristic clustering of database objects according to multi-valued attributes

  • Object Oriented Databases II
  • Conference paper
  • First Online:
Database and Expert Systems Applications (DEXA 1997)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1308))

Included in the following conference series:

Abstract

This paper studies clustering of objects on the basis of set-valued attributes, so that objects in the same cluster share as many attribute values as possible. Our primary application is clustering of objects participating in a many-to-many relationship. Since the precise optimum is computationally too hard to find, a relatively simple and fast heuristic is developed. The sets of attribute values are represented by non-unique, fixed-size signatures, which constitute the basis for clustering. Objects to be clustered are stored on the leaf pages of a binary tree, where each internal node contains a pair of signatures directing the search for a suitable leaf. The core of the method is a page splitting algorithm, which tries to combine two endeavours: enhance clustering and keep the tree balanced. In a random case, there is little chance for beneficial clustering. However, the dependencies and correlations between real-life objects enable us to achieve notable increase of performance.

This work was supported by the Academy of Finland

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Banerjee, J., Kim, W., Kim, S-J., and Garza, J.F.: “Clustering a DAG for CAD Databases”, IEEE Trans. Softw. Eng. 14(11), 1988, 1684–1699.

    Google Scholar 

  2. Bentley, J.L.: “Multidimensional Search Trees Used for Associative Searching”, Comm. of the ACM 18(9), 1975, 509–517.

    Google Scholar 

  3. Chang, E.E., and Katz, R.H.: “Exploiting Inheritance and Structure Semantics for Effective Clustering and Buffering in an Object-Oriented DBMS”, Proc. ACM SIGMOD, 1989, 348–357.

    Google Scholar 

  4. Cheng, J.R., and Hurson, A.R.: “Effective Clustering of Complex Objects in Object-Oriented Databases”, Proc. ACM SIGMOD, 1991, 22–31.

    Google Scholar 

  5. Christodoulakis, S., and Faloutsos, C.: “Signature Files: An Access Method for Documents and Its Analytical Performance Evaluation”, ACM Trans. Office Inf. Syst. 2(4), 1984, 267–288.

    Google Scholar 

  6. Christodoulakis, S., et al.: “Multimedia Document Presentation, Information Extraction and Document Formation in MINOS: A Model and a System”, ACM Trans. Office Inf. Syst. 4(4), 1986, 345–383.

    Google Scholar 

  7. Deppisch, U.: “S-Tree: A Dynamic Balanced Signature Index for Office Retrieval”, Proc. of ACM Conf. on Res. and Dev. in Inf. Retrieval, 1986, 77–87.

    Google Scholar 

  8. Gusfield, D.: “Efficient Algorithms for Inferring Evolutionary Trees”, Networks 21(1), 1991, 19–28.

    Google Scholar 

  9. Jagadish, H.V.: “Linear Clustering of Objects with Multiple Attributes”, Proc. ACM SIGMOD, 1990, 332–342.

    Google Scholar 

  10. Kernighan, B.W., and Lin, S.: “An Efficient Heuristic Procedure for Partitioning Graphs”, Bell System Techn. J. 49(2), 1970, 291–307.

    Google Scholar 

  11. Lee, D.L., Kim., Y.M., and Patel, G.: “Efficient Signature File Methods for Text Retrieval”, IEEE Trans. Knowl. and Data Eng. 7(3), 1995, 423–435.

    Google Scholar 

  12. Nievergeld, J., Hinterberger, H., and Sevcik, K.C.: “The Grid File: An Adaptable, Symmetric Multikey File Structure”, ACM Trans. Database Syst. 9(1), 1984, 38–71.

    Google Scholar 

  13. Sacks-Davis, R., Kent, A., and Ramamohanarao, K.: “Multikey Access Methods Based on Superimposed Coding Techniques”, ACM Trans. Database Syst. 12(4), 1987, 655–696.

    Google Scholar 

  14. Schkolnick, M.: “A Clustering Algorithm for Hierarchical Structures”, ACM Trans. Database Syst. 2(1), 1977, 27–44.

    Google Scholar 

  15. Teuhola, J.: “Clustering of Shared Subobjects in Databases”, Proc. Int. Conf. on Inf. Syst. and Manag. of Data (CISMOD), New Delhi, 1993, 175–188.

    Google Scholar 

  16. Tsangaris, M.M., and Naughton, J.F.: “A Stochastic Approach for Clustering in Object Bases”, Proc. ACM SIGMOD, 1991, 12–21.

    Google Scholar 

  17. Tsangaris, M.M., and Naughton, J.F.: “On the Performance of Object Clustering Techniques”, Proc. ACM SIGMOD, 1992, 144–153.

    Google Scholar 

  18. Willard, D.E.: “Multidimensional Search Trees that Provide New Types of Memory Reductions”, Journal of ACM 34(4), 1987, 846–858.

    Google Scholar 

  19. Yu, C.T., Suen, C-M., Lam, K., and Siu, M.K.: “Adaptive Record Clustering”, ACM Trans. Database Syst. 10(2), 1985, 180–204.

    Google Scholar 

  20. Zezula, P., Rabitti, F., and Tiberio, P.: “Dynamic Partitioning of Signature Files”, ACM Trans. Inf. Syst. 9(4), 1991, 336–369.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Abdelkader Hameurlain A Min Tjoa

Rights and permissions

Reprints and permissions

Copyright information

© 1997 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Teuhola, J. (1997). Heuristic clustering of database objects according to multi-valued attributes. In: Hameurlain, A., Tjoa, A.M. (eds) Database and Expert Systems Applications. DEXA 1997. Lecture Notes in Computer Science, vol 1308. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0022028

Download citation

  • DOI: https://doi.org/10.1007/BFb0022028

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-63478-2

  • Online ISBN: 978-3-540-69580-6

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics