Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

MKL-tree: an index structure for high-dimensional vector spaces

  • Regular Paper
  • Published:
Multimedia Systems Aims and scope Submit manuscript

Abstract

In this work, a novel hierarchical data structure for high dimensional data indexing is proposed. MKL-tree is based on dimensionality reduction operated by means of the MKL transform, a multi-space generalization of the KL transform. A local dimensionality reduction is performed at each node of the tree, allowing more selective features to be extracted and thus increasing the discriminating power of the index. The mathematical foundation for nodes and leaves representation and for the techniques aimed to manage the structure is detailed. Moreover, the algorithms for bulk loading MKL-tree (i.e., for creating the tree given a large number of objects simultaneously), for updating and splitting nodes after the insertion of new objects and for performing similarity searches are described. Results are reported for the comparison of MKL-tree with other well-known access methods in terms of I/O and CPU costs and precision of the result in the execution of similarity queries.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Aggarwal, C.C., Procopiuc, C., Wolf, J.L., Yu, P.S., Park, J.S.: Fast algorithms for projected clustering. In: Proceedings of ACM SIGMOD Conference on Management of Data, pp. 61–72 (1999)

  2. Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P.: Automatic subspace clustering of high dimensional data for data mining applications. In: Proceedings of ACM SIGMOD Conference on Management of Data, pp. 94–105 (1998)

  3. Bellmann, R.: Adaptive Control Process: A Guided Tour. Princeton University Press (1961)

  4. Berchtold, S., Boehm, C., Jagadish, H.V., Kriegel, H.P., Sander, J.: Independent quantization: an index compression technique for high-dimensional data spaces. In: Proceedings of the IEEE Data Engineering, pp. 577–588 (2000)

  5. Berchtold, S., Keim, D.A., Kriegel, H.P.: The X-tree: an index structure for high-dimensional data. In: Proceedings of International Conference on Very Large Data Base, pp. 28–39 (1996)

  6. Bhm, C., Berchtold, S., Keim, D.A.: Searching in high dimensional spaces: index structures for improving the performance of multimedia databases. ACM Compu. Surv. (2001)

  7. Böhm, C., Kriegel, H.P.: Efficient bulk loading of large high-dimensional indexes. In: Proceedings International Conference on Data Warehousing and Knowledge Discovery, pp. 251–260 (1999)

  8. Cappelli R., Maio D., Maltoni D. (2001) Multi-space KL for pattern recognition and classification. IEEE Trans. Pattern Anal. Machine Intell. 23(9):977–996

    Article  Google Scholar 

  9. Cappelli, R., Maio, D., Maltoni, D.: Similarity search using multi-space KL. In: Proceedings of International Workshop on Database and Expert Systems Applications, pp.155–160 (1999)

  10. Castelli V., Thomasian A., Li C.S. (2003) CSVD: clustering and singular value decomposition for approximate similarity search in high-dimensional spaces. IEEE Trans. Knowl Data Eng. 15, 671–685

    Article  Google Scholar 

  11. Cha G.H., Chung C.W. (2002) The GC-tree: A high-dimensional index structure for similarity search in image databases. IEEE Trans. Multimed. 4(2):235–247

    Article  Google Scholar 

  12. Cha G.H., Zhu X., Petkovic D., Chung C.W. (2002) An efficient indexing method for nearest neighbor searches in high-dimensional image databases. IEEE Trans. Multimed. 4(1):76–87

    Article  Google Scholar 

  13. Chakrabarti, K., Mehrotra, S.: Local dimensionality reduction: A new approach to indexing high dimensional spaces. In: Proceedings of the International Conference on Very Large Data Base, pp. 89–100 (2000)

  14. Chakrabarti, K., Mehrotra, S.: The hybrid-tree: an index structure for high-dimensional feature spaces. In: Proceedings of the IEEE International Conference on Data Engineering, pp. 440–447 (1999)

  15. Cui, B., Ooi, B.C., Su, J.W., Tan, K.L.: Contorting high dimensional data for efficient main memory processing. In: Proceedings of the ACM SIGMOD International Conference On Management of Data, pp. 479–490 (2003)

  16. Ferhatosmanoglu, H., Tuncel, E., Agrawal, D., El Abbadi, A.: Vector approximation based indexing for non-uniform high dimensional data sets. In: Proceedings of the ACM International Conference on Information and Knowledge Management, pp. 202–209 (2000)

  17. Figueiredo M., Jain A.K. (2002) Unsupervised learning of finite mixture models. IEEE Trans. Pattern Anal. Machine Intell. 24(3):381–396

    Article  Google Scholar 

  18. Franco, A., Lumini, A., Maio, D.: Eigenspace merging for model updating. In: Proceedings of the International Conference on Pattern Recognition, 2, 156–159 (2002)

  19. Fukunaga K. (1990) Statistical Pattern Recognition. Academic, San Diego

    MATH  Google Scholar 

  20. Gaede, V., Günther, O.: Multidimensional access methods. ACM Comput. Surv. 30(2), (1998)

  21. Gray J., Reuter A. (1993) Transaction Processing: Concepts and Techniques. Morgan Kaufmann, San Mateo, CA

    MATH  Google Scholar 

  22. Guttman, A.: R-trees: A dynamic index structure for spatial searching. In: Proceedings of ACM SIGMOD International Conference on Management of Data, pp. 47–57 (1984)

  23. Jolliffe I.T. (1986) Principal component analysis. Springer, Berlin Heidelberg New York

    Google Scholar 

  24. Kamel, I., Faloutsos, C.: Hilbert R-tree: an improved R-tree using fractals. In: Proceedings of International Conference on Very Large Data Base, pp. 500–509

  25. Kanth, K.V., Agrawal, D., Singh, A.: Dimensionality for similarity searching in dynamic databases. In: Proceedings ACM SIGMOD International Conference on Management of Data, pp. 166–176 (1998)

  26. Katayama, N., Satoh, S.: The SR-tree: an index structure for high dimensional nearest neighbor queries. In: Proceedings of SIGMOD International Conference on Management of Data, pp. 369–380 (1997)

  27. Kelly, P.M.: An algorithm for merging hyperellipsoidal clusters. Technical Report LA-UR-94-306, Los Alamos National Laboratory, Los Alamos, NM (1994)

  28. Kumar A. (1994) G-Tree: A new data structure for organizing multidimensional data. IEEE Trans. Knowl Data Eng. 6, 341–347

    Article  Google Scholar 

  29. LDR: http://www.ics.uci.edu/~kaushik/research/ldr.html

  30. Li C., Molina H.G., Wiederhold G. (2002) Clustering for approximate similarity search in high-dimensional spaces. IEEE Trans. Knowl Data Eng. 14, 792–808

    Article  Google Scholar 

  31. Lin K., Jagadish H., Faloutsos C. (1994) The TV-tree: an index structure for high-dimensional data. VLDB J 3(4):517–542

    Article  Google Scholar 

  32. MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Berkeley Symposium on Mathematical statistics and probability, vol. 1, pp. 281–297, Berkeley University of California Press (1967)

  33. Nievergelt J., Hinterberger H., Sevcik K.C. (1984) The grid file: an adaptable symmetric multikey file structure. ACM Trans. Database Syst. 9(1):38–71

    Article  Google Scholar 

  34. Ortega, M., Rui, Y., Chakrabarti, K., Mehrotra, S., Huang, T.S.: Supporting similarity queries in MARS. In: Proceedings of the ACM Confernece on Multimed. 403–413 (1997)

  35. Sakurai, Y., Yoshikawa, M., Uemura, S., Kojima, H.: The A-tree: an index structure for high-dimensional spaces using relative approximation. In: Proceedings of the International Conference on Very Large Data Base, pp. 516–526 (2000)

  36. Salomon D. (1997) Data Compression: The Complete Reference. Springer, Berlin Heidelberg New York

    MATH  Google Scholar 

  37. Samet H. (1990) The Design and Analysis of Spatial Data Structures. Addison Wesley, New York

    Google Scholar 

  38. Van den Bercken, J., Seeger, B., Widmayer, P.: A generic approach to bulk loading multidimensional index structures. In: Proceedings of the International Conference on Very Large Data Base, Athens, pp. 406–415 (1997)

  39. Weber, R., Schek, H.J., Blott, S.: A quantitative analysis and performance study for Similarity-search methods in high-dimensional spaces. In: Proceedings of the International Conference on Very Large Data Base, pp. 194–205 (1998)

  40. White, D.A., Jain, R.: Similarity indexing with the SS-tree. In: Proceedings of the International Conference on Data Engineering, pp. 516–523 (1996)

  41. Yu D., Zhang A. (2003) ClusterTree: integration of clustering representation and nearest-neighbor search for large data sets with high dimensions. IEEE Trans Knowl. Data Eng. 15:1316–1337

    Article  MathSciNet  Google Scholar 

  42. Zhang, T., Ramakrishnan, R., Livny, M.: BIRCH: a new data clustering algorithm and its applications. Data Min. Knowl. Discov. 1(2), (1997)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Annalisa Franco.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Franco, A., Lumini, A. & Maio, D. MKL-tree: an index structure for high-dimensional vector spaces. Multimedia Systems 12, 533–550 (2007). https://doi.org/10.1007/s00530-006-0070-9

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00530-006-0070-9

Keywords