MKL-tree: an index structure for high-dimensional vector spaces

Franco, Annalisa; Lumini, Alessandra; Maio, Dario

doi:10.1007/s00530-006-0070-9

MKL-tree: an index structure for high-dimensional vector spaces

Regular Paper
Published: 09 November 2006

Volume 12, pages 533–550, (2007)
Cite this article

Multimedia Systems Aims and scope Submit manuscript

Annalisa Franco¹,
Alessandra Lumini¹ &
Dario Maio²

99 Accesses
Explore all metrics

Abstract

In this work, a novel hierarchical data structure for high dimensional data indexing is proposed. MKL-tree is based on dimensionality reduction operated by means of the MKL transform, a multi-space generalization of the KL transform. A local dimensionality reduction is performed at each node of the tree, allowing more selective features to be extracted and thus increasing the discriminating power of the index. The mathematical foundation for nodes and leaves representation and for the techniques aimed to manage the structure is detailed. Moreover, the algorithms for bulk loading MKL-tree (i.e., for creating the tree given a large number of objects simultaneously), for updating and splitting nodes after the insertion of new objects and for performing similarity searches are described. Results are reported for the comparison of MKL-tree with other well-known access methods in terms of I/O and CPU costs and precision of the result in the execution of similarity queries.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The Design Perspective of the Structures Based on k-d Tree

Application of the nonlinear dimensionality reduction method with the use of reference nodes for solving the problem of multidimensional data search

Article 01 July 2015

Index Structures for Fast Similarity Search for Real-Valued Vectors. I

Article 27 January 2018

References

Aggarwal, C.C., Procopiuc, C., Wolf, J.L., Yu, P.S., Park, J.S.: Fast algorithms for projected clustering. In: Proceedings of ACM SIGMOD Conference on Management of Data, pp. 61–72 (1999)
Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P.: Automatic subspace clustering of high dimensional data for data mining applications. In: Proceedings of ACM SIGMOD Conference on Management of Data, pp. 94–105 (1998)
Bellmann, R.: Adaptive Control Process: A Guided Tour. Princeton University Press (1961)
Berchtold, S., Boehm, C., Jagadish, H.V., Kriegel, H.P., Sander, J.: Independent quantization: an index compression technique for high-dimensional data spaces. In: Proceedings of the IEEE Data Engineering, pp. 577–588 (2000)
Berchtold, S., Keim, D.A., Kriegel, H.P.: The X-tree: an index structure for high-dimensional data. In: Proceedings of International Conference on Very Large Data Base, pp. 28–39 (1996)
Bhm, C., Berchtold, S., Keim, D.A.: Searching in high dimensional spaces: index structures for improving the performance of multimedia databases. ACM Compu. Surv. (2001)
Böhm, C., Kriegel, H.P.: Efficient bulk loading of large high-dimensional indexes. In: Proceedings International Conference on Data Warehousing and Knowledge Discovery, pp. 251–260 (1999)
Cappelli R., Maio D., Maltoni D. (2001) Multi-space KL for pattern recognition and classification. IEEE Trans. Pattern Anal. Machine Intell. 23(9):977–996
Article Google Scholar
Cappelli, R., Maio, D., Maltoni, D.: Similarity search using multi-space KL. In: Proceedings of International Workshop on Database and Expert Systems Applications, pp.155–160 (1999)
Castelli V., Thomasian A., Li C.S. (2003) CSVD: clustering and singular value decomposition for approximate similarity search in high-dimensional spaces. IEEE Trans. Knowl Data Eng. 15, 671–685
Article Google Scholar
Cha G.H., Chung C.W. (2002) The GC-tree: A high-dimensional index structure for similarity search in image databases. IEEE Trans. Multimed. 4(2):235–247
Article Google Scholar
Cha G.H., Zhu X., Petkovic D., Chung C.W. (2002) An efficient indexing method for nearest neighbor searches in high-dimensional image databases. IEEE Trans. Multimed. 4(1):76–87
Article Google Scholar
Chakrabarti, K., Mehrotra, S.: Local dimensionality reduction: A new approach to indexing high dimensional spaces. In: Proceedings of the International Conference on Very Large Data Base, pp. 89–100 (2000)
Chakrabarti, K., Mehrotra, S.: The hybrid-tree: an index structure for high-dimensional feature spaces. In: Proceedings of the IEEE International Conference on Data Engineering, pp. 440–447 (1999)
Cui, B., Ooi, B.C., Su, J.W., Tan, K.L.: Contorting high dimensional data for efficient main memory processing. In: Proceedings of the ACM SIGMOD International Conference On Management of Data, pp. 479–490 (2003)
Ferhatosmanoglu, H., Tuncel, E., Agrawal, D., El Abbadi, A.: Vector approximation based indexing for non-uniform high dimensional data sets. In: Proceedings of the ACM International Conference on Information and Knowledge Management, pp. 202–209 (2000)
Figueiredo M., Jain A.K. (2002) Unsupervised learning of finite mixture models. IEEE Trans. Pattern Anal. Machine Intell. 24(3):381–396
Article Google Scholar
Franco, A., Lumini, A., Maio, D.: Eigenspace merging for model updating. In: Proceedings of the International Conference on Pattern Recognition, 2, 156–159 (2002)
Fukunaga K. (1990) Statistical Pattern Recognition. Academic, San Diego
MATH Google Scholar
Gaede, V., Günther, O.: Multidimensional access methods. ACM Comput. Surv. 30(2), (1998)
Gray J., Reuter A. (1993) Transaction Processing: Concepts and Techniques. Morgan Kaufmann, San Mateo, CA
MATH Google Scholar
Guttman, A.: R-trees: A dynamic index structure for spatial searching. In: Proceedings of ACM SIGMOD International Conference on Management of Data, pp. 47–57 (1984)
Jolliffe I.T. (1986) Principal component analysis. Springer, Berlin Heidelberg New York
Google Scholar
Kamel, I., Faloutsos, C.: Hilbert R-tree: an improved R-tree using fractals. In: Proceedings of International Conference on Very Large Data Base, pp. 500–509
Kanth, K.V., Agrawal, D., Singh, A.: Dimensionality for similarity searching in dynamic databases. In: Proceedings ACM SIGMOD International Conference on Management of Data, pp. 166–176 (1998)
Katayama, N., Satoh, S.: The SR-tree: an index structure for high dimensional nearest neighbor queries. In: Proceedings of SIGMOD International Conference on Management of Data, pp. 369–380 (1997)
Kelly, P.M.: An algorithm for merging hyperellipsoidal clusters. Technical Report LA-UR-94-306, Los Alamos National Laboratory, Los Alamos, NM (1994)
Kumar A. (1994) G-Tree: A new data structure for organizing multidimensional data. IEEE Trans. Knowl Data Eng. 6, 341–347
Article Google Scholar
LDR: http://www.ics.uci.edu/~kaushik/research/ldr.html
Li C., Molina H.G., Wiederhold G. (2002) Clustering for approximate similarity search in high-dimensional spaces. IEEE Trans. Knowl Data Eng. 14, 792–808
Article Google Scholar
Lin K., Jagadish H., Faloutsos C. (1994) The TV-tree: an index structure for high-dimensional data. VLDB J 3(4):517–542
Article Google Scholar
MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Berkeley Symposium on Mathematical statistics and probability, vol. 1, pp. 281–297, Berkeley University of California Press (1967)
Nievergelt J., Hinterberger H., Sevcik K.C. (1984) The grid file: an adaptable symmetric multikey file structure. ACM Trans. Database Syst. 9(1):38–71
Article Google Scholar
Ortega, M., Rui, Y., Chakrabarti, K., Mehrotra, S., Huang, T.S.: Supporting similarity queries in MARS. In: Proceedings of the ACM Confernece on Multimed. 403–413 (1997)
Sakurai, Y., Yoshikawa, M., Uemura, S., Kojima, H.: The A-tree: an index structure for high-dimensional spaces using relative approximation. In: Proceedings of the International Conference on Very Large Data Base, pp. 516–526 (2000)
Salomon D. (1997) Data Compression: The Complete Reference. Springer, Berlin Heidelberg New York
MATH Google Scholar
Samet H. (1990) The Design and Analysis of Spatial Data Structures. Addison Wesley, New York
Google Scholar
Van den Bercken, J., Seeger, B., Widmayer, P.: A generic approach to bulk loading multidimensional index structures. In: Proceedings of the International Conference on Very Large Data Base, Athens, pp. 406–415 (1997)
Weber, R., Schek, H.J., Blott, S.: A quantitative analysis and performance study for Similarity-search methods in high-dimensional spaces. In: Proceedings of the International Conference on Very Large Data Base, pp. 194–205 (1998)
White, D.A., Jain, R.: Similarity indexing with the SS-tree. In: Proceedings of the International Conference on Data Engineering, pp. 516–523 (1996)
Yu D., Zhang A. (2003) ClusterTree: integration of clustering representation and nearest-neighbor search for large data sets with high dimensions. IEEE Trans Knowl. Data Eng. 15:1316–1337
Article MathSciNet Google Scholar
Zhang, T., Ramakrishnan, R., Livny, M.: BIRCH: a new data clustering algorithm and its applications. Data Min. Knowl. Discov. 1(2), (1997)

Download references

Author information

Authors and Affiliations

Corso di Laurea in Scienze dell’Informazione, Università di Bologna, via Sacchi 3, 47023, Cesena, Italy
Annalisa Franco & Alessandra Lumini
DEIS - CSITE-CNR - Università di Bologna, viale Risorgimento 2, 40136, Bologna, Italy
Dario Maio

Authors

Annalisa Franco
View author publications
You can also search for this author in PubMed Google Scholar
Alessandra Lumini
View author publications
You can also search for this author in PubMed Google Scholar
Dario Maio
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Annalisa Franco.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Franco, A., Lumini, A. & Maio, D. MKL-tree: an index structure for high-dimensional vector spaces. Multimedia Systems 12, 533–550 (2007). https://doi.org/10.1007/s00530-006-0070-9

Download citation

Published: 09 November 2006
Issue Date: May 2007
DOI: https://doi.org/10.1007/s00530-006-0070-9

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

MKL-tree: an index structure for high-dimensional vector spaces

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

The Design Perspective of the Structures Based on k-d Tree

Application of the nonlinear dimensionality reduction method with the use of reference nodes for solving the problem of multidimensional data search

Index Structures for Fast Similarity Search for Real-Valued Vectors. I

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

MKL-tree: an index structure for high-dimensional vector spaces

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

The Design Perspective of the Structures Based on k-d Tree

Application of the nonlinear dimensionality reduction method with the use of reference nodes for solving the problem of multidimensional data search

Index Structures for Fast Similarity Search for Real-Valued Vectors. I

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation