Abstract
In this work, a novel hierarchical data structure for high dimensional data indexing is proposed. MKL-tree is based on dimensionality reduction operated by means of the MKL transform, a multi-space generalization of the KL transform. A local dimensionality reduction is performed at each node of the tree, allowing more selective features to be extracted and thus increasing the discriminating power of the index. The mathematical foundation for nodes and leaves representation and for the techniques aimed to manage the structure is detailed. Moreover, the algorithms for bulk loading MKL-tree (i.e., for creating the tree given a large number of objects simultaneously), for updating and splitting nodes after the insertion of new objects and for performing similarity searches are described. Results are reported for the comparison of MKL-tree with other well-known access methods in terms of I/O and CPU costs and precision of the result in the execution of similarity queries.
Similar content being viewed by others
References
Aggarwal, C.C., Procopiuc, C., Wolf, J.L., Yu, P.S., Park, J.S.: Fast algorithms for projected clustering. In: Proceedings of ACM SIGMOD Conference on Management of Data, pp. 61–72 (1999)
Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P.: Automatic subspace clustering of high dimensional data for data mining applications. In: Proceedings of ACM SIGMOD Conference on Management of Data, pp. 94–105 (1998)
Bellmann, R.: Adaptive Control Process: A Guided Tour. Princeton University Press (1961)
Berchtold, S., Boehm, C., Jagadish, H.V., Kriegel, H.P., Sander, J.: Independent quantization: an index compression technique for high-dimensional data spaces. In: Proceedings of the IEEE Data Engineering, pp. 577–588 (2000)
Berchtold, S., Keim, D.A., Kriegel, H.P.: The X-tree: an index structure for high-dimensional data. In: Proceedings of International Conference on Very Large Data Base, pp. 28–39 (1996)
Bhm, C., Berchtold, S., Keim, D.A.: Searching in high dimensional spaces: index structures for improving the performance of multimedia databases. ACM Compu. Surv. (2001)
Böhm, C., Kriegel, H.P.: Efficient bulk loading of large high-dimensional indexes. In: Proceedings International Conference on Data Warehousing and Knowledge Discovery, pp. 251–260 (1999)
Cappelli R., Maio D., Maltoni D. (2001) Multi-space KL for pattern recognition and classification. IEEE Trans. Pattern Anal. Machine Intell. 23(9):977–996
Cappelli, R., Maio, D., Maltoni, D.: Similarity search using multi-space KL. In: Proceedings of International Workshop on Database and Expert Systems Applications, pp.155–160 (1999)
Castelli V., Thomasian A., Li C.S. (2003) CSVD: clustering and singular value decomposition for approximate similarity search in high-dimensional spaces. IEEE Trans. Knowl Data Eng. 15, 671–685
Cha G.H., Chung C.W. (2002) The GC-tree: A high-dimensional index structure for similarity search in image databases. IEEE Trans. Multimed. 4(2):235–247
Cha G.H., Zhu X., Petkovic D., Chung C.W. (2002) An efficient indexing method for nearest neighbor searches in high-dimensional image databases. IEEE Trans. Multimed. 4(1):76–87
Chakrabarti, K., Mehrotra, S.: Local dimensionality reduction: A new approach to indexing high dimensional spaces. In: Proceedings of the International Conference on Very Large Data Base, pp. 89–100 (2000)
Chakrabarti, K., Mehrotra, S.: The hybrid-tree: an index structure for high-dimensional feature spaces. In: Proceedings of the IEEE International Conference on Data Engineering, pp. 440–447 (1999)
Cui, B., Ooi, B.C., Su, J.W., Tan, K.L.: Contorting high dimensional data for efficient main memory processing. In: Proceedings of the ACM SIGMOD International Conference On Management of Data, pp. 479–490 (2003)
Ferhatosmanoglu, H., Tuncel, E., Agrawal, D., El Abbadi, A.: Vector approximation based indexing for non-uniform high dimensional data sets. In: Proceedings of the ACM International Conference on Information and Knowledge Management, pp. 202–209 (2000)
Figueiredo M., Jain A.K. (2002) Unsupervised learning of finite mixture models. IEEE Trans. Pattern Anal. Machine Intell. 24(3):381–396
Franco, A., Lumini, A., Maio, D.: Eigenspace merging for model updating. In: Proceedings of the International Conference on Pattern Recognition, 2, 156–159 (2002)
Fukunaga K. (1990) Statistical Pattern Recognition. Academic, San Diego
Gaede, V., Günther, O.: Multidimensional access methods. ACM Comput. Surv. 30(2), (1998)
Gray J., Reuter A. (1993) Transaction Processing: Concepts and Techniques. Morgan Kaufmann, San Mateo, CA
Guttman, A.: R-trees: A dynamic index structure for spatial searching. In: Proceedings of ACM SIGMOD International Conference on Management of Data, pp. 47–57 (1984)
Jolliffe I.T. (1986) Principal component analysis. Springer, Berlin Heidelberg New York
Kamel, I., Faloutsos, C.: Hilbert R-tree: an improved R-tree using fractals. In: Proceedings of International Conference on Very Large Data Base, pp. 500–509
Kanth, K.V., Agrawal, D., Singh, A.: Dimensionality for similarity searching in dynamic databases. In: Proceedings ACM SIGMOD International Conference on Management of Data, pp. 166–176 (1998)
Katayama, N., Satoh, S.: The SR-tree: an index structure for high dimensional nearest neighbor queries. In: Proceedings of SIGMOD International Conference on Management of Data, pp. 369–380 (1997)
Kelly, P.M.: An algorithm for merging hyperellipsoidal clusters. Technical Report LA-UR-94-306, Los Alamos National Laboratory, Los Alamos, NM (1994)
Kumar A. (1994) G-Tree: A new data structure for organizing multidimensional data. IEEE Trans. Knowl Data Eng. 6, 341–347
LDR: http://www.ics.uci.edu/~kaushik/research/ldr.html
Li C., Molina H.G., Wiederhold G. (2002) Clustering for approximate similarity search in high-dimensional spaces. IEEE Trans. Knowl Data Eng. 14, 792–808
Lin K., Jagadish H., Faloutsos C. (1994) The TV-tree: an index structure for high-dimensional data. VLDB J 3(4):517–542
MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Berkeley Symposium on Mathematical statistics and probability, vol. 1, pp. 281–297, Berkeley University of California Press (1967)
Nievergelt J., Hinterberger H., Sevcik K.C. (1984) The grid file: an adaptable symmetric multikey file structure. ACM Trans. Database Syst. 9(1):38–71
Ortega, M., Rui, Y., Chakrabarti, K., Mehrotra, S., Huang, T.S.: Supporting similarity queries in MARS. In: Proceedings of the ACM Confernece on Multimed. 403–413 (1997)
Sakurai, Y., Yoshikawa, M., Uemura, S., Kojima, H.: The A-tree: an index structure for high-dimensional spaces using relative approximation. In: Proceedings of the International Conference on Very Large Data Base, pp. 516–526 (2000)
Salomon D. (1997) Data Compression: The Complete Reference. Springer, Berlin Heidelberg New York
Samet H. (1990) The Design and Analysis of Spatial Data Structures. Addison Wesley, New York
Van den Bercken, J., Seeger, B., Widmayer, P.: A generic approach to bulk loading multidimensional index structures. In: Proceedings of the International Conference on Very Large Data Base, Athens, pp. 406–415 (1997)
Weber, R., Schek, H.J., Blott, S.: A quantitative analysis and performance study for Similarity-search methods in high-dimensional spaces. In: Proceedings of the International Conference on Very Large Data Base, pp. 194–205 (1998)
White, D.A., Jain, R.: Similarity indexing with the SS-tree. In: Proceedings of the International Conference on Data Engineering, pp. 516–523 (1996)
Yu D., Zhang A. (2003) ClusterTree: integration of clustering representation and nearest-neighbor search for large data sets with high dimensions. IEEE Trans Knowl. Data Eng. 15:1316–1337
Zhang, T., Ramakrishnan, R., Livny, M.: BIRCH: a new data clustering algorithm and its applications. Data Min. Knowl. Discov. 1(2), (1997)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Franco, A., Lumini, A. & Maio, D. MKL-tree: an index structure for high-dimensional vector spaces. Multimedia Systems 12, 533–550 (2007). https://doi.org/10.1007/s00530-006-0070-9
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00530-006-0070-9