Abstract
The query space of a similarity query is usually narrowed down by pruning inactive query subspaces which contain no query results and keeping active query subspaces which may contain objects corresponding to the request. However, some active query subspaces may contain no query results at all, those are called false active query subspaces. It is obvious that the performance of query processing degrades in the presence of false active query subspaces. Our experiments show that this problem becomes seriously when the data are high dimensional and the number of accesses to false active subspaces increases as the dimensionality increases. In order to solve this problem, this paper proposes a space mapping approach to reducing such unnecessary accesses. A given query space can be refined by filtering within its mapped space. To do so, a mapping strategy called maxgap is proposed to improve the efficiency of the refinement processing. Based on the mapping strategy, an index structure called MS-tree and algorithms of query processing are presented in this paper. Finally, the performance of MS-tree is compared with that of other competitors in terms of range queries on a real data set.
Similar content being viewed by others
References
Böhm C, Berchtold S, Keim D A. Searching in high-dimensional spaces-index structures for improving the performance of multimedia databases. ACM Comput Surv, 2001, 33(3): 322–373
Berkmann N, Krigel H P, Schneider R, et al. The R*-tree: an efficient and robust access method for points and rectangles. SIGMOD Record, 1990, 19(2): 322–331
Katayama N, Satoh S. The SR-tree: an index structure for high-dimensional nearest meighbor queries. SIGMOD Record, 1997, 26(2): 369–380
Lin K I, Jagadish H V, Faloutsos C. The TV-tree: an index structure for high-dimensional data. VLDB J, 1994, 3(4): 517–542
White D A, Jain R. Similarity indexing with the SS-tree. In: Proceedings of the 12th ICDE Conference. Washington: IEEE Computer Society, 1996. 516–523
Cha G H, Chung C W. The GC-tree: a high-dimensional index structure for similarity search in image databases. IEEE Trans Multimedia, 2002, 4(2): 235–247
Bozkaya T, Ozsoyoglu M. Distance-based indexing for high-dimensional metric spaces. SIGMOD Record, 1997, 26(2): 357–368
Ciaccia P, Patella M, Zezula P. M-tree: an efficient access method for similarity search in metric spaces. In: Proceedings of the 23rd VLDB Conference. San Fransisco: Morgan Kaufmann, 1997. 426–435
Skopal T, Pokorny J, Kratky M, et al. Revisiting M-tree building principles. In: Proceedings of the 7th ADBIS Conference. Berlin: Springer-Verlag, 2003. 148–162
Ishikawa M, Chen H, Furuse K, et al. MB+tree: a dynamically updatable metric index for similarity searches. In: Proceedings of the first WAIM Conference. Berlin: Springer-Verlag, 2000. 356–373
Chakrabarti K, Mehrotra S. Local dimensionality reduction: a new approach to indexing high dimensional spaces. In: Proceedings of the 26th VLDB Conference. San Fransisco: Morgan Kaufmann, 2000. 89–100
Zhou X, Wang G, Yu J X, et al. M+-tree: a new dynamical multidimensional index for metric spaces. In: Proceedings of the 14th Australasian Database Conference. Sydney: Australian Computer Society, 2003. 161–168
Cui B, Ooi B C, Su J, et al. Contorting high dimensional data for efficient main memory processing. In: Proceedings of the 2003 ACM SIGMOD Conference. New York: ACM Press, 2003. 479–490
Uhlmann J K. Satisfying general proximity/similarity queries with metric trees. Inform Process Lett, 1991, 40(4): 175–179
Yu G, Kaneko K, Bai G, et al. Transaction management for a distributed object storage system WAKSHI-design, implementation and performance. In: Proceedings of the 12th ICDE Conference. Washington: IEEE Computer Society, 1996. 460–468
Author information
Authors and Affiliations
Corresponding author
Additional information
Supported by National Basic Research Program of China (Grant No. 2006CB303103), the National Natural Science Foundation of China (Grant Nos. 60873011, 60802026, 60773219, 60773021) and the High Technology Program (Grant No. 2007AA01Z192)
Rights and permissions
About this article
Cite this article
Wang, G., Yu, G., Xin, J. et al. Fast filtering false active subspaces for efficient high dimensional similarity processing. Sci. China Ser. F-Inf. Sci. 52, 286–294 (2009). https://doi.org/10.1007/s11432-009-0051-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11432-009-0051-7