Abstract
Finding similar objects based on a query and a distance, remains a fundamental problem for many applications. The general problem of many similarity measures is to focus the search on as few elements as possible to find the answer. The index structures divides the target dataset into subsets. With large amounts of data, the volumes of the subspaces grow exponentially, that will affect the search algorithms. This problem is caused by inherent deficiencies of space partitioning, and also, the overlap factor between regions. This methods have proven to be unreliable, it becomes hard to store, manage, and analyze these quantities. The research tends to degenerate into a complete analysis of the data set. In this paper, we propose a new indexing technique called XM-tree, that partitions the space using spheres. The idea is to combine two structures, arborescent and sequential, in order to limit the volume of the outer regions of the spheres, by creating extended regions and inserting them into linked lists named extended regions, and also by excluding of the empty sets—separable partitions—that do not contain objects. The goal is to eliminate some objects without the need to compute their relative distances to a query object. Therefore, we proposed a parallel version of the structure on a set of real machine. We also discuss the efficiency of the construction and querying phases, and the quality of our index by comparing it with recent techniques.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
The first dataset is available from the COPhIR collection at http://cophir.isti.cnr.it, whereas the second one can be found at http://kdd.ics.uci.edu.
GBDI Arboretum is a library C++ that implements different metric access methods (MAM) (cf. http://www.gbdi.icmc.usp.br/old/arboretum).
References
Almeida J, Valle E, Torres RS (2010) Dahc-tree: an effective index for approximate search in high-dimensional metric spaces. J Inf Data Manag 1(3):375–390
Arroyuelo D (2014) A dynamic pivoting algorithm based on spatial approximation indexes. In: Similarity search and applications—7th international conference, SISAP 2014, Los Cabos, Mexico, 29–31 October 2014
Batko M, Novak D, Falchi F, Zezula P (2006) On scalability of the similarity search in the world of peers. In: Proceedings of the 1st international conference on scalable information systems (InfoScale), ACM Press, Hong Kong, China, pp 20–31
Böhm C, Berchtold S, Keim DA (2001) Searching in high-dimensional spaces: index structures for improving the performance of multimedia databases. ACM Comput Surv 33(3):322–373
Bolettieri P, Falchi F, Lucchese C, Mass Y, Perego R, Rabitti F, Shmueli-Scheuer M (2009) Searching 100m images by content similarity. In: Post-proceedings of the 5th Italian Research Conference on Digital Library Systems (IRCD), Padova, Italy, pp 88–99
Bozkaya T, Özsoyoglu M (1999) Indexing large metric spaces for similarity search queries. ACM Trans Database Syst 24:361–404
Burkhard WA, Keller RM (1973) Some approaches to best-match file searching. Commun ACM 16(4):230–236
Carélo CCM, Pola IRV, Ciferri RR, Traina AJM, Traina C, de Aguiar Ciferri CD (2011) Slicing the metric space to provide quick indexing of complex data in the main memory. Inf Syst 36:79–98
Chakraborty D, Singh S, Dutta D (2017) Segmentation and classification of high spatial resolution images based on hölder exponents and variance. Geo-spatial Inf Sci 20(1):39–45
Chavez E, Navarro G, Marroquin JL, Baeza-Yates R (2001) Searching in metric spaces. ACM Comput Surv 33(3):273–321
Chen L, Gao Y, Zheng B, Jensen CS, Yang H, Yang K (2016) Pivot-based metric indexing. Proc VLDB Endow 10(10):1058–1069
Chen L, Gao Y, Li X, Jensen CS, Chen G (2017) Efficient metric indexing for similarity search and similarity joins. IEEE Trans Knowl Data Eng 29(3):556–571
Ciaccia P, Patella M, Zezula P (1997) M-tree: an efficient access method for similarity search in metric spaces. In: Proceedings of the 23rd VLDB international conference, pp 426–435
Cordeiro RLF, Gonzaga AS (2017) A new division operator to handle complex objects in very large relational datasets. In: EDBT
Curtin RR (2015) Faster dual-tree traversal for nearest neighbor search. In: Similarity search and applications—8th international conference, SISAP 2015, Glasgow, UK, 12–14 October 2015
Fu AWC, Chan PMS, Cheung YL, Moon YS (2012) Dynamic vp-tree indexing for n-nearest neighbor search given pair-wise distances. VLDB J Very Large Data Bases 9:154–173
Gaede V, Günther O (1998) Multidimensional access methods. ACM Comput Surv 30(2):170–231
Gimenes G, Cordeiro RL, Rodrigues JF Jr (2017) ORFEL: efficient detection of defamation or illegitimate promotion in online recommendation. Inf Sci 379:274–287
Goto H, Shimakawa Y (2017) Storage-efficient reconstruction framework for planar contours. Geo-spatial Inf Sci 20(1):14–28
Graefe G (2009) Fast Loads and Fast Queries. Springer, Berlin, pp 111–124
Kondylakis H, Dayan N, Zoumpatianos K, Palpanas T (2018) Coconut: a scalable bottom-up approach for building data series indexes. In: VLDB
Martinez J, Kouahla Z (2012) Indexing metric spaces with nested forests. In: Database and expert systems applications—23rd international conference, DEXA 2012, Part I, Vienna, Austria, 3–6 September 2012
Navarro G (1999) Searching in metric spaces by spatial approximation. In: Proceedings of string processing and information retrieval (SPIRE99), Cancun, Mexico
Navarro G (2002) Searching in metric spaces by spatial approximation. VLDB J 11(1):28–46
Nielsen F (2009) Bregman vantage point trees for efficient nearest neighbor queries. In: Proceedings of multimedia and exp (ICME). IEEE
Ooi BC (1987) Spatial kd-tree: a data structure for geographic database. Springer, Berlin, pp 247–258
Ortega JP, Ortega NNA, Ruiz-Vanoye JA, Sanchez SS, Lelis JMR, Rebollar AM (2018) A-means: improving the cluster assignment phase of k-means for big data. Int J Comb Optim Probl Inf 9(2):3–10
Pagh R, Silvestri F, Sivertsen J, Skala M (2015) Approximate furthest neighbor in high dimensions. In: Similarity search and applications—8th international conference, SISAP 2015, Glasgow, UK, 12–14 October 2015
Pola IRV, Traina C Jr, Traina AJM (2007) The mm-tree: a memory-based metric tree without overlap between nodes. In: ADBIS 2007, LNCS 4690, pp 157–171
Pola IRV, Traina C Jr, Traina AJM (2014) The nobh-tree: improving in-memory metric access methods by using metric hyperplanes with non-overlapping nodes. Data Knowl Eng 94:65–88
Pola IRV, Traina AJM, Traina C, Kaster DS (2015) Improving metric access methods with bucket files. In: Similarity search and applications—8th international conference, SISAP 2015, Glasgow, UK, 12–14 October 2015
Samet H (2006) Foundations of multidimensional and metric data structures. Morgan-Kaufmann, San Francisco, p 993
Santoyo F, Chávez E, Tellez ES (2015) A compressed index for hamming distances. In: Similarity search and applications—7th international conference, SISAP 2014, Los Cabos, Mexico, 29–31 October 2014
Shu H (2016) Big data analytics: six techniques. Geo-spatial Inf Sci 19(2):119–128
Traina C Jr, Traina A, Seeger B, Faloutsos C (2000) Slim-trees: high performance metric trees minimizing overlap between nodes. In: International conference on extending database technology (EDBT)
Wan WY, Xiabi L, Wu Y (2017) Cd-tree: A clustering-based dynamic indexing and retrieval approach. Intell Data Anal 21:243–261
Yang H, Yu L (2017) Feature extraction of wood-hole defects using wavelet-based ultrasonic testing. J For Res 28(2):395–402
Yianilos PN (1993) Data structures and algorithms for nearest neighbor search in general metric spaces. In: Proceedings of the 4th Annual In ACM-SIAM symposium on discrete algorithms, pp 311–321
Zierenberg M, Schmitt I (2015) Optimizing the distance computation order of multi-feature similarity search indexing. In: Similarity search and applications—8th international conference, SISAP 2015, Glasgow, UK, 12–14 October 2015
Zineddine K, Martinez J (2012) A new intersection tree for content-based image retrieval. In: 10th international workshop on content-based multimedia indexing, CBMI 2012, Annecy, France, 27–29 June 2012, pp 1–6
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Kouahla, Z., Anjum, A., Akram, S. et al. XM-tree: data driven computational model by using metric extended nodes with non-overlapping in high-dimensional metric spaces. Comput Math Organ Theory 25, 196–223 (2019). https://doi.org/10.1007/s10588-018-9272-x
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10588-018-9272-x