Abstract
A large number of data resources with different types are appearing in the internet with the development of information technology, and some negative ones have done harm to our society and citizens. In order to insure the harmony of the society, it is important to discovery the bad resources from the heterogeneous massive data resources in the cyberspace, the internet resource discovery has attracted increasing attention. In this paper, we present the iHash method, a semantic-based organization and similarity search method for internet data resources. First, the iHash normalizes the internet data objects into a high-dimensional feature space, solving the “feature explosion” problem of the feature space; second, we partition the high-dimensional data in the feature space according to clustering method, transform the data clusters into regular shapes, and use the Pyramid-similar method to organize the high-dimensional data; finally, we realize the range and kNN queries based on our method. At last we discuss the performance evaluation of the iHash method and find it performs efficiently for similarity search.
Chapter PDF
Similar content being viewed by others
References
Böhm, C., et al.: Searching in High-Dimensional Spaces—Index Structures for Improving the Performance of Multimedia Databases. ACM Computing Surveys 33(3), 322–373 (2001)
Chàvez, E., et al.: Searching in Metric Spaces. ACM Computing Surveys 33(3), 273–321 (2001)
Zezula, P., Amato, G., Dohnal, V., Batko, M.: Similarity Search: The Metric Space Approach. Advances in Database Systems (2006)
Guttman, A.: R-trees: a dynamic index structure for spatial searching. In: SIGMOD, pp. 47–57 (1984)
Bentley, J.L.: Multidimensional Binary Search Trees Used for Associative Searching. Communications of the ACM 18(9), 509–517 (1975)
Fu, A.W., Chan, P.M., Cheung, Y.L., Moon, Y.S.: Dynamic vp-Tree Indexing for n-Nearest Neighbor Search Given PairWise Distances. In: VLDB (2000)
Ciaccia, T., Patella, M., Zezula, P.: M-tree: An Efficient Access Method for Similarity Search in Metric Spaces. In: VLDB, pp. 426–435 (1997)
Berchtold, S., Böhm, C., Kriegel, H.P.: The Pyramid-Technique: Towards Breaking the Curse of Dimensionality. In: SIGMOD, pp. 142–153 (1998)
Jagadish, H.V., Ooi, B.C., et al.: iDistance: An Adaptive B + -tree Based Indexing Method for Nearest Neighbor Search. ACM Transactions on Database Systems, 1–34 (2003)
Zhang, R., Ooi, B.C., Tan, K.-L.: Making the Pyramid Technique Robust to Query Types and Workloads. In: Proceeding of the 20th International Conference on Data Engineering, ICDE 2004 (2004)
Jain, A.K., Murty, M.N., Flynn, P.J.: Data Clustering: A Review. ACM Computing Surveys 31(3), 264–323 (1999)
Berchtold, S., Keim, D.A., Kriegel, H.-R.: The x-tree: An index structure for high-dimensional data. In: VLDB (1996)
Zhang, R., Ooi, B.C., Tan, K.-L.: Making the pyramid technique robust to query types and workloads. In: Proceedings of the 20th International Conference on Data Engineering. IEEE (2004)
Comer, D.: The Ubiquitous B-tree. ACM Computing Surveys 11(2), 121–138 (1979)
Berchtold, S., Keim, D.A., Kriegel, H.-P.: The X-tree: An index structure for high-dimensional data. Readings in Multimedia Computing and Networking, 451 (2001)
White, D.A., Jain, R.: Similarity indexing with the SS-tree. In: Proceedings of the Twelfth International Conference on Data Engineering. IEEE (1996)
Batko, M., Gennaro, C., Zezula, P.: Similarity grid for searching in metric spaces. In: Türker, C., Agosti, M., Schek, H.-J. (eds.) P2P, Grid, and Service Orientation . . . . LNCS, vol. 3664, pp. 25–44. Springer, Heidelberg (2005)
Berchtold, S., Boehm, C., Kriegel, H.-P.: High-dimensional index structure. U.S. Patent No. 6,154,746 (November 28, 2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 IFIP International Federation for Information Processing
About this paper
Cite this paper
Ren, P., Wang, X., Sun, H., Zhao, B., Wu, C. (2014). An Efficient Semantic-Based Organization and Similarity Search Method for Internet Data Resources. In: Linawati, Mahendra, M.S., Neuhold, E.J., Tjoa, A.M., You, I. (eds) Information and Communication Technology. ICT-EurAsia 2014. Lecture Notes in Computer Science, vol 8407. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-55032-4_68
Download citation
DOI: https://doi.org/10.1007/978-3-642-55032-4_68
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-55031-7
Online ISBN: 978-3-642-55032-4
eBook Packages: Computer ScienceComputer Science (R0)