Abstract
Peer-to-peer systems have been widely used for sharing and exchanging data and resources among numerous computer nodes. Various data objects identifiable with high dimensional feature vectors, such as text, images, genome sequences, are starting to leverage P2P technology. Most of the existing works have been focusing on queries on data objects with one or few attributes and thus are not applicable on high dimensional data objects. In this study, we investigate K nearest neighbors query (KNN) on high dimensional data objects in P2P systems. Efficient query algorithm and solutions that address various technical challenges raised by high dimensionality, such as search space resolution and incremental search space refinement, are proposed. An extensive simulation using both synthetic and real data sets demonstrates that our proposal efficiently supports KNN query on high dimensional data in P2P systems.
Similar content being viewed by others
References
Ratnasamy S, Francis P, Handley M, et al. Scalable, distributed object location and routing for large-scale peer-to-peer systems In: Proceedings of ACM SIGCOMM 2001. New York: ACM Press, 2001, 161–172
Stoica I, Morris R, Karger D, et al. Chord: A scalable peer-topeer lookup service for Internet applications. In: Proceedings of ACMSIGCOMM2001. New York: ACM Press, 2001, 149–160
Rowstron A I T, Druschel P. Pastry: Scalable, distributed object location and routing for large-scale peer-to-peer systems. In: Proceedings of IFIP/ACM International Conference on Distributed Systems Platforms (Middleware). New York: ACM Press, 2001, 329–350
Zhao B Y, Kubiatowicz J D, Joseph A D. Tapestry: an infrastructure for fault-tolerant wide-area location and routing. Technical Report UCS/CSD-01-1141, Computer Science Division, U. C. Berkeley, 2001
Andrzejak A, Xu Z. Scalable, efficient range queries for grid information services. In: Proceedings of IEEE International Conference on Peer-to-Peer Computing (P2P). Wahsington D.C.: IEEE Computing Soceity, 2002, 33–40
Aspnes J, Shah G. Skip graphs. In: Proceedings of ACMSIAM Symposium on Discrete Algorithms (SODA). New York: ACM Press, 2003, 384–393
Bharambe A R, Agrawal M, Seshan S. Mercury: Supporting scalable multi-attribute range queries. In: Proceedings of ACM SIGCOMM. New York: ACM Press, 2004, 353–366
Ganesan P, Bawa M, Garcia-Molina H. Online balancing of range-partitioned data with applications to peer-to-peer systems. In: Proceedings of International Conference on Very Large Data Bases (VLDB). VLDB Endowment, 2004, 444–455
Gao J, Steenkiste P. An adaptive protocol for efficient support of range queries in DHT-based systems. In: Proceedings of IEEE International Conference on Network Protocols (ICNP). Washington D.C.: IEEE Computer Society, 2004, 239–250
Gupta A, Agrawal D, Abbadi A E. Approximate range selection queries in peer-to-peer systems. In: Proceedings of Biennial Conference on Innovative Data Systems Research (CIDR), 2003
Sahin O, Gupta A, Agrawal D, et al. A peer-to-peer framework for caching range queries. In: Proceedings of International Conference on Data Engineering (ICDE). Washinton D.C.: IEEE Computer Society, 2004, 165–176
Shu Y, Ooi B C, Tan KL, et al. Supporting multi-dimensional range queries in peer-to-peer systems. In: Proceedings of IEEE International Conference on Peer-to-Peer Computing (P2P). Washington D.C.: IEEE Computer Society, 2005, 173–180
Banaei-Kashani F, Shahabi C. SWAM: a family of access methods for similarity-search in peer-to-peer data networks. In: Proceedings of ACM Conference on Information and Knowledge Management (CIKM). New York: ACM Press, 2004, 304–313
Jagadish H V, Ooi BC, Vu Q H, et al. VBI-Tree: a peer-to-peer framework for supporting multi-dimensional indexing schemes. In: Proceedings of International Conference on Data Engineering (ICDE), 2006
Li M, Lee W-C, Sivasubramaniam A. DPTree: a balanced tree based indexing framework for peer-to-peer systems. In: Proceedings of International Conference on Network Protocols (ICNP). Washington D.C.: IEEE Computer Society, 2006, 12–21
Liu B, Lee W-C, Lee D L. Supporting complex multi-dimensional queries in P2P systems. In: Proceedings of International Conference on Distributed Computing Systems (ICDCS), 2005, 155–164
Tanin E, Nayar D, Samet H. An efficient nearest neighbor algorithm for P2P settings. In: Proceedings of National Conference on Digital Government Research, 2005, 21–28
Li M, Lee W-C, Sivasubramaniam A. Semantic small world: An overlay network for peer-to-peer search. In: Proceedings of International Conference on Network Protocols (ICNP). Washington D.C.: IEEE Computer Society, 2004, 228–238
Li M, Lee W-C, Sivasubramaniam A, et al. Ssw: a small world based overlay for peer-to-peer search. IEEE Transaction on Parallel and Distributed Systems, 2008, 19(2): 735–749
Ganesan P, Yang B, Garcia-Molina B. One torus to rule them all: Multidimensional queries in P2P systems. In: Proceedings of International Workshop on the Web and Databases (WebDB), 2004, 19–24
Tang C, Xu Z, Dwarkadas S. Peer-to-peer information retrieval using self-organizing semantic overlay networks. In: Proceedings of ACM SIGCOMM. New York: AMC Press, 2003, 175–186
Müller W, Henrich A. Fast retrieval of high-dimensional feature vectors in P2P networks using compact peer data summaries. In: Proceedings of ACM SIGMM International Workshop on Multimedia Information Retrieval (MIR). New York: ACM Press, 2003, 79–86
Aberer K. P-Grid: a self-organizing access structure for P2P information systems. In: Proceedings of International Conference on Cooperative Information Systems (CoopIS) 2001, 179–194
Crainiceanu A, Linga P, Gehrke J, et al. Querying peer-to-peer networks using P-trees. In: Proceedings of International Workshop on the Web and Databases (WebDB). New York: ACM Press, 2004, 25–30
Houle M. E, Sakuma J. Fast approximate similarity search in extremely high-dimensional data sets. In: Proceedings of International Conference on Data Engineering (ICDE). Washinton DC.: IEEE Computer Society, 2005, 619–630
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Li, M., Lee, WC., Sivasubramaniam, A. et al. Supporting K nearest neighbors query on high-dimensional data in P2P systems. Front. Comput. Sci. China 2, 234–247 (2008). https://doi.org/10.1007/s11704-008-0026-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11704-008-0026-7