Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3357384.3357834acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Large-Scale Visual Search with Binary Distributed Graph at Alibaba

Published: 03 November 2019 Publication History

Abstract

Graph-based approximate nearest neighbor search has attracted more and more attentions due to its online search advantages. Numbers of methods studying the enhancement of speed and recall have been put forward. However, few of them focus on the efficiency and scale of offline graph-construction. For a deployed visual search system with several billions of online images in total, building a billion-scale offline graph in hours is essential, which is almost unachievable by most existing methods.
In this paper, we propose a novel algorithm called Binary Distributed Graph to solve this problem. Specifically, we combine binary codes with graph structure to speedup both offline and online procedures, and achieve comparable performance with the ones that use real-value features, by recalling and reranking more binary candidates. Furthermore, the graph-construction is optimized to completely distributed implementation, which significantly accelerates the offline process and gets rid of the limitation of single machine, such as memory and storage. Experimental comparisons on Alibaba Commodity Data Set (more than three billion images) show that the proposed method outperforms the state-of-the-art with respect to the online/offline trade-off.

References

[1]
Artem Babenko and Victor Lempitsky. 2012. The inverted multi-index. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 3069--3076.
[2]
Artem Babenko and Victor Lempitsky. 2016. Efficient indexing of billion-scale datasets of deep descriptors. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2055--2063.
[3]
Artem Babenko and Victor Lempitsky. 2017. Product split trees. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) .
[4]
Norbert Beckmann, Hans-Peter Kriegel, Ralf Schneider, and Bernhard Seeger. 1990. The R*-tree: an efficient and robust access method for points and rectangles. In ACM Conference on Management of Data (SIGMOD), Vol. 19. 322--331.
[5]
Jon Louis Bentley. 1975. Multidimensional binary search trees used for associative searching. Communications Of The ACM, Vol. 18, 9 (1975), 509--517.
[6]
Yue Cao, Mingsheng Long, Jianmin Wang, Qiang Yang, and Philip S Yu. 2016. Deep visual-semantic hashing for cross-modal retrieval. In Proceedings of the 22nd International Conference on Knowledge Discovery and Data Mining (SIGKDD) . 1445--1454.
[7]
Mayur Datar, Nicole Immorlica, Piotr Indyk, and Vahab S Mirrokni. 2004. Locality-sensitive hashing scheme based on p-stable distributions. In Proceedings of the 20th annual Symposium on Computational Geometry (SoCG). 253--262.
[8]
Wei Dong, Charikar Moses, and Kai Li. 2011. Efficient k-nearest neighbor graph construction for generic similarity measures. In Proceedings of the 20th International Conference on World Wide Web (WWW) . 577--586.
[9]
Matthijs Douze, Hervé Jégou, and Florent Perronnin. 2016. Polysemous codes. In European Conference on Computer Vision (ECCV) . 785--801.
[10]
Matthijs Douze, Alexandre Sablayrolles, and Hervé Jégou. 2018. Link and code: Fast indexing with graphs and compact regression codes. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) . 3646--3654.
[11]
Jinyang Gao, HV Jagadish, Beng Chin Ooi, and Sheng Wang. 2015. Selective hashing: Closing the gap between radius search and k-nn search. In Proceedings of the 21th International Conference on Knowledge Discovery and Data Mining (SIGKDD) . 349--358.
[12]
Tiezheng Ge, Kaiming He, Qifa Ke, and Jian Sun. 2013. Optimized product quantization for approximate nearest neighbor search. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) . 2946--2953.
[13]
Yunchao Gong and Svetlana Lazebnik. 2011. Iterative quantization: A procrustean approach to learning binary codes. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 817--824.
[14]
Yunchao Gong, Marcin Pawlowski, Fei Yang, Louis Brandy, Lubomir Bourdev, and Rob Fergus. 2015. Web scale photo hash clustering on a single machine. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 19--27.
[15]
Jie Gui and Ping Li. 2018. R 2 sdh: Robust rotated supervised discrete hashing. In Proceedings of the 24th International Conference on Knowledge Discovery and Data Mining (SIGKDD) . 1485--1493.
[16]
Ben Harwood and Tom Drummond. 2016. Fanng: Fast approximate nearest neighbour graphs. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 5713--5722.
[17]
Jae-Pil Heo, Zhe Lin, and Sung-Eui Yoon. 2014. Distance encoded product quantization. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2131--2138.
[18]
Hosagrahar V Jagadish, Beng Chin Ooi, Kian-Lee Tan, Cui Yu, and Rui Zhang. 2005. iDistance: An adaptive B
[19]
-tree based indexing method for nearest neighbor search. ACM Transactions on Database Systems (TODS), Vol. 30, 2 (2005), 364--397.
[20]
Herve Jegou, Matthijs Douze, and Cordelia Schmid. 2011. Product quantization for nearest neighbor search. IEEE Transactions on Pattern Analysis and Machine Intelligence (T-PAMI), Vol. 33, 1 (2011), 117--128.
[21]
Weihao Kong and Wu-Jun Li. 2012. Isotropic hashing. In Advances in Neural Information Processing Systems (NIPS). 1655--1663.
[22]
Yury Malkov, Alexander Ponomarenko, Andrey Logvinov, and Vladimir Krylov. 2014. Approximate nearest neighbor algorithm based on navigable small world graphs. Information Systems (IS), Vol. 45 (2014), 61--68.
[23]
Yury A Malkov and Dmitry A Yashunin. 2018. Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs. IEEE Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) (2018).
[24]
Yusuke Matsui, Toshihiko Yamasaki, and Kiyoharu Aizawa. 2015. Pqtable: Fast exact asymmetric distance neighbor search for product quantization using hash tables. In IEEE International Conference on Computer Vision (ICCV). 1940--1948.
[25]
Chanop Silpa-Anan and Richard Hartley. 2008. Optimised KD-trees for fast image descriptor matching. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1--8.
[26]
Takeaki Uno, Masashi Sugiyama, and Koji Tsuda. 2009. Efficient construction of neighborhood graphs by the multiple sorting method. arXiv preprint arXiv:0904.3151 (2009).
[27]
Dilin Wang, Lei Shi, and Jianwen Cao. 2013. Fast algorithm for approximate k-nearest neighbor graph construction. In IEEE 13th International Conference on Data Mining Workshops (ICDM workshop) . 349--356.
[28]
Jingdong Wang and Shipeng Li. 2012. Query-driven iterated neighborhood graph search for large scale indexing. In Proceedings of the 20th International Conference on Multimedia (MM). 179--188.
[29]
Jing Wang, Jingdong Wang, Gang Zeng, Zhuowen Tu, Rui Gan, and Shipeng Li. 2012. Scalable k-nn graph construction for visual descriptors. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1106--1113.
[30]
Yair Weiss, Antonio Torralba, and Rob Fergus. 2008. Spectral hashing. In Advances in Neural Information Processing Systems (NIPS). 1753--1760.
[31]
Bin Xu, Jiajun Bu, Yue Lin, Chun Chen, Xiaofei He, and Deng Cai. 2013. Harmonious hashing. In Proceedings of the 23rd International Joint Conference on Artificial Intelligence (IJCAI). 1820--1826.
[32]
Yanhao Zhang, Pan Pan, Yun Zheng, Kang Zhao, Yingya Zhang, Xiaofeng Ren, and Rong Jin. 2018. Visual search at alibaba. In Proceedings of the 24th International Conference on Knowledge Discovery and Data Mining (SIGKDD). 993--1001.
[33]
Kang Zhao, Hongtao Lu, Yangcheng He, and Shaokun Feng. 2014b. Locality preserving discriminative hashing. In Proceedings of the 22nd International Conference on Multimedia (MM). 1089--1092.
[34]
Kang Zhao, Hongtao Lu, and Jincheng Mei. 2014a. Locality Preserving Hashing. In AAAI Conference on Artificial Intelligence (AAAI). 2874--2881.
[35]
Wan-Lei Zhao. 2018. k-NN Graph Construction: a Generic Online Approach. arXiv preprint arXiv:1804.03032 (2018).

Cited By

View all
  • (2024)Bringing Multimodality to Amazon Visual Search SystemProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671640(6390-6399)Online publication date: 25-Aug-2024
  • (2023)Automated Siamese Network Design for Image Similarity ComputationProceedings of the 20th International Conference on Content-based Multimedia Indexing10.1145/3617233.3617243(8-13)Online publication date: 20-Sep-2023
  • (2023)Prophet: An Efficient Feature Indexing Mechanism for Similarity Data Sharing at Network EdgeIEEE INFOCOM 2023 - IEEE Conference on Computer Communications10.1109/INFOCOM53939.2023.10228941(1-10)Online publication date: 17-May-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CIKM '19: Proceedings of the 28th ACM International Conference on Information and Knowledge Management
November 2019
3373 pages
ISBN:9781450369763
DOI:10.1145/3357384
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 November 2019

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. binary codes
  2. distributed algorithm
  3. graph construction
  4. visual search

Qualifiers

  • Research-article

Conference

CIKM '19
Sponsor:

Acceptance Rates

CIKM '19 Paper Acceptance Rate 202 of 1,031 submissions, 20%;
Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)40
  • Downloads (Last 6 weeks)9
Reflects downloads up to 25 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Bringing Multimodality to Amazon Visual Search SystemProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671640(6390-6399)Online publication date: 25-Aug-2024
  • (2023)Automated Siamese Network Design for Image Similarity ComputationProceedings of the 20th International Conference on Content-based Multimedia Indexing10.1145/3617233.3617243(8-13)Online publication date: 20-Sep-2023
  • (2023)Prophet: An Efficient Feature Indexing Mechanism for Similarity Data Sharing at Network EdgeIEEE INFOCOM 2023 - IEEE Conference on Computer Communications10.1109/INFOCOM53939.2023.10228941(1-10)Online publication date: 17-May-2023
  • (2023)An efficient indexing technique for billion-scale nearest neighbor searchMultimedia Tools and Applications10.1007/s11042-023-14825-z82:20(31673-31689)Online publication date: 23-Mar-2023
  • (2023)Technologies for AI-Driven Fashion Social Networking Service with E-CommerceSemantic Intelligence10.1007/978-981-19-7126-6_15(187-204)Online publication date: 1-Apr-2023
  • (2022)ANN softmaxProceedings of the VLDB Endowment10.14778/3485450.348545115:1(1-10)Online publication date: 14-Jan-2022
  • (2022)Billion-Scale Pretraining with Vision Transformers for Multi-Task Visual Representations2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)10.1109/WACV51458.2022.00150(1431-1440)Online publication date: Jan-2022
  • (2021)A comprehensive survey and experimental comparison of graph-based approximate nearest neighbor searchProceedings of the VLDB Endowment10.14778/3476249.347625514:11(1964-1978)Online publication date: 27-Oct-2021
  • (2020)Large-Scale Training System for 100-Million Classification at AlibabaProceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining10.1145/3394486.3403342(2909-2930)Online publication date: 23-Aug-2020

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media