Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3240508.3240630acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article
Open access

Reconfigurable Inverted Index

Published: 15 October 2018 Publication History

Abstract

Existing approximate nearest neighbor search systems suffer from two fundamental problems that are of practical importance but have not received sufficient attention from the research community. First, although existing systems perform well for the whole database, it is difficult to run a search over a subset of the database. Second, there has been no discussion concerning the performance decrement after many items have been newly added to a system. We develop a reconfigurable inverted index (Rii) to resolve these two issues. Based on the standard IVFADC system, we design a data layout such that items are stored linearly. This enables us to efficiently run a subset search by switching the search method to a linear PQ scan if the size of a subset is small. Owing to the linear layout, the data structure can be dynamically adjusted after new items are added, maintaining the fast speed of the system. Extensive comparisons show that Rii achieves a comparable performance with state-of-the art systems such as Faiss.

Supplementary Material

ZIP File (fp0587.zip)
Supplementary Material for "Reconfigurable Inverted Index"

References

[1]
Alexandr Andoni, Piotr Indyk, Thijs Laarhoven, Ilya Razenshteyn, and Ludwig Schmidt. 2015. Practical and Optimal LSH for Angular Distance. In Proc. NIPS .
[2]
Fabien André, Anne-Marie Kermarrec, and Nicolas Le Scouarnec. 2015. Cache Locality is Not Enough: High-performance Nearest Neighbor Search with Product Quantization Fast Scan. In Proc. VLDB .
[3]
Fabien André, Anne-Marie Kermarrec, and Nicolas Le Scouarnec. 2017. Accelerated Nearest Neighbor Search with Quick ADC. In Proc. ICMR .
[4]
Martin Aumüller, Erik Bernhardsson, and Alexander Faithfull. 2017. ANN-Benchmarks: A Benchmarking Tool for Approximate Nearest Neighbor Algorithms. In Proc. SISAP .
[5]
Artem Babenko and Victor Lemitsky. 2017. AnnArbor: Approximate Nearest Neighbors Using Arborescence Coding. In Proc. IEEE ICCV .
[6]
Artem Babenko and Victor Lempitsky. 2014. Additive Quantization for Extreme Vector Compression. In Proc. IEEE CVPR .
[7]
Artem Babenko and Victor Lempitsky. 2015a. The Inverted Multi-Index. IEEE TPAMI, Vol. 37, 6 (2015), 1247--1260.
[8]
Artem Babenko and Victor Lempitsky. 2015b. Tree Quantization for Large-Scale Similarity Search and Classification. In Proc. IEEE CVPR .
[9]
Artem Babenko and Victor Lempitsky. 2016. Efficient Indexing of Billion-Scale Datasets of Deep Descriptors. In Proc. IEEE CVPR .
[10]
Dmitry Baranchuk, Artem Babenko, and Yury Malkov. 2018. Revisiting the Inverted Indices for Billion-Scale Approximate Nearest Neighbors. In Proc. ECCV .
[11]
Erik Bernhardsson. 2018. Annoy. https://github.com/spotify/annoy.
[12]
Erik Bernhardsson, Martin Aumüller, and Alexander Faithfull. 2018. ann-benchmarks. https://github.com/erikbern/ann-benchmarks.
[13]
Davis W. Blalock and John V. Guttag. 2017. Bolt: Accelerated Data Mining with Fast Vector Compression. In Proc. ACM KDD .
[14]
Leonid Boytsov and Bilegsaikhan Naidan. 2013. Engineering Efficient and Effective Non-metric Space Library. In Proc. SISAP .
[15]
Mayur Datar, Nicole Immorlica, Piotr Indyk, and Vahab S. Mirrokni. 2004. Locality-Sensitive Hashing Scheme Based on p-Stable Distributions. In Proc. SCG .
[16]
Matthijs Douze, Hervé Jégou, and Florent Perronnin. 2016. Polysemous Codes. In Proc. ECCV .
[17]
Matthijs Douze, Alexandre Sablayrolles, and Hervé Jégou. 2018. Link and code: Fast indexing with graphs and compact regression codes. In Proc. IEEE CVPR .
[18]
Tiezheng Ge, Kaiming He, Qifa Ke, and Jian Sun. 2014. Optimized Product Quantization. IEEE TPAMI, Vol. 36, 4 (2014), 744--755.
[19]
Gylfi Þór Gudmundsson, Björn Þór Jónsson, Laurent Amsaleg, and Michael J. Franklin. 2018. Prototyping a Web-Scale Multimedia Retrieval Service Using Spark. ACM TOMM, Vol. 14, 3s (2018), 65:1--65:24.
[20]
Jae-Pil Heo, Zhe Lin, Xiaohui Shen, Jonathan Brandt, and Sung-Eui Yoon. 2016. Shortlist Selection With Residual-Aware Distance Estimator for K-Nearest Neighbor Search. In Proc. IEEE CVPR .
[21]
Jae-Pil Heo, Zhe Lin, and Sung-Eui Yoon. 2014. Distance Encoded Product Quantization. In Proc. IEEE CVPR .
[22]
Gao Huang, Zhuang Liu, Laurens van der Maaten, and Kilian Q. Weinberger. 2017. Densely Connected Convolutional Networks. In Proc. IEEE CVPR .
[23]
Masakazu Iwamura, Tomokazu Sato, and Koichi Kise. 2013. What Is the Most Efficient Way to Select Nearest Neighbor Candidates for Fast Approximate Nearest Neighbor Search?. In Proc. IEEE ICCV .
[24]
Himalaya Jain, Patrick Pérez, Rémi Gribonval, Joaquin Zepeda, and Hervé Jégou. 2016. Approximate Search with Quantized Sparse Representations. In Proc. ECCV .
[25]
Hervé Jégou, Matthijs Douze, and Jeff Johnson. 2018. Faiss. https://github.com/facebookresearch/faiss.
[26]
Hervé Jégou, Matthijis Douze, and Cordelia Schmid. 2011a. Product Quantization for Nearest Neighbor Search. IEEE TPAMI, Vol. 33, 1 (2011), 117--128.
[27]
Hervé Jégou, Romain Tavenard, Matthijs Douze, and Laurent Amsaleg. 2011b. Searching in One Billion Vectors: Re-rank with Source Coding. In Proc. IEEE ICASSP .
[28]
Jeff Johnson, Matthijs Douze, and Hervé Jégou. 2017. Billion-scale Similarity Search with GPUs. CoRR, Vol. abs/1702.08734 (2017).
[29]
Yannis Kalantidis and Yannis Avrithis. 2014. Locally Optimized Product Quantization for Approximate Nearest Neighbor Search. In Proc. IEEE CVPR .
[30]
Yingfan Liu, Hong Cheng, and Jiangtao Cui. 2017. PQBF: I/O-Efficient Approximate Nearest Neighbor Search by Product Quantization. In Proc. CIKM .
[31]
Qin Lv, William Josephson, Zhe Wang, Moses Charikar, and Kai Li. 2007. Multi-Probe LSH: Efficient Indexing for High-Dimensional Similarity Search. In Proc. VLDB .
[32]
Yury Malkov, Alexander Ponomarenko, Andrey Logvinov, and Vladimir Krylov. 2014. Approximate Nearest Neighbor Algorithm Based on Navigable Small World Graphs. Inf. Syst., Vol. 45 (2014), 61--68.
[33]
Yury A. Malkov and Dmitry A. Yashunin. 2016. Efficient and Robust Approximate Nearest Neighbor Search using Hierarchical Navigable Small World Graphs. CoRR, Vol. abs/1603.09320 (2016).
[34]
Julieta Martinez, Joris Clement, Holger H. Hoos, and James J. Little. 2016. Revisiting Additive Quantization. In Proc. ECCV .
[35]
Yusuke Matsui, Keisuke Ogaki, Toshihiko Yamasaki, and Kiyoharu Aizawa. 2017. PQk-means: Billion-scale Clustering for Product-quantized Codes. In Proc. MM .
[36]
Yusuke Matsui, Yusuke Uchida, Hervé Jégou, and Shin'ichi Satoh. 2018a. A Survey of Product Quantization. ITE Transactions on Media Technology and Applications, Vol. 6, 1 (2018), 2--10.
[37]
Yusuke Matsui, Toshihiko Yamasaki, and Kiyoharu Aizawa. 2018b. PQTable: Non-exhaustive Fast Search for Product-quantized Codes using Hash Tables. IEEE TMM, Vol. 20, 7 (2018), 1809--1822.
[38]
Marius Muja and David G. Lowe. 2014. Scalable Nearest Neighbor Algorithms for High Dimensional Data. IEEE TPAMI, Vol. 36, 11 (2014), 2227--2240.
[39]
Bilegsaikhan Naidan, Leonid Boytsov, Yury Malkov, David Novak, and Ben Frederickson. 2018. Non-Metric Space Library (NMSLIB). https://github.com/searchivarius/nmslib.
[40]
Mohammad Norouzi and David J. Fleet. 2013. Cartesian k-means. In Proc. IEEE CVPR .
[41]
Ilya Razenshteyn and Ludwig Schmidt. 2018. FALCONN - FAst Lookups of Cosine and Other Nearest Neighbors. https://github.com/FALCONN-LIB/FALCONN.
[42]
Eleftherios Spyromitros-Xioufis, Symeon Papadopoulos, Ioannis (Yiannis) Kompatsiaris, Grigorios Tsoumakas, and Ioannis Vlahavas. 2014. A Comprehensive Study Over VLAD and Product Quantization in Large-Scale Image Retrieval. IEEE TMM, Vol. 16, 6 (2014), 1713--1728.
[43]
Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going Deeper With Convolutions. In Proc. IEEE CVPR .
[44]
Jianfeng Wang, Jingdong Wang, Jingkuan Song, Xin-Shun Xu, Heng Tao Shen, and Shipeng Li. 2015. Optimized Cartesian K-Means. IEEE TKDE, Vol. 27, 1 (2015), 180--192.
[45]
Patrick Wieschollek, Oliver Wang, Alexander Sorkine-Hornung, and Hendrik P. A. Lensch. 2016. Efficient Large-Scale Approximate Nearest Neighbor Search on the GPU. In Proc. IEEE CVPR .
[46]
Yan Xia, Kaiming He, Fang Wen, and Jian Sun. 2013. Joint Inverted Indexing. In Proc. IEEE ICCV .
[47]
Jialiang Zhang, Soroosh Khoram, and Jing Li. 2018. Efficient Large-Scale Approximate Nearest Neighbor Search on OpenCL FPGA. In Proc. IEEE CVPR .
[48]
Ting Zhang, Chao Du, and Jingdong Wang. 2014. Composite Quantization for Approximate Nearest Neighbor Search. In Proc. ICML .
[49]
Ting Zhang, Guo-Jun Qi, Jinhui Tang, and Jingdong Wang. 2015. Sparse Composite Quantization. In Proc. IEEE CVPR .

Cited By

View all
  • (2024)Subset Retrieval Nearest Neighbor Machine TranslationJournal of Natural Language Processing10.5715/jnlp.31.37431:2(374-406)Online publication date: 2024
  • (2024)iRangeGraph: Improvising Range-dedicated Graphs for Range-filtering Nearest Neighbor SearchProceedings of the ACM on Management of Data10.1145/36988142:6(1-26)Online publication date: 20-Dec-2024
  • (2024)SeRF: Segment Graph for Range-Filtering Approximate Nearest Neighbor SearchProceedings of the ACM on Management of Data10.1145/36393242:1(1-26)Online publication date: 26-Mar-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MM '18: Proceedings of the 26th ACM international conference on Multimedia
October 2018
2167 pages
ISBN:9781450356657
DOI:10.1145/3240508
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 October 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. approximate nearest neighbor search
  2. inverted index
  3. product quantization
  4. reconfigure
  5. subset search

Qualifiers

  • Research-article

Funding Sources

  • JST ACT-I

Conference

MM '18
Sponsor:
MM '18: ACM Multimedia Conference
October 22 - 26, 2018
Seoul, Republic of Korea

Acceptance Rates

MM '18 Paper Acceptance Rate 209 of 757 submissions, 28%;
Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)179
  • Downloads (Last 6 weeks)25
Reflects downloads up to 16 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Subset Retrieval Nearest Neighbor Machine TranslationJournal of Natural Language Processing10.5715/jnlp.31.37431:2(374-406)Online publication date: 2024
  • (2024)iRangeGraph: Improvising Range-dedicated Graphs for Range-filtering Nearest Neighbor SearchProceedings of the ACM on Management of Data10.1145/36988142:6(1-26)Online publication date: 20-Dec-2024
  • (2024)SeRF: Segment Graph for Range-Filtering Approximate Nearest Neighbor SearchProceedings of the ACM on Management of Data10.1145/36393242:1(1-26)Online publication date: 26-Mar-2024
  • (2024)FEDS-ICLInformation Processing and Management: an International Journal10.1016/j.ipm.2024.10382561:5Online publication date: 1-Sep-2024
  • (2023)Subset Retrieval Nearest Neighbor Machine TranslationSubset Retrieval Nearest Neighbor Machine TranslationJournal of Natural Language Processing10.5715/jnlp.30.124530:4(1245-1250)Online publication date: 2023
  • (2023)ARKGraph: All-Range Approximate K-Nearest-Neighbor GraphProceedings of the VLDB Endowment10.14778/3603581.360360116:10(2645-2658)Online publication date: 1-Jun-2023
  • (2021)Hierarchical quantization for billion-scale similarity retrieval on GPUsComputers & Electrical Engineering10.1016/j.compeleceng.2021.10700290(107002)Online publication date: Mar-2021

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media