research-article

DeepSearch: A Fast Image Search Framework for Mobile Devices

Authors:

Chaoyang Zhao, and

Jian ChengAuthors Info & Claims

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), Volume 14, Issue 1

Article No.: 6, Pages 1 - 22

https://doi.org/10.1145/3152127

Published: 13 December 2017 Publication History

Abstract

Content-based image retrieval (CBIR) is one of the most important applications of computer vision. In recent years, there have been many important advances in the development of CBIR systems, especially Convolutional Neural Networks (CNNs) and other deep-learning techniques. On the other hand, current CNN-based CBIR systems suffer from high computational complexity of CNNs. This problem becomes more severe as mobile applications become more and more popular. The current practice is to deploy the entire CBIR systems on the server side while the client side only serves as an image provider. This architecture can increase the computational burden on the server side, which needs to process thousands of requests per second. Moreover, sending images have the potential of personal information leakage. As the need of mobile search expands, concerns about privacy are growing. In this article, we propose a fast image search framework, named DeepSearch, which makes complex image search based on CNNs feasible on mobile phones. To implement the huge computation of CNN models, we present a tensor Block Term Decomposition (BTD) approach as well as a nonlinear response reconstruction method to accelerate the CNNs involving in object detection and feature extraction. The extensive experiments on the ImageNet dataset and Alibaba Large-scale Image Search Challenge dataset show that the proposed accelerating approach BTD can significantly speed up the CNN models and further makes CNN-based image search practical on common smart phones.

References

[1]

Artem Babenko, Anton Slesarev, Alexandr Chigorin, and Victor Lempitsky. 2014. Neural codes for image retrieval. In European Conference on Computer Vision. Springer, 584--599.

[2]

Stefano Berretti, Alberto Del Bimbo, and Pietro Pala. 2000. Retrieval by shape similarity with perceptual distance and effective indexing. IEEE Trans. Multimedia 2, 4 (2000), 225--239.

Digital Library

[3]

Ken Chatfield, Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. 2014. Return of the devil in the details: Delving deep into convolutional nets. In British Machine Vision Conference.

[4]

Jian Cheng, Cong Leng, Peng Li, Meng Wang, and Hanqing Lu. 2014. Semi-supervised multi-graph hashing for scalable similarity search. Comput. Vis. Image Understand. 124 (2014), 12--21.

[5]

Jian Cheng, Cong Leng, Jiaxiang Wu, Hainan Cui, and Hanqing Lu. 2014. Fast and accurate image matching with cascade hashing for 3d reconstruction. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1--8.

Digital Library

[6]

Jian Cheng and Kongqiao Wang. 2007. Active learning for image retrieval with Co-SVM. Pattern Recogn. 40, 1 (2007), 330--334.

Digital Library

[7]

Zhiyong Cheng, Daniel Soudry, Zexi Mao, and Zhenzhong Lan. 2015. Training binary multilayer neural networks for image classification using expectation backpropagation. Arxiv:1503.03562 (2015).

[8]

Matthieu Courbariaux and Yoshua Bengio. 2016. Binarynet: Training deep neural networks with weights and activations constrained to+ 1 or-1. Arxiv:1602.02830 (2016).

[9]

Matthieu Courbariaux, Yoshua Bengio, and Jean-Pierre David. 2015. Binaryconnect: Training deep neural networks with binary weights during propagations. In Advances in Neural Information Processing Systems. 3123--3131.

Digital Library

[10]

Lieven De Lathauwer. 2008. Decompositions of a higher-order tensor in block terms-Part I: Lemmas for partitioned matrices. SIAM J. Matrix Anal. Appl. 30, 3 (2008), 1022--1032.

Digital Library

[11]

Lieven De Lathauwer. 2008. Decompositions of a higher-order tensor in block terms-part II: definitions and uniqueness. SIAM J. Matrix Anal. Appl. 30, 3 (2008), 1033--1066.

Digital Library

[12]

Lieven De Lathauwer, Bart De Moor, and Joos Vandewalle. 2000. On the best rank-1 and rank-(r 1, r 2,..., rn) approximation of higher-order tensors. SIAM J. Matrix Anal. Appl. 21, 4 (2000), 1324--1342.

Digital Library

[13]

Lieven De Lathauwer and Dimitri Nion. 2008. Decompositions of a higher-order tensor in block terms-part III: Alternating least squares algorithms. SIAM J. Matrix Anal. Appl. 30, 3 (2008), 1067--1083.

Digital Library

[14]

Misha Denil, Babak Shakibi, Laurent Dinh, Nando de Freitas, and others. 2013. Predicting parameters in deep learning. In Advances in Neural Information Processing Systems. 2148--2156.

Digital Library

[15]

Emily L. Denton, Wojciech Zaremba, Joan Bruna, Yann LeCun, and Rob Fergus. 2014. Exploiting linear structure within convolutional networks for efficient evaluation. In Advances in Neural Information Processing Systems. 1269--1277.

Digital Library

[16]

Zhiwei Fang, Jing Liu, Yuhang Wang, Yong Li, Song Hang, Jinhui Tang, and Hanqing Lu. 2016. Object-aware deep network for commodity image retrieval. In Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval. ACM, 405--408.

Digital Library

[17]

Yunchao Gong, Svetlana Lazebnik, Albert Gordo, and Florent Perronnin. 2013. Iterative quantization: A procrustean approach to learning binary codes for large-scale image retrieval. IEEE Trans. Pattern Anal. Mach. Intell. 35, 12 (2013), 2916--2929.

Digital Library

[18]

John C. Gower and Garmt B. Dijksterhuis. 2004. Procrustes Problems. Number 30. Oxford University Press on Demand.

[19]

Suyog Gupta, Ankur Agrawal, Kailash Gopalakrishnan, and Pritish Narayanan. 2015. Deep learning with limited numerical precision. In Proceedings of the 32nd International Conference on Machine Learning (ICML–15). 1737–1746.

Digital Library

[20]

Song Han, Jeff Pool, John Tran, and William Dally. 2015. Learning both weights and connections for efficient neural network. In Advances in Neural Information Processing Systems. 1135--1143.

Digital Library

[21]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16).

[22]

Eva Hörster and Rainer Lienhart. 2008. Deep networks for image retrieval on large-scale databases. In Proceedings of the 16th ACM International Conference on Multimedia. ACM, 643--646.

Digital Library

[23]

Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. 2017. Mobilenets: Efficient convolutional neural networks for mobile vision applications. Arxiv:1704.04861 (2017).

[24]

Itay Hubara, Matthieu Courbariaux, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. 2016. Binarized neural networks. In Advances in Neural Information Processing Systems. 4107–4115.

Digital Library

[25]

Kyuyeon Hwang and Wonyong Sung. 2014. Fixed-point feedforward deep neural network design using weights+ 1, 0, and- 1. In Proceedings of the 2014 IEEE Workshop on Signal Processing Systems (SiPS). IEEE, 1--6.

[26]

Piotr Indyk and Rajeev Motwani. 1998. Approximate nearest neighbors: Towards removing the curse of dimensionality. In Proceedings of the T30th Annual ACM Symposium on Theory of Computing. ACM, 604--613.

Digital Library

[27]

Max Jaderberg, Andrea Vedaldi, and Andrew Zisserman. 2014. Speeding up convolutional neural networks with low rank expansions. In British Machine Vision Conference.

[28]

Anil K. Jain and Aditya Vailaya. 1996. Image retrieval using color and shape. Pattern Recogn. 29, 8 (1996), 1233--1244.

[29]

Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. 2014. Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the ACM International Conference on Multimedia. ACM, 675--678.

Digital Library

[30]

Minje Kim and Paris Smaragdis. 2016. Bitwise neural networks. Arxiv:1601.06071 (2016).

[31]

Yong-Deok Kim, Eunhyeok Park, Sungjoo Yoo, Taelim Choi, Lu Yang, and Dongjun Shin. 2015. Compression of deep convolutional neural networks for fast and low power mobile applications. Arxiv:1511.06530 (2015).

[32]

Tamara G. Kolda and Brett W. Bader. 2009. Tensor decompositions and applications. SIAM Rev. 51, 3 (2009), 455--500.

Digital Library

[33]

Alex Krizhevsky and Geoffrey E. Hinton. 2011. Using very deep autoencoders for content-based image retrieval. In Proceedings of the European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN’11).

[34]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems. 1097--1105.

Digital Library

[35]

Hanjiang Lai, Yan Pan, Ye Liu, and Shuicheng Yan. 2015. Simultaneous feature learning and hash coding with deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3270--3278.

[36]

Vadim Lebedev, Yaroslav Ganin, Maksim Rakhuba, Ivan Oseledets, and Victor Lempitsky. 2014. Speeding-up convolutional neural networks using fine-tuned cp-decomposition. Arxiv:1412.6553 (2014).

[37]

Vadim Lebedev and Victor Lempitsky. 2016. Fast convnets using group-wise brain damage. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2554–2564.

[38]

Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep learning. Nature 521, 7553 (2015), 436--444.

[39]

Darryl D. Lin, Sachin S. Talathi, and V. Sreekanth Annapureddy. 2015. Fixed point quantization of deep convolutional networks. Arxiv:1511.06393 (2015).

Digital Library

[40]

Baoyuan Liu, Min Wang, Hassan Foroosh, Marshall Tappen, and Marianna Pensky. 2015. Sparse convolutional neural networks. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15). 806--814.

[41]

Jonathan Long, Evan Shelhamer, and Trevor Darrell. 2015. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3431--3440.

[42]

Bangalore S. Manjunath and Wei-Ying Ma. 1996. Texture features for browsing and retrieval of image data. IEEE Trans. Pattern Anal. Mach. Intell. 18, 8 (1996), 837--842.

Digital Library

[43]

Michael Mathieu, Mikael Henaff, and Yann LeCun. 2013. Fast training of convolutional networks through FFTs. Arxiv:1312.5851 (2013).

[44]

Alexander Novikov, Dmitrii Podoprikhin, Anton Osokin, and Dmitry P. Vetrov. 2015. Tensorizing neural networks. In Advances in Neural Information Processing Systems. 442--450.

Digital Library

[45]

Jiantao Qiu, Jie Wang, Song Yao, Kaiyuan Guo, Boxun Li, Erjin Zhou, Jincheng Yu, Tianqi Tang, Ningyi Xu, Sen Song, and others. 2016. Going deeper with embedded fpga platform for convolutional neural network. In Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. ACM, 26--35.

Digital Library

[46]

Mohammad Rastegari, Vicente Ordonez, Joseph Redmon, and Ali Farhadi. 2016. XNOR-Net: ImageNet classification using binary convolutional neural networks. In ECCV (4), Vol. 9908. Springer, 525--542.

[47]

Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster R-CNN: Towards real-time object detection with region proposal networks. In Advances in Neural Information Processing Systems. 91--99.

Digital Library

[48]

Adriana Romero, Nicolas Ballas, Samira Ebrahimi Kahou, Antoine Chassang, Carlo Gatta, and Yoshua Bengio. 2014. Fitnets: Hints for thin deep nets. Arxiv:1412.6550 (2014).

[49]

Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, and others. 2015. Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115, 3 (2015), 211--252.

Digital Library

[50]

Karen Simonyan and Andrew Zisserman. 2015. Very deep convolutional networks for large-scale image recognition. In International Conference on Learning Representations.

[51]

Cheng Tai, Tong Xiao, Xiaogang Wang, and others. 2015. Convolutional neural networks with low-rank regularization. Arxiv:1511.06067 (2015).

[52]

Yoshio Takane and Sunho Jung. 2006. Generalized constrained redundancy analysis. Behaviormetrika 33, 2 (2006), 179--192.

[53]

Ji Wan, Dayong Wang, Steven Chu Hong Hoi, Pengcheng Wu, Jianke Zhu, Yongdong Zhang, and Jintao Li. 2014. Deep learning for content-based image retrieval: A comprehensive study. In Proceedings of the 22nd ACM International Conference on Multimedia. ACM, 157--166.

Digital Library

[54]

Peisong Wang and Jian Cheng. 2016. Accelerating convolutional neural networks for mobile applications. In Proceedings of the 2016 ACM on Multimedia Conference. ACM, 541--545.

Digital Library

[55]

Peisong Wang and Jian Cheng. 2017. Fixed-point factorized networks. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17).

[56]

Yair Weiss, Antonio Torralba, and Rob Fergus. 2009. Spectral hashing. In Advances in Neural Information Processing Systems. 1753--1760.

Digital Library

[57]

Jiaxiang Wu, Cong Leng, Yuhang Wang, Qinghao Hu, and Jian Cheng. 2016. Quantized convolutional neural networks for mobile devices. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16).

[58]

Joe Yue-Hei Ng, Fan Yang, and Larry S. Davis. 2015. Exploiting local features from deep networks for image retrieval. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 53--61.

[59]

Xiangyu Zhang, Xinyu Zhou, Mengxiao Lin, and Jian Sun. 2017. ShuffleNet: An extremely efficient convolutional neural network for mobile devices. Arxiv:1707.01083 (2017).

[60]

Xiangyu Zhang, Jianhua Zou, Kaiming He, and Jian Sun. 2015. Accelerating very deep convolutional networks for classification and detection. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI’15).

Digital Library

[61]

Aojun Zhou, Anbang Yao, Yiwen Guo, Lin Xu, and Yurong Chen. 2017. Incremental network quantization: Towards lossless cnns with low-precision weights. Arxiv:1702.03044 (2017).

Cited By

Shuvo MIslam SCheng JMorshed B(2023)Efficient Acceleration of Deep Learning Inference on Resource-Constrained Edge Devices: A ReviewProceedings of the IEEE10.1109/JPROC.2022.3226481111:1(42-91)Online publication date: Jan-2023
https://doi.org/10.1109/JPROC.2022.3226481
Xue PLu YChang JWei XWei Z(2023)IRNet: information restriction and information recovery for accurate binary neural networksNeural Computing and Applications10.1007/s00521-023-08495-z35:19(14449-14464)Online publication date: 1-Jul-2023
https://dl.acm.org/doi/10.1007/s00521-023-08495-z
Zaman KReaz MMd Ali SBakar AChowdhury M(2022)Custom Hardware Architectures for Deep Learning on Portable Devices: A ReviewIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2021.308230433:11(6068-6088)Online publication date: Nov-2022
https://doi.org/10.1109/TNNLS.2021.3082304
Show More Cited By

Index Terms

DeepSearch: A Fast Image Search Framework for Mobile Devices
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
2. Information systems
  1. Information retrieval
    1. Specialized information retrieval
      1. Multimedia and multimodal retrieval
        Image search

Recommendations

Accelerating Convolutional Neural Networks for Mobile Applications
MM '16: Proceedings of the 24th ACM international conference on Multimedia

Convolutional neural networks (CNNs) have achieved remarkable performance in a wide range of computer vision tasks, typically at the cost of massive computational complexity. The low speed of these networks may hinder real-time applications especially ...
Read More
Local bit-plane decoded convolutional neural network features for biomedical image retrieval
Abstract
Biomedical image retrieval is a challenging problem due to the varying contrast and size of structures in the images. The approaches for biomedical image retrieval generally rely on the feature descriptors to characterize the images. The feature ...
Read More
Hyperlayer Bilinear Pooling with application to fine-grained categorization and image retrieval
Highlights
- Systematically study the multi-layer second-order pooling in CNN architectures.
Abstract
With rapid development of deep convolutional neural networks (CNNs), more and more high–accuracy CNN architectures have been proposed for multimedia and vision applications. Among them, Bilinear CNN (B-CNN) performs outer product on ...
Read More

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Multimedia Computing, Communications, and Applications

ACM Transactions on Multimedia Computing, Communications, and Applications Volume 14, Issue 1

February 2018

287 pages

ISSN:1551-6857

EISSN:1551-6865

DOI:10.1145/3173554

Editor:
Alberto Del Bimbo
University of Firenze, Italy

Issue’s Table of Contents

Copyright © 2017 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 December 2017

Accepted: 01 September 2017

Revised: 01 July 2017

Received: 01 March 2017

Published in TOMM Volume 14, Issue 1

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

Jiangsu Key Laboratory of Big Data Analysis Technology
National Natural Science Foundation of China
863 program

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

11
Total Citations
View Citations
469
Total Downloads

Downloads (Last 12 months)13
Downloads (Last 6 weeks)3

Other Metrics

View Author Metrics

Citations

Cited By

Shuvo MIslam SCheng JMorshed B(2023)Efficient Acceleration of Deep Learning Inference on Resource-Constrained Edge Devices: A ReviewProceedings of the IEEE10.1109/JPROC.2022.3226481111:1(42-91)Online publication date: Jan-2023
https://doi.org/10.1109/JPROC.2022.3226481
Xue PLu YChang JWei XWei Z(2023)IRNet: information restriction and information recovery for accurate binary neural networksNeural Computing and Applications10.1007/s00521-023-08495-z35:19(14449-14464)Online publication date: 1-Jul-2023
https://dl.acm.org/doi/10.1007/s00521-023-08495-z
Zaman KReaz MMd Ali SBakar AChowdhury M(2022)Custom Hardware Architectures for Deep Learning on Portable Devices: A ReviewIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2021.308230433:11(6068-6088)Online publication date: Nov-2022
https://doi.org/10.1109/TNNLS.2021.3082304
Park SKim DYou C(2022)Fi-Vi: Large-Area Indoor Localization Scheme Combining ML/DL-Based Wireless Fingerprinting and Visual PositioningIEEE Access10.1109/ACCESS.2022.322681610(127094-127116)Online publication date: 2022
https://doi.org/10.1109/ACCESS.2022.3226816
Xu DLi TLi YSu XTarkoma SJiang TCrowcroft JHui P(2021)Edge Intelligence: Empowering Intelligence to the Edge of NetworkProceedings of the IEEE10.1109/JPROC.2021.3119950109:11(1778-1837)Online publication date: Nov-2021
https://doi.org/10.1109/JPROC.2021.3119950
Zheng YLi XLu X(2019)Unsupervised Learning of Human Action Categories in Still Images with Deep RepresentationsACM Transactions on Multimedia Computing, Communications, and Applications10.1145/336216115:4(1-20)Online publication date: 16-Dec-2019
https://dl.acm.org/doi/10.1145/3362161
Yu DFu JTian XMei T(2019)Multi-source Multi-level Attention Networks for Visual Question AnsweringACM Transactions on Multimedia Computing, Communications, and Applications10.1145/331676715:2s(1-20)Online publication date: 19-Jul-2019
https://dl.acm.org/doi/10.1145/3316767
Xue FLu WYe ZLiu H(2019)JPEG image tampering localization based on normalized gray level co-occurrence matrixMultimedia Tools and Applications10.1007/s11042-018-6611-378:8(9895-9918)Online publication date: 25-May-2019
https://dl.acm.org/doi/10.1007/s11042-018-6611-3
Wang CChattopadhyay S(2018)LAWNProceedings of the 55th Annual Design Automation Conference10.1145/3195970.3196066(1-6)Online publication date: 24-Jun-2018
https://dl.acm.org/doi/10.1145/3195970.3196066
Wang PHu QZhang YZhang CLiu YCheng J(2018)Two-Step Quantization for Low-bit Neural Networks2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition10.1109/CVPR.2018.00460(4376-4384)Online publication date: Jun-2018
https://doi.org/10.1109/CVPR.2018.00460
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents