research-article

Boosting Mobile CNN Inference through Semantic Memory

Authors:

Yunxin Liu, and

Mengwei XuAuthors Info & Claims

MM '21: Proceedings of the 29th ACM International Conference on Multimedia

October 2021

Pages 2362 - 2371

https://doi.org/10.1145/3474085.3475399

Published: 17 October 2021 Publication History

Abstract

Human brains are known to be capable of speeding up visual recognition of repeatedly presented objects through faster memory encoding and accessing procedures on activated neurons. For the first time, we borrow and distill such a capability into a semantic memory design, namely SMTM, to improve on-device CNN inference. SMTM employs a hierarchical memory architecture to leverage the long-tail distribution of objects of interest, and further incorporates several novel techniques to put it into effects: (1) it encodes high-dimensional feature maps into low-dimensional, semantic vectors for low-cost yet accurate cache and lookup; (2) it uses a novel metric in determining the exit timing considering different layers' inherent characteristics; (3) it adaptively adjusts the cache size and semantic vectors to fit the scene dynamics. SMTM is prototyped on commodity CNN engine and runs on both mobile CPU and GPU. Extensive experiments on large-scale datasets and models show that SMTM can significantly speed up the model inference over standard approach (up to 2×) and prior cache designs (up to 1.5x), with acceptable accuracy loss.

Supplementary Material

ZIP File (mfp1346aux.zip)

We provide auxiliary experimental results to show in the attached PDF material.

Download
7.94 MB

References

[1]

2020. FFmpeg: a video processing platform. https://www.ffmpeg.org/.

[2]

2020. ncnn: a high-performance neural network inference framework. https://github.com/Tencent/ncnn.

[3]

2021. General Data Protection Regulation (GDPR). https://gdpr-info.eu/.

[4]

John A Bargh and Tanya L Chartrand. 2000. Studying the mind in the middle: a practical guide to priming and automaticity research. Handbook of research methods in social psychology. Handbook of research methods in social and personality psychology (2000), 253--285.

[5]

Mark Buckler, Philip Bedoukian, Suren Jayasuriya, and Adrian Sampson. 2018. EVA2: Exploiting Temporal Redundancy in Live Computer Vision. In 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA). IEEE, 533--546.

Digital Library

[6]

Kaidi Cao, Colin Wei, Adrien Gaidon, Nikos Arechiga, and Tengyu Ma. 2019. Learning Imbalanced Datasets with Label-Distribution-Aware Margin Loss. In Advances in Neural Information Processing Systems (NeurIPS).

Digital Library

[7]

Shijie Cao, Lingxiao Ma, Wencong Xiao, Chen Zhang, Yunxin Liu, Lintao Zhang, Lanshun Nie, and Zhi Yang. 2019. SeerNet: Predicting Convolutional Neural Network Feature-Map Sparsity Through Low-Bit Quantization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 11216--11225.

[8]

Lukas Cavigelli, Philippe Degen, and Luca Benini. 2017. Cbinfer: Change-based inference for convolutional neural networks on video data. In Proceedings of the 11th International Conference on Distributed Smart Cameras (ICDSC). 1--8.

Digital Library

[9]

Matthieu Courbariaux, Itay Hubara, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. 2016. Binarized neural networks: Training deep neural networks with weights and activations constrained to+ 1 or-1. arXiv preprint arXiv:1602.02830 (2016).

[10]

Frederik Michel Dekking, Cornelis Kraaikamp, Hendrik Paul Lopuhaä, and Ludolf Erwin Meester. 2005. A Modern Introduction to Probability and Statistics: Understanding why and how. Springer Science & Business Media.

[11]

Hermann Ebbinghaus. 2013. Memory: A contribution to experimental psychology. Annals of neurosciences 20, 4 (2013), 155.

[12]

Michael Figurnov, Maxwell D Collins, Yukun Zhu, Li Zhang, Jonathan Huang, Dmitry Vetrov, and Ruslan Salakhutdinov. 2017. Spatially adaptive computation time for residual networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1039--1048.

[13]

Michael S Gazzaniga, Richard B Ivry, and GR Mangun. 2006. Cognitive Neuroscience. The biology of the mind, (2014).

[14]

Peizhen Guo, Bo Hu, Rui Li, and Wenjun Hu. 2018. FoggyCache: Cross-device approximate computation reuse. In Proceedings of the 24th Annual International Conference on Mobile Computing and Networking (MobiCom). 19--34.

Digital Library

[15]

Song Han, Huizi Mao, and William J Dally. 2016. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. In Proceedings of International Conference on Learning Representations (ICLR).

[16]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 770--778.

[17]

Richard NA Henson. 2003. Neuroimaging studies of priming. Progress in neurobiology 70, 1 (2003), 53--81.

[18]

E Tory Higgins, John A Bargh, and Wendy J Lombardi. 1985. Nature of priming effects on categorization. Journal of experimental psychology: Learning, Memory, and Cognition 11, 1 (1985), 59.

[19]

Steve Hodges, Lyndsay Williams, Emma Berry, Shahram Izadi, James Srinivasan, Alex Butler, Gavin Smyth, Narinder Kapur, and Ken Wood. 2006. SenseCam: A retrospective memory aid. In International conference on ubiquitous computing. Springer, 177--193.

Digital Library

[20]

Andrew Howard, Andrey Zhmoginov, Liang-Chieh Chen, Mark Sandler, and Menglong Zhu. 2018. Inverted residuals and linear bottlenecks: Mobile networks for classification, detection and segmentation. (2018).

[21]

Loc N Huynh, Youngki Lee, and Rajesh Krishna Balan. 2017. Deepmon: Mobile gpu-based deep learning framework for continuous vision applications. In Proceedings of the 15th Annual International Conference on Mobile Systems, Applications, and Services (MobiSys). 82--95.

Digital Library

[22]

Youngsok Kim, Joonsung Kim, Dongju Chae, Daehyun Kim, and Jangwoo Kim. 2019. uLayer: Low Latency On-Device Inference Using Cooperative Single-Layer Acceleration and Processor-Friendly Quantization. In Proceedings of the Fourteenth EuroSys Conference 2019 (EuroSys). 1--15.

Digital Library

[23]

Alex Krizhevsky and Geoffrey Hinton. 2009. Learning multiple layers of features from tiny images. Technical Report. Citeseer.

[24]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems (NeurIPS). 1097--1105.

Digital Library

[25]

Stefanos Laskaridis, Stylianos I Venieris, Mario Almeida, Ilias Leontiadis, and Nicholas D Lane. 2020. SPINN: synergistic progressive inference of neural networks over device and cloud. In Proceedings of the 26th Annual International Conference on Mobile Computing and Networking (MobiCom). 1--15.

Digital Library

[26]

Seulki Lee and Shahriar Nirjon. 2020. Fast and scalable in-memory deep multitask learning via neural weight virtualization. In Proceedings of the 18th International Conference on Mobile Systems, Applications, and Services (MobiSys). 175--190.

Digital Library

[27]

Xiaoxiao Li, Ziwei Liu, Ping Luo, Chen Change Loy, and Xiaoou Tang. 2017. Not all pixels are equal: Difficulty-aware semantic segmentation via deep layer cascade. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). 3193--3202.

[28]

Yun Li, Weiqun Wu, Zechun Liu, Chi Zhang, Xiangyu Zhang, Haotian Yao, and Baoqun Yin. 2020. Weight-Dependent Gates for Differentiable Neural Net- work Pruning. In European Conference on Computer Vision Workshops (ECCVW). Springer, 23--37.

[29]

Min Lin, Qiang Chen, and Shuicheng Yan. 2013. Network in network. arXiv preprint arXiv:1312.4400 (2013).

[30]

Zechun Liu, Haoyuan Mu, Xiangyu Zhang, Zichao Guo, Xin Yang, Kwang-Ting Cheng, and Jian Sun. 2019. Metapruning: Meta learning for automatic neural network channel pruning. In Proceedings of the IEEE international conference on computer vision (ICCV). 3296--3305.

[31]

Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of machine learning research 9, Nov (2008), 2579--2605.

[32]

Akhil Mathur, Nicholas D Lane, Sourav Bhattacharya, Aidan Boran, Claudio Forlivesi, and Fahim Kawsar. 2017. Deepeye: Resource efficient local execution of multiple deep vision models using wearable commodity hardware. In Proceedings of the 15th Annual International Conference on Mobile Systems, Applications, and Services (MobiSys). 68--81.

Digital Library

[33]

Hieu V Nguyen and Li Bai. 2010. Cosine similarity metric learning for face verification. In Asian conference on computer vision (ACCV). Springer, 709--720.

Digital Library

[34]

Wei Niu, Xiaolong Ma, Sheng Lin, Shihao Wang, Xuehai Qian, Xue Lin, Yanzhi Wang, and Bin Ren. 2020. Patdnn: Achieving real-time DNN execution on mobile devices with pattern-based weight pruning. In Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). 907--922.

Digital Library

[35]

Ruslan Salakhutdinov, Antonio Torralba, and Josh Tenenbaum. 2011. Learning to share visual appearance for multiclass object detection. In CVPR 2011. IEEE, 1481--1488.

Digital Library

[36]

Don L Scarborough, Linda Gerard, and Charles Cortese. 1979. Accessing lexical memory: The transfer of word repetition effects across task and modality. Memory & Cognition 7, 1 (1979), 3--12.

[37]

Florian Schroff, Dmitry Kalenichenko, and James Philbin. 2015. Facenet: A unified embedding for face recognition and clustering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 815--823.

[38]

Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).

[39]

Khurram Soomro, Amir Roshan Zamir, and Mubarak Shah. 2012. UCF101: A dataset of 101 human actions classes from videos in the wild. arXiv preprint arXiv:1212.0402 (2012).

[40]

Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). 1--9.

[41]

Surat Teerapittayanon, Bradley McDanel, and Hsiang-Tsung Kung. 2016. Branchynet: Fast inference via early exiting from deep neural networks. In 2016 23rd International Conference on Pattern Recognition (ICPR). IEEE, 2464--2469.

[42]

Xin Wang, Fisher Yu, Zi-Yi Dou, Trevor Darrell, and Joseph E Gonzalez. 2018. Skipnet: Learning dynamic routing in convolutional networks. In Proceedings of the European Conference on Computer Vision (ECCV). 409--424.

Digital Library

[43]

Longhui Wei, Shiliang Zhang, Hantao Yao, Wen Gao, and Qi Tian. 2017. Glad: Global-local-alignment descriptor for pedestrian retrieval. In Proceedings of the 25th ACM international conference on Multimedia (ACM MM). 420--428.

Digital Library

[44]

Evan Weingarten, Qijia Chen, Maxwell McAdams, Jessica Yi, Justin Hepler, and Dolores Albarracín. 2016. From primed concepts to action: A meta-analysis of the behavioral effects of incidentally presented words. Psychological Bulletin 142, 5 (2016), 472.

[45]

Hao Wu, Jinghao Feng, Xuejin Tian, Edward Sun, Yunxin Liu, Bo Dong, Fengyuan Xu, and Sheng Zhong. 2020. EMO: real-time emotion recognition from single-eye images for resource-constrained eyewear devices. In Proceedings of the 18th International Conference on Mobile Systems, Applications, and Services (MobiSys). 448--461.

Digital Library

[46]

Zuxuan Wu, Tushar Nagarajan, Abhishek Kumar, Steven Rennie, Larry S Davis, Kristen Grauman, and Rogerio Feris. 2018. Blockdrop: Dynamic inference paths in residual networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 8817--8826.

[47]

Mengwei Xu, Jiawei Liu, Yuanqiang Liu, Felix Xiaozhu Lin, Yunxin Liu, and Xuanzhe Liu. 2019. A first look at deep learning apps on smartphones. In The World Wide Web Conference. 2125--2136.

Digital Library

[48]

Mengwei Xu, Xiwen Zhang, Yunxin Liu, Gang Huang, Xuanzhe Liu, and Felix Xiaozhu Lin. 2020. Approximate query service on autonomous iot cameras. In Proceedings of the 18th International Conference on Mobile Systems, Applications, and Services (MobiSys). 191--205.

Digital Library

[49]

Mengwei Xu, Mengze Zhu, Yunxin Liu, Felix Xiaozhu Lin, and Xuanzhe Liu. 2018. DeepCache: Principled cache for mobile deep vision. In Proceedings of the 24th Annual International Conference on Mobile Computing and Networking (MobiCom). 129--144.

Digital Library

[50]

Hantao Yao, Shiliang Zhang, Richang Hong, Yongdong Zhang, Changsheng Xu, and Qi Tian. 2019. Deep representation learning with part loss for person re- identification. IEEE Transactions on Image Processing (TIP) 28, 6 (2019), 2860--2871.

[51]

Hyunho Yeo, Chan Ju Chong, Youngmok Jung, Juncheol Ye, and Dongsu Han. 2020. NEMO: enabling neural-enhanced video streaming on commodity mobile devices. In Proceedings of the 26th Annual International Conference on Mobile Computing and Networking (MobiCom). 1--14.

Digital Library

[52]

Juheon Yi, Sunghyun Choi, and Youngki Lee. 2020. EagleEye: wearable camera-based person identification in crowded urban spaces. In Proceedings of the 26th Annual International Conference on Mobile Computing and Networking. 1--14.

Digital Library

[53]

Juheon Yi and Youngki Lee. 2020. Heimdall: mobile GPU coordination platform for augmented reality applications. In Proceedings of the 26th Annual International Conference on Mobile Computing and Networking (MobiCom). 1--14.

Digital Library

[54]

Jerrold H Zar. 1999. Biostatistical analysis. Pearson Education India.

Digital Library

[55]

Xiao Zeng, Kai Cao, and Mi Zhang. 2017. MobileDeepPill: A small-footprint mobile deep learning system for recognizing unconstrained pill images. In Proceedings of the 15th Annual International Conference on Mobile Systems, Applications, and Services. 56--67.

Digital Library

[56]

Chen Zhang, Peng Li, Guangyu Sun, Yijin Guan, Bingjun Xiao, and Jason Cong. 2015. Optimizing fpga-based accelerator design for deep convolutional neural networks. In Proceedings of the 2015 ACM/SIGDA international symposium on field-programmable gate arrays (FPGA). 161--170.

Digital Library

[57]

Chen Zhang, Guangyu Sun, Zhenman Fang, Peipei Zhou, Peichen Pan, and Jason Cong. 2018. Caffeine: Toward uniformed representation and acceleration for deep convolutional neural networks. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 38, 11 (2018), 2072--2085.

Digital Library

[58]

Yu Zhang, Tao Gu, and Xi Zhang. 2020. MDLdroidLite: a release-and-inhibit control approach to resource-efficient deep neural networks on mobile devices. In Proceedings of the 18th Conference on Embedded Networked Sensor Systems (SenSys). 463--475.

Digital Library

[59]

Xiangxin Zhu, Dragomir Anguelov, and Deva Ramanan. 2014. Capturing long-tail distributions of object subcategories. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 915--922.

Digital Library

Cited By

Wu JWang LJin QLiu F(2024)Graft: Efficient Inference Serving for Hybrid Deep Learning With SLO Guarantees via DNN Re-AlignmentIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.334051835:2(280-296)Online publication date: 1-Feb-2024
https://dl.acm.org/doi/10.1109/TPDS.2023.3340518
Li WHacid HAlmazrouei EDebbah M(2023)A Comprehensive Review and a Taxonomy of Edge Machine Learning: Requirements, Paradigms, and TechniquesAI10.3390/ai40300394:3(729-786)Online publication date: 13-Sep-2023
https://doi.org/10.3390/ai4030039
Jiang SHuang TYu BHo T(2023)SNICIT: Accelerating Sparse Neural Network Inference via Compression at Inference Time on GPUProceedings of the 52nd International Conference on Parallel Processing10.1145/3605573.3605625(51-61)Online publication date: 13-Sep-2023
https://doi.org/10.1145/3605573.3605625
Show More Cited By

Index Terms

Boosting Mobile CNN Inference through Semantic Memory
1. Computer systems organization
  1. Embedded and cyber-physical systems
    1. Embedded systems
2. Computing methodologies
  1. Artificial intelligence
    1. Computer vision

Recommendations

Boosting the Performance of Shared Memory Multiprocessors

Shared memory multiprocessors make it practical to convert sequential programs to parallel ones in a variety of applications. An emerging class of shared memory multiprocessors are nonuniform memory access machines with private caches and a cache ...
Read More
Boosting superpage utilization with the shadow memory and the partial-subblock TLB
ICS '00: Proceedings of the 14th international conference on Supercomputing

While superpage is an efficient solution to increase TLB reach, its limited flexibility for address mapping is still a hard issue. Our proposed mechanism has been developed for taking advantage of two previous approaches which resolve the issue ...
Read More
Cache memory design and performance issues in shared-memory multiprocessors
Read More

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '21: Proceedings of the 29th ACM International Conference on Multimedia

October 2021

5796 pages

ISBN:9781450386517

DOI:10.1145/3474085

General Chairs:
Heng Tao Shen
University of Electronic Science&Technology of China, China
,
Yueting Zhuang
Zhejiang University, China
,
John R. Smith
IBM, USA
,
Program Chairs:
Yang Yang
University of Electronic Science and Technology of China, China
,
Pablo Cesar
CWI&TU Delft, The Netherlands
,
Florian Metze
FACEBOOK, Inc., USA
,
Balakrishnan Prabhakaran
University of Texas at Dallas, USA

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 October 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

the Fundamental Research Funds for the Central Universities and National Natural Science Foundation of China

Conference

MM '21

Sponsor:

SIGMM

MM '21: ACM Multimedia Conference

October 20 - 24, 2021

Virtual Event, China

Acceptance Rates

Overall Acceptance Rate 995 of 4,171 submissions, 24%

Upcoming Conference

MM '24

Sponsor:
sigmm

The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne , VIC , Australia

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

7
Total Citations
View Citations
280
Total Downloads

Downloads (Last 12 months)51
Downloads (Last 6 weeks)7

Other Metrics

View Author Metrics

Citations

Cited By

Wu JWang LJin QLiu F(2024)Graft: Efficient Inference Serving for Hybrid Deep Learning With SLO Guarantees via DNN Re-AlignmentIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.334051835:2(280-296)Online publication date: 1-Feb-2024
https://dl.acm.org/doi/10.1109/TPDS.2023.3340518
Li WHacid HAlmazrouei EDebbah M(2023)A Comprehensive Review and a Taxonomy of Edge Machine Learning: Requirements, Paradigms, and TechniquesAI10.3390/ai40300394:3(729-786)Online publication date: 13-Sep-2023
https://doi.org/10.3390/ai4030039
Jiang SHuang TYu BHo T(2023)SNICIT: Accelerating Sparse Neural Network Inference via Compression at Inference Time on GPUProceedings of the 52nd International Conference on Parallel Processing10.1145/3605573.3605625(51-61)Online publication date: 13-Sep-2023
https://doi.org/10.1145/3605573.3605625
Zhang RLi CWu CHuang TSun L(2023)Owl: A Pre-and Post-processing Framework for Video Analytics in Low-light SurroundingsIEEE INFOCOM 2023 - IEEE Conference on Computer Communications10.1109/INFOCOM53939.2023.10229059(1-10)Online publication date: 17-May-2023
https://doi.org/10.1109/INFOCOM53939.2023.10229059
Guo PHu BHu WIves ZBonifati AEl Abbadi A(2022)Sommelier: Curating DNN Models for the MassesProceedings of the 2022 International Conference on Management of Data10.1145/3514221.3526173(1876-1890)Online publication date: 10-Jun-2022
https://dl.acm.org/doi/10.1145/3514221.3526173
Li YLiu ZWu WYao HZhang XZhang CYin B(2022)Weight-Dependent Gates for Network PruningIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2022.317576232:10(6941-6954)Online publication date: Oct-2022
https://doi.org/10.1109/TCSVT.2022.3175762
Smaragdakis YGrech NLagouvardos STriantafyllou KTsatiris I(2021)Symbolic value-flow static analysis: deep, precise, complete modeling of Ethereum smart contractsProceedings of the ACM on Programming Languages10.1145/34855405:OOPSLA(1-30)Online publication date: 15-Oct-2021
https://dl.acm.org/doi/10.1145/3485540

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents