research-article

Neuro.ZERO: a zero-energy neural network accelerator for embedded sensing and inference systems

Authors:

Shahriar NirjonAuthors Info & Claims

SenSys '19: Proceedings of the 17th Conference on Embedded Networked Sensor Systems

Pages 138 - 152

https://doi.org/10.1145/3356250.3360030

Published: 10 November 2019 Publication History

Abstract

We introduce Neuro.ZERO---a co-processor architecture consisting of a main microcontroller (MCU) that executes scaled-down versions of a deep neural network¹ (DNN) inference task, and an accelerator microcontroller that is powered by harvested energy and follows the intermittent computing paradigm [76]. The goal of the accelerator is to enhance the inference performance of the DNN that is running on the main microcontroller. Neuro.ZERO opportunistically accelerates the run-time performance of a DNN via one of its four acceleration modes: extended inference, expedited inference, ensemble inference, and latent training. To enable these modes, we propose two sets of algorithms: (1) energy and intermittence-aware DNN inference and training algorithms, and (2) a fast and high-precision adaptive fixed-point arithmetic that beats existing floating-point and fixed-point arithmetic in terms of speed and precision, respectively, and achieves the best of both. To evaluate Neuro.ZERO, we implement low-power image and audio recognition applications and demonstrate that their inference speedup increases by 1.6× and 1.7×, respectively, and the inference accuracy increases by 10% and 16%, respectively, when compared to battery-powered single-MCU systems.

References

[1]

National Highway Traffic Safety Administration. 2013. Traffic Safety Facts. https://crashstats.nhtsa.dot.gov/Api/Public/ViewPublication/812124. (2013).

[2]

SF Anderson, JG Earle, R Et Goldschmidt, and DM Powers. 1967. The IBM system/360 model 91: floating-point execution unit. IBM Journal of research and development 11, 1 (1967), 34--53.

[3]

Sajid Anwar, Kyuyeon Hwang, and Wonyong Sung. 2015. Fixed point optimization of deep convolutional neural networks for object recognition. In Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on. IEEE, 1131--1135.

[4]

Apple. 2017. Neural Engine. https://www.apple.com/iphone-xs/a12-bionic/. (2017).

[5]

Arm. 2013. big.LITTLE technology. https://www.arm.com/files/pdf/bigLITTLETechnologytheFutueofMobile.pdf. (2013).

[6]

Nii O Attoh-Okine. 1999. Analysis of learning rate and momentum term in backpropagation neural network algorithm trained to predict pavement performance. Advances in Engineering Software 30, 4 (1999), 291--302.

Digital Library

[7]

David Audet, Leandro Collares De Oliveira, Neil MacMillan, Dimitri Marinakis, and Kui Wu. 2011. Scheduling recurring tasks in energy harvesting sensors. In 2011 IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS). IEEE, 277--282.

[8]

Abdulrahman Baknina and Sennur Ulukus. 2017. Online scheduling for energy harvesting channels with processing costs. IEEE Transactions on Green Communications and Networking 1, 3 (2017), 281--293.

[9]

Yoshua Bengio. 2012. Practical recommendations for gradient-based training of deep architectures. In Neural networks: Tricks of the trade. Springer, 437--478.

Digital Library

[10]

Sourav Bhattacharya and Nicholas D Lane. 2016. Sparsification and separation of deep learning layers for constrained resource inference on wearables. In Proceedings of the 14th ACM Conference on Embedded Network Sensor Systems CD-ROM. ACM, 176--189.

Digital Library

[11]

Dudley Allen Buck. 1952. Ferroelectrics for Digital Information Storage and Switching. Technical Report. MASSACHUSETTS INST OF TECH CAMBRIDGE DIGITAL COMPUTER LAB.

[12]

Michael Buettner, Ben Greenstein, and David Wetherall. 2011. Dewdrop: an energy-aware runtime for computational RFID. In Proc. USENIX NSDI. 197--210.

Digital Library

[13]

Erik Cambria and Bebo White. 2014. Jumping NLP curves: A review of natural language processing research. IEEE Computational intelligence magazine 9, 2 (2014), 48--57.

[14]

Lukas Cavigelli, Michele Magno, and Luca Benini. 2015. Accelerating real-time embedded scene labeling with convolutional networks. In Proceedings of the 52nd Annual Design Automation Conference. ACM, 108.

Digital Library

[15]

Jagmohan Chauhan, Suranga Seneviratne, Yining Hu, Archan Misra, Aruna Seneviratne, and Youngki Lee. 2018. Breathing-Based Authentication on Resource-Constrained IoT Devices using Recurrent Neural Networks. Computer 51, 5 (2018), 60--67.

[16]

Liang-Bi Chen, Wan-Jung Chang, Jian-Ping Su, Ji-Yi Ciou, Yi-Jhan Ciou, Cheng-Chin Kuo, and Katherine Shu-Min Li. 2016. A wearable-glasses-based drowsiness-fatigue-detection system for improving road safety. In 2016 IEEE 5th Global Conference on Consumer Electronics. IEEE, 1--2.

[17]

Tianshi Chen, Zidong Du, Ninghui Sun, Jia Wang, Chengyong Wu, Yunji Chen, and Olivier Temam. 2014. Diannao: A small-footprint high-throughput accelerator for ubiquitous machine-learning. ACM Sigplan Notices 49, 4 (2014), 269--284.

Digital Library

[18]

Maryline Chetto and Hussein El Ghor. 2019. Scheduling and power management in energy harvesting computing systems with real-time constraints. Journal of Systems Architecture (2019).

[19]

Alexei Colin and Brandon Lucia. 2016. Chain: tasks and channels for reliable intermittent programs. ACM SIGPLAN Notices 51, 10 (2016), 514--530.

Digital Library

[20]

Alexei Colin, Emily Ruppel, and Brandon Lucia. 2018. A Reconfigurable Energy Storage Architecture for Energy-harvesting Devices. In Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems. ACM, 767--781.

Digital Library

[21]

Matthieu Courbariaux, Yoshua Bengio, and Jean-Pierre David. 2014. Training deep neural networks with low precision multiplications. arXiv preprint arXiv:1412.7024 (2014).

[22]

Matthieu Courbariaux, Yoshua Bengio, and Jean-Pierre David. 2015. Binaryconnect: Training deep neural networks with binary weights during propagations. In Advances in neural information processing systems. 3123--3131.

Digital Library

[23]

George Cybenko. 1989. Approximation by superpositions of a sigmoidal function. Mathematics of control, signals and systems 2, 4 (1989), 303--314.

[24]

Cypress. 2017. CY15B104Q. http://www.cypress.com/file/209146/download. (2017).

[25]

Nicolaas Govert de Bruijn. 1975. Acknowledgement of priority to C. Flye Sainte-Marie on the counting of circular arrangements of 2n zeros and ones that show each n-letter word exactly once. Department of Mathematics, Technological University.

[26]

Li Deng. 2014. A tutorial survey of architectures, algorithms, and applications for deep learning. APSIPA Transactions on Signal and Information Processing 3 (2014).

[27]

Li Deng, Geoffrey Hinton, and Brian Kingsbury. 2013. New types of deep neural network learning for speech recognition and related applications: An overview. In Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on. IEEE, 8599--8603.

[28]

Sorin Draghici. 2002. On the capabilities of neural networks using limited precision weights. Neural networks 15, 3 (2002), 395--414.

[29]

Aysegul Dundar, Jonghoon Jin, Berin Martini, and Eugenio Culurciello. 2017. Embedded streaming deep neural networks accelerator with applications. IEEE transactions on neural networks and learning systems 28, 7 (2017), 1572--1583.

[30]

Clément Farabet, Berin Martini, Benoit Corda, Polina Akselrod, Eugenio Culurciello, and Yann LeCun. 2011. Neuflow: A runtime reconfigurable dataflow processor for vision. In Computer Vision and Pattern Recognition Workshops (CVPRW), 2011 IEEE Computer Society Conference on. IEEE, 109--116.

[31]

National Science Foundation. 2019. Real-Time Machine Learning (RTML). https://www.nsf.gov/pubs/2019/nsf19566/nsf19566.htm?WT.mcid=USNSF25&WT.mcev=click. (2019).

[32]

Xavier Glorot and Yoshua Bengio. 2010. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the thirteenth international conference on artificial intelligence and statistics. 249--256.

[33]

Graham Gobieski, Nathan Beckmann, and Brandon Lucia. 2018. Intelligence Beyond the Edge: Inference on Intermittent Embedded Systems. arXiv preprint arXiv:1810.07751 (2018).

[34]

Graham Gobieski, Nathan Beckmann, and Brandon Lucia. 2018. Intermittent Deep Neural Network Inference. SysML (2018).

[35]

Graham Gobieski, Nathan Beckmann, and Brandon Lucia. 2019. Intelligence Beyond the Edge: Inference on Intermittent Embedded Systems. In ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS).

Digital Library

[36]

Vinayak Gokhale, Jonghoon Jin, Aysegul Dundar, Berin Martini, and Eugenio Culurciello. 2014. A 240 g-ops/s mobile coprocessor for deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 682--687.

Digital Library

[37]

Yunchao Gong, Liu Liu, Ming Yang, and Lubomir Bourdev. 2014. Compressing deep convolutional networks using vector quantization. arXiv preprint arXiv:1412.6115 (2014).

[38]

Ian Goodfellow, Yoshua Bengio, Aaron Courville, and Yoshua Bengio. 2016. Deep learning. Vol. 1. MIT press Cambridge.

[39]

Google. 2018. Google Clips. https://store.google.com/us/product/googleclips?hl=en-US. (2018).

[40]

Stefanie Günther, Lars Ruthotto, Jacob B Schroder, EC Cyr, and Nicolas R Gauger. 2018. Layer-parallel training of deep residual neural networks. arXiv preprint arXiv:1812.04352 (2018).

[41]

Suyog Gupta, Ankur Agrawal, Kailash Gopalakrishnan, and Pritish Narayanan. 2015. Deep learning with limited numerical precision. In International Conference on Machine Learning. 1737--1746.

Digital Library

[42]

Song Han, Xingyu Liu, Huizi Mao, Jing Pu, Ardavan Pedram, Mark A Horowitz, and William J Dally. 2016. EIE: efficient inference engine on compressed deep neural network. In Computer Architecture (ISCA), 2016 ACM/IEEE 43rd Annual International Symposium on. IEEE, 243--254.

Digital Library

[43]

Song Han, Huizi Mao, and William J Dally. 2015. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149 (2015).

[44]

Boris Hanin. 2017. Universal function approximation by deep neural nets with bounded width and relu activations. arXiv preprint arXiv:1708.02691 (2017).

[45]

Lars Kai Hansen and Peter Salamon. 1990. Neural network ensembles. IEEE Transactions on Pattern Analysis & Machine Intelligence 10 (1990), 993--1001.

Digital Library

[46]

Andrew Hard, Kanishka Rao, Rajiv Mathews, Françoise Beaufays, Sean Augenstein, Hubert Eichner, Chloé Kiddon, and Daniel Ramage. 2018. Federated learning for mobile keyboard prediction. arXiv preprint arXiv:1811.03604 (2018).

[47]

Douglas M Hawkins. 2004. The problem of overfitting. Journal of chemical information and computer sciences 44, 1 (2004), 1--12.

[48]

Jibo He, William Choi, Yan Yang, Junshi Lu, Xiaohui Wu, and Kaiping Peng. 2017. Detection of driver drowsiness using wearable devices: A feasibility study of the proximity sensor. Applied ergonomics 65 (2017), 473--480.

[49]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.

[50]

Josiah Hester, Lanny Sitanayah, and Jacob Sorber. 2015. Tragedy of the coulombs: Federating energy storage for tiny, intermittently-powered sensors. In Proceedings of the 13th ACM Conference on Embedded Networked Sensor Systems. ACM, 5--16.

Digital Library

[51]

Josiah Hester, Kevin Storer, and Jacob Sorber. 2017. Timely Execution on Intermi ently Powered Ba eryless Sensors. (2017).

[52]

Embedded Intelligence Lab (UNC Chapel Hill). 2019. Neuro.ZERO open source project. https://github.com/learning1234embed/Neuro.ZERO. (2019).

[53]

Sepp Hochreiter, Yoshua Bengio, Paolo Frasconi, and Jürgen Schmidhuber. 2001. Gradient flow in recurrent nets: the difficulty of learning long-term dependencies. (2001).

[54]

Jordan L Holi and J-N Hwang. 1993. Finite precision error analysis of neural network hardware implementations. IEEE Trans. Comput. 3 (1993), 281--290.

Digital Library

[55]

Kurt Hornik. 1991. Approximation capabilities of multilayer feedforward networks. Neural networks 4, 2 (1991), 251--257.

[56]

Itay Hubara, Matthieu Courbariaux, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. 2017. Quantized neural networks: Training neural networks with low precision weights and activations. The Journal of Machine Learning Research 18, 1 (2017), 6869--6898.

Digital Library

[57]

Kyuyeon Hwang and Wonyong Sung. 2014. Fixed-point feedforward deep neural network design using weights+ 1, 0, and- 1. In Signal Processing Systems (SiPS), 2014 IEEE Workshop on. IEEE, 1--6.

[58]

Forrest N Iandola, Song Han, Matthew W Moskewicz, Khalid Ashraf, William J Dally, and Kurt Keutzer. 2016. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and < 0.5 MB model size. arXiv preprint arXiv:1602.07360 (2016).

[59]

Shubham Kamdar and Neha Kamdar. 2015. big. LITTLE architecture: Heterogeneous multicore processing. International Journal of Computer Applications 119, 1 (2015).

[60]

Jack Kiefer and Jacob Wolfowitz. 1952. Stochastic estimation of the maximum of a regression function. The Annals of Mathematical Statistics 23, 3 (1952), 462--466.

[61]

Jonghong Kim, Kyuyeon Hwang, and Wonyong Sung. 2014. X1000 real-time phoneme recognition VLSI using feed-forward deep neural networks. In Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on. IEEE, 7510--7514.

[62]

Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).

[63]

Alex Krizhevsky and Geoffrey Hinton. 2009. Learning multiple layers of features from tiny images. Technical Report. Citeseer.

[64]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems. 1097--1105.

[65]

Anders Krogh and Jesper Vedelsby. 1995. Neural network ensembles, cross validation, and active learning. In Advances in neural information processing systems. 231--238.

[66]

Nicholas D Lane, Sourav Bhattacharya, Akhil Mathur, Petko Georgiev, Claudio Forlivesi, and Fahim Kawsar. 2017. Squeezing deep learning into mobile and embedded devices. IEEE Pervasive Computing 3 (2017), 82--88.

[67]

Steve Lawrence, C Lee Giles, and Ah Chung Tsoi. 1997. Lessons in neural network training: Overfitting may be harder than expected. In AAAI/IAAI. Citeseer, 540--545.

[68]

Steve Lawrence, C Lee Giles, and Ah Chung Tsoi. 1998. What size neural network gives optimal generalization? Convergence properties of backpropagation. Technical Report.

[69]

Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep learning. nature 521, 7553 (2015), 436.

[70]

Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. Gradient-based learning applied to document recognition. Proc. IEEE 86, 11 (1998), 2278--2324.

[71]

Dong-Hyun Lee. 2013. Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks. In Workshop on Challenges in Representation Learning, ICML, Vol. 3. 2.

[72]

Mu Li, Tong Zhang, Yuqiang Chen, and Alexander J Smola. 2014. Efficient mini-batch training for stochastic optimization. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 661--670.

Digital Library

[73]

Zhouhan Lin, Matthieu Courbariaux, Roland Memisevic, and Yoshua Bengio. 2015. Neural networks with few multiplications. arXiv preprint arXiv:1510.03009 (2015).

[74]

Hong Lu, AJ Bernheim Brush, Bodhi Priyantha, Amy K Karlson, and Jie Liu. 2011. Speakersense: Energy efficient unobtrusive speaker identification on mobile phones. In International conference on pervasive computing. Springer, 188--205.

[75]

Zhou Lu, Hongming Pu, Feicheng Wang, Zhiqiang Hu, and Liwei Wang. 2017. The expressive power of neural networks: A view from the width. In Advances in Neural Information Processing Systems. 6231--6239.

[76]

Brandon Lucia, Vignesh Balaji, Alexei Colin, Kiwan Maeng, and Emily Ruppel. 2017. Intermittent Computing: Challenges and Opportunities. In LIPIcs-Leibniz International Proceedings in Informatics, Vol. 71. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik.

[77]

Brandon Lucia and Benjamin Ransford. 2015. A simpler, safer programming and execution model for intermittent systems. ACM SIGPLAN Notices 50, 6 (2015), 575--585.

Digital Library

[78]

Yubo Luo and Shahriar Nirjon. 2019. SpotON: Just-in-Time Active Event Detection on Energy Autonomous Sensing Systems. In Proceedings of the 25th IEEE RealTime and Embedded Technology and Applications Symposium (RTAS WIP Session). IEEE, Montreal, Canada.

[79]

Kiwan Maeng, Alexei Colin, and Brandon Lucia. 2017. Alpaca: intermittent execution without checkpoints. Proceedings of the ACM on Programming Languages 1, OOPSLA (2017), 96.

Digital Library

[80]

Franco Manessi, Alessandro Rozza, Simone Bianco, Paolo Napoletano, and Raimondo Schettini. 2018. Automated pruning for deep neural network compression. In 2018 24th International Conference on Pattern Recognition (ICPR). IEEE, 657--664.

[81]

Dominic Masters and Carlo Luschi. 2018. Revisiting small batch training for deep neural networks. arXiv preprint arXiv:1804.07612 (2018).

[82]

Paul A Merolla, John V Arthur, Rodrigo Alvarez-Icaza, Andrew S Cassidy, Jun Sawada, Filipp Akopyan, Bryan L Jackson, Nabil Imam, Chen Guo, and Yutaka Nakamura. 2014. A million spiking-neuron integrated circuit with a scalable communication network and interface. Science 345, 6197 (2014), 668--673.

[83]

Milad Mohammadi and Subhasis Das. 2016. SNN: stacked neural networks. arXiv preprint arXiv:1605.08512 (2016).

[84]

Clemens Moser, Davide Brunelli, Lothar Thiele, and Luca Benini. 2007. Real-time scheduling for energy harvesting sensor nodes. Real-Time Systems 37, 3 (2007), 233--260.

Digital Library

[85]

Saman Naderiparizi, Pengyu Zhang, Matthai Philipose, Bodhi Priyantha, Jie Liu, and Deepak Ganesan. 2017. Glimpse: A programmable early-discard camera architecture for continuous mobile vision. In Proceedings of the 15th Annual International Conference on Mobile Systems, Applications, and Services. ACM, 292--305.

Digital Library

[86]

Yuval Netzer, Tao Wang, Adam Coates, Alessandro Bissacco, Bo Wu, and Andrew Y Ng. 2011. Reading digits in natural images with unsupervised feature learning. (2011).

[87]

Shahriar Nirjon. 2018. Lifelong Learning on Harvested Energy. In Proceedings of the 16th Annual International Conference on Mobile Systems, Applications, and Services. ACM, 500--501.

Digital Library

[88]

Erick L Oberstar. 2007. Fixed-point representation & fractional math. Oberstar Consulting (2007), 9.

[89]

Sinno Jialin Pan and Qiang Yang. 2009. A survey on transfer learning. IEEE Transactions on knowledge and data engineering 22, 10 (2009), 1345--1359.

Digital Library

[90]

Mohammad Peikari, Sherine Salama, Sharon Nofech-Mozes, and Anne L Martel. 2018. A cluster-then-label semi-supervised learning approach for pathology image classification. Scientific reports 8, 1 (2018), 7193.

[91]

Powercast. 2016. Powercast p2110b. http://www.powercastco.com/wp-content/uploads/2016/12/P2110B-Datasheet-Rev-3.pdf. (2016).

[92]

Powercast. 2016. Powercaster transmitter. http://www.powercastco.com/wp-content/uploads/2016/11/User-Manual-TX-915-01-Rev-A-4.pdf. (2016).

[93]

Jiantao Qiu, Jie Wang, Song Yao, Kaiyuan Guo, Boxun Li, Erjin Zhou, Jincheng Yu, Tianqi Tang, Ningyi Xu, and Sen Song. 2016. Going deeper with embedded fpga platform for convolutional neural network. In Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. ACM, 26--35.

Digital Library

[94]

Qualcomm. 2017. Snapdragon 845 Mobile Platform. https://www.qualcomm.com/media/documents/files/snapdragon-845-mobile-platform-product-brief.pdf. (2017).

[95]

Qualcomm. 2018. Qualcomm Snapdragon 820E Processor (APQ8096SGE). https://developer.qualcomm.com/download/sd820e/qualcomm-snapdragon-820e-processor-apq8096sge-device-specification.pdf. (2018).

[96]

Mohammad Rastegari, Vicente Ordonez, Joseph Redmon, and Ali Farhadi. 2016. Xnor-net: Imagenet classification using binary convolutional neural networks. In European Conference on Computer Vision. Springer, 525--542.

[97]

Herbert Robbins and Sutton Monro. 1985. A stochastic approximation method. In Herbert Robbins Selected Papers. Springer, 102--109.

[98]

Mathieu ROUAUD. 2012. Probabilités, statistiques et analyses multicritères. (2012).

[99]

David E Rumelhart, Geoffrey E Hinton, and Ronald J Williams. 1988. Learning representations by back-propagating errors. Cognitive modeling 5, 3 (1988), 1.

[100]

Tara Sainath and Carolina Parada. 2015. Convolutional neural networks for small-footprint keyword spotting. (2015).

[101]

Jürgen Schmidhuber. 2015. Deep learning in neural networks: An overview. Neural networks 61 (2015), 85--117.

[102]

Florian Schroff, Dmitry Kalenichenko, and James Philbin. 2015. Facenet: A unified embedding for face recognition and clustering. In Proceedings of the IEEE conference on computer vision and pattern recognition. 815--823.

[103]

Khurram Shahzad and Bengt Oelmann. 2014. Investigating Energy Consumption of an SRAM-based FPGA for Duty-Cycle Applications. In International Conference on Parallel Computing-ParCo 2013, 10-13 Sept, Munich. 548--559.

[104]

Rajiv Ranjan Singh. 2007. Preventing Road Accidents with Wearable Biosensors and Innovative Architectural Design. In 2nd ISSS National Conference on MEMS, Pilani, India. 1--8.

[105]

Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: a simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research 15, 1 (2014), 1929--1958.

Digital Library

[106]

Johannes Stallkamp, Marc Schlipsing, Jan Salmen, and Christian Igel. 2011. The German Traffic Sign Recognition Benchmark: A multi-class classification competition. In IEEE International Joint Conference on Neural Networks. 1453--1460.

[107]

TexasInstruments. 2018. MSP430FR5994. http://www.ti.com/product/MSP430FR5994. (2018).

[108]

Lisa Torrey and Jude Shavlik. 2010. Transfer learning. In Handbook of research on machine learning applications and trends: algorithms, methods, and techniques. IGI Global, 242--264.

[109]

Varuna Tyagi and Anju Mishra. 2014. A survey on ensemble combination schemes of neural network. International Journal of Computer Applications 95, 16 (2014).

[110]

James Victor Uspensky. 1937. Introduction to mathematical probability. (1937).

[111]

Vincent Vanhoucke, Andrew Senior, and Mark Z Mao. 2011. Improving the speed of neural networks on CPUs. In Proc. Deep Learning and Unsupervised Feature Learning NIPS Workshop, Vol. 1. Citeseer, 4.

[112]

Chao Wang, Qi Yu, Lei Gong, Xi Li, Yuan Xie, and Xuehai Zhou. 2016. DLAU: A scalable deep learning accelerator unit on FPGA. arXiv preprint arXiv:1605.06894 (2016).

[113]

Lipo Wang, Hou Chai Quek, Keng Hoe Tee, Nina Zhou, and Chunru Wan. 2005. Optimal size of a feedforward neural network: How much does it matter?. In Joint International Conference on Autonomic and Autonomous Systems and International Conference on Networking and Services-(icas-isns' 05). IEEE, 69--69.

[114]

Yue Wang, Tan Nguyen, Yang Zhao, Zhangyang Wang, Yingyan Lin, and Richard Baraniuk. 2018. EnergyNet: Energy-Efficient Dynamic Inference. (2018).

[115]

P. Warden. 2018. Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition. ArXiv e-prints (April 2018). arXiv:cs.CL/1804.03209 https://arxiv.org/abs/1804.03209

[116]

Alan R Weiss. 2002. Dhrystone benchmark: History, analysis, scores and recommendations. (2002).

[117]

Paul J Werbos. 1990. Backpropagation through time: what it does and how to do it. Proc. IEEE 78, 10 (1990), 1550--1560.

[118]

Han Xiao, Kashif Rasul, and Roland Vollgraf. 2017. Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms. (2017). arXiv:cs.LG/1708.07747

[119]

Xilinx. 2011. Spartan-6 Family Overview. https://www.xilinx.com/support/documentation/datasheets/ds160.pdf. (2011).

[120]

Shuochao Yao, Yiran Zhao, Aston Zhang, Shaohan Hu, Huajie Shao, Chao Zhang, Lu Su, and Tarek Abdelzaher. 2018. Deep Learning for the Internet of Things. Computer 51, 5 (2018), 32--41.

[121]

Tom Young, Devamanyu Hazarika, Soujanya Poria, and Erik Cambria. 2018. Recent trends in deep learning based natural language processing. ieee Computational intelligenCe magazine 13, 3 (2018), 55--75.

[122]

Chen Zhang, Peng Li, Guangyu Sun, Yijin Guan, Bingjun Xiao, and Jason Cong. 2015. Optimizing fpga-based accelerator design for deep convolutional neural networks. In Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. ACM, 161--170.

Digital Library

[123]

Aojun Zhou, Anbang Yao, Kuan Wang, and Yurong Chen. 2018. Explicit Loss-Error-Aware Quantization for Low-Bit Deep Neural Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 9426--9435.

[124]

Xiaojin Zhu and Andrew B Goldberg. 2009. Introduction to semi-supervised learning. Synthesis lectures on artificial intelligence and machine learning 3, 1 (2009), 1--130.

Cited By

Custode LFarina PYildiz EKilic RYildirim KIacca GShu YLiu JTan RHe YChen J(2024)Fast-Inf: Ultra-Fast Embedded Intelligence on the Batteryless EdgeProceedings of the 22nd ACM Conference on Embedded Networked Sensor Systems10.1145/3666025.3699335(239-252)Online publication date: 4-Nov-2024
https://dl.acm.org/doi/10.1145/3666025.3699335
Akhunov KYldrm K(2024)CRAM-Based Acceleration for Intermittent Computing of Parallelizable TasksIEEE Transactions on Emerging Topics in Computing10.1109/TETC.2023.329342612:1(48-59)Online publication date: Jan-2024
https://doi.org/10.1109/TETC.2023.3293426
Zhang YGao YXu JZhao GShi LKong L(2024)Unsupervised Joint Domain Adaptation for Decoding Brain Cognitive States From tfMRI ImagesIEEE Journal of Biomedical and Health Informatics10.1109/JBHI.2023.334813028:3(1494-1503)Online publication date: Mar-2024
https://doi.org/10.1109/JBHI.2023.3348130
Show More Cited By

Index Terms

Neuro.ZERO: a zero-energy neural network accelerator for embedded sensing and inference systems
1. Computer systems organization
  1. Architectures
    1. Other architectures
      1. Neural networks
2. Hardware
  1. Integrated circuits
    1. Reconfigurable logic and FPGAs
      1. Hardware accelerators
  2. Power and energy

Recommendations

BitSET: Bit-Serial Early Termination for Computation Reduction in Convolutional Neural Networks
Special Issue ESWEEK 2023
Convolutional Neural Networks (CNNs) have demonstrated remarkable performance across a wide range of machine learning tasks. However, the high accuracy usually comes at the cost of substantial computation and energy consumption, making it difficult to be ...
The Energy Harvesting Mode Abstraction
SenSys '18: Proceedings of the 16th ACM Conference on Embedded Networked Sensor Systems

We propose a new abstraction for understanding energy harvesting behaviors in the wild, especially how these behaviors impact energy constrained and battery-free sensors. The Energy Harvesting Mode abstraction explores ways to make sense of energy ...
Can FPGAs Beat GPUs in Accelerating Next-Generation Deep Neural Networks?
FPGA '17: Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

Current-generation Deep Neural Networks (DNNs), such as AlexNet and VGG, rely heavily on dense floating-point matrix multiplication (GEMM), which maps well to GPUs (regular parallelism, high TFLOP/s). Because of this, GPUs are widely used for ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SenSys '19: Proceedings of the 17th Conference on Embedded Networked Sensor Systems

November 2019

472 pages

ISBN:9781450369503

DOI:10.1145/3356250

General Chairs:
Raghu K. Ganti
IBM T.J. Watson
,
Xiaofan (Fred) Jiang
Columbia University
,
Program Chairs:
Gian Pietro Picco
University of Trento, Italy
,
Xia Zhou
Dartmouth College

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 November 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

SenSys '19

Sponsor:

SenSys '19: The 17th ACM Conference on Embedded Networked Sensor Systems

November 10 - 13, 2019

New York, New York

Acceptance Rates

Overall Acceptance Rate 174 of 867 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

31
Total Citations
View Citations
1,222
Total Downloads

Downloads (Last 12 months)146
Downloads (Last 6 weeks)24

Reflects downloads up to 10 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Custode LFarina PYildiz EKilic RYildirim KIacca GShu YLiu JTan RHe YChen J(2024)Fast-Inf: Ultra-Fast Embedded Intelligence on the Batteryless EdgeProceedings of the 22nd ACM Conference on Embedded Networked Sensor Systems10.1145/3666025.3699335(239-252)Online publication date: 4-Nov-2024
https://dl.acm.org/doi/10.1145/3666025.3699335
Akhunov KYldrm K(2024)CRAM-Based Acceleration for Intermittent Computing of Parallelizable TasksIEEE Transactions on Emerging Topics in Computing10.1109/TETC.2023.329342612:1(48-59)Online publication date: Jan-2024
https://doi.org/10.1109/TETC.2023.3293426
Zhang YGao YXu JZhao GShi LKong L(2024)Unsupervised Joint Domain Adaptation for Decoding Brain Cognitive States From tfMRI ImagesIEEE Journal of Biomedical and Health Informatics10.1109/JBHI.2023.334813028:3(1494-1503)Online publication date: Mar-2024
https://doi.org/10.1109/JBHI.2023.3348130
Kim YLim YLim C(2024)LACTJournal of Systems Architecture: the EUROMICRO Journal10.1016/j.sysarc.2024.103213153:COnline publication date: 1-Aug-2024
https://dl.acm.org/doi/10.1016/j.sysarc.2024.103213
Khan OPark GSeo E(2023)DaCapo: An On-Device Learning Scheme for Memory-Constrained Embedded SystemsACM Transactions on Embedded Computing Systems10.1145/360912122:5s(1-23)Online publication date: 9-Sep-2023
https://dl.acm.org/doi/10.1145/3609121
Caronti LAkhunov KNardello MYıldırım KBrunelli D(2023)Fine-grained Hardware Acceleration for Efficient Batteryless Intermittent Inference on the EdgeACM Transactions on Embedded Computing Systems10.1145/360847522:5(1-19)Online publication date: 10-Jul-2023
https://dl.acm.org/doi/10.1145/3608475
Monjur MLuo YWang ZNirjon SHui PAmiri Sani ANurmi PLiu Y(2023)SoundSieve: Seconds-Long Audio Event Recognition on Intermittently-Powered SystemsProceedings of the 21st Annual International Conference on Mobile Systems, Applications and Services10.1145/3581791.3596859(28-41)Online publication date: 18-Jun-2023
https://dl.acm.org/doi/10.1145/3581791.3596859
Narayanan VSahu RSun JDuwe HIenne PZhang Z(2023)BOBBER A Prototyping Platform for Batteryless Intermittent AcceleratorsProceedings of the 2023 ACM/SIGDA International Symposium on Field Programmable Gate Arrays10.1145/3543622.3573046(221-228)Online publication date: 12-Feb-2023
https://dl.acm.org/doi/10.1145/3543622.3573046
Bakar AGoel Rde Winkel JHuang JAhmed SIslam BPawełczak PYıldırım KHester JGummeson JLee SGao JXing G(2022)ProteanProceedings of the 20th ACM Conference on Embedded Networked Sensor Systems10.1145/3560905.3568561(207-221)Online publication date: 6-Nov-2022
https://dl.acm.org/doi/10.1145/3560905.3568561
Bakar ARahman TShafik RKawsar FMontanari AGummeson JLee SGao JXing G(2022)Adaptive Intelligence for Batteryless Sensors Using Software-Accelerated Tsetlin MachinesProceedings of the 20th ACM Conference on Embedded Networked Sensor Systems10.1145/3560905.3568512(236-249)Online publication date: 6-Nov-2022
https://dl.acm.org/doi/10.1145/3560905.3568512
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents