Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Public Access

Towards on-node Machine Learning for Ultra-low-power Sensors Using Asynchronous Σ Δ Streams

Published: 26 August 2020 Publication History
  • Get Citation Alerts
  • Abstract

    We propose a novel architecture to enable low-power, complex on-node data processing, for the next generation of sensors for the internet of things (IoT), smartdust, or edge intelligence. Our architecture combines near-analog-memory-computing (NAM) and asynchronous-computing-with-streams (ACS), eliminating the need for ADCs. ACS enables ultra-low power, massive computational resources required to execute on-node complex Machine Learning (ML) algorithms; while NAM addresses the memory-wall that represents a common bottleneck for ML and other complex functions. In ACS an analog value is mapped to an asynchronous stream that can take one of two logic levels (vh, vl). This stream-based data representation enables area/power-efficient computing units such as a multiplier implemented as an AND gate yielding savings in power of ∼90% compared to digital approaches. The generation of streams for NAM and ACS in a brute force manner, using analog-to-digital-converters (ADCs) and digital-to-streams-converters, would sky-rocket the power-latency-energy cost making the approach impractical. Our NAM-ACS architecture eliminates expensive conversions, enabling an end-to-end processing on asynchronous streams data-path. We tailor the NAM-ACS architecture for random forest (RaF), an ML algorithm, chosen for its ability to classify using a reduced number of features. Simulations show that our NAM-ACS architecture enables 75% of savings in power compared with a single ADC, obtaining a classification accuracy of 85% using an RaF-inspired algorithm.

    References

    [1]
    A. Alaghi et al. 2013. Stochastic circuits for real-time image-processing applications. In Proceedings of the 50th ACM/EDAC/IEEE Design Automation Conference (DAC’13). ACM/EDAC/IEEE, 1--6.
    [2]
    F. Merrikh Bayat, Xinjie Guo, H. A. Om’Mani, N. Do, Konstantin K. Likharev, and Dmitri B. Strukov. 2015. Redesigning commercial floating-gate memory for analog computing applications. In Proceedings of the IEEE International Symposium on Circuits and Systems (ISCAS’15). IEEE, 1921--1924.
    [3]
    BP. 2019. Brain Power. Retrieved from http://www.brain-power.com/.
    [4]
    Leo Breiman. 1996. Bagging predictors. Mach. Learn. 24, 2 (1996), 123--140.
    [5]
    Leo Breiman. 2001. Random forests. Mach. Learn. 45, 1 (2001), 5--32.
    [6]
    Irina Burciu, Thomas Martinetz, and Erhardt Barth. 2017. Sensing forest for pattern recognition. In Advanced Concepts for Intelligent Vision Systems, Jacques Blanc-Talon, Rudi Penne, Wilfried Philips, Dan Popescu, and Paul Scheunders (Eds.). Springer International Publishing, Cham, 126--137.
    [7]
    Y. Chae et al. 2011. A 2.1 M Pixels, 120 Frame/s CMOS image sensor with column-parallel-delta-sigma-ADC architecture. IEEE J. Solid-state Circ. 46, 1 (Jan. 2011), 236--247.
    [8]
    Jerome H. Friedman. 2001. Greedy function approximation: A gradient boosting machine. Ann. Stat. 29, 5 (2001), 1189--1232.
    [9]
    B. R. Gaines. 1967. Stochastic computing. In Proceedings of the AFIPS’67 Spring Joint Computer Conference. ACM Press, 149.
    [10]
    Patricia Gonzalez-Guerrero, Xinfei Guo, and Mircea R. Stan. 2018. SC-SD: Towards low power stochastic computing using sigma delta streams. In Proceedings of the IEEE International Conference on Rebooting Computing (ICRC’18). IEEE, 1--8.
    [11]
    Patricia Gonzalez-Guerrero, Xinfei Guo, and Mircea R. Stan. 2019. ASC-FFT: Area-efficient low-latency FFT design based on asynchronous stochastic computing. In Proceedings of the Latin American Symposium on Circuits and Systems (LASCAS’19). IEEE, 117--120.
    [12]
    Patricia Gonzalez-Guerrero, Stephen G. Wilson, and Mircea R. Stan. 2019. Error-latency trade-off for asynchronous stochastic computing with Σ Δ streams for the IoT. In Proceedings of the 32nd IEEE International System-on-Chip Conference. IEEE.
    [13]
    Paul R. Gray, Paul Hurst, Robert G. Meyer, and Stephen Lewis. 2001. Analysis and Design of Analog Integrated Circuits. Wiley.
    [14]
    M. Hu et al. 2016. Dot-product engine for neuromorphic computing: Programming 1T1M crossbar to accelerate matrix-vector multiplication. In Proceedings of the 53rd ACM/EDAC/IEEE Design Automation Conference (DAC’16). ACM/EDAC/IEEE, 1--6.
    [15]
    Matthew Jerry, Pai-Yu Chen, Jianchi Zhang, Pankaj Sharma, Kai Ni, Shimeng Yu, and Suman Datta. 2017. Ferroelectric FET analog synapse for acceleration of deep neural network training. In Proceedings of the IEEE International Electron Devices Meeting (IEDM’17). IEEE, 1--6.
    [16]
    Raj Johri, Ravindra Singh Kushwah, Raghvendra Singh, and Shyam Akashe. 2013. Modeling and simulation of high speed 8T SRAM cell. In Proceedings of the 7th International Conference on Bio-inspired Computing: Theories and Applications (BIC-TA’12). Springer, 245--251.
    [17]
    Norman P. Jouppi et al. 2017. In-datacenter performance analysis of a tensor processing unit. In Proceedings of the 44th Annual International Symposium on Computer Architecture (ISCA’17). ACM, 1--12.
    [18]
    Y. Kang, W. Huang, S. Yoo, D. Keen, Z. Ge, V. Lam, P. Pattnaik, and J. Torrellas. 2012. FlexRAM: Toward an advanced intelligent memory system. In Proceedings of the IEEE 30th International Conference on Computer Design (ICCD’12). IEEE, 5--14.
    [19]
    Dariusz Kościelnik and Marek Miśkowicz. 2008. Asynchronous sigma-delta analog-to digital converter based on the charge pump integrator. Analog Integ. Circ. Sig. Proc. 55, 3 (6 2008), 223--238.
    [20]
    L. Kull et al. 2017. A 10b 1.5GS/s pipelined-SAR ADC with background second-stage common-mode regulation and offset calibration in 14nm CMOS FinFET. In Proceedings of the IEEE International Solid-state Circuits Conference (ISSCC’17). IEEE, 474--475.
    [21]
    M. Le Gallo, A. Sebastian, G. Cherubini, H. Giefers, and E. Eleftheriou. 2017. Compressed sensing recovery using computational memory. In Proceedings of the IEEE International Electron Devices Meeting (IEDM’17). IEEE, 28.3.1--28.3.4.
    [22]
    Yann LeCun and Corinna Cortes. 2010. MNIST handwritten digit database. Retrieved from http://yann.lecun.com/exdb/mnist/.
    [23]
    H. Li, K. Ota, and M. Dong. 2018. Learning IoT in edge: Deep learning for the internet of things with edge computing. IEEE Netw. 32, 1 (Jan. 2018), 96--101.
    [24]
    S. Li, A. O. Glova, X. Hu, P. Gu, D. Niu, K. T. Malladi, H. Zheng, B. Brennan, and Y. Xie. 2018. SCOPE: A stochastic computing engine for DRAM-based in-situ accelerator. In Proceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’18). IEEE/ACM, 696--709.
    [25]
    Robert LiKamWa et al. 2013. Energy characterization and optimization of image sensing toward continuous mobile vision. In Proceedings of the 11th Annual International Conference on Mobile Systems, Applications, and Services (MobiSys’13). ACM, 69--82.
    [26]
    R. LiKamWa et al. 2016. RedEye: Analog convnet image sensor architecture for continuous mobile vision. In Proceedings of the ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA’16). ACM/IEEE, 255--266.
    [27]
    J. Lu et al. 2015. A 1 TOPS/W analog deep machine-learning engine with floating-gate storage in 0.13 μm CMOS. IEEE J. Solid-state Circ. 50, 1 (Jan. 2015), 270--281.
    [28]
    J. Lu and J. Holleman. 2013. A floating-gate analog memory with bidirectional sigmoid updates in a standard digital process. In Proceedings of the IEEE International Symposium on Circuits and Systems (ISCAS’13). IEEE, 1600--1603.
    [29]
    M. Reza Mahmoodi et al. 2018. An ultra-low energy internally analog, externally digital vector-matrix multiplier based on NOR flash memory technology. In Proceedings of the 55th Annual Design Automation Conference (DAC’18). ACM.
    [30]
    M. Hassan Najafi et al. 2017. Time-encoded values for highly efficient stochastic circuits. IEEE Trans. Very Large Scale Integ. (VLSI) Syst. 25, 5 (5 2017), 1644--1657.
    [31]
    J. Kevin O’Regan. 1992. Solving the “real” mysteries of visual perception: The world as an outside memory.Canad. J. Psychol./Rev. canad. psychol. 46, 3 (1992), 461.
    [32]
    Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, et al. 2011. Scikit-Learn: Machine learning in Python. J. Mach. Learn. Res. 12 (Oct. 2011), 2825--2830.
    [33]
    Weikang Qian et al. 2011. An architecture for fault-tolerant computation with stochastic logic. IEEE Trans. Comput. 60, 1 (1 2011), 93--105.
    [34]
    E. Roza. 1997. Analog-to-digital conversion via duty-cycle modulation. IEEE Trans. Circ. Syst. II: Analog Dig. Sig. Proc. 44, 11 (1997), 907--914.
    [35]
    A. Shafiee et al. 2016. ISAAC: A convolutional neural network accelerator with in-situ analog arithmetic in crossbars. In Proceedings of the ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA’16). IEEE, 14--26.
    [36]
    Tommy Tracy, Yao Fu, Indranil Roy, Eric Jonas, and Paul Glendenning. 2016. Towards machine learning on the automata processor. In Proceedings of the International Conference on High Performance Computing. Springer, 200--218.
    [37]
    Tommy Tracy II. 2019. RFAutomata. Retrieved from https://github.com/tjt7a/ANMLZoo/tree/master/RandomForest/code.
    [38]
    C. H. Van Berkel, M. B. Josephs, and S. M. Nowick. 1999. Applications of asynchronous circuits. Proc. IEEE 87, 2 (Feb. 1999), 223--233.
    [39]
    J. Wadden, T. Tracy, E. Sadredini, L. Wu, C. Bo, J. Du, Y. Wei, J. Udall, M. Wallace, M. Stan, and K. Skadron. 2018. AutomataZoo: A modern automata processing benchmark suite. In Proceedings of the IEEE International Symposium on Workload Characterization (IISWC’18). IEEE, 13--24.
    [40]
    Ran Wang, Jie Han, Bruce Cockburn, and Duncan Elliott. 2015. Design and evaluation of stochastic FIR filters. In Proceedings of the IEEE Pacific Rim Conference on Communications, Computers and Signal Processing (PACRIM’15). IEEE, 407--412.
    [41]
    B. Yuan, Y. Wang, and Z. Wang. 2016. Area-efficient scaling-free DFT/FFT design using stochastic computing. IEEE Trans. Circ. Syst. II: Expr. Briefs 63, 12 (Dec. 2016), 1131--1135.
    [42]
    Mohammed Affan Zidan, Hossam Aly Hassan Fahmy, Muhammad Mustafa Hussain, and Khaled Nabil Salama. 2013. Memristor-based memory: The sneak paths problem and solutions. Microelectron. J. 44, 2 (2013), 176--183.

    Cited By

    View all
    • (2022)Temporal and SFQ pulse-streams encoding for area-efficient superconducting acceleratorsProceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3503222.3507765(963-976)Online publication date: 28-Feb-2022
    • (2022)GearboxProceedings of the 49th Annual International Symposium on Computer Architecture10.1145/3470496.3527402(218-230)Online publication date: 18-Jun-2022
    • (2022)NexusEdge: Leveraging IoT Gateways for a Decentralized Edge Computing Platform2022 IEEE/ACM 7th Symposium on Edge Computing (SEC)10.1109/SEC54971.2022.00014(82-95)Online publication date: Dec-2022
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Journal on Emerging Technologies in Computing Systems
    ACM Journal on Emerging Technologies in Computing Systems  Volume 16, Issue 4
    Special Issue on Nanoelectronic Device, Circuit, Architecture Design, Part 2 and Regular Papers
    October 2020
    202 pages
    ISSN:1550-4832
    EISSN:1550-4840
    DOI:10.1145/3418801
    • Editor:
    • Ramesh Karri
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Journal Family

    Publication History

    Published: 26 August 2020
    Accepted: 01 June 2020
    Revised: 01 May 2020
    Received: 01 November 2019
    Published in JETC Volume 16, Issue 4

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. IoT
    2. Stochastic computing
    3. asynchronous computing
    4. edge intelligence
    5. machine learning
    6. random forest
    7. smartdust

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Funding Sources

    • NSF

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)124
    • Downloads (Last 6 weeks)22
    Reflects downloads up to 26 Jul 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2022)Temporal and SFQ pulse-streams encoding for area-efficient superconducting acceleratorsProceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3503222.3507765(963-976)Online publication date: 28-Feb-2022
    • (2022)GearboxProceedings of the 49th Annual International Symposium on Computer Architecture10.1145/3470496.3527402(218-230)Online publication date: 18-Jun-2022
    • (2022)NexusEdge: Leveraging IoT Gateways for a Decentralized Edge Computing Platform2022 IEEE/ACM 7th Symposium on Edge Computing (SEC)10.1109/SEC54971.2022.00014(82-95)Online publication date: Dec-2022
    • (2021)ATCPiM: Analog to Time Coded Processing in Memory for IoT at the Edge2021 IEEE 7th World Forum on Internet of Things (WF-IoT)10.1109/WF-IoT51360.2021.9595467(704-709)Online publication date: 14-Jun-2021

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Get Access

    Login options

    Full Access

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media