Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3299874.3317988acmconferencesArticle/Chapter ViewAbstractPublication PagesglsvlsiConference Proceedingsconference-collections
research-article

Efficient Softmax Hardware Architecture for Deep Neural Networks

Published: 13 May 2019 Publication History

Abstract

Deep neural network (DNN) has become a pivotal machine learning and object recognition technology in the big data era. The softmax layer is one of the key component layers for completing multi-classification tasks. However, the softmax layer contains complex exponential and division operations, resulting in low accuracy and long critical paths in hardware accelerator design. In order to solve the above issues, we present a softmax hardware architecture with proper accuracy, good trade-off and strong expansibility. We summarize the classification rules of neural network and balance the calculation accuracy between resource consumption. On this basis, we proposed an exponential calculation unit based on the group lookup table, and improve a natural logarithmic calculation unit based on the Maclaurin series and the data preprocessing scheme matching them. The experimental results show that the softmax hardware architecture proposed in this paper can achieve the calculation accuracy of 3 decimal fraction and the classification accuracy of $99.01%$. Theoretically, it can accomplish the classification task of infinite categories.

References

[1]
Kota Ando. 2018. BRein Memory: A Single-Chip Binary/Ternary Reconfigurable in-Memory Deep Neural Network Accelerator Achieving 1.4 TOPS at 0.6 W. IEEE Journal of Solid-State Circuits, Vol. 53, 4 (April 2018), 983--994.
[2]
Ebru Arisoy. 2014. Converting Neural Network Language Models into back-off language models for efficient decoding in automatic speech recognition. IEEE Transactions on Audio, Speech and Language Processing, Vol. 22, 1 (Jan. 2014), 184--192.
[3]
S.M. Aroutchelvame. 2005. An efficient algorithm and architecture for natural logarithm using Maclaurin series. IEEE International Conference on Electronics, Circuits and Systems (Dec. 2005).
[4]
Tomá Brabec. 2006. Hardware Implementation of Continued Logarithm Arithmetic. GAMM-IMACS International Symposium on Scientific Computing, Computer Arithmetic and Validated Numerics (Sept. 2006).
[5]
Luis Camunas-Mesa. 2011. A 32 x 32 Pixel Convolution Processor Chip for Address Event Vision Sensors With 155 ns Event Latency and 20 Meps Throughput. IEEE Transactions on Circuits and Systems I: Regular Papers, Vol. 58, 4 (April 2011), 777--790.
[6]
Siyi Chen. 2016a. Image classification with stacked restricted boltzmann machines and evolutionary function array classification voter. IEEE Congress on Evolutionary Computation (July 2016), 4599--4606.
[7]
Tianshi Chen. 2014. DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning. International Conference on Architectural Support for Programming Languages and Operating Systems (March 2014), 269--283.
[8]
Yunji Chen. 2015. DaDianNao: A Machine-Learning Supercomputer. Annual IEEE/ACM International Symposium on Microarchitecture (Jan. 2015), 609--622.
[9]
Yu-Hsin Chen. 2016b. Eyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks. International Symposium on Computer Architecture (Aug. 2016), 367--379.
[10]
Javier Hormigo. 2000. A Hardware Algorithm for Variable-Precision Logarithm. IEEE International Conference on Application-Specific Systems, Architectures, and Processors (July 2000), 215--224.
[11]
Norman P. Jouppi. 2017. In-Datacenter Performance Analysis of a Tensor Processing Unit. Annual International Symposium on Computer Architecture (June 2017), 1--12.
[12]
Sung Su Kim. 2004. Hardware implementation of a real time neural network controller with a DSP and an FPGA. Proceedings - IEEE International Conference on Robotics and Automation (April 2004), 4639--4644.
[13]
Alex Krizhevsky. 2012. ImageNet classification with deep convolutional neural networks. Annual Conference on Neural Information Processing Systems (Dec. 2012), 1097--1105.
[14]
Yann Lecun. 1998. Gradient-based learning applied to document recognition. Proc. IEEE, Vol. 86, 11 (Nov. 1998), 2278--2323.
[15]
Xi Li. 2016. DeepSaliency: Multi-Task Deep Neural Network Model for Salient Object Detection. IEEE Transactions on Image Processing, Vol. 25, 1 (Aug. 2016), 3919--3930.
[16]
Kaiwen Lin. 2017. Design and optimization of exponential function based on Taylor expansion. Application Research of computers (Oct. 2017).
[17]
Peter Nilsson. 2015. Hardware implementation of the exponential function using Taylor series. NORCHIP Conference (Jan. 2015).
[18]
Markos Papadonikolakis. {n. d.}. Novel cascade FPGA accelerator for support vector machines classification. IEEE Transactions on Neural Networks and Learning Systems, Vol. 23, 7 ({n. d.}).
[19]
Peyman Pouyan. 2011. A VLSI implementation of logarithmic and exponential functions using a novel parabolic synthesis methodology compared to the CORDIC algorithm. European Conference on Circuit Theory and Design (Aug. 2011), 709--712.
[20]
Michael Price. 2017. A scalable speech recognizer with deep-neural-network acoustic models and voice-activated power gating. IEEE International Solid-State Circuits Conference, Vol. 60 (March 2017), 244--245.
[21]
Mudhar Bin Rabieah. 2015. FPGA based nonlinear Support Vector Machine training using an ensemble learning. International Conference on Field Programmable Logic and Applications (Oct. 2015).
[22]
Shouyi Yin. 2017. A 1.06-to-5.09 TOPS/W reconfigurable hybrid-neural-network processor for deep learning applications. Symposium on VLSI Circuits (Aug. 2017), C26--C27.
[23]
Bo Yuan. 2017. Efficient hardware architecture of softmax layer in deep neural network. IEEE International System on Chip Conference (April 2017), 323--326.

Cited By

View all
  • (2024)1D-CNN-Transformer for Radar Emitter Identification and Implemented on FPGARemote Sensing10.3390/rs1616296216:16(2962)Online publication date: 12-Aug-2024
  • (2024)NOVA: NoC-based Vector Unit for Mapping Attention Layers on a CNN Accelerator2024 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE58400.2024.10546727(1-6)Online publication date: 25-Mar-2024
  • (2024)SimA: Simple Softmax-free Attention for Vision Transformers2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)10.1109/WACV57701.2024.00259(2595-2605)Online publication date: 3-Jan-2024
  • Show More Cited By

Index Terms

  1. Efficient Softmax Hardware Architecture for Deep Neural Networks

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    GLSVLSI '19: Proceedings of the 2019 Great Lakes Symposium on VLSI
    May 2019
    562 pages
    ISBN:9781450362528
    DOI:10.1145/3299874
    Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 13 May 2019

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. architecture
    2. dnn
    3. softmax
    4. vlsi

    Qualifiers

    • Research-article

    Conference

    GLSVLSI '19
    Sponsor:
    GLSVLSI '19: Great Lakes Symposium on VLSI 2019
    May 9 - 11, 2019
    VA, Tysons Corner, USA

    Acceptance Rates

    Overall Acceptance Rate 312 of 1,156 submissions, 27%

    Upcoming Conference

    GLSVLSI '25
    Great Lakes Symposium on VLSI 2025
    June 30 - July 2, 2025
    New Orleans , LA , USA

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)285
    • Downloads (Last 6 weeks)27
    Reflects downloads up to 23 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)1D-CNN-Transformer for Radar Emitter Identification and Implemented on FPGARemote Sensing10.3390/rs1616296216:16(2962)Online publication date: 12-Aug-2024
    • (2024)NOVA: NoC-based Vector Unit for Mapping Attention Layers on a CNN Accelerator2024 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE58400.2024.10546727(1-6)Online publication date: 25-Mar-2024
    • (2024)SimA: Simple Softmax-free Attention for Vision Transformers2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)10.1109/WACV57701.2024.00259(2595-2605)Online publication date: 3-Jan-2024
    • (2024)SoftAct: A High-Precision Softmax Architecture for Transformers Supporting Nonlinear FunctionsIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2024.338677934:9(8912-8923)Online publication date: Sep-2024
    • (2024)Hardware-Efficient SoftMax Architecture With Bit-Wise Exponentiation and Reciprocal CalculationIEEE Transactions on Circuits and Systems I: Regular Papers10.1109/TCSI.2024.344327071:10(4574-4585)Online publication date: Oct-2024
    • (2024)A Hardware-Friendly Alternative to Softmax Function and Its Efficient VLSI Implementation for Deep Learning Applications2024 IEEE International Symposium on Circuits and Systems (ISCAS)10.1109/ISCAS58744.2024.10558086(1-5)Online publication date: 19-May-2024
    • (2024)A High-Performance Approximate Softmax Architecture2024 6th International Conference on Circuits and Systems (ICCS)10.1109/ICCS62517.2024.10846441(32-36)Online publication date: 20-Sep-2024
    • (2024)Activation Function Integration for Accelerating Multi-Layer Graph Convolutional Neural Networks2024 IEEE 17th Dallas Circuits and Systems Conference (DCAS)10.1109/DCAS61159.2024.10539892(1-6)Online publication date: 19-Apr-2024
    • (2024)FPGA Implementation of Efficient Softmax Architecture for Deep Neural NetworksEmerging Electronics and Automation10.1007/978-981-99-6855-8_47(617-623)Online publication date: 3-Feb-2024
    • (2023)Approximate Softmax Functions for Energy-Efficient Deep Neural NetworksIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2022.322401131:1(4-16)Online publication date: Jan-2023
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media