research-article

Efficient Softmax Hardware Architecture for Deep Neural Networks

Authors:

Yiming OuyangAuthors Info & Claims

GLSVLSI '19: Proceedings of the 2019 Great Lakes Symposium on VLSI

Pages 75 - 80

https://doi.org/10.1145/3299874.3317988

Published: 13 May 2019 Publication History

Abstract

Deep neural network (DNN) has become a pivotal machine learning and object recognition technology in the big data era. The softmax layer is one of the key component layers for completing multi-classification tasks. However, the softmax layer contains complex exponential and division operations, resulting in low accuracy and long critical paths in hardware accelerator design. In order to solve the above issues, we present a softmax hardware architecture with proper accuracy, good trade-off and strong expansibility. We summarize the classification rules of neural network and balance the calculation accuracy between resource consumption. On this basis, we proposed an exponential calculation unit based on the group lookup table, and improve a natural logarithmic calculation unit based on the Maclaurin series and the data preprocessing scheme matching them. The experimental results show that the softmax hardware architecture proposed in this paper can achieve the calculation accuracy of 3 decimal fraction and the classification accuracy of $99.01%$. Theoretically, it can accomplish the classification task of infinite categories.

References

[1]

Kota Ando. 2018. BRein Memory: A Single-Chip Binary/Ternary Reconfigurable in-Memory Deep Neural Network Accelerator Achieving 1.4 TOPS at 0.6 W. IEEE Journal of Solid-State Circuits, Vol. 53, 4 (April 2018), 983--994.

[2]

Ebru Arisoy. 2014. Converting Neural Network Language Models into back-off language models for efficient decoding in automatic speech recognition. IEEE Transactions on Audio, Speech and Language Processing, Vol. 22, 1 (Jan. 2014), 184--192.

Digital Library

[3]

S.M. Aroutchelvame. 2005. An efficient algorithm and architecture for natural logarithm using Maclaurin series. IEEE International Conference on Electronics, Circuits and Systems (Dec. 2005).

[4]

Tomá Brabec. 2006. Hardware Implementation of Continued Logarithm Arithmetic. GAMM-IMACS International Symposium on Scientific Computing, Computer Arithmetic and Validated Numerics (Sept. 2006).

Digital Library

[5]

Luis Camunas-Mesa. 2011. A 32 x 32 Pixel Convolution Processor Chip for Address Event Vision Sensors With 155 ns Event Latency and 20 Meps Throughput. IEEE Transactions on Circuits and Systems I: Regular Papers, Vol. 58, 4 (April 2011), 777--790.

[6]

Siyi Chen. 2016a. Image classification with stacked restricted boltzmann machines and evolutionary function array classification voter. IEEE Congress on Evolutionary Computation (July 2016), 4599--4606.

Digital Library

[7]

Tianshi Chen. 2014. DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning. International Conference on Architectural Support for Programming Languages and Operating Systems (March 2014), 269--283.

Digital Library

[8]

Yunji Chen. 2015. DaDianNao: A Machine-Learning Supercomputer. Annual IEEE/ACM International Symposium on Microarchitecture (Jan. 2015), 609--622.

Digital Library

[9]

Yu-Hsin Chen. 2016b. Eyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks. International Symposium on Computer Architecture (Aug. 2016), 367--379.

Digital Library

[10]

Javier Hormigo. 2000. A Hardware Algorithm for Variable-Precision Logarithm. IEEE International Conference on Application-Specific Systems, Architectures, and Processors (July 2000), 215--224.

Digital Library

[11]

Norman P. Jouppi. 2017. In-Datacenter Performance Analysis of a Tensor Processing Unit. Annual International Symposium on Computer Architecture (June 2017), 1--12.

Digital Library

[12]

Sung Su Kim. 2004. Hardware implementation of a real time neural network controller with a DSP and an FPGA. Proceedings - IEEE International Conference on Robotics and Automation (April 2004), 4639--4644.

[13]

Alex Krizhevsky. 2012. ImageNet classification with deep convolutional neural networks. Annual Conference on Neural Information Processing Systems (Dec. 2012), 1097--1105.

Digital Library

[14]

Yann Lecun. 1998. Gradient-based learning applied to document recognition. Proc. IEEE, Vol. 86, 11 (Nov. 1998), 2278--2323.

[15]

Xi Li. 2016. DeepSaliency: Multi-Task Deep Neural Network Model for Salient Object Detection. IEEE Transactions on Image Processing, Vol. 25, 1 (Aug. 2016), 3919--3930.

[16]

Kaiwen Lin. 2017. Design and optimization of exponential function based on Taylor expansion. Application Research of computers (Oct. 2017).

[17]

Peter Nilsson. 2015. Hardware implementation of the exponential function using Taylor series. NORCHIP Conference (Jan. 2015).

[18]

Markos Papadonikolakis. {n. d.}. Novel cascade FPGA accelerator for support vector machines classification. IEEE Transactions on Neural Networks and Learning Systems, Vol. 23, 7 ({n. d.}).

[19]

Peyman Pouyan. 2011. A VLSI implementation of logarithmic and exponential functions using a novel parabolic synthesis methodology compared to the CORDIC algorithm. European Conference on Circuit Theory and Design (Aug. 2011), 709--712.

[20]

Michael Price. 2017. A scalable speech recognizer with deep-neural-network acoustic models and voice-activated power gating. IEEE International Solid-State Circuits Conference, Vol. 60 (March 2017), 244--245.

[21]

Mudhar Bin Rabieah. 2015. FPGA based nonlinear Support Vector Machine training using an ensemble learning. International Conference on Field Programmable Logic and Applications (Oct. 2015).

[22]

Shouyi Yin. 2017. A 1.06-to-5.09 TOPS/W reconfigurable hybrid-neural-network processor for deep learning applications. Symposium on VLSI Circuits (Aug. 2017), C26--C27.

[23]

Bo Yuan. 2017. Efficient hardware architecture of softmax layer in deep neural network. IEEE International System on Chip Conference (April 2017), 323--326.

Cited By

Gao XWu BLi PJing Z(2024)1D-CNN-Transformer for Radar Emitter Identification and Implemented on FPGARemote Sensing10.3390/rs1616296216:16(2962)Online publication date: 12-Aug-2024
https://doi.org/10.3390/rs16162962
Upadhyay MJuneja RWong WPeh L(2024)NOVA: NoC-based Vector Unit for Mapping Attention Layers on a CNN Accelerator2024 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE58400.2024.10546727(1-6)Online publication date: 25-Mar-2024
https://doi.org/10.23919/DATE58400.2024.10546727
Koohpayegani SPirsiavash H(2024)SimA: Simple Softmax-free Attention for Vision Transformers2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)10.1109/WACV57701.2024.00259(2595-2605)Online publication date: 3-Jan-2024
https://doi.org/10.1109/WACV57701.2024.00259
Show More Cited By

Index Terms

Efficient Softmax Hardware Architecture for Deep Neural Networks
1. Hardware
  1. Integrated circuits
    1. Reconfigurable logic and FPGAs

Recommendations

Hardware-Aware Softmax Approximation for Deep Neural Networks
Computer Vision – ACCV 2018
Abstract
There has been a rapid development of custom hardware for accelerating the inference speed of deep neural networks (DNNs), by explicitly incorporating hardware metrics (e.g., area and energy) as additional constraints, in addition to application ...
Sentiment analysis for Chinese microblog based on deep neural networks with convolutional extension features

Related research for sentiment analysis on Chinese microblog is aiming at the analysis procedure of posts. The length of short microblog text limits feature extraction of microblog. Tweeting is the process of communication with friends, so that ...
Quantized deep neural networks for energy efficient hardware-based inference
ASPDAC '18: Proceedings of the 23rd Asia and South Pacific Design Automation Conference

Deep Neural Networks (DNNs) have been adopted in many systems because of their higher classification accuracy, with custom hardware implementations great candidates for high-speed, accurate inference. While progress in achieving large scale, highly ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

GLSVLSI '19: Proceedings of the 2019 Great Lakes Symposium on VLSI

May 2019

562 pages

ISBN:9781450362528

DOI:10.1145/3299874

General Chairs:
Houman Homayoun
George Mason University, USA
,
Baris Taskin
Drexel University, USA
,
Program Chairs:
Tinoosh Mohsenin
UMBC, USA
,
Weisheng Zhao
Beihang University, China

Copyright © 2019 ACM.

Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

Sponsors

SIGDA: ACM Special Interest Group on Design Automation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 May 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

GLSVLSI '19

Sponsor:

SIGDA

GLSVLSI '19: Great Lakes Symposium on VLSI 2019

May 9 - 11, 2019

VA, Tysons Corner, USA

Acceptance Rates

Overall Acceptance Rate 312 of 1,156 submissions, 27%

Upcoming Conference

GLSVLSI '25

Sponsor:
sigda

Great Lakes Symposium on VLSI 2025

June 30 - July 2, 2025

New Orleans , LA , USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

36
Total Citations
View Citations
1,353
Total Downloads

Downloads (Last 12 months)285
Downloads (Last 6 weeks)27

Reflects downloads up to 23 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Gao XWu BLi PJing Z(2024)1D-CNN-Transformer for Radar Emitter Identification and Implemented on FPGARemote Sensing10.3390/rs1616296216:16(2962)Online publication date: 12-Aug-2024
https://doi.org/10.3390/rs16162962
Upadhyay MJuneja RWong WPeh L(2024)NOVA: NoC-based Vector Unit for Mapping Attention Layers on a CNN Accelerator2024 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE58400.2024.10546727(1-6)Online publication date: 25-Mar-2024
https://doi.org/10.23919/DATE58400.2024.10546727
Koohpayegani SPirsiavash H(2024)SimA: Simple Softmax-free Attention for Vision Transformers2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)10.1109/WACV57701.2024.00259(2595-2605)Online publication date: 3-Jan-2024
https://doi.org/10.1109/WACV57701.2024.00259
Fu YZhou CHuang THan EHe YJiao H(2024)SoftAct: A High-Precision Softmax Architecture for Transformers Supporting Nonlinear FunctionsIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2024.338677934:9(8912-8923)Online publication date: Sep-2024
https://doi.org/10.1109/TCSVT.2024.3386779
Kim JKim SChoi KPark I(2024)Hardware-Efficient SoftMax Architecture With Bit-Wise Exponentiation and Reciprocal CalculationIEEE Transactions on Circuits and Systems I: Regular Papers10.1109/TCSI.2024.344327071:10(4574-4585)Online publication date: Oct-2024
https://doi.org/10.1109/TCSI.2024.3443270
Hsieh MLi XHuang YKuo PHuang J(2024)A Hardware-Friendly Alternative to Softmax Function and Its Efficient VLSI Implementation for Deep Learning Applications2024 IEEE International Symposium on Circuits and Systems (ISCAS)10.1109/ISCAS58744.2024.10558086(1-5)Online publication date: 19-May-2024
https://doi.org/10.1109/ISCAS58744.2024.10558086
Xu WZhou JYang YWang LZhu JTeng HWei JLiu G(2024)A High-Performance Approximate Softmax Architecture2024 6th International Conference on Circuits and Systems (ICCS)10.1109/ICCS62517.2024.10846441(32-36)Online publication date: 20-Sep-2024
https://doi.org/10.1109/ICCS62517.2024.10846441
Grailoo MNikoubin TGustafsson ONunez-Yanez J(2024)Activation Function Integration for Accelerating Multi-Layer Graph Convolutional Neural Networks2024 IEEE 17th Dallas Circuits and Systems Conference (DCAS)10.1109/DCAS61159.2024.10539892(1-6)Online publication date: 19-Apr-2024
https://doi.org/10.1109/DCAS61159.2024.10539892
Gokula Kannan RHari Raghavan VGuruviah V(2024)FPGA Implementation of Efficient Softmax Architecture for Deep Neural NetworksEmerging Electronics and Automation10.1007/978-981-99-6855-8_47(617-623)Online publication date: 3-Feb-2024
https://doi.org/10.1007/978-981-99-6855-8_47
Chen KGao YWaris HLiu WLombardi F(2023)Approximate Softmax Functions for Energy-Efficient Deep Neural NetworksIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2022.322401131:1(4-16)Online publication date: Jan-2023
https://doi.org/10.1109/TVLSI.2022.3224011
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten