Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

STADIA: Photonic Stochastic Gradient Descent for Neural Network Accelerators

Published: 09 September 2023 Publication History

Abstract

Deep Neural Networks (DNNs) have demonstrated great success in many fields such as image recognition and text analysis. However, the ever-increasing sizes of both DNN models and training datasets make deep leaning extremely computation- and memory-intensive. Recently, photonic computing has emerged as a promising technology for accelerating DNNs. While the design of photonic accelerators for DNN inference and forward propagation of DNN training has been widely investigated, the architectural acceleration for equally important backpropagation of DNN training has not been well studied. In this paper, we propose a novel silicon photonic-based backpropagation accelerator for high performance DNN training. Specifically, a general-purpose photonic gradient descent unit named STADIA is designed to implement the multiplication, accumulation, and subtraction operations required for computing gradients using mature optical devices including Mach-Zehnder Interferometer (MZI) and Mircoring Resonator (MRR), which can significantly reduce the training latency and improve the energy efficiency of backpropagation. To demonstrate efficient parallel computing, we propose a STADIA-based backpropagation acceleration architecture and design a dataflow by using wavelength-division multiplexing (WDM). We analyze the precision of STADIA by quantifying the precision limitations imposed by losses and noises. Furthermore, we evaluate STADIA with different element sizes by analyzing the power, area and time delay for photonic accelerators based on DNN models such as AlexNet, VGG19 and ResNet. Simulation results show that the proposed architecture STADIA can achieve significant improvement by 9.7× in time efficiency and 147.2× in energy efficiency, compared with the most advanced optical-memristor based backpropagation accelerator.

References

[1]
Suguru Akiyama, Takeshi Baba, Masahiko Imai, Takeshi Akagawa, Masashi Takahashi, Naoki Hirayama, Hiroyuki Takahashi, Yoshiji Noguchi, Hideaki Okayama, Tsuyoshi Horikawa, et al. 2012. 12.5-Gb/s operation with 0.29-V· cm V \(\pi\) L using silicon mach-zehnder modulator based-on forward-biased pin diode. Optics Express 20, 3 (2012), 2911–2923.
[2]
MA Al-Qadasi, L Chrostowski, BJ Shastri, and S Shekhar. 2022. Scaling up silicon photonic-based accelerators: Challenges and opportunities. APL Photonics 7, 2 (2022), 020902.
[3]
Theoni Alexoudi, George Theodore Kanellos, and Nikos Pleros. 2020. Optical RAM and integrated optical memories: A survey. Light: Science & Applications 9, 1 (2020), 1–16.
[4]
Ahmed Awny, Rajasekhar Nagulapalli, Marcel Kroh, Jan Hoffmann, Patrick Runge, Daniel Micusik, Gunter Fischer, Ahmet Cagri Ulusoy, Minsu Ko, and Dietmar Kissinger. 2017. A linear differential transimpedance amplifier for 100-Gb/s integrated coherent optical fiber receivers. IEEE Transactions on Microwave Theory and Techniques 66, 2 (2017), 973–986.
[5]
Xia Chen, Milan M Milosevic, Stevan Stanković, Scott Reynolds, Thalia Dominguez Bucio, Ke Li, David J Thomson, Frederic Gardes, and Graham T Reed. 2018. The emergence of silicon photonics as a flexible technology platform. Proc. IEEE 106, 12 (2018), 2101–2116.
[6]
Sharan Chetlur, Cliff Woolley, Philippe Vandermersch, Jonathan Cohen, John Tran, Bryan Catanzaro, and Evan Shelhamer. 2014. cudnn: Efficient primitives for deep learning. arXiv preprint arXiv:1410.0759 (2014).
[7]
Dharanidhar Dang, Sai Vineel Reddy Chittamuru, Sudeep Pasricha, Rabi Mahapatra, and Debashis Sahoo. 2021. BPLight-CNN: A photonics-based backpropagation accelerator for deep learning. ACM Journal on Emerging Technologies in Computing Systems (JETC) 17, 4 (2021), 1–26.
[8]
Dharanidhar Dang, Aurosmita Khansama, Rabi Mahapatra, and Debashis Sahoo. 2020. BPhoton-CNN: An ultrafast photonic backpropagation accelerator for deep learning. In Proceedings of the Great Lakes Symposium on VLSI. 27–32.
[9]
Dharanidhar Dang, Sahar Taheri, Bill Lin, and Debashis Sahoo. 2020. MEMTONIC: A neuromorphic accelerator for energy efficient deep learning. In 2020 57th ACM/IEEE Design Automation Conference (DAC). IEEE, 1–2.
[10]
Christopher De Sa, Matthew Feldman, Christopher Ré, and Kunle Olukotun. 2017. Understanding and optimizing asynchronous low-precision stochastic gradient descent. In Proceedings of the 44th Annual International Symposium on Computer Architecture. 561–574.
[11]
Jeffrey Dean, Greg Corrado, Rajat Monga, Kai Chen, Matthieu Devin, Mark Mao, Marc’aurelio Ranzato, Andrew Senior, Paul Tucker, Ke Yang, et al. 2012. Large scale distributed deep networks. Advances in Neural Information Processing Systems 25 (2012), 1223–1231.
[12]
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition. Ieee, 248–255.
[13]
Li Deng. 2012. The mnist database of handwritten digit images for machine learning research. IEEE Signal Processing Magazine 29, 6 (2012), 141–142.
[14]
Florian Denis-Le Coarer, Marc Sciamanna, Andrew Katumba, Matthias Freiberger, Joni Dambre, Peter Bienstman, and Damien Rontani. 2018. All-optical reservoir computing on a photonic chip using silicon-based ring resonators. IEEE Journal of Selected Topics in Quantum Electronics 24, 6 (2018), 1–8.
[15]
A Descos, C Jany, D Bordel, H Duprez, G Beninca de Farias, P Brianceau, S Menezo, and B Ben Bakir. 2013. Heterogeneously integrated III-V/Si distributed Bragg reflector laser with adiabatic coupling. In 39th European Conference and Exhibition on Optical Communication (ECOC 2013). IET, 1–3.
[16]
Michael Y-S Fang, Sasikanth Manipatruni, Casimir Wierzynski, Amir Khosrowshahi, and Michael R DeWeese. 2019. Design of optical neural networks with component imprecisions. Optics Express 27, 10 (2019), 14009–14029.
[17]
William Fedus, Barret Zoph, and Noam Shazeer. 2021. Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity.
[18]
Johannes Feldmann, Nathan Youngblood, Maxim Karpov, Helge Gehring, Xuan Li, Maik Stappers, Manuel Le Gallo, Xin Fu, Anton Lukashchuk, Arslan Sajid Raja, et al. 2021. Parallel convolutional processing using an integrated photonic tensor core. Nature 589, 7840 (2021), 52–58.
[19]
Luciano Floridi and Massimo Chiriatti. 2020. GPT-3: Its nature, scope, limits, and consequences. Minds and Machines 30 (2020), 681–694.
[20]
Xianxin Guo, Thomas D Barrett, Zhiming M Wang, and AI Lvovsky. 2021. Backpropagation through nonlinear units for the all-optical training of neural networks. Photonics Research 9, 3 (2021), B71–B80.
[21]
Song Han, Huizi Mao, and William J Dally. 2015. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149 (2015).
[22]
Reiner Hartenstein. 2016. The alternative machine paradigm for energy-efficient computing. ResearchGate 113 (2016), 1–13.
[23]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770–778.
[24]
Rongqing Hui. 2019. Introduction to Fiber-optic Communications. Academic Press.
[25]
Benoit Jacob, Skirmantas Kligys, Bo Chen, Menglong Zhu, Matthew Tang, Andrew Howard, Hartwig Adam, and Dmitry Kalenichenko. 2018. Quantization and training of neural networks for efficient integer-arithmetic-only inference. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2704–2713.
[26]
Rashid Kaleem, Sreepathi Pai, and Keshav Pingali. 2015. Stochastic gradient descent on GPUs. In Proceedings of the 8th Workshop on General Purpose Processing using GPUs. 81–89.
[27]
John Koetsier. 2021. Photonic Supercomputer For AI: 10X faster, 90% less energy, plus runway for 100X speed boost. Forbes (2021). https://www.forbes.com/sites/johnkoetsier/2021/04/07/photonic-supercomputer-for-ai-10x-faster-90-less-energy-plus-runway-for-100x-speed-boost/?sh=4589d9b67260
[28]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems 25 (2012), 84–90.
[29]
Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. Gradient-based learning applied to document recognition. Proc. IEEE 86, 11 (1998), 2278–2324.
[30]
Kun Liao, Chentong Li, Tianxiang Dai, Chuyu Zhong, Hongtao Lin, Xiaoyong Hu, and Qihuang Gong. 2022. Matrix eigenvalue solver based on reconfigurable photonic neural network. Nanophotonics 11, 17 (2022), 4089–4099.
[31]
Xing Lin, Yair Rivenson, Nezih T Yardimci, Muhammed Veli, Yi Luo, Mona Jarrahi, and Aydogan Ozcan. 2018. All-optical machine learning using diffractive deep neural networks. Science 361, 6406 (2018), 1004–1008.
[32]
Qijun Liu, Miguel Jimenez, Maria Eugenia Inda, Arslan Riaz, Timur Zirtiloglu, Anantha P Chandrakasan, Timothy K Lu, Giovanni Traverso, Phillip Nadeau, and Rabia Tugce Yazicigil. 2022. A threshold-based bioluminescence detector with a CMOS-integrated photodiode array in 65 nm for a multi-diagnostic ingestible capsule. IEEE Journal of Solid-State Circuits (2022).
[33]
Armin Mehrabian, Yousra Al-Kabani, Volker J Sorger, and Tarek El-Ghazawi. 2018. PCNNA: A photonic convolutional neural network accelerator. In 2018 31st International System-on-Chip Conference. IEEE, 169–173.
[34]
Ian O’Connor and Gabriela Nicolescu. 2012. Integrated Optical Interconnect Architectures for Embedded Systems. Springer Science & Business Media.
[35]
Sunil Pai, Zhanghao Sun, Tyler W Hughes, Taewon Park, Ben Bartlett, Ian AD Williamson, Momchil Minkov, Maziyar Milanizadeh, Nathnael Abebe, Francesco Morichetti, et al. 2023. Experimentally realized in situ backpropagation for deep learning in photonic neural networks. Science 380, 6643 (2023), 398–404.
[36]
Alireza Shafaei, Yanzhi Wang, and Xue Lin. 2014. FinCACTI: Architectural analysis and modeling of caches with deeply-scaled FinFET devices. In 2014 IEEE Computer Society Annual Symposium on VLSI. 290–295.
[37]
Ali Shafiee, Anirban Nag, Naveen Muralimanohar, Rajeev Balasubramonian, John Paul Strachan, Miao Hu, and R Stanley Williams. 2016. ISAAC: A convolutional neural network accelerator with in-situ analog arithmetic in crossbars. ACM SIGARCH Computer Architecture News 44, 3 (2016), 14–26.
[38]
Yichen Shen, Nicholas C Harris, Scott Skirlo, Mihika Prabhu, Tom Baehr-Jones, Michael Hochberg, Xin Sun, Shijie Zhao, Hugo Larochelle, Dirk Englund, et al. 2017. Deep learning with coherent nanophotonic circuits. Nature Photonics 11, 7 (2017), 441–446.
[39]
Kyle Shiflett, Avinash Karanth, Razvan Bunescu, and Ahmed Louri. 2021. Albireo: Energy-efficient acceleration of convolutional neural networks via silicon photonics. In ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA). 860–873.
[40]
Kyle Shiflett, Dylan Wright, Avinash Karanth, and Ahmed Louri. 2020. PIXEL: Photonic neural network accelerator. In 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA). 474–487.
[41]
Farhad Shokraneh, Simon Geoffroy-Gagnon, Mohammadreza Sanadgol Nezami, and Odile Liboiron-Ladouceur. 2019. A single layer neural network implemented by a \(4 \times 4\) MZI-based optical processor. IEEE Photonics Journal 11, 6 (2019), 1–12.
[42]
Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014), 1–14.
[43]
Jaspreet Singh, Onkar Dabeer, and Upamanyu Madhow. 2009. On the limits of communication with low-precision analog-to-digital conversion at the receiver. IEEE Transactions on Communications 57, 12 (2009), 3629–3639.
[44]
Linghao Song, Xuehai Qian, Hai Li, and Yiran Chen. 2017. Pipelayer: A pipelined reram-based accelerator for deep learning. In 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 541–552.
[45]
Xiao Sun, Jungwook Choi, Chia-Yu Chen, Naigang Wang, Swagath Venkataramani, Vijayalakshmi Viji Srinivasan, Xiaodong Cui, Wei Zhang, and Kailash Gopalakrishnan. 2019. Hybrid 8-bit floating point (HFP8) training and inference for deep neural networks. Advances in Neural Information Processing Systems 32 (2019).
[46]
Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1–9.
[47]
Alexander N Tait, Mitchell A Nahmias, Bhavin J Shastri, and Paul R Prucnal. 2014. Broadcast and weight: An integrated network for scalable photonic spike processing. Journal of Lightwave Technology 32, 21 (2014), 3427–3439.
[48]
Texas. 2023. Texas Instruments ADS1285 32-Bit Low-Power ADC. https://nz.mouser.com/new/texas-instruments/ti-ads1285-low-power-adc/
[49]
Arthur van Wijk, Christopher R Doerr, Zain Ali, Mustafa Karabiyik, and B Imran Akca. 2020. Compact ultrabroad-bandwidth cascaded arrayed waveguide gratings. Optics Express 28, 10 (2020), 14618–14626.
[50]
Chao Xiang, Joel Guo, Warren Jin, Lue Wu, Jonathan Peters, Weiqiang Xie, Lin Chang, Boqiang Shen, Heming Wang, Qi-Fan Yang, et al. 2021. High-performance lasers for fully integrated silicon nitride photonics. Nature Communications 12, 1 (2021), 6650.
[51]
Shuiying Xiang, Yanan Han, Ziwei Song, Xingxing Guo, Yahui Zhang, Zhenxing Ren, Suhong Wang, Yuanting Ma, Weiwen Zou, Bowen Ma, et al. 2021. A review: Photonics devices, architectures, and algorithms for optical neural computing. Journal of Semiconductors 42, 2 (2021), 023105.
[52]
Xiaolong Xie, Wei Tan, Liana L Fong, and Yun Liang. 2017. CuMF_SGD: Parallelized stochastic gradient descent for matrix factorization on GPUs. In Proceedings of the 26th International Symposium on High-Performance Parallel and Distributed Computing. 79–92.
[53]
Shijie Zhou, Rajgopal Kannan, and Viktor Prasanna. 2020. Accelerating stochastic gradient descent based matrix factorization on FPGA. IEEE Transactions on Parallel and Distributed Systems 31, 8 (2020), 1897–1911.
[54]
Feng Zhu, Ruihao Gong, Fengwei Yu, Xianglong Liu, Yanfei Wang, Zhelong Li, Xiuqi Yang, and Junjie Yan. 2020. Towards unified int8 training for convolutional neural network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1969–1979.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Embedded Computing Systems
ACM Transactions on Embedded Computing Systems  Volume 22, Issue 5s
Special Issue ESWEEK 2023
October 2023
1394 pages
ISSN:1539-9087
EISSN:1558-3465
DOI:10.1145/3614235
  • Editor:
  • Tulika Mitra
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

Publication History

Published: 09 September 2023
Accepted: 01 July 2023
Revised: 02 June 2023
Received: 23 March 2023
Published in TECS Volume 22, Issue 5s

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Stochastic gradient descent
  2. neural networks accelerator
  3. optical computing

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 437
    Total Downloads
  • Downloads (Last 12 months)206
  • Downloads (Last 6 weeks)13
Reflects downloads up to 23 Dec 2024

Other Metrics

Citations

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media