research-article

Open access

A domain-specific architecture for deep neural networks

Authors:

Norman P. Jouppi,

Cliff Young,

Nishant Patil,

David PattersonAuthors Info & Claims

Communications of the ACM, Volume 61, Issue 9

Pages 50 - 59

https://doi.org/10.1145/3154484

Published: 22 August 2018 Publication History

All formats PDF

Abstract

Tensor processing units improve performance per watt of neural networks in Google datacenters by roughly 50x.

References

[1]

Abadi, M. et al. Tensorflow: Large-scale machine learning on heterogeneous distributed systems. arXiv preprint, 2016; https://arxiv.org/abs/1603.04467

Google Scholar

[2]

Albericio, J., Judd, P., Hetherington, T., Aamodt, T., Jerger, N.E., and Moshovos, A. 2016 Cnvlutin: Ineffectual-neuron-free deep neural network computing. In Proceedings of the 43rd ACM/IEEE International Symposium on Computer Architecture (Seoul, Korea), IEEE Press, 2016.

Digital Library

Google Scholar

[3]

Asanović, K. Programmable neurocomputing. In The Handbook of Brain Theory and Neural Networks, Second Edition, M.A. Arbib, Ed. MIT Press, Cambridge, MA, Nov. 2002; https://people.eecs.berkeley.edu/~krste/papers/neurocomputing.pdf

Google Scholar

[4]

Barroso, L.A. and Hölzle, U. The case for energy-proportional computing. IEEE Computer 40, 12 (Dec. 2007), 33--37.

Digital Library

Google Scholar

[5]

Barr, J. New G2 Instance Type for Amazon EC2: Up to 16 GPUs. Amazon blog, Sept. 29, 2016; https://aws.amazon.com/about-aws/whats-new/2015/04/introducing-a-new-g2-instance-size-the-g28xlarge/

Google Scholar

[6]

Barr, J. New Next-Generation GPU-Powered EC2 Instances (G3). Amazon blog, July 13, 2017; https://aws.amazon.com/blogs/aws/new-next-generation-gpu-powered-ec2-instances-g3/

Google Scholar

[7]

Chen, Y., Chen, T., Xu, Z., Sun, N., and Teman, 0. DianNao Family: Energy-efficient hardware accelerators for machine learning. Commun. ACM 59, 11 (Nov. 2016), 105--112.

Digital Library

Google Scholar

[8]

Chen, Y.H., Emer, J., and Sze, V. Eyeriss: A spatial architecture for energy-efficient dataflow for convolutional neural networks. In Proceedings of the 43rd ACM/IEEE International Symposium on Computer Architecture (Seoul, Korea), IEEE Press, 2016.

Digital Library

Google Scholar

[9]

Clark, J. Google turning its lucrative Web search over to AI machines. Bloomberg Technology (0ct. 26, 2015).

Google Scholar

[10]

Dally, W. High-performance hardware for machine learning. Invited talk at Cadence ENN Summit (Santa Clara, CA, Feb. 9, 2016); https://ip.cadence.com/uploads/presentations/1000AM_Dally_Cadence_ENN.pdf

Google Scholar

[11]

Dean, J. Large-Scale Deep Learning with TensorFlow for Building Intelligent Systems. ACM webinar, July 7, 2016; https://www.youtube.com/watch?v=vzoe2G5g-w4

Google Scholar

[12]

Hammerstrom, D. A VLSI architecture for high-performance, low-cost, on-chip learning. In Proceedings of the International Joint Conference on Neural Networks (San Diego, CA, June 17--21). IEEE Press, 1990.

Crossref

Google Scholar

[13]

Han, S., Pool, J., Tran, J., and Dally, W. Learning both weights and connections for efficient neural networks. In Proceedings of Advances in Neural Information Processing Systems (Montreal Canada, Dec.) MIT Press, Cambridge, MA, 2015.

Digital Library

Google Scholar

[14]

Han, S., Liu, X., Mao, H., Pu, J., Pedram, A., Horowitz, M.A., and Dally, W.J. EIE: Efficient Inference Engine on compressed deep neural network. In Proceedings of the 43rd ACM/IEEE International Symposium on Computer Architecture (Seoul, Korea). IEEE Press, 2016.

Digital Library

Google Scholar

[15]

Huang, J. AI Drives the Rise of Accelerated Computing in Data Centers. Nvidia blog, Apr. 2017; https://blogs.nvidia.com/blog/2017/04/10/ai-drives-rise-accelerated-computing-datacenter/

Google Scholar

[16]

He, K., Zhang, X., Ren, S., and Sun, J. Identity mappings in deep residual networks. arXiv preprint, Mar. 16, 2016; https://arxiv.org/abs/1603.05027

Google Scholar

[17]

Hennessy, J.L. and Patterson, D.A. Computer Architecture: A Quantitative Approach, Sixth Edition. Elsevier, New York, 2018.

Digital Library

Google Scholar

[18]

Ienne, P., Cornu, T., and Kuhn, G. Special-purpose digital hardware for neural networks: An architectural survey. Journal of VLSI Signal Processing Systems for Signal, Image and Video Technology 13, 1 (1996), 5--25.

Digital Library

Google Scholar

[19]

Jouppi, N. Google Supercharges Machine Learning Tasks with TPU Custom Chip. Google platform blog, May 18, 2016; https://cloudplatform.googleblog.com/2016/05/Google-supercharges-machine-learning-tasks-with-custom-chip.html

Google Scholar

[20]

Jouppi, N. et al, In-datacenter performance of a tensor processing unit. In Proceedings of the 44th International Symposium on Computer Architecture (Toronto, Canada, June 24--28). ACM Press, New York, 2017, 1--12.

Digital Library

Google Scholar

[21]

Keutzer, K. If I could only design one circuit ... Commun. ACM 59, 11 (Nov. 2016), 104.

Digital Library

Google Scholar

[22]

Krizhevsky, A., Sutskever, I., and Hinton, G. Imagenet classification with deep convolutional neural networks. In Proceedings of Advances in Neural Information Processing Systems (Lake Tahoe, NV). MIT Press, Cambridge, MA, 2012.

Digital Library

Google Scholar

[23]

Kung, H.T. and Leiserson, C.E. Algorithms for VLSI processor arrays. Chapter in Introduction to VLSI systems by C. Mead and L. Conway. Addison-Wesley, Reading, MA, 1980, 271--292.

Google Scholar

[24]

Lange, K.D. Identifying shades of green: The SPECpower benchmarks. IEEE Computer 42, 3 (Mar. 2009), 95--97.

Digital Library

Google Scholar

[25]

Larabel, M. Google Looks to Open Up StreamExecutor to Make GPGPU Programming Easier. Phoronix, Mar. 10, 2016; https://www.phoronix.com/scan.php?page=news_item&px=Google-StreamExec-Parallel

Google Scholar

[26]

Metz, C. Microsoft bets its future on a reprogrammable computer chip. Wired (Sept. 25, 2016); https://www.wired.com/2016/09/microsoft-bets-future-chip-reprogram-fly/

Google Scholar

[27]

Moore, G.E. No exponential is forever: But 'forever' can be delayed! In Proceedings of the International Solid-State Circuits Conference (San Francisco, CA, Feb. 13). IEEE Press, 2003.

Google Scholar

[28]

Parashar, A., Rhu, M., Mukkara, A., Puglielli, A., Venkatesan, R., Khailany, B., Emer, J., Keckler, S.W., and Dally, W.J. SCNN: An accelerator for compressed-sparse convolutional neural networks. In Proceedings of the 44th Annual International Symposium on Computer Architecture (Toronto, 0N, Canada, June 24--28). IEEE Press, 2017, 27--40.

Digital Library

Google Scholar

[29]

Patterson, D.A. Latency lags bandwidth. Commun. ACM 47, 10 (Oct. 2004), 71--75.

Digital Library

Google Scholar

[30]

Patterson, D.A. and Ditzel, D.R. The case for the reduced instruction set computer. SIGARCH Computer Architecture News 8, 6 (Sept. 1980), 25--33.

Digital Library

Google Scholar

[31]

Putnam, A. et al. A reconfigurable fabric for accelerating large-scale datacenter services. Commun. ACM 59, 11 (Nov. 2016), 114--122.

Digital Library

Google Scholar

[32]

Reagen, B., Whatmough, P., Adolf, R., Rama, S., Lee, H., Lee, S.K., Hernández-Lobato, J.M., Wei, G.Y., and Brooks, D. Minerva: Enabling low-power, highly accurate deep neural network accelerators. In Proceedings of the 43rd ACM/IEEE International Symposium on Computer Architecture (Seoul, Korea), IEEE Press 2016.

Digital Library

Google Scholar

[33]

Silver, D. et al. Mastering the game of Go with deep neural networks and tree search. Nature 529, 7587 (Sept. 20, 2016).

Crossref

Google Scholar

[34]

Smith, J.E. Decoupled access/execute computer architectures. In Proceedings of the 11th Annual International Symposium on Computer Architecture (Austin, TX, Apr. 26--29). IEEE Computer Society Press, 1982.

Digital Library

Google Scholar

[35]

Szegedy, C. et al. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (Boston, MA, June 7--12). IEEE Computer Society Press, 2015.

Crossref

Google Scholar

[36]

Venkataramani, S. et al. ScaleDeep: A scalable compute architecture for learning and evaluating deep networks. In Proceedings of the 44th Annual International Symposium on Computer Architecture (Toronto, ON, Canada, June 24--28). ACM Press, New York, 2017, 13--26.

Digital Library

Google Scholar

[37]

Williams, S., Waterman, A., and Patterson, D. Roofline: An insightful visual performance model for multi-core architectures. Commun. ACM 52, 4 (Apr. 2009), 65--76.

Digital Library

Google Scholar

[38]

Wu, Y. et al. Google's neural machine translation system: Bridging the gap between human and machine translation. arXiv preprint, Sept. 26, 2016; arXiv:1609.03144

Google Scholar

Cited By

View all

Abdi GMazur TSzaciłowski K(2024)An organized view of reservoir computing: a perspective on theory and technology developmentJapanese Journal of Applied Physics10.35848/1347-4065/ad394f63:5(050803)Online publication date: 9-May-2024
https://doi.org/10.35848/1347-4065/ad394f
Ibrahim MUsman MLee J(2024)ECHO: Energy-Efficient Computation Harnessing Online Arithmetic—An MSDF-Based Accelerator for DNN InferenceElectronics10.3390/electronics1310189313:10(1893)Online publication date: 11-May-2024
https://doi.org/10.3390/electronics13101893
Park YBudhathoki KChen LKübler JHuang JKleindessner MHuan JCevher VWang YKarypis GBaeza-Yates RBonchi F(2024)Inference Optimization of Foundation Models on AI AcceleratorsProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671465(6605-6615)Online publication date: 25-Aug-2024
https://dl.acm.org/doi/10.1145/3637528.3671465
Show More Cited By

Index Terms

A domain-specific architecture for deep neural networks

Recommendations

Deep Kronecker neural networks: A general framework for neural networks with adaptive activation functions
Abstract
We propose a new type of neural networks, Kronecker neural networks (KNNs), that form a general framework for neural networks with adaptive activation functions. KNNs employ the Kronecker product, which provides an efficient way of ...
Neural Networks and Deep Learning: A Textbook
Granular neural networks

Fuzzy neural networks (FNNs) and rough neural networks (RNNs) both have been hot research topics in the artificial intelligence in recent years. The former imitates the human brain in dealing with problems, the other takes advantage of rough set theory ...

Comments

Information & Contributors

Information

Published In

Communications of the ACM Volume 61, Issue 9

September 2018

94 pages

ISSN:0001-0782

EISSN:1557-7317

DOI:10.1145/3271489

Editor:
Andrew A. Chien
Association for Computing Machinery, New York, NY

Issue’s Table of Contents

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 August 2018

Published in CACM Volume 61, Issue 9

Check for updates

Qualifiers

Research-article
Popular
Refereed

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

113
Total Citations
View Citations
37,397
Total Downloads

Downloads (Last 12 months)2,129
Downloads (Last 6 weeks)253

Reflects downloads up to 15 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Abdi GMazur TSzaciłowski K(2024)An organized view of reservoir computing: a perspective on theory and technology developmentJapanese Journal of Applied Physics10.35848/1347-4065/ad394f63:5(050803)Online publication date: 9-May-2024
https://doi.org/10.35848/1347-4065/ad394f
Ibrahim MUsman MLee J(2024)ECHO: Energy-Efficient Computation Harnessing Online Arithmetic—An MSDF-Based Accelerator for DNN InferenceElectronics10.3390/electronics1310189313:10(1893)Online publication date: 11-May-2024
https://doi.org/10.3390/electronics13101893
Park YBudhathoki KChen LKübler JHuang JKleindessner MHuan JCevher VWang YKarypis GBaeza-Yates RBonchi F(2024)Inference Optimization of Foundation Models on AI AcceleratorsProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671465(6605-6615)Online publication date: 25-Aug-2024
https://dl.acm.org/doi/10.1145/3637528.3671465
Mishty KSadi M(2024)System and Design Technology Co-Optimization of SOT-MRAM for High-Performance AI Accelerator Memory SystemIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2023.333375443:4(1065-1078)Online publication date: 1-Apr-2024
https://dl.acm.org/doi/10.1109/TCAD.2023.3333754
Yusuf AAdegbija TGajaria D(2024)Domain-Specific STT-MRAM-Based In-Memory Computing: A SurveyIEEE Access10.1109/ACCESS.2024.336563212(28036-28056)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3365632
Borders WMadhavan ADaniels MGeorgiou VLueker-Boden MSantos TBraganca PStiles MMcClelland JHoskins B(2024)Measurement-driven neural-network training for integrated magnetic tunnel junction arraysPhysical Review Applied10.1103/PhysRevApplied.21.05402821:5Online publication date: 14-May-2024
https://doi.org/10.1103/PhysRevApplied.21.054028
Wang XCheng JChang FZhu LChang HMei K(2024)A bandwidth enhancement method of VTA based on paralleled memory access designIntegration10.1016/j.vlsi.2023.10210294(102102)Online publication date: Jan-2024
https://doi.org/10.1016/j.vlsi.2023.102102
Gill SWu HPatros POttaviani CArora PPujol VHaunschild DParlikad ACetinkaya OLutfiyya HStankovski VLi RDing YQadir JAbraham AGhosh SSong HSakellariou RRana ORodrigues JKanhere SDustdar SUhlig SRamamohanarao KBuyya R(2024)Modern computing: Vision and challengesTelematics and Informatics Reports10.1016/j.teler.2024.10011613(100116)Online publication date: Mar-2024
https://doi.org/10.1016/j.teler.2024.100116
Šíma JCabessa JVidnerová P(2024)On energy complexity of fully-connected layersNeural Networks10.1016/j.neunet.2024.106419178(106419)Online publication date: Oct-2024
https://doi.org/10.1016/j.neunet.2024.106419
Gookyi DLee EKim KJang SLee S(2023)Deep Learning Accelerators’ Configuration Space Exploration Effect on Performance and Resource Utilization: A Gemmini Case StudySensors10.3390/s2305238023:5(2380)Online publication date: 21-Feb-2023
https://doi.org/10.3390/s23052380
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Digital Edition

View this article in digital edition.

Digital Edition

Magazine Site

View this article on the magazine site (external)

Magazine Site

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Abstract

References

Cited By

Index Terms

Recommendations

Deep Kronecker neural networks: A general framework for neural networks with adaptive activation functions

Neural Networks and Deep Learning: A Textbook

Granular neural networks

Comments

Information

Published In

Publisher

Publication History

Check for updates

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

PDF

eReader

Digital Edition

Magazine Site

Get Access

Login options

Full Access

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations