Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Open access

A domain-specific architecture for deep neural networks

Published: 22 August 2018 Publication History

Abstract

Tensor processing units improve performance per watt of neural networks in Google datacenters by roughly 50x.

References

[1]
Abadi, M. et al. Tensorflow: Large-scale machine learning on heterogeneous distributed systems. arXiv preprint, 2016; https://arxiv.org/abs/1603.04467
[2]
Albericio, J., Judd, P., Hetherington, T., Aamodt, T., Jerger, N.E., and Moshovos, A. 2016 Cnvlutin: Ineffectual-neuron-free deep neural network computing. In Proceedings of the 43<sup>rd</sup> ACM/IEEE International Symposium on Computer Architecture (Seoul, Korea), IEEE Press, 2016.
[3]
Asanović, K. Programmable neurocomputing. In The Handbook of Brain Theory and Neural Networks, Second Edition, M.A. Arbib, Ed. MIT Press, Cambridge, MA, Nov. 2002; https://people.eecs.berkeley.edu/~krste/papers/neurocomputing.pdf
[4]
Barroso, L.A. and Hölzle, U. The case for energy-proportional computing. IEEE Computer 40, 12 (Dec. 2007), 33--37.
[5]
Barr, J. New G2 Instance Type for Amazon EC2: Up to 16 GPUs. Amazon blog, Sept. 29, 2016; https://aws.amazon.com/about-aws/whats-new/2015/04/introducing-a-new-g2-instance-size-the-g28xlarge/
[6]
Barr, J. New Next-Generation GPU-Powered EC2 Instances (G3). Amazon blog, July 13, 2017; https://aws.amazon.com/blogs/aws/new-next-generation-gpu-powered-ec2-instances-g3/
[7]
Chen, Y., Chen, T., Xu, Z., Sun, N., and Teman, 0. DianNao Family: Energy-efficient hardware accelerators for machine learning. Commun. ACM 59, 11 (Nov. 2016), 105--112.
[8]
Chen, Y.H., Emer, J., and Sze, V. Eyeriss: A spatial architecture for energy-efficient dataflow for convolutional neural networks. In Proceedings of the 43rd ACM/IEEE International Symposium on Computer Architecture (Seoul, Korea), IEEE Press, 2016.
[9]
Clark, J. Google turning its lucrative Web search over to AI machines. Bloomberg Technology (0ct. 26, 2015).
[10]
Dally, W. High-performance hardware for machine learning. Invited talk at Cadence ENN Summit (Santa Clara, CA, Feb. 9, 2016); https://ip.cadence.com/uploads/presentations/1000AM_Dally_Cadence_ENN.pdf
[11]
Dean, J. Large-Scale Deep Learning with TensorFlow for Building Intelligent Systems. ACM webinar, July 7, 2016; https://www.youtube.com/watch?v=vzoe2G5g-w4
[12]
Hammerstrom, D. A VLSI architecture for high-performance, low-cost, on-chip learning. In Proceedings of the International Joint Conference on Neural Networks (San Diego, CA, June 17--21). IEEE Press, 1990.
[13]
Han, S., Pool, J., Tran, J., and Dally, W. Learning both weights and connections for efficient neural networks. In Proceedings of Advances in Neural Information Processing Systems (Montreal Canada, Dec.) MIT Press, Cambridge, MA, 2015.
[14]
Han, S., Liu, X., Mao, H., Pu, J., Pedram, A., Horowitz, M.A., and Dally, W.J. EIE: Efficient Inference Engine on compressed deep neural network. In Proceedings of the 43<sup>rd</sup> ACM/IEEE International Symposium on Computer Architecture (Seoul, Korea). IEEE Press, 2016.
[15]
Huang, J. AI Drives the Rise of Accelerated Computing in Data Centers. Nvidia blog, Apr. 2017; https://blogs.nvidia.com/blog/2017/04/10/ai-drives-rise-accelerated-computing-datacenter/
[16]
He, K., Zhang, X., Ren, S., and Sun, J. Identity mappings in deep residual networks. arXiv preprint, Mar. 16, 2016; https://arxiv.org/abs/1603.05027
[17]
Hennessy, J.L. and Patterson, D.A. Computer Architecture: A Quantitative Approach, Sixth Edition. Elsevier, New York, 2018.
[18]
Ienne, P., Cornu, T., and Kuhn, G. Special-purpose digital hardware for neural networks: An architectural survey. Journal of VLSI Signal Processing Systems for Signal, Image and Video Technology 13, 1 (1996), 5--25.
[19]
Jouppi, N. Google Supercharges Machine Learning Tasks with TPU Custom Chip. Google platform blog, May 18, 2016; https://cloudplatform.googleblog.com/2016/05/Google-supercharges-machine-learning-tasks-with-custom-chip.html
[20]
Jouppi, N. et al, In-datacenter performance of a tensor processing unit. In Proceedings of the 44<sup>th</sup> International Symposium on Computer Architecture (Toronto, Canada, June 24--28). ACM Press, New York, 2017, 1--12.
[21]
Keutzer, K. If I could only design one circuit ... Commun. ACM 59, 11 (Nov. 2016), 104.
[22]
Krizhevsky, A., Sutskever, I., and Hinton, G. Imagenet classification with deep convolutional neural networks. In Proceedings of Advances in Neural Information Processing Systems (Lake Tahoe, NV). MIT Press, Cambridge, MA, 2012.
[23]
Kung, H.T. and Leiserson, C.E. Algorithms for VLSI processor arrays. Chapter in Introduction to VLSI systems by C. Mead and L. Conway. Addison-Wesley, Reading, MA, 1980, 271--292.
[24]
Lange, K.D. Identifying shades of green: The SPECpower benchmarks. IEEE Computer 42, 3 (Mar. 2009), 95--97.
[25]
Larabel, M. Google Looks to Open Up StreamExecutor to Make GPGPU Programming Easier. Phoronix, Mar. 10, 2016; https://www.phoronix.com/scan.php?page=news_item&px=Google-StreamExec-Parallel
[26]
Metz, C. Microsoft bets its future on a reprogrammable computer chip. Wired (Sept. 25, 2016); https://www.wired.com/2016/09/microsoft-bets-future-chip-reprogram-fly/
[27]
Moore, G.E. No exponential is forever: But 'forever' can be delayed! In Proceedings of the International Solid-State Circuits Conference (San Francisco, CA, Feb. 13). IEEE Press, 2003.
[28]
Parashar, A., Rhu, M., Mukkara, A., Puglielli, A., Venkatesan, R., Khailany, B., Emer, J., Keckler, S.W., and Dally, W.J. SCNN: An accelerator for compressed-sparse convolutional neural networks. In Proceedings of the 44<sup>th</sup> Annual International Symposium on Computer Architecture (Toronto, 0N, Canada, June 24--28). IEEE Press, 2017, 27--40.
[29]
Patterson, D.A. Latency lags bandwidth. Commun. ACM 47, 10 (Oct. 2004), 71--75.
[30]
Patterson, D.A. and Ditzel, D.R. The case for the reduced instruction set computer. SIGARCH Computer Architecture News 8, 6 (Sept. 1980), 25--33.
[31]
Putnam, A. et al. A reconfigurable fabric for accelerating large-scale datacenter services. Commun. ACM 59, 11 (Nov. 2016), 114--122.
[32]
Reagen, B., Whatmough, P., Adolf, R., Rama, S., Lee, H., Lee, S.K., Hernández-Lobato, J.M., Wei, G.Y., and Brooks, D. Minerva: Enabling low-power, highly accurate deep neural network accelerators. In Proceedings of the 43<sup>rd</sup> ACM/IEEE International Symposium on Computer Architecture (Seoul, Korea), IEEE Press 2016.
[33]
Silver, D. et al. Mastering the game of Go with deep neural networks and tree search. Nature 529, 7587 (Sept. 20, 2016).
[34]
Smith, J.E. Decoupled access/execute computer architectures. In Proceedings of the 11<sup>th</sup> Annual International Symposium on Computer Architecture (Austin, TX, Apr. 26--29). IEEE Computer Society Press, 1982.
[35]
Szegedy, C. et al. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (Boston, MA, June 7--12). IEEE Computer Society Press, 2015.
[36]
Venkataramani, S. et al. ScaleDeep: A scalable compute architecture for learning and evaluating deep networks. In Proceedings of the 44<sup>th</sup> Annual International Symposium on Computer Architecture (Toronto, ON, Canada, June 24--28). ACM Press, New York, 2017, 13--26.
[37]
Williams, S., Waterman, A., and Patterson, D. Roofline: An insightful visual performance model for multi-core architectures. Commun. ACM 52, 4 (Apr. 2009), 65--76.
[38]
Wu, Y. et al. Google's neural machine translation system: Bridging the gap between human and machine translation. arXiv preprint, Sept. 26, 2016; arXiv:1609.03144

Cited By

View all
  • (2024)An organized view of reservoir computing: a perspective on theory and technology developmentJapanese Journal of Applied Physics10.35848/1347-4065/ad394f63:5(050803)Online publication date: 9-May-2024
  • (2024)ECHO: Energy-Efficient Computation Harnessing Online Arithmetic—An MSDF-Based Accelerator for DNN InferenceElectronics10.3390/electronics1310189313:10(1893)Online publication date: 11-May-2024
  • (2024)Inference Optimization of Foundation Models on AI AcceleratorsProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671465(6605-6615)Online publication date: 25-Aug-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Communications of the ACM
Communications of the ACM  Volume 61, Issue 9
September 2018
94 pages
ISSN:0001-0782
EISSN:1557-7317
DOI:10.1145/3271489
Issue’s Table of Contents
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 August 2018
Published in CACM Volume 61, Issue 9

Check for updates

Qualifiers

  • Research-article
  • Popular
  • Refereed

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2,129
  • Downloads (Last 6 weeks)253
Reflects downloads up to 15 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)An organized view of reservoir computing: a perspective on theory and technology developmentJapanese Journal of Applied Physics10.35848/1347-4065/ad394f63:5(050803)Online publication date: 9-May-2024
  • (2024)ECHO: Energy-Efficient Computation Harnessing Online Arithmetic—An MSDF-Based Accelerator for DNN InferenceElectronics10.3390/electronics1310189313:10(1893)Online publication date: 11-May-2024
  • (2024)Inference Optimization of Foundation Models on AI AcceleratorsProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671465(6605-6615)Online publication date: 25-Aug-2024
  • (2024)System and Design Technology Co-Optimization of SOT-MRAM for High-Performance AI Accelerator Memory SystemIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2023.333375443:4(1065-1078)Online publication date: 1-Apr-2024
  • (2024)Domain-Specific STT-MRAM-Based In-Memory Computing: A SurveyIEEE Access10.1109/ACCESS.2024.336563212(28036-28056)Online publication date: 2024
  • (2024)Measurement-driven neural-network training for integrated magnetic tunnel junction arraysPhysical Review Applied10.1103/PhysRevApplied.21.05402821:5Online publication date: 14-May-2024
  • (2024)A bandwidth enhancement method of VTA based on paralleled memory access designIntegration10.1016/j.vlsi.2023.10210294(102102)Online publication date: Jan-2024
  • (2024)Modern computing: Vision and challengesTelematics and Informatics Reports10.1016/j.teler.2024.10011613(100116)Online publication date: Mar-2024
  • (2024)On energy complexity of fully-connected layersNeural Networks10.1016/j.neunet.2024.106419178(106419)Online publication date: Oct-2024
  • (2023)Deep Learning Accelerators’ Configuration Space Exploration Effect on Performance and Resource Utilization: A Gemmini Case StudySensors10.3390/s2305238023:5(2380)Online publication date: 21-Feb-2023
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Digital Edition

View this article in digital edition.

Digital Edition

Magazine Site

View this article on the magazine site (external)

Magazine Site

Get Access

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media