Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1109/ISCA45697.2020.00045acmconferencesArticle/Chapter ViewAbstractPublication PagesiscaConference Proceedingsconference-collections

Abstract

Machine-learning (ML) hardware and software system demand is burgeoning. Driven by ML applications, the number of different ML inference systems has exploded. Over 100 organizations are building ML inference chips, and the systems that incorporate existing models span at least three orders of magnitude in power consumption and five orders of magnitude in performance; they range from embedded devices to data-center solutions. Fueling the hardware are a dozen or more software frameworks and libraries. The myriad combinations of ML hardware and ML software make assessing ML-system performance in an architecture-neutral, representative, and reproducible manner challenging. There is a clear need for industry-wide standard ML benchmarking and evaluation criteria. MLPerf Inference answers that call. In this paper, we present our benchmarking method for evaluating ML inference systems. Driven by more than 30 organizations as well as more than 200 ML engineers and practitioners, MLPerf prescribes a set of rules and best practices to ensure comparability across systems with wildly differing architectures. The first call for submissions garnered more than 600 reproducible inference-performance measurements from 14 organizations, representing over 30 systems that showcase a wide range of capabilities. The submissions attest to the benchmark's flexibility and adaptability.

References

[1]
M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat, G. Irving, M. Isard et al., "TensorFlow: A System for Large-Scale Machine Learning," in OSDI, vol. 16, 2016.
[2]
R. Adolf, S. Rama, B. Reagen, G.-Y. Wei, and D. Brooks, "Fathom: Reference Workloads for Modern Deep Learning Methods," in IEEE International Symposium on Workload Characterization (IISWC), 2016.
[3]
Alibaba, "Ai matrix." https://aimatrix.ai/en-us/, Alibaba, 2018.
[4]
D. Amodei and D. Hernandez, "Ai and compute," https://blog.openai.com/ai-and-compute/, OpenAI, 2018.
[5]
Apple, "Core ml: Integrate machine learning models into your app," https://developer.apple.com/documentation/coreml, Apple, 2017.
[6]
V. Badrinarayanan, A. Kendall, and R. Cipolla, "Segnet: A deep convolutional encoder-decoder architecture for image segmentation," IEEE transactions on pattern analysis and machine intelligence, vol. 39, no. 12, 2017.
[7]
J. Bai, F. Lu, K. Zhang et al., "Onnx: Open neural network exchange," https://github.com/onnx/onnx, 2019.
[8]
Baidu, "DeepBench: Benchmarking Deep Learning Operations on Different Hardware," https://github.com/baidu-research/DeepBench, 2017.
[9]
S. Bianco, R. Cadene, L. Celona, and P. Napoletano, "Benchmark analysis of representative deep neural network architectures," IEEE Access, vol. 6, 2018.
[10]
T. Chen, M. Li, Y. Li, M. Lin, N. Wang, M. Wang, T. Xiao, B. Xu, C. Zhang, and Z. Zhang, "Mxnet: A flexible and efficient machine learning library for heterogeneous distributed systems," arXiv preprint arXiv:1512.01274, 2015.
[11]
S. Chetlur, C. Woolley, P. Vandermersch, J. Cohen, J. Tran, B. Catanzaro, and E. Shelhamer, "cudnn: Efficient primitives for deep learning," CoRR, vol. abs/1410.0759, 2014. [Online]. Available: http://arxiv.org/abs/1410.0759
[12]
F. Chollet et al., "Keras," https://keras.io, 2015.
[13]
F. Chollet, "Xception: Deep learning with depthwise separable convolutions," in Proceedings of Conference on Computer Vision and Pattern Recognition, 2017.
[14]
C. Coleman, D. Narayanan, D. Kang, T. Zhao, J. Zhang, L. Nardi, P. Bailis, K. Olukotun, C. Ré, and M. Zaharia, "DAWNBench: An End-to-End Deep Learning Benchmark and Competition," NeurIPS ML Systems Workshop, 2017.
[15]
J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, "Imagenet: A large-scale hierarchical image database," in 2009 IEEE conference on computer vision and pattern recognition. Ieee, 2009.
[16]
J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, "Bert: Pre-training of deep bidirectional transformers for language understanding," arXiv preprint arXiv:1810.04805, 2018.
[17]
EEMBC, "Introducing the eembc mlmark benchmark," https://www.eembc.org/mlmark/index.php, Embedded Microprocessor Benchmark Consortium, 2019.
[18]
I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, "Generative adversarial nets," in Advances in Neural Information Processing Systems, 2014.
[19]
U. Gupta, X. Wang, M. Naumov, C. Wu, B. Reagen, D. Brooks, B. Cottel, K. M. Hazelwood, B. Jia, H. S. Lee, A. Malevich, D. Mudigere, M. Smelyanskiy, L. Xiong, and X. Zhang, "The architectural implications of facebook's dnn-based personalized recommendation," CoRR, vol. abs/1906.03109, 2019. [Online]. Available: http://arxiv.org/abs/1906.03109
[20]
K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," in Proceedings of Conference on Computer Vision and Pattern Recognition, 2016.
[21]
A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam, "Mobilenets: Efficient convolutional neural networks for mobile vision applications," arXiv preprint arXiv:1704.04861, 2017.
[22]
J. Hu, L. Shen, and G. Sun, "Squeeze-and-excitation networks," in Proceedings of Computer Vision and Pattern Recognition, 2018.
[23]
A. Ignatov, R. Timofte, A. Kulik, S. Yang, K. Wang, F. Baum, M. Wu, L. Xu, and L. Van Gool, "Ai benchmark: All about deep learning on smartphones in 2019," arXiv preprint arXiv:1910.06663, 2019.
[24]
Intel, "Intel distribution of openvino toolkit," https://software.intel.com/en-us/openvino-toolkit, Intel, 2018.
[25]
Intel, "Math kernel library," https://software.intel.com/en-us/mkl, 2018.
[26]
Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell, "Caffe: Convolutional Architecture for Fast Feature Embedding," in ACM International Conference on Multimedia, 2014.
[27]
D. Kanter, "Supercomputing 19: Hpc meets machine learning," https://www.realworldtech.com/sc19-hpc-meets-machine-learning/, real world technologies, 11 2019.
[28]
D. S. Khudia, P. Basu, and S. Deng, "Open-sourcing fbgemm for state-of-the-art server-side inference," https://engineering.fb.com/ml-applications/fbgemm/, Facebook, 2018.
[29]
A. Krizhevsky, I. Sutskever, and G. E. Hinton, "Imagenet classification with deep convolutional neural networks," in Advances in Neural Information Processing Systems, 2012.
[30]
J. Lee, N. Chirkov, E. Ignasheva, Y. Pisarchyk, M. Shieh, F. Riccardi, R. Sarokin, A. Kulik, and M. Grundmann, "On-device neural net inference with mobile gpus," arXiv preprint arXiv:1907.01989, 2019.
[31]
K. Lee, V. Rao, and W. C. Arnold, "Accelerating facebook's infrastructure with application-specific hardware," https://engineering.fb.com/data-center-engineering/accelerating-infrastructure/, Facebook, 3 2019.
[32]
S. Levine, P. Pastor, A. Krizhevsky, J. Ibarz, and D. Quillen, "Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection," The International Journal of Robotics Research, vol. 37, no. 4-5, 2018.
[33]
T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, "Microsoft coco: Common objects in context," in European conference on computer vision. Springer, 2014.
[34]
W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A. C. Berg, "Ssd: Single shot multibox detector," in European conference on computer vision. Springer, 2016.
[35]
P. Mattson, C. Cheng, C. Coleman, G. Diamos, P. Micikevicius, D. Patterson, H. Tang, G.-Y. Wei, P. Bailis, V. Bittorf, D. Brooks, D. Chen, D. Dutta, U. Gupta, K. Hazelwood, A. Hock, X. Huang, B. Jia, D. Kang, D. Kanter, N. Kumar, J. Liao, D. Narayanan, T. Oguntebi, G. Pekhimenko, L. Pentecost, V. J. Reddi, T. Robie, T. S. John, C.-J. Wu, L. Xu, C. Young, and M. Zaharia, "Mlperf training benchmark," arXiv preprint arXiv:1910.01500, 2019.
[36]
P. Mattson, V. J. Reddi, C. Cheng, C. Coleman, G. Diamos, D. Kanter, P. Micikevicius, D. Patterson, G. Schmuelling, H. Tang et al., "Mlperf: An industry standard benchmark suite for machine learning performance," IEEE Micro, vol. 40, no. 2, pp. 8--16, 2020.
[37]
MLPerf, "ResNet in TensorFlow," https://github.com/mlperf/training/tree/master/image_classification/tensorflow/official, 2019.
[38]
M. Naumov, D. Mudigere, H. M. Shi, J. Huang, N. Sundaraman, J. Park, X. Wang, U. Gupta, C. Wu, A. G. Azzolini, D. Dzhulgakov, A. Mallevich, I. Cherniavskii, Y. Lu, R. Krishnamoorthi, A. Yu, V. Kondratenko, S. Pereira, X. Chen, W. Chen, V. Rao, B. Jia, L. Xiong, and M. Smelyanskiy, "Deep learning recommendation model for personalization and recommendation systems," CoRR, vol. abs/1906.00091, 2019. [Online]. Available: http://arxiv.org/abs/1906.00091
[39]
NVIDIA, "Nvidia tensorrt: Programmable inference accelerator," https://developer.nvidia.com/tensorrt, NVIDIA.
[40]
K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu, "Bleu: a method for automatic evaluation of machine translation," in Proceedings of the 40th annual meeting on association for computational linguistics. Association for Computational Linguistics, 2002.
[41]
A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer, "Automatic differentiation in pytorch," 2017.
[42]
M. Post, "A call for clarity in reporting bleu scores," arXiv preprint arXiv:1804.08771, 2018.
[43]
Principled Technologies, "Aixprt community preview," https://www.principledtechnologies.com/benchmarkxprt/aixprt/, 2019.
[44]
Qualcomm, "Snapdragon neural processing engine sdk reference guide," https://developer.qualcomm.com/docs/snpe/overview.html, Qualcomm.
[45]
M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, "Mobilenetv2: Inverted residuals and linear bottlenecks," in Proceedings of Conference on Computer Vision and Pattern Recognition, 2018.
[46]
F. Seide and A. Agarwal, "Cntk: Microsoft's open-source deep-learning toolkit," in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2016.
[47]
A. Tamhane and D. Dunlop, "Statistics and data analysis: from elementary to intermediate," Prentice Hall, 2000.
[48]
S. Tang, "Ai-chip," https://basicmi.github.io/AI-Chip/, 2019.
[49]
S. Tokui, K. Oono, S. Hido, and J. Clayton, "Chainer: a next-generation open source framework for deep learning," in Proceedings of workshop on machine learning systems (LearningSys) in Neural Information Processing Systems (NeurIPS), vol. 5, 2015.
[50]
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, "Attention is all you need," in Advances in Neural Information Processing Systems, 2017.
[51]
P. Wang, X. Huang, X. Cheng, D. Zhou, Q. Geng, and R. Yang, "The apolloscape open dataset for autonomous driving and its application," IEEE transactions on pattern analysis and machine intelligence, 2019.
[52]
WMT, "First conference on machine translation," 2016. [Online]. Available: http://www.statmt.org/wmt16/
[53]
Y. Wu, M. Schuster, Z. Chen, Q. V. Le, M. Norouzi, W. Macherey, M. Krikun, Y. Cao, Q. Gao, K. Macherey et al., "Google's neural machine translation system: Bridging the gap between human and machine translation," arXiv preprint arXiv:1609.08144, 2016.
[54]
S. Xie, R. Girshick, P. Dollár, Z. Tu, and K. He, "Aggregated residual transformations for deep neural networks," in Proceedings of Conference on Computer Vision and Pattern Recognition, 2017.
[55]
D. Xu, D. Anguelov, and A. Jain, "Pointfusion: Deep sensor fusion for 3d bounding box estimation," in Proceedings of Conference on Computer Vision and Pattern Recognition, 2018.
[56]
F. Yu, W. Xian, Y. Chen, F. Liu, M. Liao, V. Madhavan, and T. Darrell, "Bdd100k: A diverse driving video database with scalable annotation tooling," arXiv preprint arXiv:1805.04687, 2018.
[57]
H. Zhu, M. Akrout, B. Zheng, A. Pelegris, A. Jayarajan, A. Phanishayee, B. Schroeder, and G. Pekhimenko, "Benchmarking and analyzing deep neural network training," in IEEE International Symposium on Workload Characterization (IISWC), 2018.

Cited By

View all
  • (2024)Identifying Architectural Design Decisions for Achieving Green ML ServingProceedings of the IEEE/ACM 3rd International Conference on AI Engineering - Software Engineering for AI10.1145/3644815.3644962(18-23)Online publication date: 14-Apr-2024
  • (2024)Application-level Validation of Accelerator Designs Using a Formal Software/Hardware InterfaceACM Transactions on Design Automation of Electronic Systems10.1145/363905129:2(1-25)Online publication date: 14-Feb-2024
  • (2024)HamPerf: A Hamiltonian-Oriented Approach to Quantum BenchmarkingProceedings of the 21st ACM International Conference on Computing Frontiers: Workshops and Special Sessions10.1145/3637543.3653431(133-138)Online publication date: 7-May-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ISCA '20: Proceedings of the ACM/IEEE 47th Annual International Symposium on Computer Architecture
May 2020
1152 pages
ISBN:9781728146614

Sponsors

In-Cooperation

  • IEEE

Publisher

IEEE Press

Publication History

Published: 23 September 2020

Check for updates

Author Tags

  1. benchmarking
  2. inference
  3. machine learning

Qualifiers

  • Research-article

Conference

ISCA '20
Sponsor:

Acceptance Rates

Overall Acceptance Rate 543 of 3,203 submissions, 17%

Upcoming Conference

ISCA '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)93
  • Downloads (Last 6 weeks)15
Reflects downloads up to 04 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Identifying Architectural Design Decisions for Achieving Green ML ServingProceedings of the IEEE/ACM 3rd International Conference on AI Engineering - Software Engineering for AI10.1145/3644815.3644962(18-23)Online publication date: 14-Apr-2024
  • (2024)Application-level Validation of Accelerator Designs Using a Formal Software/Hardware InterfaceACM Transactions on Design Automation of Electronic Systems10.1145/363905129:2(1-25)Online publication date: 14-Feb-2024
  • (2024)HamPerf: A Hamiltonian-Oriented Approach to Quantum BenchmarkingProceedings of the 21st ACM International Conference on Computing Frontiers: Workshops and Special Sessions10.1145/3637543.3653431(133-138)Online publication date: 7-May-2024
  • (2024)FlexNN: Efficient and Adaptive DNN Inference on Memory-Constrained Edge DevicesProceedings of the 30th Annual International Conference on Mobile Computing and Networking10.1145/3636534.3649391(709-723)Online publication date: 29-May-2024
  • (2024)Mastering Computer Vision Inference FrameworksCompanion of the 15th ACM/SPEC International Conference on Performance Engineering10.1145/3629527.3651430(28-33)Online publication date: 7-May-2024
  • (2024)ZKML: An Optimizing System for ML Inference in Zero-Knowledge ProofsProceedings of the Nineteenth European Conference on Computer Systems10.1145/3627703.3650088(560-574)Online publication date: 22-Apr-2024
  • (2024)CDMPP: A Device-Model Agnostic Framework for Latency Prediction of Tensor ProgramsProceedings of the Nineteenth European Conference on Computer Systems10.1145/3627703.3629572(1054-1074)Online publication date: 22-Apr-2024
  • (2024)Model Selection for Latency-Critical Inference ServingProceedings of the Nineteenth European Conference on Computer Systems10.1145/3627703.3629565(1016-1038)Online publication date: 22-Apr-2024
  • (2024)T3: Transparent Tracking & Triggering for Fine-grained Overlap of Compute & CollectivesProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3620665.3640410(1146-1164)Online publication date: 27-Apr-2024
  • (2024)A Journey of a 1,000 Kernels Begins with a Single Step: A Retrospective of Deep Learning on GPUsProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3620665.3640367(20-36)Online publication date: 27-Apr-2024
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media