research-article

MLPerf inference benchmark

ISCA '20: Proceedings of the ACM/IEEE 47th Annual International Symposium on Computer Architecture

Pages 446 - 459

https://doi.org/10.1109/ISCA45697.2020.00045

Published: 23 September 2020 Publication History

Abstract

Machine-learning (ML) hardware and software system demand is burgeoning. Driven by ML applications, the number of different ML inference systems has exploded. Over 100 organizations are building ML inference chips, and the systems that incorporate existing models span at least three orders of magnitude in power consumption and five orders of magnitude in performance; they range from embedded devices to data-center solutions. Fueling the hardware are a dozen or more software frameworks and libraries. The myriad combinations of ML hardware and ML software make assessing ML-system performance in an architecture-neutral, representative, and reproducible manner challenging. There is a clear need for industry-wide standard ML benchmarking and evaluation criteria. MLPerf Inference answers that call. In this paper, we present our benchmarking method for evaluating ML inference systems. Driven by more than 30 organizations as well as more than 200 ML engineers and practitioners, MLPerf prescribes a set of rules and best practices to ensure comparability across systems with wildly differing architectures. The first call for submissions garnered more than 600 reproducible inference-performance measurements from 14 organizations, representing over 30 systems that showcase a wide range of capabilities. The submissions attest to the benchmark's flexibility and adaptability.

References

[1]

M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat, G. Irving, M. Isard et al., "TensorFlow: A System for Large-Scale Machine Learning," in OSDI, vol. 16, 2016.

[2]

R. Adolf, S. Rama, B. Reagen, G.-Y. Wei, and D. Brooks, "Fathom: Reference Workloads for Modern Deep Learning Methods," in IEEE International Symposium on Workload Characterization (IISWC), 2016.

[3]

Alibaba, "Ai matrix." https://aimatrix.ai/en-us/, Alibaba, 2018.

[4]

D. Amodei and D. Hernandez, "Ai and compute," https://blog.openai.com/ai-and-compute/, OpenAI, 2018.

[5]

Apple, "Core ml: Integrate machine learning models into your app," https://developer.apple.com/documentation/coreml, Apple, 2017.

[6]

V. Badrinarayanan, A. Kendall, and R. Cipolla, "Segnet: A deep convolutional encoder-decoder architecture for image segmentation," IEEE transactions on pattern analysis and machine intelligence, vol. 39, no. 12, 2017.

[7]

J. Bai, F. Lu, K. Zhang et al., "Onnx: Open neural network exchange," https://github.com/onnx/onnx, 2019.

[8]

Baidu, "DeepBench: Benchmarking Deep Learning Operations on Different Hardware," https://github.com/baidu-research/DeepBench, 2017.

[9]

S. Bianco, R. Cadene, L. Celona, and P. Napoletano, "Benchmark analysis of representative deep neural network architectures," IEEE Access, vol. 6, 2018.

[10]

T. Chen, M. Li, Y. Li, M. Lin, N. Wang, M. Wang, T. Xiao, B. Xu, C. Zhang, and Z. Zhang, "Mxnet: A flexible and efficient machine learning library for heterogeneous distributed systems," arXiv preprint arXiv:1512.01274, 2015.

[11]

S. Chetlur, C. Woolley, P. Vandermersch, J. Cohen, J. Tran, B. Catanzaro, and E. Shelhamer, "cudnn: Efficient primitives for deep learning," CoRR, vol. abs/1410.0759, 2014. [Online]. Available: http://arxiv.org/abs/1410.0759

[12]

F. Chollet et al., "Keras," https://keras.io, 2015.

[13]

F. Chollet, "Xception: Deep learning with depthwise separable convolutions," in Proceedings of Conference on Computer Vision and Pattern Recognition, 2017.

[14]

C. Coleman, D. Narayanan, D. Kang, T. Zhao, J. Zhang, L. Nardi, P. Bailis, K. Olukotun, C. Ré, and M. Zaharia, "DAWNBench: An End-to-End Deep Learning Benchmark and Competition," NeurIPS ML Systems Workshop, 2017.

[15]

J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, "Imagenet: A large-scale hierarchical image database," in 2009 IEEE conference on computer vision and pattern recognition. Ieee, 2009.

[16]

J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, "Bert: Pre-training of deep bidirectional transformers for language understanding," arXiv preprint arXiv:1810.04805, 2018.

[17]

EEMBC, "Introducing the eembc mlmark benchmark," https://www.eembc.org/mlmark/index.php, Embedded Microprocessor Benchmark Consortium, 2019.

[18]

I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, "Generative adversarial nets," in Advances in Neural Information Processing Systems, 2014.

[19]

U. Gupta, X. Wang, M. Naumov, C. Wu, B. Reagen, D. Brooks, B. Cottel, K. M. Hazelwood, B. Jia, H. S. Lee, A. Malevich, D. Mudigere, M. Smelyanskiy, L. Xiong, and X. Zhang, "The architectural implications of facebook's dnn-based personalized recommendation," CoRR, vol. abs/1906.03109, 2019. [Online]. Available: http://arxiv.org/abs/1906.03109

[20]

K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," in Proceedings of Conference on Computer Vision and Pattern Recognition, 2016.

[21]

A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam, "Mobilenets: Efficient convolutional neural networks for mobile vision applications," arXiv preprint arXiv:1704.04861, 2017.

[22]

J. Hu, L. Shen, and G. Sun, "Squeeze-and-excitation networks," in Proceedings of Computer Vision and Pattern Recognition, 2018.

[23]

A. Ignatov, R. Timofte, A. Kulik, S. Yang, K. Wang, F. Baum, M. Wu, L. Xu, and L. Van Gool, "Ai benchmark: All about deep learning on smartphones in 2019," arXiv preprint arXiv:1910.06663, 2019.

[24]

Intel, "Intel distribution of openvino toolkit," https://software.intel.com/en-us/openvino-toolkit, Intel, 2018.

[25]

Intel, "Math kernel library," https://software.intel.com/en-us/mkl, 2018.

[26]

Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell, "Caffe: Convolutional Architecture for Fast Feature Embedding," in ACM International Conference on Multimedia, 2014.

Digital Library

[27]

D. Kanter, "Supercomputing 19: Hpc meets machine learning," https://www.realworldtech.com/sc19-hpc-meets-machine-learning/, real world technologies, 11 2019.

[28]

D. S. Khudia, P. Basu, and S. Deng, "Open-sourcing fbgemm for state-of-the-art server-side inference," https://engineering.fb.com/ml-applications/fbgemm/, Facebook, 2018.

[29]

A. Krizhevsky, I. Sutskever, and G. E. Hinton, "Imagenet classification with deep convolutional neural networks," in Advances in Neural Information Processing Systems, 2012.

Digital Library

[30]

J. Lee, N. Chirkov, E. Ignasheva, Y. Pisarchyk, M. Shieh, F. Riccardi, R. Sarokin, A. Kulik, and M. Grundmann, "On-device neural net inference with mobile gpus," arXiv preprint arXiv:1907.01989, 2019.

[31]

K. Lee, V. Rao, and W. C. Arnold, "Accelerating facebook's infrastructure with application-specific hardware," https://engineering.fb.com/data-center-engineering/accelerating-infrastructure/, Facebook, 3 2019.

[32]

S. Levine, P. Pastor, A. Krizhevsky, J. Ibarz, and D. Quillen, "Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection," The International Journal of Robotics Research, vol. 37, no. 4-5, 2018.

[33]

T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, "Microsoft coco: Common objects in context," in European conference on computer vision. Springer, 2014.

[34]

W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A. C. Berg, "Ssd: Single shot multibox detector," in European conference on computer vision. Springer, 2016.

[35]

P. Mattson, C. Cheng, C. Coleman, G. Diamos, P. Micikevicius, D. Patterson, H. Tang, G.-Y. Wei, P. Bailis, V. Bittorf, D. Brooks, D. Chen, D. Dutta, U. Gupta, K. Hazelwood, A. Hock, X. Huang, B. Jia, D. Kang, D. Kanter, N. Kumar, J. Liao, D. Narayanan, T. Oguntebi, G. Pekhimenko, L. Pentecost, V. J. Reddi, T. Robie, T. S. John, C.-J. Wu, L. Xu, C. Young, and M. Zaharia, "Mlperf training benchmark," arXiv preprint arXiv:1910.01500, 2019.

[36]

P. Mattson, V. J. Reddi, C. Cheng, C. Coleman, G. Diamos, D. Kanter, P. Micikevicius, D. Patterson, G. Schmuelling, H. Tang et al., "Mlperf: An industry standard benchmark suite for machine learning performance," IEEE Micro, vol. 40, no. 2, pp. 8--16, 2020.

[37]

MLPerf, "ResNet in TensorFlow," https://github.com/mlperf/training/tree/master/image_classification/tensorflow/official, 2019.

[38]

M. Naumov, D. Mudigere, H. M. Shi, J. Huang, N. Sundaraman, J. Park, X. Wang, U. Gupta, C. Wu, A. G. Azzolini, D. Dzhulgakov, A. Mallevich, I. Cherniavskii, Y. Lu, R. Krishnamoorthi, A. Yu, V. Kondratenko, S. Pereira, X. Chen, W. Chen, V. Rao, B. Jia, L. Xiong, and M. Smelyanskiy, "Deep learning recommendation model for personalization and recommendation systems," CoRR, vol. abs/1906.00091, 2019. [Online]. Available: http://arxiv.org/abs/1906.00091

[39]

NVIDIA, "Nvidia tensorrt: Programmable inference accelerator," https://developer.nvidia.com/tensorrt, NVIDIA.

[40]

K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu, "Bleu: a method for automatic evaluation of machine translation," in Proceedings of the 40th annual meeting on association for computational linguistics. Association for Computational Linguistics, 2002.

[41]

A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer, "Automatic differentiation in pytorch," 2017.

[42]

M. Post, "A call for clarity in reporting bleu scores," arXiv preprint arXiv:1804.08771, 2018.

[43]

Principled Technologies, "Aixprt community preview," https://www.principledtechnologies.com/benchmarkxprt/aixprt/, 2019.

[44]

Qualcomm, "Snapdragon neural processing engine sdk reference guide," https://developer.qualcomm.com/docs/snpe/overview.html, Qualcomm.

[45]

M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, "Mobilenetv2: Inverted residuals and linear bottlenecks," in Proceedings of Conference on Computer Vision and Pattern Recognition, 2018.

[46]

F. Seide and A. Agarwal, "Cntk: Microsoft's open-source deep-learning toolkit," in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2016.

[47]

A. Tamhane and D. Dunlop, "Statistics and data analysis: from elementary to intermediate," Prentice Hall, 2000.

[48]

S. Tang, "Ai-chip," https://basicmi.github.io/AI-Chip/, 2019.

[49]

S. Tokui, K. Oono, S. Hido, and J. Clayton, "Chainer: a next-generation open source framework for deep learning," in Proceedings of workshop on machine learning systems (LearningSys) in Neural Information Processing Systems (NeurIPS), vol. 5, 2015.

[50]

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, "Attention is all you need," in Advances in Neural Information Processing Systems, 2017.

[51]

P. Wang, X. Huang, X. Cheng, D. Zhou, Q. Geng, and R. Yang, "The apolloscape open dataset for autonomous driving and its application," IEEE transactions on pattern analysis and machine intelligence, 2019.

[52]

WMT, "First conference on machine translation," 2016. [Online]. Available: http://www.statmt.org/wmt16/

[53]

Y. Wu, M. Schuster, Z. Chen, Q. V. Le, M. Norouzi, W. Macherey, M. Krikun, Y. Cao, Q. Gao, K. Macherey et al., "Google's neural machine translation system: Bridging the gap between human and machine translation," arXiv preprint arXiv:1609.08144, 2016.

[54]

S. Xie, R. Girshick, P. Dollár, Z. Tu, and K. He, "Aggregated residual transformations for deep neural networks," in Proceedings of Conference on Computer Vision and Pattern Recognition, 2017.

[55]

D. Xu, D. Anguelov, and A. Jain, "Pointfusion: Deep sensor fusion for 3d bounding box estimation," in Proceedings of Conference on Computer Vision and Pattern Recognition, 2018.

[56]

F. Yu, W. Xian, Y. Chen, F. Liu, M. Liao, V. Madhavan, and T. Darrell, "Bdd100k: A diverse driving video database with scalable annotation tooling," arXiv preprint arXiv:1805.04687, 2018.

[57]

H. Zhu, M. Akrout, B. Zheng, A. Pelegris, A. Jayarajan, A. Phanishayee, B. Schroeder, and G. Pekhimenko, "Benchmarking and analyzing deep neural network training," in IEEE International Symposium on Workload Characterization (IISWC), 2018.

Cited By

Durán FMartinez-Fernandez SMartinez MLago PBosch JLewis GCleland-Huang JMuccini H(2024)Identifying Architectural Design Decisions for Achieving Green ML ServingProceedings of the IEEE/ACM 3rd International Conference on AI Engineering - Software Engineering for AI10.1145/3644815.3644962(18-23)Online publication date: 14-Apr-2024
https://dl.acm.org/doi/10.1145/3644815.3644962
Huang BLyubomirsky SLi YHe MSmith GTambe TGaonkar ACanumalla VCheung AWei GGupta ATatlock ZMalik S(2024)Application-level Validation of Accelerator Designs Using a Formal Software/Hardware InterfaceACM Transactions on Design Automation of Electronic Systems10.1145/363905129:2(1-25)Online publication date: 14-Feb-2024
https://dl.acm.org/doi/10.1145/3639051
Butko AKlymko KCamps DSawaya N(2024)HamPerf: A Hamiltonian-Oriented Approach to Quantum BenchmarkingProceedings of the 21st ACM International Conference on Computing Frontiers: Workshops and Special Sessions10.1145/3637543.3653431(133-138)Online publication date: 7-May-2024
https://dl.acm.org/doi/10.1145/3637543.3653431
Show More Cited By

MLPerf inference benchmark
1. General and reference
  1. Cross-computing tools and techniques

Recommendations

Everyone is a Winner: Interpreting MLPerf Inference Benchmark Results
Performance Evaluation and Benchmarking
Abstract
MLPerf Inference benchmark suite version 1.0 was recently released. It is a third release and the version number along with minor changes from the previous version indicate a maturity of the suite. With 33 benchmarks and almost 2,000 results it ...
A Benchmark Characterization of the EEMBC Benchmark Suite

Benchmark consumers expect benchmark suites to be complete, accurate, and consistent, and benchmark scores serve as relative measures of performance. However, it is important to understand how benchmarks stress the processors that they aim to test. This ...
A Study of Machine Learning Inference Benchmarks
ICAIP '20: Proceedings of the 4th International Conference on Advances in Image Processing

Machine learning (ML) is becoming a powerful tool for a variety of applications where artificial intelligence solutions are required. A ML benchmark is a standard suite to measure, evaluate and compare the performance and efficiency of ML systems. This ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ISCA '20: Proceedings of the ACM/IEEE 47th Annual International Symposium on Computer Architecture

May 2020

1152 pages

ISBN:9781728146614

General Chairs:
José Martínez
Cornell University
,
José Duato
Universitat Politècnica de València
,
Program Chair:
Lieven Eeckhout
Ghent University

Sponsors

SIGARCH: ACM Special Interest Group on Computer Architecture

In-Cooperation

IEEE

Publisher

IEEE Press

Publication History

Published: 23 September 2020

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ISCA '20

Sponsor:

SIGARCH

ISCA '20: The 47th Annual International Symposium on Computer Architecture

May 30 - June 3, 2020

Virtual Event

Acceptance Rates

Overall Acceptance Rate 543 of 3,203 submissions, 17%

Upcoming Conference

ISCA '25

Sponsor:
sigarch

The 52nd Annual International Symposium on Computer Architecture

June 21 - 25, 2025

Tokyo , Japan

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

68
Total Citations
View Citations
391
Total Downloads

Downloads (Last 12 months)93
Downloads (Last 6 weeks)15

Reflects downloads up to 04 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Durán FMartinez-Fernandez SMartinez MLago PBosch JLewis GCleland-Huang JMuccini H(2024)Identifying Architectural Design Decisions for Achieving Green ML ServingProceedings of the IEEE/ACM 3rd International Conference on AI Engineering - Software Engineering for AI10.1145/3644815.3644962(18-23)Online publication date: 14-Apr-2024
https://dl.acm.org/doi/10.1145/3644815.3644962
Huang BLyubomirsky SLi YHe MSmith GTambe TGaonkar ACanumalla VCheung AWei GGupta ATatlock ZMalik S(2024)Application-level Validation of Accelerator Designs Using a Formal Software/Hardware InterfaceACM Transactions on Design Automation of Electronic Systems10.1145/363905129:2(1-25)Online publication date: 14-Feb-2024
https://dl.acm.org/doi/10.1145/3639051
Butko AKlymko KCamps DSawaya N(2024)HamPerf: A Hamiltonian-Oriented Approach to Quantum BenchmarkingProceedings of the 21st ACM International Conference on Computing Frontiers: Workshops and Special Sessions10.1145/3637543.3653431(133-138)Online publication date: 7-May-2024
https://dl.acm.org/doi/10.1145/3637543.3653431
Li XLi YLi YCao TLiu YGanesan DShi W(2024)FlexNN: Efficient and Adaptive DNN Inference on Memory-Constrained Edge DevicesProceedings of the 30th Annual International Conference on Mobile Computing and Networking10.1145/3636534.3649391(709-723)Online publication date: 29-May-2024
https://dl.acm.org/doi/10.1145/3636534.3649391
Pochelu PCastro Lopez OBalsamo SKnottenbelt WAbad CShang W(2024)Mastering Computer Vision Inference FrameworksCompanion of the 15th ACM/SPEC International Conference on Performance Engineering10.1145/3629527.3651430(28-33)Online publication date: 7-May-2024
https://dl.acm.org/doi/10.1145/3629527.3651430
Chen BWaiwitlikhit SStoica IKang D(2024)ZKML: An Optimizing System for ML Inference in Zero-Knowledge ProofsProceedings of the Nineteenth European Conference on Computer Systems10.1145/3627703.3650088(560-574)Online publication date: 22-Apr-2024
https://dl.acm.org/doi/10.1145/3627703.3650088
Hu HSu JZhao JPeng YZhu YLin HWu C(2024)CDMPP: A Device-Model Agnostic Framework for Latency Prediction of Tensor ProgramsProceedings of the Nineteenth European Conference on Computer Systems10.1145/3627703.3629572(1054-1074)Online publication date: 22-Apr-2024
https://dl.acm.org/doi/10.1145/3627703.3629572
Mendoza DRomero FTrippel C(2024)Model Selection for Latency-Critical Inference ServingProceedings of the Nineteenth European Conference on Computer Systems10.1145/3627703.3629565(1016-1038)Online publication date: 22-Apr-2024
https://dl.acm.org/doi/10.1145/3627703.3629565
Pati SAga SIslam MJayasena NSinclair MTsafrir DMusuvathi MGupta RAbu-Ghazaleh N(2024)T3: Transparent Tracking & Triggering for Fine-grained Overlap of Compute & CollectivesProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3620665.3640410(1146-1164)Online publication date: 27-Apr-2024
https://dl.acm.org/doi/10.1145/3620665.3640410
Davies MMcDougall IAnandaraj SMachchhar DJain RSankaralingam KTsafrir DMusuvathi MGupta RAbu-Ghazaleh N(2024)A Journey of a 1,000 Kernels Begins with a Single Step: A Retrospective of Deep Learning on GPUsProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3620665.3640367(20-36)Online publication date: 27-Apr-2024
https://dl.acm.org/doi/10.1145/3620665.3640367
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents