Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Advertisement

Performance prediction of deep learning applications training in GPU as a service systems

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

Data analysts predict that the GPU as a service (GPUaaS) market will grow to support 3D models, animated video processing, gaming, and deep learning model training. The main cloud providers already offer in their catalogs VMs with different type and number of GPUs. Because of the significant difference in terms of performance and cost of this type of VMs, correctly selecting the most appropriate one to execute the required job is mandatory to minimize the training cost. Motivated by these considerations, this paper proposes performance models to predict GPU-deployed neural networks (NNs) training. The proposed approach is based on machine learning and exploits two main sets of features, thus capturing both NNs properties and hardware characteristics. Such data enable the learning of multiple linear regression models that, coupled with an established feature selection technique, become accurate prediction tools, with errors below 12% on average. An extensive experimental campaign, performed both on public and in-house private cloud deployments, considers popular deep NNs used for image classification and speech transcription. The results show that prediction errors remain small even when extrapolating outside the range spanned by the input data, with important implications for the models’ applicability.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Data availability

The datasets generated and analysed during the current study and the source code supporting the experiments are available in the Zenodo repository at https://doi.org/10.5281/zenodo.5327342.

Notes

  1. Experiment data and our scripts source code for models training and testing are available at https://doi.org/10.5281/zenodo.5327342.

  2. An implementation of ResNet with a variable number of inner modules is available at https://github.com/KellerJordan/ResNet-PyTorch-CIFAR10.

References

  1. Jun, T.J., Kang, D., Kim, D., Kim, D.: GPU enabled serverless computing framework. In: 26th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP 2018), Cambridge, 21–23 March 2018, pp. 533–540 (2018)

  2. Global Market Insights (2019) GPU as a service market size by product. www.gminsights.com/industry-analysis/gpu-as-a-service-market

  3. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Proceedings of 25th International Conference on Neural Information Processing Systems (NIPS 12), vol. 1, pp 1097–1105 (2012)

  4. Khomenko, V., Shyshkov, O., Radyvonenko, O., Bokhan, K.: Accelerating recurrent neural network training using sequence bucketing and multi-gpu data parallelization. CoRR abs/1708.05604, (2017) arXiv:1708.05604

  5. PyTorch: Tensors and dynamic neural networks in Python with strong GPU acceleration (2018). https://pytorch.org

  6. TensorFlow: An open source machine learning framework for everyone (2018). www.tensorflow.org

  7. Amazon (2018) Amazon EC2 elastic GPUs. https://aws.amazon.com/ec2/elastic-gpus/

  8. Google: GPUs on Compute Engine (2018). https://cloud.google.com/compute/docs/gpus/

  9. Microsoft: GPU optimized virtual machine sizes (2018). https://docs.microsoft.com/en-us/azure/virtual-machines/windows/sizes-gpu

  10. Hadjis, S., Zhang, C., Mitliagkas, I., Ré, C.: Omnivore: an optimizer for multi-device deep learning on CPUs and GPUs. Arxiv Preprint (2016). arXiv:1606.04487

  11. Shawi, R.E., Wahab, A., Barnawi, A., Sakr, S.: Dlbench: a comprehensive experimental evaluation of deep learning frameworks. Clust. Comput. 24(3), 2017–2038 (2021)

    Article  Google Scholar 

  12. Draper, N.R., Smith, H.: Applied Regression Analysis. Wiley, New York (1966)

    MATH  Google Scholar 

  13. Srivastava, N., Hinton, G.E., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)

    MathSciNet  MATH  Google Scholar 

  14. Lin, M., Chen, Q., Yan, S.: Network in network. In: 2nd International Conference on Learning Representations (ICLR 2014). arXiv preprint (2013). arXiv:1312.4400

  15. Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., Manzagol, P.: Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. J. Mach. Learn. Res. 11, 3371–3408 (2010)

    MathSciNet  MATH  Google Scholar 

  16. Jia, W., Shaw, K.A., Martonosi, M.: Stargazer: automated regression-based GPU design space exploration. In: International Symposium on Performance Analysis of Systems & Software (ISPASS 12). IEEE (2012) https://doi.org/10.1109/ISPASS.2012.6189201

  17. Shafiabadi, M.H., Pedram, H., Reshadi, M., Reza, A.: Comprehensive regression-based model to predict performance of general-purpose graphics processing unit. Clust. Comput. 23(2), 1505–1516 (2020)

    Article  Google Scholar 

  18. Zhang, Y., Owens, J.D.: A quantitative performance analysis model for GPU architectures. In: 17th International Symposium on High Performance Computer Architecture (HPCA 11). IEEE (2011). https://doi.org/10.1109/HPCA.2011.5749745

  19. Song, S., Su, C., Rountree, B., Cameron, K.W.: A simplified and accurate model of power-performance efficiency on emergent GPU architectures. In: 27th International Symposium on Parallel and Distributed Processing (IPDPS 13). IEEE (2013) https://doi.org/10.1109/IPDPS.2013.73

  20. Dao, T.T., Kim, J., Seo, S., Egger, B., Lee, J.: A performance model for GPUs with caches. IEEE Trans. Parallel Distrib. Syst. 26(7), 1800–1813 (2015)

    Google Scholar 

  21. Lu, Z., Rallapalli, S., Chan, K., La Porta, T.: Modeling the resource requirements of convolutional neural networks on mobile devices. In: Proceedings of Conference on Multimedia (MM 17). ACM, New York (2017). https://doi.org/10.1145/3123266.3123389

  22. Gupta, U., Babu, M., Ayoub, R., Kishinevsky, M., Paterna, F., Gumussoy, S., Ogras, U.Y.: An misc learning methodology for performance modeling of graphics processors. IEEE Trans. Comput. (2018). https://doi.org/10.1109/TC.2018.2840710

    Article  MATH  Google Scholar 

  23. Peng, Y., Bao, Y., Chen, Y., Wu, C., Guo, C.: Optimus: an efficient dynamic resource scheduler for deep learning clusters. In: Proceedings of the Thirteenth EuroSys Conference (EuroSys 2018), Porto, Portugal, 23–26 April 2018. ACM, New York, pp. 3:1–3:14 (2018)

  24. Dube, P., Suk, T., Wang, C.: AI gauge: Runtime estimation for deep learning in the cloud. In: 31st International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD 2019), Campo Grande, Brazil, 15–18 October 2019, pp. 160–167. IEEE (2019) https://doi.org/10.1109/SBAC-PAD.2019.00035

  25. Madougou, S., Varbanescu, A., de Laat, C., van Nieuwpoort, R.: The landscape of GPGPU performance modeling tools. J. Parallel Comput. 56, 18–33 (2016)

    Article  Google Scholar 

  26. Kerr, A., Diamos, G., Yalamanchili, S.: Modeling GPU-CPU workloads and systems. In: Proceedings of 3rd Workshop General-Purpose Computation on Graphics Processing Units (GPGPU-3). ACM, New York (2010). https://doi.org/10.1145/1735688.1735696

  27. Diamos, G.F., Kerr, A., Yalamanchili, S., Clark, N.: Ocelot: a dynamic optimization framework for bulk-synchronous applications in heterogeneous systems. In: 19th International Conference on Parallel Architecture and Compilation Techniques (PACT 10). IEEE, pp. 353–364 (2010). https://doi.org/10.1145/1854273.1854318

  28. Gianniti, E., Zhang, L., Ardagna, D.: Performance prediction of GPU-based deep learning applications. In: 30th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD 18) (2018)

  29. Gianniti, E., Zhang, L., Ardagna, D.: Performance prediction of gpu-based deep learning applications. In: Proceedings of the 9th International Conference on Cloud Computing and Services Science, CLOSER 2019, Heraklion, Crete, Greece, May 2-4, 2019, SciTePress, pp 279–286 (2019)

  30. Mendoza, D., Romero, F., Li, Q., Yadwadkar, N.J., Kozyrakis, C.: Interference-aware scheduling for inference serving. In: Proceedings of the 1st Workshop on Machine Learning and Systemsg Virtual Event (EuroMLSys@EuroSys 2021), Edinburgh, Scotland, UK, 26 April 2021. ACM, New York, pp. 80–88 (2021)

  31. Yeung, G., Borowiec, D., Yang, R., Friday, A., Harper, R., Garraghan, P.: Horus: Interference-aware and prediction-based scheduling in deep learning systems. IEEE Trans. Parallel Distrib. Syst. 33(1), 88–100 (2022)

    Article  Google Scholar 

  32. Lee, B.C., Brooks, D.M., de Supinski, B.R., Schulz, M., Singh, K., McKee, S.A.: Methods of inference and learning for performance modeling of parallel applications. In: Proceedings of 12th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 07) (2007). https://doi.org/10.1145/1229428.1229479

  33. Didona, D., Romano, P.: On bootstrapping machine learning performance predictors via analytical models. CoRR (2014). arXiv:1410.5102v1

  34. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint (2014). arXiv:1409.1556

  35. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Conference on Computer Vision and Pattern Recognition (CVPR 16). IEEE (2016). https://doi.org/10.1109/CVPR.2016.90

  36. Foundation, M.: Project deepspeech (2019b). https://github.com/mozilla/DeepSpeech

  37. Csurka, G.: Domain adaptation for visual applications: a comprehensive survey. Arxiv Preprint (2017). arXiv:1702.05374

  38. Hannun, A.Y., Case, C., Casper, J., Catanzaro, B., Diamos, G., Elsen, E., Prenger, R., Satheesh, S., Sengupta, S., Coates, A., Ng, A.Y.: Deep speech: Scaling up end-to-end speech recognition. CoRR (2014). arXiv:1412.5567

  39. Amodei, D., Anubhai, R., Battenberg, E., Case, C., Casper, J., Catanzaro, B., Chen, J., Chrzanowski, M., Coates, A., Diamos, G., Elsen, E., Engel, J., Fan, L., Fougner, C., Han, T., Hannun, A.Y., Jun, B., LeGresley, P., Lin, L., Narang, S., Ng, A.Y., Ozair, S., Prenger, R., Raiman, J., Satheesh, S., Seetapun, D., Sengupta, S., Wang, Y., Wang, Z., Wang, C., Xiao, B., Yogatama, D., Zhan, J., Zhu, Z.: Deep speech 2: End-to-end speech recognition in english and mandarin. CoRR (2015). arXiv:1512.02595

  40. Morais, R.: A journey to <10% word error rate (2019). https://voice.mozilla.org/

  41. Maas, A.L., Hannun, A.Y., Jurafsky, D., Ng, A.Y.: First-pass large vocabulary continuous speech recognition using bi-directional recurrent DNNs. CoRR (2014). arXiv:1408.2873

  42. Gianniti, E., Ciavotta, M., Ardagna, D.: Optimizing quality-aware big data applications in the cloud. IEEE Trans. Cloud Comput. (2018). https://doi.org/10.1109/TCC.2018.2874944

    Article  Google Scholar 

  43. Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in PyTorch. In: 31st Conference on Neural Information Processing Systems (NIPS 17) (2017)

  44. Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G.S., Davis, A., Dean, J., Devin, M., Ghemawat, S., Goodfellow, I.J., Harp, A., Irving, G., Isard, M., Jia, Y., Józefowicz, R., Kaiser, L., Kudlur, M., Levenberg, J., Mané, D., Monga, R., Moore, S., Murray, D.G., Olah, C., Schuster, M., Shlens, J., Steiner, B., Sutskever, I., Talwar, K., Tucker, P.A., Vanhoucke, V., Vasudevan, V., Viégas, F.B., Vinyals, O., Warden, P., Wattenberg, M., Wicke, M., Yu, Y., Zheng, X.: TensorFlow: large-scale machine learning on heterogeneous distributed systems. Arxiv Preprint (2016). arXiv:1603.04467

  45. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. Arxiv Preprint (2015). arXiv:1512.03385

  46. Deng, J., Dong, W., Socher, R., L.-J. Li, K. Li, L. Fei-Fei: Imagenet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, pp. 248–255 (2009) https://doi.org/10.1109/CVPR.2009.5206848

  47. Foundation, M.: Common voice (2019a). https://voice.mozilla.org/

  48. Lazowska, E.D., Zahorjan, J., Graham, G.S., Sevcik, K.C.: Quantitative System Performance. Prentice-Hall, Hoboken (1984)

    Google Scholar 

Download references

Acknowledgements

The results of this work have been partially funded by ATMOSPHERE (grant agreement no. 777154), a Research and Innovation Action funded by the European Commission under the Cooperation Programme, Horizon 2020 and the Ministério de Ciência, Tecnologia e Inovação, RNP/Brazil. Danilo Ardagna’s work is also supported by the AI-SPRINT (grant agreement no. 101016577) H2020 project. GPU cloud experiments have been supported by Microsoft under the Top Compsci University Azure Adoption program.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Danilo Ardagna.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix 1: Number of samples in training data used in performance prediction models

This appendix presents the number of samples that have been used to train each of the models presented in Sect. 5. Table 10 reports the data about the hold-out scenario, while the remaining tables details the size of the training dataset for extrapolation experiments. For example, Table 11 shows how the number of samples of the training set of the extrapolation experiment on the batch AlexNet implemented on PyTorch considering 1 and 2 P600 is 72 and 204 respectively (Tables 12, 13, 14).

Table 10 Number of samples in training data used in interpolation models
Table 11 Number of samples in training data used in batch size extrapolation models
Table 12 Number of samples in training data used in GPU number extrapolation models
Table 13 Number of samples in training data used in computational power extrapolation models
Table 14 Number of samples in training data used in Network depth extrapolation models

Appendix 2: Root mean square error (RMSE) of performance prediction models

This appendix present the RMSE on the estimation of training time per iteration of the models presented in Sect. 5 (Tables 15, 16, 17, 18, 19).

Table 15 RMSE of interpolation models
Table 16 RMSE of batch size extrapolation models
Table 17 RMSE of GPU number extrapolation models
Table 18 RMSE of computational power extrapolation models
Table 19 RMSE of Network depth extrapolation models

Appendix 3: Mean absolute error (MAE) of performance prediction models

This appendix present the MAE on the estimation of training time per iteration of the models presented in Sect. 5 (Tables 20, 21, 22, and 23, 24).

Table 20 MAE of interpolation models
Table 21 MAE of batch size extrapolation models
Table 22 MAE of GPU number extrapolation models
Table 23 MAE of computational power extrapolation models
Table 24 MAE of network depth extrapolation models

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lattuada, M., Gianniti, E., Ardagna, D. et al. Performance prediction of deep learning applications training in GPU as a service systems. Cluster Comput 25, 1279–1302 (2022). https://doi.org/10.1007/s10586-021-03428-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10586-021-03428-8

Keywords