Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3131365.3131372acmconferencesArticle/Chapter ViewAbstractPublication PagesimcConference Proceedingsconference-collections

Complexity vs. performance: empirical analysis of machine learning as a service

Published: 01 November 2017 Publication History
  • Get Citation Alerts
  • Abstract

    Machine learning classifiers are basic research tools used in numerous types of network analysis and modeling. To reduce the need for domain expertise and costs of running local ML classifiers, network researchers can instead rely on centralized Machine Learning as a Service (MLaaS) platforms.
    In this paper, we evaluate the effectiveness of MLaaS systems ranging from fully-automated, turnkey systems to fully-customizable systems, and find that with more user control comes greater risk. Good decisions produce even higher performance, and poor decisions result in harsher performance penalties. We also find that server side optimizations help fully-automated systems outperform default settings on competitors, but still lag far behind well-tuned MLaaS systems which compare favorably to standalone ML libraries. Finally, we find classifier choice is the dominating factor in determining model performance, and that users can approximate the performance of an optimal classifier choice by experimenting with a small subset of random classifiers. While network researchers should approach MLaaS systems with caution, they can achieve results comparable to standalone classifiers if they have sufficient insight into key decisions like classifiers and feature selection.


    Bhavish Aggarwal, Ranjita Bhagwan, Tathagata Das, Siddharth Eswaran, Venkata N. Padmanabhan, and Geoffrey M. Voelker. 2009. NetPrints: Diagnosing home network misconfigurations using shared knowledge. In Proc. of NSDI.
    Jesüs Alcalá-Fdez, Luciano Sánchez, Salvador Garcia, Maria Jose del Jesus, Sebastian Ventura, Josep M. Garrell, José Otero, Cristóbal Romero, Jaume Bacardit, Victor M. Rivas, et al. 2009. KEEL: A software tool to assess evolutionary algorithms for data mining problems. Soft Computing-A Fusion of Foundations, Methodologies and Applications 13, 3 (2009), 307--318.
    Arthur Asuncion and David Newman. 2007. UCI machine learning repository. http://archive.ics.uci.edu/ml. (2007).
    Rémi Bardenet, Mátyás Brendel, Balázs Kégl, and Michele Sebag. 2013. Collaborative hyperparameter tuning. In Proc. of ICML.
    Fabrício Benevenuto, Gabriel Magno, Tiago Rodrigues, and Virgílio Almeida. 2010. Detecting spammers on Twitter. In Proc. of CEAS.
    James Bergstra and Yoshua Bengio. 2012. Random search for hyper-parameter optimization. Journal of Machine Learning Research 13, Feb (2012), 281--305.
    James S. Bergstra, Rémi Bardenet, Yoshua Bengio, and Balázs Kégl. 2011. Algorithms for hyper-parameter optimization. In Proc. of NIPS.
    Peter Bodik, Moises Goldszmidt, Armando Fox, Dawn B. Woodard, and Hans Andersen. 2010. Fingerprinting the datacenter: Automated classification of performance crises. In Proc. of EuroSys.
    Léon Bottou and Chih-Jen Lin. 2007. Support vector machine solvers. Large scale kernel machines (2007), 301--320.
    Pavel B. Brazdil, Carlos Soares, and Joaquim Pinto Da Costa. 2003. Ranking learning algorithms: Using IBL and meta-learning on accuracy and time results. Machine Learning 50, 3 (2003), 251--277.
    Leo Breiman. 1996. Bagging predictors. Machine learning 24, 2 (1996), 123--140.
    Leo Breiman. 2001. Random forests. Machine learning 45, 1 (2001), 5--32.
    Matthijs C. Brouwer, Allan R. Tunkel, and Diederik van de Beek. 2010. Epidemiology, diagnosis, and antimicrobial treatment of acute bacterial meningitis. Clinical microbiology reviews 23, 3 (2010), 467--492.
    Rich Caruana, Nikos Karampatziakis, and Ainur Yessenalina. 2008. An empirical evaluation of supervised learning in high dimensions. In Proc. of ICML.
    Rich Caruana and Alexandru Niculescu-Mizil. 2006. An empirical comparison of supervised learning algorithms. In Proc. of ICML.
    Simon Chan, Thomas Stone, Kit Pang Szeto, and Ka Hou Chan. 2013. PredictionIO: a distributed machine learning server for practical software development. In Proc. of CIKM.
    Helen Costa, Fabricio Benevenuto, and Luiz H.C. Merschmann. 2013. Detecting tip spam in location-based social networks. In Proc. of SAC.
    Helen Costa, Luiz Henrique de Campos Merschmann, Fabricio Barth, and Fabricio Benevenuto. 2014. Pollution, Bad-mouthing, and Local Marketing: The underground of location-based social networks. Elsevier Information Sciences (2014).
    Janez Demšar. 2006. Statistical comparisons of classifiers over multiple data sets. Journal of Machine learning research 7, Jan (2006), 1--30.
    Thomas G. Dietterich. 1998. Approximate statistical tests for comparing supervised classification learning algorithms. Neural computation 10, 7 (1998), 1895--1923.
    Katharina Eggensperger, Matthias Feurer, Frank Hutter, James Bergstra, Jasper Snoek, Holger Hoos, and Kevin Leyton-Brown. 2013. Towards an empirical foundation for assessing bayesian optimization of hyperparameters. In Proc. of NIPS.
    Manuel J.A. Eugster, Torsten Hothorn, and Friedrich Leisch. 2016. Domain-based benchmark experiments: Exploratory and inferential analysis. Austrian Journal of Statistics 41, 1 (2016), 5--26.
    Manuel Fernández-Delgado, Eva Cernadas, Senén Barro, and Dinani Amorim. 2014. Do we need hundreds of classifiers to solve real world classification problems. Journal of Machine Learning Research 15, 1 (2014), 3133--3181.
    Matthias Feurer, Aaron Klein, Katharina Eggensperger, Jost Springenberg, Manuel Blum, and Frank Hutter. 2015. Efficient and robust automated machine learning. In Proc. of NIPS.
    Matthias Feurer, Jost Tobias Springenberg, and Frank Hutter. 2015. Initializing bayesian hyperparameter optimization via meta-learning. In Proc. of AAAI.
    Matt Fredrikson, Somesh Jha, and Thomas Ristenpart. 2015. Model inversion attacks that exploit confidence information and basic countermeasures. In Proc. of CCS.
    Yoav Freund and Robert E. Schapire. 1999. Large margin classification using the perceptron algorithm. Machine learning 37, 3 (1999), 277--296.
    Jerome H. Friedman. 2002. Stochastic gradient boosting. Computational Statistics and Data Analysis 38, 4 (2002), 367--378.
    Salvador Garcia and Francisco Herrera. 2008. An extension on "statistical comparisons of classifiers over multiple data sets" for all pairwise comparisons. Journal of Machine Learning Research 9, Dec (2008), 2677--2694.
    Michael Goebel and Le Gruenwald. 1999. A survey of data mining and knowledge discovery software tools. ACM SIGKDD explorations newsletter 1, 1 (1999), 20--33.
    Peter Haider and Tobias Scheffer. 2014. Finding botnets using minimal graph clusterings. In Proc. of ICML.
    Frank E. Harrell. 2002. Very low birth weight infants dataset.
    Frank E. Harrell. 2006. VA lung cancer dataset.
    Ralf Herbrich, Thore Graepel, and Colin Campbell. 2001. Bayes point machines. Journal of Machine Learning Research 1, Aug (2001), 245--279.
    Robert C. Holte. 1993. Very simple classification rules perform well on most commonly used datasets. Machine learning 11, 1 (1993), 63--90.
    Torsten Hothorn, Friedrich Leisch, Achim Zeileis, and Kurt Hornik. 2005. The design and analysis of benchmark experiments. Journal of Computational and Graphical Statistics 14, 3 (2005), 675--699.
    Frank Hutter, Holger H. Hoos, and Kevin Leyton-Brown. 2011. Sequential model-based optimization for general algorithm configuration. In Proc. of LION.
    Eamonn Keogh and Shruti Kasetty. 2003. On the need for time series data mining benchmarks: a survey and empirical demonstration. Data Mining and knowledge discovery 7, 4 (2003), 349--371.
    Lars Kotthoff, Chris Thornton, Holger H. Hoos, Frank Hutter, and Kevin Leyton-Brown. 2016. Auto-WEKA 2.0: Automatic model selection and hyperparameter optimization in WEKA. Journal of Machine Learning Research 17 (2016), 1--5.
    Rui Leite, Pavel Brazdil, and Joaquin Vanschoren. 2012. Selecting classification algorithms with active testing. In Proc. of MLDM.
    Zhijing Li, Ana Nika, Xinyi Zhang, Yanzi Zhu, Yuanshun Yao, Ben Y. Zhao, and Haitao Zheng. 2017. Identifying value in crowdsourced wireless signal measurements. In Proc. of WWW.
    Dapeng Liu, Youjian Zhao, Haowen Xu, Yongqian Sun, Dan Pei, Jiao Luo, Xiaowei Jing, and Mei Feng. 2015. Opprentice: Towards practical and automatic anomaly detection through machine learning. In Proc. of IMC.
    Qingyun Liu, Shiliang Tang, Xinyi Zhang, Xiaohan Zhao, Ben Y. Zhao, and Haitao Zheng. 2016. Network growth and link prediction through an empirical lens. In Proc. of IMC.
    Julián Luengo and Francisco Herrera. 2015. An automatic extraction method of the domains of competence for learning classifiers using data complexity measures. Knowledge and Information Systems 42, 1 (2015), 147--180.
    Núria Macià and Ester Bernadó-Mansilla. 2014. Towards UCI+: A mindful repository design. Information Sciences 261 (2014), 237--262.
    Núria Macià, Ester Bernadó-Mansilla, Albert Orriols-Puig, and Tin Kam Ho. 2013. Learner excellence biased by data set selection: A case for data characterisation and artificial data sets. Pattern Recognition 46, 3 (2013), 1054--1066.
    Richard Maclin and David Opitz. 1997. An empirical evaluation of bagging and boosting. In Proc. of AAAI.
    Laura Morán-Fernández, Verónica Bolón-Canedo, and Amparo Alonso-Betanzos. 2016. Can classification performance be predicted by complexity measures? A study using microarray data. Knowledge and Information Systems (2016), 1--24.
    David Opitz and Richard Maclin. 1999. Popular ensemble methods: An empirical study. Journal of Artificial Intelligence Research 11 (1999), 169--198.
    Claudia Perlich, Foster Provost, and Jeffrey S. Simonoff. 2003. Tree induction vs. logistic regression: A learning-curve analysis. Journal of Machine Learning Research 4, Jun (2003), 211--255.
    Mauro Ribeiro, Katarina Grolinger, and Miriam A.M. Capretz. 2015. MLaaS: Machine learning as a service. In Proc. of ICMLA.
    David E. Rumelhart, Geoffrey E. Hinton, and Ronald J. Williams. 1988. Learning representations by back-propagating errors. Cognitive modeling 5, 3 (1988), 1.
    Steven L. Salzberg. 1997. On comparing classifiers: Pitfalls to avoid and a recommended approach. Data mining and knowledge discovery 1, 3 (1997), 317--328.
    Purnamrita Sarkar, Deepayan Chakrabarti, and Michael I. Jordan. 2012. Nonparametric link prediction in dynamic networks. In Proc. of ICML.
    David J. Sheskin. 2003. Handbook of parametric and nonparametric statistical procedures. CRC Press.
    Shaohuai Shi, Qiang Wang, Pengfei Xu, and Xiaowen Chu. 2016. Benchmarking state-of-the-art deep learning software tools. International Conference on Cloud Computing and Big Data (2016), 99--104.
    Reza Shokri, Marco Stronati, Congzheng Song, and Vitaly Shmatikov. 2017. Membership inference attacks against machine learning models. In Proc. of IEEE S&P.
    Jamie Shotton, Toby Sharp, Pushmeet Kohli, Sebastian Nowozin, John Winn, and Antonio Criminisi. 2013. Decision jungles: Compact and rich models for classification. In Proc. of NIPS.
    Anirudh Sivaraman, Keith Winstein, Pratiksha Thaker, and Hari Balakrishnan. 2014. An experimental study of the learnability of congestion control. In Proc. of SIGCOMM.
    Michael R. Smith, Logan Mitchell, Christophe Giraud-Carrier, and Tony Martinez. 2014. Recommending learning algorithms and their associated hyperparameters. In Proc. of MLAS.
    Jasper Snoek, Hugo Larochelle, and Ryan P. Adams. 2012. Practical bayesian optimization of machine learning algorithms. In Proc. of NIPS.
    Pang-Ning Tan, Michael Steinbach, and Vipin Kumar. 2005. Introduction to data mining. Addison-Wesley Longman Publishing Co., Inc.
    Chris Thornton, Frank Hutter, Holger H. Hoos, and Kevin Leyton-Brown. 2013. Auto-WEKA: Combined selection and hyperparameter optimization of classification algorithms. In Proc. of KDD.
    Florian Tramèr, Fan Zhang, Ari Juels, Michael K. Reiter, and Thomas Ristenpart. 2016. Stealing machine learning models via Prediction APIs. In Proc. of USENIX Security.
    Ferdinand Van Der Heijden, Robert Duin, Dick De Ridder, and David MJ Tax. 2005. Classification, parameter estimation and state estimation: an engineering approach using MATLAB. John Wiley & Sons.
    Joaquin Vanschoren, Hendrik Blockeel, Bernhard Pfahringer, and Geoffrey Holmes. 2012. Experiment databases. Machine Learning 87, 2 (2012), 127--158.
    Jean-Philippe Vert, Koji Tsuda, and Bernhard Schölkopf. 2004. A primer on kernel methods. Kernel Methods in Computational Biology (2004), 35--70.
    Kiri Wagstaff. 2012. Machine learning that matters. In Proc. of ICML.
    Abdullah H. Wahbeh, Qasem A. Al-Radaideh, Mohammed N. Al-Kabi, and Emad M. Al-Shawakfa. 2011. A comparison study between data mining tools over some classification methods. International Journal of Advanced Computer Science and Applications (2011), 18--26.
    Gang Wang, Tristan Konolige, Christo Wilson, Xiao Wang, Haitao Zheng, and Ben Y. Zhao. 2013. You are how you click: Clickstream analysis for sybil detection. In Proc. of Usenix Security.
    Gang Wang, Bolun Wang, Tianyi Wang, Ana Nika, Haitao Zheng, and Ben Y. Zhao. 2014. Whispers in the dark: Analysis of an anonymous social network. In Proc. of IMC.
    Gang Wang, Xinyi Zhang, Shiliang Tang, Haitao Zheng, and Ben Y. Zhao. 2016. Unsupervised clickstream clustering for user behavior analysis. In Proc. of CHI.
    David J. Whellan, Robert H. Tuttle, Eric J. Velazquez, Linda K. Shaw, James G. Jollis, Wendell Ellis, Christopher M. O'connor, Robert M. Califf, and Salvador Borges-Neto. 2006. Predicting significant coronary artery disease in patients with left ventricular dysfunction. American heart journal 152, 2 (2006), 340--347.
    Keith Winstein and Hari Balakrishnan. 2013. TCP ex Machina: Computer-generated congestion control. In Proc. of SIGCOMM.
    Ian H. Witten, Eibe Frank, Leonard E. Trigg, Mark A. Hall, Geoffrey Holmes, and Sally Jo Cunningham. 1999. Weka: Practical machine learning tools and techniques with Java implementations.
    Wei Xu, Ling Huang, Armando Fox, David Patterson, and Michael I. Jordan. 2009. Detecting large-scale system problems by mining console logs. In Proc. of SOSP.
    Minhui Xue, Cameron Ballard, Kelvin Liu, Carson Nemelka, Yanqiu Wu, Keith Ross, and Haifeng Qian. 2016. You can Yak but you can't hide: Localizing anonymous social network users. In Proc. of IMC.
    Julian Zubek and Dariusz M. Plewczynski. 2016. Complexity curve: a graphical measure of data complexity and classifier performance. Peer J Computer Science 2 (2016), e76.

    Cited By

    View all
    • (2024)Network Intrusion Detection system with Machine learning Intrusion Detection System with Machine Learning As a ServiceJournal of Information Systems Applied Research10.62273/EWQL502317:3(4-15)Online publication date: 2024
    • (2024)Model ChangeLists: Characterizing Updates to ML ModelsProceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency10.1145/3630106.3659047(2432-2453)Online publication date: 3-Jun-2024
    • (2024)Enhancing safety of construction workers in Korea: an integrated text mining and machine learning framework for predicting accident typesInternational Journal of Injury Control and Safety Promotion10.1080/17457300.2023.230042431:2(203-215)Online publication date: 2-Jan-2024
    • Show More Cited By

    Index Terms

    1. Complexity vs. performance: empirical analysis of machine learning as a service



      Information & Contributors


      Published In

      cover image ACM Conferences
      IMC '17: Proceedings of the 2017 Internet Measurement Conference
      November 2017
      509 pages
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]



      • USENIX Assoc: USENIX Assoc


      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 01 November 2017


      Request permissions for this article.

      Check for updates

      Author Tags

      1. cloud computing
      2. machine learning


      • Research-article


      IMC '17
      IMC '17: Internet Measurement Conference
      November 1 - 3, 2017
      London, United Kingdom

      Acceptance Rates

      Overall Acceptance Rate 277 of 1,083 submissions, 26%

      Upcoming Conference

      IMC '24
      ACM Internet Measurement Conference
      November 4 - 6, 2024
      Madrid , AA , Spain


      Other Metrics

      Bibliometrics & Citations


      Article Metrics

      • Downloads (Last 12 months)152
      • Downloads (Last 6 weeks)22

      Other Metrics


      Cited By

      View all
      • (2024)Network Intrusion Detection system with Machine learning Intrusion Detection System with Machine Learning As a ServiceJournal of Information Systems Applied Research10.62273/EWQL502317:3(4-15)Online publication date: 2024
      • (2024)Model ChangeLists: Characterizing Updates to ML ModelsProceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency10.1145/3630106.3659047(2432-2453)Online publication date: 3-Jun-2024
      • (2024)Enhancing safety of construction workers in Korea: an integrated text mining and machine learning framework for predicting accident typesInternational Journal of Injury Control and Safety Promotion10.1080/17457300.2023.230042431:2(203-215)Online publication date: 2-Jan-2024
      • (2024)Forecasting multi‐frequency intraday exchange rates using deep learning modelsJournal of Forecasting10.1002/for.308243:5(1338-1355)Online publication date: 15-Feb-2024
      • (2024)A lightweight performance proxy for deep‐learning model training on Amazon SageMakerConcurrency and Computation: Practice and Experience10.1002/cpe.810436:14Online publication date: 8-Apr-2024
      • (2023)Deep Learning Approaches for Automatic Drum TranscriptionEMITTER International Journal of Engineering Technology10.24003/emitter.v11i1.764(21-34)Online publication date: 23-Jun-2023
      • (2023)Run-Time Prevention of Software Integration Failures of Machine Learning APIsProceedings of the ACM on Programming Languages10.1145/36228067:OOPSLA2(264-291)Online publication date: 16-Oct-2023
      • (2023)SMILE: A Cost-Effective System for Serving Massive Pretrained Language Models in The CloudCompanion of the 2023 International Conference on Management of Data10.1145/3555041.3589720(135-138)Online publication date: 4-Jun-2023
      • (2023)SIRM: Cost efficient and SLO aware ML prediction on Fog-Cloud Network2023 15th International Conference on COMmunication Systems & NETworkS (COMSNETS)10.1109/COMSNETS56262.2023.10041384(825-829)Online publication date: 3-Jan-2023
      • (2023)Building ML Workflow for Malware Images Classification using Machine Learning Services in Leading Cloud Platforms2023 International Conference on Computational Intelligence and Sustainable Engineering Solutions (CISES)10.1109/CISES58720.2023.10183421(233-239)Online publication date: 28-Apr-2023
      • Show More Cited By

      View Options

      Get Access

      Login options

      View options


      View or Download as a PDF file.



      View online with eReader.








      Share this Publication link

      Share on social media