Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

The cosmos big data platform at Microsoft: over a decade of progress and a decade to look forward

Published: 01 July 2021 Publication History

Abstract

The twenty-first century has been dominated by the need for large scale data processing, marking the birth of big data platforms such as Cosmos. This paper describes the evolution of the exabyte-scale Cosmos big data platform at Microsoft; our journey right from scale and reliability all the way to efficiency and usability, and our next steps towards improving security, compliance, and support for heterogeneous analytics scenarios. We discuss how the evolution of Cosmos parallels the evolution of the big data field, and how the changes in the Cosmos workloads over time parallel the changing requirements of users across industry.

References

[1]
Daniel Abadi, Rakesh Agrawal, Anastasia Ailamaki, Magdalena Balazinska, Philip A Bernstein, Michael J Carey, Surajit Chaudhuri, Jeffrey Dean, AnHai Doan, Michael J Franklin, et al. 2016. The Beckman report on database research. Commun. ACM 59, 2 (2016), 92--99.
[2]
Ashvin Agrawal, Rony Chatterjee, Carlo Curino, Avrilia Floratou, Neha Gowdal, Matteo Interlandi, Alekh Jindal, Kostantinos Karanasos, Subru Krishnan, Brian Kroth, et al. 2019. Cloudy with high chance of DBMS: A 10-year prediction for Enterprise-Grade ML. arXiv preprint arXiv:1909.00084 (2019).
[3]
Amazon. 2017. Amazon Athena. https://docs.amazonaws.cn/en_us/athena/latest/APIReference/athena-api.pdf.
[4]
Arvind Arasu, Surajit Chaudhuri, Zhimin Chen, Kris Ganjam, Raghav Kaushik, and Vivek Narasayya. 2012. Experiences with using data cleaning technology for bing services. Data Engineering Bulletin (2012).
[5]
Michael Armbrust, Tathagata Das, Liwen Sun, Burak Yavuz, Shixiong Zhu, Mukul Murthy, Joseph Torres, Herman van Hovell, Adrian Ionescu, Alicja Łuszczak, et al. 2020. Delta lake: high-performance ACID table storage over cloud object stores. Proceedings of the VLDB Endowment 13, 12 (2020), 3411--3424.
[6]
Michael Armbrust, Ali Ghodsi, Reynold Xin, and Matei Zaharia. [n.d.]. Lake-house: A New Generation of Open Platforms that Unify Data Warehousing and Advanced Analytics. ([n. d.]).
[7]
Malay Bag, Alekh Jindal, and Hiren Patel. 2020. Towards Plan-aware Resource Allocation in Serverless Query Processing. In 12th {USENIX} Workshop on Hot Topics in Cloud Computing (HotCloud 20).
[8]
Peter Bailis, Juliana Freire, Magda Balazinska, Raghu Ramakrishnan, Joseph M Hellerstein, Xin Luna Dong, and Michael Stonebraker. 2020. Winds from seattle: database research directions. Proceedings of the VLDB Endowment 13, 12 (2020), 3516--3516.
[9]
Peter Bodík, Ishai Menache, Joseph Naor, and Jonathan Yaniv. 2014. Deadline-aware scheduling of big-data processing jobs. In Proceedings of the 26th ACM symposium on Parallelism in algorithms and architectures. 211--213.
[10]
W.J. Bolosky, D. Bradshaw, R.B. Haagens, N.P. Kusters, and P. Li. 2011. Paxos replicated state machines as the basis of a high-performance data store. In 2011 In Proc. NSDI'11, USENIX Conference on Networked Systems Design and Implementation. 141--154.
[11]
Eric Boutin, Jaliya Ekanayake, Wei Lin, Bing Shi, Jingren Zhou, Zhengping Qian, Ming Wu, and Lidong Zhou. 2014. Apollo: Scalable and coordinated scheduling for cloud-scale computing. In 11th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 14). 285--300.
[12]
Brad Calder, Ju Wang, Aaron Ogus, Niranjan Nilakantan, Arild Skjolsvold, Sam McKelvie, Yikang Xu, Shashwat Srivastav, Jiesheng Wu, Huseyin Simitci, et al. 2011. Windows Azure Storage: a highly available cloud storage service with strong consistency. In Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles. 143--157.
[13]
Badrish Chandramouli, Jonathan Goldstein, Mike Barnett, Robert DeLine, Danyel Fisher, John C Platt, James F Terwilliger, and John Wernsing. 2014. Trill: A high-performance incremental query processor for diverse analytics. Proceedings of the VLDB Endowment 8, 4 (2014), 401--412.
[14]
Badrish Chandramouli, Jonathan Goldstein, Mike Barnett, and James F Terwilliger. 2015. Trill: Engineering a Library for Diverse Analytics. IEEE Data Eng. Bull. 38, 4 (2015), 51--60.
[15]
Badrish Chandramouli, Jonathan Goldstein, and Songyun Duan. 2012. Temporal analytics on big data for web advertising. In 2012 IEEE 28th international conference on data engineering. IEEE, 90--101.
[16]
Andrew Chung, Subru Krishnan, Konstantinos Karanasos, Carlo Curino, and Gregory R Ganger. 2020. Unearthing inter-job dependencies for better cluster scheduling. In 14th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 20). 1205--1223.
[17]
Gavin Clarke. 2008. Microsoft's Red-Dog cloud turns Azure. Retrieved January 22, 2021 from https://www.theregister.com/2008/10/27/microsoft_amazon/
[18]
Carlo Curino, Subru Krishnan, Konstantinos Karanasos, Sriram Rao, Giovanni M Fumarola, Botong Huang, Kishore Chaliparambil, Arun Suresh, Young Chen, Solom Heddaya, et al. 2019. Hydra: a federated resource manager for data-center scale analytics. In 16th {USENIX} Symposium on Networked Systems Design and Implementation ({NSDI} 19). 177--192.
[19]
Ovidiu Dan, Vaibhav Parikh, and Brian D Davison. 2016. Improving IP geolocation using query logs. In Proceedings of the Ninth ACM International Conference on Web Search and Data Mining. 347--356.
[20]
Akash Das Sarma, Yeye He, and Surajit Chaudhuri. 2014. Clusterjoin: A similarity joins framework using map-reduce. Proceedings of the VLDB Endowment 7, 12 (2014), 1059--1070.
[21]
Jeffrey Dean and Sanjay Ghemawat. 2004. MapReduce: Simplified data processing on large clusters. (2004).
[22]
Dremio. 2021. Dremio. https://www.dremio.com/data-lake/.
[23]
Mary Jo Foley. 2009. Red Dog: Five questions with Microsoft mystery man Dave Cutler. Retrieved January 22, 2021 from https://www.zdnet.com/article/red-dog-five-questions-with-microsoft-mystery-man-dave-cutler/
[24]
.NET Foundation. 2020. .NET for Apache Spark. https://github.com/dotnet/spark.
[25]
Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung. 2003. The Google file system. In Proceedings of the nineteenth ACM symposium on Operating systems principles. 29--43.
[26]
Christos Gkantsidis, Dimitrios Vytiniotis, Orion Hodson, Dushyanth Narayanan, Florin Dinu, and Antony Rowstron. 2013. Rhea: automatic filtering for unstructured cloud storage. In 10th {USENIX} Symposium on Networked Systems Design and Implementation ({NSDI} 13). 343--355.
[27]
Google. 2015. Google Cloud Dataflow. https://cloud.google.com/dataflow/.
[28]
Robert Grandl, Srikanth Kandula, Sriram Rao, Aditya Akella, and Janardhan Kulkarni. 2016. {GRAPHENE}: Packing and dependency-aware scheduling for data-parallel clusters. In 12th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 16). 81--97.
[29]
Jayant Gupchup, Yasaman Hosseinkashi, Pavel Dmitriev, Daniel Schneider, Ross Cutler, Andrei Jefremov, and Martin Ellis. 2018. Trustworthy Experimentation Under Telemetry Loss. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management. 387--396.
[30]
Apache Hadoop. 2005. https://hadoop.apache.org.
[31]
Yuxiong He, Jie Liu, and Hongyang Sun. 2011. Scheduling functionally heterogeneous systems with utilization balancing. In 2011 IEEE International Parallel & Distributed Processing Symposium. IEEE, 1187--1198.
[32]
Herodotos Herodotou, Harold Lim, Gang Luo, Nedyalko Borisov, Liang Dong, Fatma Bilgen Cetin, and Shivnath Babu. 2011. Starfish: A Self-tuning System for Big Data Analytics. In Cidr, Vol. 11. 261--272.
[33]
Michael Isard. 2007. Autopilot: automatic data center management. ACM SIGOPS Operating Systems Review 41, 2 (2007), 60--67.
[34]
Michael Isard, Mihai Budiu, Yuan Yu, Andrew Birrell, and Dennis Fetterly. 2007. Dryad: distributed data-parallel programs from sequential building blocks. In Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007. 59--72.
[35]
Alekh Jindal, K Venkatesh Emani, Maureen Daum, Olga Poppe, Brandon Haynes, Anna Pavlenko, Ayushi Gupta, Karthik Ramachandra, Carlo Curino, Andreas Mueller, et al. [n.d.]. Magpie: Python at Speed and Scale using Cloud Backends. ([n.d.]).
[36]
Alekh Jindal, Konstantinos Karanasos, Sriram Rao, and Hiren Patel. 2018. Selecting subexpressions to materialize at datacenter scale. Proceedings of the VLDB Endowment 11, 7 (2018), 800--812.
[37]
Alekh Jindal, Shi Qiao, Hiren Patel, Zhicheng Yin, Jieming Di, Malay Bag, Marc Friedman, Yifung Lin, Konstantinos Karanasos, and Sriram Rao. 2018. Computation reuse in analytics job service at microsoft. In Proceedings of the 2018 International Conference on Management of Data. 191--203.
[38]
Alekh Jindal, Shi Qiao, Rathijit Sen, and Hiren Patel. 2021. Microlearner: A fine-grained Learning Optimizer for Big Data Workloads at Microsoft. In ICDE.
[39]
Srikanth Kandula, Kukjin Lee, Surajit Chaudhuri, and Marc Friedman. 2019. Experiences with approximating queries in Microsoft's production big-data clusters. Proceedings of the VLDB Endowment 12, 12 (2019), 2131--2142.
[40]
S. Kandula, L. Orr, and S. Chaudhuri. 2019. Pushing Data-Induced Predicates Through Joins in Big-Data Clusters. In Proceedings of the VLDB Endowment, 13(3). 252--265.
[41]
Srikanth Kandula, Anil Shanbhag, Aleksandar Vitorovic, Matthaios Olma, Robert Grandl, Surajit Chaudhuri, and Bolin Ding. 2016. Quickr: Lazily approximating complex adhoc queries in bigdata clusters. In Proceedings of the 2016 international conference on management of data. 631--646.
[42]
Tim Kraska, Mohammad Alizadeh, Alex Beutel, H Chi, Ani Kristo, Guillaume Leclerc, Samuel Madden, Hongzi Mao, and Vikram Nathan. 2019. Sagedb: A learned database system. In CIDR.
[43]
SPT Krishnan and Jose L Ugia Gonzalez. 2015. Google cloud dataflow. In Building Your Next Big Thing with Google Cloud Platform. Springer, 255--275.
[44]
George Lee, Jimmy Lin, Chuang Liu, Andrew Lorek, and Dmitriy Ryaboy. 2012. The Unified Logging Infrastructure for Data Analytics at Twitter. Proceedings of the VLDB Endowment 5, 12 (2012).
[45]
Jyoti Leeka and Kaushik Rajan. 2019. Incorporating super-operators in big-data query optimizers. Proceedings of the VLDB Endowment 13, 3 (2019), 348--361.
[46]
Jimmy Lin and Dmitriy Ryaboy. 2013. Scaling big data mining infrastructure: the twitter experience. Acm SIGKDD Explorations Newsletter 14, 2 (2013), 6--19.
[47]
Yao Lu, Aakanksha Chowdhery, and Srikanth Kandula. 2016. Optasia: A relational platform for efficient large-scale video analytics. In Proceedings of the Seventh ACM Symposium on Cloud Computing. 57--70.
[48]
Yao Lu, Aakanksha Chowdhery, and Srikanth Kandula. 2016. Visflow: a relational platform for efficient large-scale video analytics. In ACM Symposium on Cloud Computing (SoCC).
[49]
Yao Lu, Aakanksha Chowdhery, Srikanth Kandula, and Surajit Chaudhuri. 2018. Accelerating machine learning inference with probabilistic predicates. In Proceedings of the 2018 International Conference on Management of Data. 1493--1508.
[50]
Microsoft. 2015. Azure Data Lake. https://azure.github.io/AzureDataLake/.
[51]
Microsoft. 2016. U-SQL. http://usql.io.
[52]
Microsoft. 2016. U-SQL Release Notes. https://github.com/Azure/AzureDataLake/tree/master/docs/Release_Notes.
[53]
Microsoft. 2017. U-SQL Data Definition Language. https://docs.microsoft.com/en-us/u-sql/data-definition-language-ddl-statements.
[54]
Microsoft. 2017. U-SQL Language Reference. https://docs.microsoft.com/en-us/u-sql/.
[55]
Microsoft. 2018. Azure RSL. https://github.com/Azure/RSL.
[56]
Microsoft. 2018. IntelliSense. https://docs.microsoft.com/en-us/visualstudio/ide/using-intellisense?view=vs-2019.
[57]
Microsoft. 2021. Azure Synapse Analytics. https://azure.microsoft.com/en-in/services/synapse-analytics/.
[58]
Pulkit A Misra, María F Borge, Íñigo Goiri, Alvin R Lebeck, Willy Zwaenepoel, and Ricardo Bianchini. 2019. Managing tail latency in datacenter-scale file systems under production constraints. In Proceedings of the Fourteenth EuroSys Conference 2019. 1--15.
[59]
Azade Nazi, Bolin Ding, Vivek Narasayya, and Surajit Chaudhuri. 2018. Efficient estimation of inclusion coefficient using hyperloglog sketches. Proceedings of the VLDB Endowment 11, 10 (2018), 1097--1109.
[60]
Rimma Nehme and Nicolas Bruno. 2011. Automated partitioning design in parallel database systems. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of data. 1137--1148.
[61]
Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, et al. 2011. Scikit-learn: Machine learning in Python. the Journal of machine Learning research 12 (2011), 2825--2830.
[62]
Rahul Potharaju, Terry Kim, Wentao Wu, Vidip Acharya, Steve Suh, Andrew Fogarty, Apoorve Dave, Sinduja Ramanujam, Tomas Talius, Lev Novik, et al. 2020. Helios: hyperscale indexing for the cloud & edge. Proceedings of the VLDB Endowment 13, 12 (2020), 3231--3244.
[63]
Zhengping Qian, Xiuwei Chen, Nanxi Kang, Mingcheng Chen, Yuan Yu, Thomas Moscibroda, and Zheng Zhang. 2012. MadLINQ: large-scale distributed matrix computation for the cloud. In Proceedings of the 7th ACM european conference on Computer Systems. 197--210.
[64]
Shi Qiao, Adrian Nicoara, Jin Sun, Marc Friedman, Hiren Patel, and Jaliya Ekanayake. 2019. Hyper dimension shuffle: Efficient data repartition at petabyte scale in scope. Proceedings of the VLDB Endowment 12, 10 (2019), 1113--1125.
[65]
Raghu Ramakrishnan, Baskar Sridharan, John R Douceur, Pavan Kasturi, Balaji Krishnamachari-Sampath, Karthick Krishnamoorthy, Peng Li, Mitica Manu, Spiro Michaylov, Rogério Ramos, et al. 2017. Azure data lake store: a hyperscale distributed file service for big data analytics. In Proceedings of the 2017 ACM International Conference on Management of Data. 51--63.
[66]
W.D. Ramsey and R.I. Chaiken. U.S. Patent 7,840,585, 2010. DISCOSQL: distributed processing of structured queries.
[67]
Knut Magne Risvik, Trishul Chilimbi, Henry Tan, Karthik Kalyanaraman, and Chris Anderson. 2013. Maguro, a system for indexing and searching over very large text collections. In Proceedings of the sixth ACM international conference on Web search and data mining. 727--736.
[68]
Michael Rys. 2015. Introducing U-SQL - A Language that makes Big Data Processing Easy. https://devblogs.microsoft.com/visualstudio/introducing-u-sql-a-language-that-makes-big-data-processing-easy/.
[69]
Mehrdad Saadatmand. 2017. Towards Automating Integration Testing of. NET Applications using Roslyn. In 2017 IEEE International Conference on Software Quality, Reliability and Security Companion (QRS-C). IEEE, 573--574.
[70]
Rathijit Sen, Alekh Jindal, Hiren Patel, and Shi Qiao. 2020. Autotoken: Predicting peak parallelism for big data analytics at microsoft. Proceedings of the VLDB Endowment 13, 12 (2020), 3326--3339.
[71]
Liqun Shao, Yiwen Zhu, Siqi Liu, Abhiram Eswaran, Kristin Lieber, Janhavi Mahajan, Minsoo Thigpen, Sudhir Darbha, Subru Krishnan, Soundar Srinivasan, et al. 2019. Griffon: Reasoning about Job Anomalies with Unlabeled Data in Cloud-based Platforms. In Proceedings of the ACM Symposium on Cloud Computing. 441--452.
[72]
Tarique Siddiqui, Alekh Jindal, Shi Qiao, Hiren Patel, and Wangchao Le. 2020. Cost models for big data query processing: Learning, retrofitting, and our findings. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data. 99--113.
[73]
Yasin N Silva, Paul-Ake Larson, and Jingren Zhou. 2012. Exploiting common subexpressions for cloud query processing. In 2012 IEEE 28th International Conference on Data Engineering. IEEE, 1337--1348.
[74]
Muthian Sivathanu, Midhul Vuppalapati, Bhargav S Gulavani, Kaushik Rajan, Jyoti Leeka, Jayashree Mohan, and Piyus Kedia. 2019. Instalytics: Cluster filesystem co-design for big-data analytics. In 17th {USENIX} Conference on File and Storage Technologies ({FAST} 19). 235--248.
[75]
Snowflake. 2021. Snowflake Data Cloud. https://www.snowflake.com/.
[76]
Roshan Sumbaly, Jay Kreps, and Sam Shah. 2013. The big data ecosystem at linkedin. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data. 1125--1134.
[77]
Ashish Thusoo, Zheng Shao, Suresh Anthony, Dhruba Borthakur, Namit Jain, Joydeep Sen Sarma, Raghotham Murthy, and Hao Liu. 2010. Data warehousing and analytics infrastructure at facebook. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of data. 1013--1020.
[78]
New York Times. 2009. A Deluge of Data Shapes a New Era in Computing. https://cacm.acm.org/news/54396-a-deluge-of-data-shapes-a-new-era-in-computing/fulltext.
[79]
Dana Van Aken, Andrew Pavlo, Geoffrey J. Gordon, and Bohan Zhang. 2017. Automatic Database Management System Tuning Through Large-scale Machine Learning. In Proceedings of the 2017 ACM International Conference on Management of Data (SIGMOD '17). 1009--1024. https://db.cs.cmu.edu/papers/2017/p1009-van-aken.pdf
[80]
Vinod Kumar Vavilapalli, Arun C Murthy, Chris Douglas, Sharad Agarwal, Mahadev Konar, Robert Evans, Thomas Graves, Jason Lowe, Hitesh Shah, Siddharth Seth, et al. 2013. Apache hadoop yarn: Yet another resource negotiator. In Proceedings of the 4th annual Symposium on Cloud Computing. 1--16.
[81]
Chenggang Wu, Alekh Jindal, Saeed Amizadeh, Hiren Patel, Wangchao Le, Shi Qiao, and Sriram Rao. 2018. Towards a learning optimizer for shared clouds. Proceedings of the VLDB Endowment 12, 3 (2018), 210--222.
[82]
Ming-Chuan Wu, Jingren Zhou, Nicolas Bruno, Yu Zhang, and Jon Fowler. 2012. Scope playback: self-validation in the cloud. In Proceedings of the Fifth International Workshop on Testing Database Systems. 1--6.
[83]
Ying Yan, Liang Jeff Chen, and Zheng Zhang. 2014. Error-bounded sampling for analytics on big sparse data. Proceedings of the VLDB Endowment 7, 13 (2014), 1508--1519.
[84]
Zhicheng Yint, Jin Sun, Ming Li, Jaliya Ekanayake, Haibo Lin, Marc Friedman, José A Blakeley, Clemens Szyperski, and Nikhil R Devanur. 2018. Bubble execution: resource-aware reliable analytics at cloud scale. Proceedings of the VLDB Endowment 11, 7 (2018), 746--758.
[85]
Matei Zaharia, Mosharaf Chowdhury, Michael J Franklin, Scott Shenker, Ion Stoica, et al. 2010. Spark: Cluster computing with working sets. HotCloud 10, 10--10 (2010), 95.
[86]
Jingren Zhou, Nicolas Bruno, and Wei Lin. 2012. Advanced partitioning techniques for massively distributed computation. In Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data. 13--24.
[87]
Jingren Zhou, Per-Ake Larson, and Ronnie Chaiken. 2010. Incorporating partitioning and parallel plans into the SCOPE optimizer. In 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010). IEEE, 1060--1071.
[88]
Yiwen Zhu, Subru Krishnan, Konstantinos Karanasos, Isha Tarte, Conor Power, Abhishek Modi, Manoj Kumar, Deli Zhang, Kartheek Muthyala, Nick Jurgens, Sarvesh Sakalanaga, Sudhir Darbha, Minu Iyer, Ankita Agarwal, and Carlo Curino. [n.d.]. KEA: Tuning an Exabyte-Scale Data Infrastructure. In Proceedings of the 2021 ACM SIGMOD International Conference on Management of data.
[89]
J. Ziv and A. Lempel. 1977. A universal algorithm for sequential data compression. In 1977 IEEE Transactions on information theory, 23(3). 337--343.

Cited By

View all
  • (2024)Proactive Resume and Pause of Resources for Microsoft Azure SQL Database ServerlessCompanion of the 2024 International Conference on Management of Data10.1145/3626246.3653371(227-240)Online publication date: 9-Jun-2024
  • (2024)An Empirical Study on Low GPU Utilization of Deep Learning JobsProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3639232(1-13)Online publication date: 20-May-2024
  • (2023)Runtime Variation in Big Data AnalyticsProceedings of the ACM on Management of Data10.1145/35889211:1(1-20)Online publication date: 30-May-2023
  • Show More Cited By

Index Terms

  1. The cosmos big data platform at Microsoft: over a decade of progress and a decade to look forward
      Index terms have been assigned to the content through auto-classification.

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image Proceedings of the VLDB Endowment
      Proceedings of the VLDB Endowment  Volume 14, Issue 12
      July 2021
      587 pages
      ISSN:2150-8097
      Issue’s Table of Contents

      Publisher

      VLDB Endowment

      Publication History

      Published: 01 July 2021
      Published in PVLDB Volume 14, Issue 12

      Qualifiers

      • Research-article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)61
      • Downloads (Last 6 weeks)2
      Reflects downloads up to 02 Sep 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Proactive Resume and Pause of Resources for Microsoft Azure SQL Database ServerlessCompanion of the 2024 International Conference on Management of Data10.1145/3626246.3653371(227-240)Online publication date: 9-Jun-2024
      • (2024)An Empirical Study on Low GPU Utilization of Deep Learning JobsProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3639232(1-13)Online publication date: 20-May-2024
      • (2023)Runtime Variation in Big Data AnalyticsProceedings of the ACM on Management of Data10.1145/35889211:1(1-20)Online publication date: 30-May-2023
      • (2022)Deploying a Steered Query Optimizer in Production at MicrosoftProceedings of the 2022 International Conference on Management of Data10.1145/3514221.3526052(2299-2311)Online publication date: 10-Jun-2022

      View Options

      Get Access

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media