Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3448016.3452771acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
short-paper

Transforming ML Predictive Pipelines into SQL with MASQ

Published: 18 June 2021 Publication History

Abstract

Inference of Machine Learning (ML) models, i.e. the process of obtaining predictions from trained models, is often an overlooked problem. Model inference is however one of the main contributors of both technical debt in ML applications and infrastructure complexity. MASQ is a framework able to run inference of ML models directly on DBMSs. MASQ not only averts expensive data movements for those predictive scenarios where data resides on a database, but it also naturally exploits all the "Enterprise-grade" features such as governance, security and auditability which make DBMSs the cornerstone of many businesses. MASQ compiles trained models and ML pipelines implemented in scikit-learn directly into standard SQL: no UDFs nor vendor-specific syntax are used, and therefore queries can be readily executed on any DBMS. In this demo, we will showcase MASQ's capabilities through a GUI allowing attendees to: (1) train ML pipelines composed of data featurizers and ML models; (2) compile the trained pipelines into SQL, and deploy them on different DBMSs (MySQL and SQLServer in the demo); and (3) compare the related performance under different configurations (e.g., the original pipeline on the ML framework against the SQL implementations).

Supplementary Material

MP4 File (3448016.3452771.mp4)
Inference of Machine Learning (ML) models, i.e. the process of obtaining predictions from trained models, is often an overlooked problem. Model inference is however one of the main contributors of both technical debt in ML applications and infrastructure complexity. MASQ is a framework able to run inference of ML models directly on DBMSs. MASQ not only averts expensive data movements forthose predictive scenarios where data resides on a database, but it also naturally exploits all the Enterprise-grade features such as governance, security and auditability which make DBMSs the cornerstone of many businesses. MASQ compiles trained models and ML pipelines implemented in scikit-learn directly into standard SQL: no UDFs nor vendor-specific syntax are used, and therefore queries can be readily executed on any DBMS.In this demo, we will showcase MASQs capabilities through a GUI allowing attendees to: (1) train ML pipelines composed of data featurizers and ML models; (2) compile the trained pipelines into SQL, and deploy them on different DBMSs (MySQL and SQL Server in the demo); and (3) compare the related performance under different configurations (e.g., the original pipeline on the ML framework against the SQL implementations).

References

[1]
Tidypredict. https://tidypredict.netlify.com/, 2020.
[2]
A. Agrawal and et al. Cloudy with high chance of DBMS: A 10-year prediction for Enterprise-Grade ML. CIDR, 2020.
[3]
Z. Ahmed, S. Amizadeh, and M. B. et al. Machine learning at microsoft with ML.NET. In KDD, pages 2448--2458, 2019.
[4]
Amazon. The total cost of ownership (tco) of amazon sagemaker. https://pages.awscloud.com/rs/112-TZM-766/images/Amazon_SageMaker_TCO_uf.pdf, 2020.
[5]
Criteo. Kaggle challenge, 2014.
[6]
X. Feng, A. Kumar, and et al. Towards a unified architecture for in-rdbms analytics. In SIGMOD, page 325--336, 2012.
[7]
J. M. Hellerstein, C. Ré, F. Schoppmann, and et al. The MADlib Analytics Library: Or MAD Skills, the SQL. VLDB, 2012.
[8]
K. Karanasos, M. Interlandi, F. Psallidas, and et al. Extending relational query processing with ML inference. In CIDR, 2020.
[9]
B. of Transportation Statistics. Flight delay dataset, 2018.
[10]
F. Psallidas, Y. Zhu, B. Karlas, and et al. Data science through the looking glass and what we found there. CoRR, abs/1912.09536, 2019.
[11]
D. Sculley, G. Holt, and et a. Hidden technical debt in machine learning systems. In NIPs, pages 2503--2511, 2015.

Cited By

View all
  • (2024)nsDB: Architecting the Next Generation Database by Integrating Neural and Symbolic SystemsProceedings of the VLDB Endowment10.14778/3681954.368200017:11(3283-3289)Online publication date: 30-Aug-2024
  • (2024)Pushing ML Predictions into DBMSs (Extended Abstract)2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00494(5725-5726)Online publication date: 13-May-2024
  • (2024)TensorTable: Extending PyTorch for mixed relational and linear algebra pipelinesBenchCouncil Transactions on Benchmarks, Standards and Evaluations10.1016/j.tbench.2024.1001614:1(100161)Online publication date: Mar-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '21: Proceedings of the 2021 International Conference on Management of Data
June 2021
2969 pages
ISBN:9781450383431
DOI:10.1145/3448016
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 June 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. in-rdbms ml prediction
  2. machine learning
  3. relational databases

Qualifiers

  • Short-paper

Conference

SIGMOD/PODS '21
Sponsor:

Acceptance Rates

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)71
  • Downloads (Last 6 weeks)1
Reflects downloads up to 11 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)nsDB: Architecting the Next Generation Database by Integrating Neural and Symbolic SystemsProceedings of the VLDB Endowment10.14778/3681954.368200017:11(3283-3289)Online publication date: 30-Aug-2024
  • (2024)Pushing ML Predictions into DBMSs (Extended Abstract)2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00494(5725-5726)Online publication date: 13-May-2024
  • (2024)TensorTable: Extending PyTorch for mixed relational and linear algebra pipelinesBenchCouncil Transactions on Benchmarks, Standards and Evaluations10.1016/j.tbench.2024.1001614:1(100161)Online publication date: Mar-2024
  • (2023)Pushing ML Predictions Into DBMSsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.326959235:10(10295-10308)Online publication date: 1-Oct-2023

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media