Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3183713.3183751acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Accelerating Machine Learning Inference with Probabilistic Predicates

Published: 27 May 2018 Publication History

Abstract

Classic query optimization techniques, including predicate pushdown, are of limited use for machine learning inference queries, because the user-defined functions (UDFs) which extract relational columns from unstructured inputs are often very expensive; query predicates will remain stuck behind these UDFs if they happen to require relational columns that are generated by the UDFs. In this work, we demonstrate constructing and applying probabilistic predicates to filter data blobs that do not satisfy the query predicate; such filtering is parametrized to different target accuracies. Furthermore, to support complex predicates and to avoid per-query training, we augment a cost-based query optimizer to choose plans with appropriate combinations of simpler probabilistic predicates. Experiments with several machine learning workloads on a big-data cluster show that query processing improves by as much as 10x.

References

[1]
Free video trigger app. http://bit.ly/2ufJSSs.
[2]
In more cities, a camera on every corner, park and sidewalk. http://n.pr/2tKQEg3.
[3]
Shun-ichi Amari and Si Wu. Improving support vector machine classifiers by modifying kernel functions. Neural Networks, 12(6):783--789, 1999.
[4]
Barak Ariel, William Farrar, and Alex Sutherland. The effect of police body-worn cameras on use of force and citizens complaints against the police: A randomized controlled trial. J. of quantitative criminology, 31(3):509--535, 2015.
[5]
Michael Armbrust et al. Spark SQL: Relational Data Processing in Spark. In SIGMOD, 2015.
[6]
Josh Attenberg, Kilian Weinberger, Anirban Dasgupta, Alex Smola, and Martin Zinkevich. Collaborative email-spam filtering with the hashing trick. In 6th Conf. on Email and Anti-Spam, 2009.
[7]
Shivnath Babu, Rajeev Motwani, Kamesh Munagala, Itaru Nishizawa, and Jennifer Widom. Adaptive ordering of pipelined stream filters. In ACM SIGMOD, 2004.
[8]
Jon Louis Bentley. Multidimensional binary search trees used for associative searching. Comm. of the ACM, 18(9):509--517, 1975.
[9]
Leo Breiman. Random forests. Mach. Learn., 45(1):5--32, October 2001.
[10]
Mark W Burris. Application of variable tolls on congested toll road. Journal of transportation engineering, 129(4):354--361, 2003.
[11]
Ronnie Chaiken et al. SCOPE: Easy and Efficient Parallel Processing of Massive Datasets. In VLDB, 2008.
[12]
Craig Chambers et al. Flumejava: easy, efficient data-parallel pipelines. In PLDI, 2010.
[13]
Surajit Chaudhuri, Vivek R. Narasayya, and Sunita Sarawagi. Efficient evaluation of queries with mining predicates. In ICDE, 2002.
[14]
Robert T Collins et al. A system for video surveillance and monitoring. VSAM final report, pages 1--68, 2000.
[15]
Navneet Dalal and Bill Triggs. Histograms of oriented gradients for human detection. In CVPR, 2005.
[16]
James Davidson et al. The youtube video recommendation system. In ACM conference on Recommender systems, 2010.
[17]
Amol Deshpande, Carlos Guestrin, Sam Madden, and Wei Hong. Exploiting correlated attributes in acquisitional query processing. In ICDE, 2005.
[18]
Christos Faloutsos. Searching Multimedia Databases by Content. Kluwer Academic Publishers, Norwell, MA, USA, 1996.
[19]
Gene H Golub and Charles F Van Loan. Matrix computations. 2012.
[20]
Jim Gray et al. Data cube: A relational aggregation operator generalizing group-by, cross-tab, and sub-total. In ICDE, 1996.
[21]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In CVPR, 2016.
[22]
Joseph M Hellerstein and Michael Stonebraker. Predicate migration: Optimizing queries with expensive predicates. ACM SIGMOD, 1993.
[23]
Nacim Ihaddadene and Chabane Djeraba. Real-time crowd motion analysis. In ICPR, 2008.
[24]
Yu-Gang Jiang, Chong-Wah Ngo, and Jun Yang. Towards optimal bag-of-features for object categorization and semantic video retrieval. In ACM Conf. on Image and video retrieval, 2007.
[25]
Thorsten Joachims. Training linear svms in linear time. In SIGKDD, 2006.
[26]
Manas Joglekar, Hector Garcia-Molina, Aditya Parameswaran, and Christopher Re. Exploiting correlations for expensive predicate evaluation. arXiv preprint arXiv:1411.3374, 2014.
[27]
Manas Joglekar, Hector Garcia-Molina, Aditya Parameswaran, and Christopher Re. Exploiting correlations for expensive predicate evaluation. In SIGMOD, 2015.
[28]
Ian Jolliffe. Principal component analysis. Wiley Online Library, 2002.
[29]
Daniel Kang, John Emmons, Firas Abuzaid, Peter Bailis, and Matei Zaharia. NoScope: Optimizing neural network queries over video at scale. VLDB, 2017.
[30]
A Kemper, G Moerkotte, K Peithner, and M Steinbrunn. Optimizing disjunctive queries with expensive predicates. In SIGMOD, 1994.
[31]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks. In NIPS, 2012.
[32]
Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-based learning applied to document recognition. Proc. IEEE, 86(11):2278--2324, 1998.
[33]
Yann LeCun et al. Handwritten digit recognition with a back-propagation network. In NIPS, 1990.
[34]
Alon Levy, Inderpal Mumick, and Yehoshua Sagiv. Query optimization by predicate move-around. In VLDB, 1994.
[35]
Tsung-Yi Lin et al. Microsoft COCO: Common objects in context. In ECCV, 2014.
[36]
Yao Lu, Aakanksha Chowdhery, and Srikanth Kandula. Optasia: A relational platform for efficient large-scale video analytics. In ACM SoCC, 2016.
[37]
Yao Lu, Wei Zhang, Ke Zhang, and Xiangyang Xue. Semantic context learning with large-scale weakly-labeled image set. In CIKM, 2012.
[38]
Bruce D Lucas and Takeo Kanade. An iterative image registration technique with an application to stereo vision. In IJCAI, 1981.
[39]
Thomas Neumann, Sven Helmer, and Guido Moerkotte. On the optimal ordering of maps and selections under factorization. In ICDE, 2005.
[40]
Ioannis Partalas et al. LSHTC: A benchmark for large-scale text classification. arXiv preprint arXiv:1503.08581, 2015.
[41]
Genevieve Patterson, Chen Xu, Hang Su, and James Hays. The sun attribute database: Beyond categories for deeper scene understanding. IJCV, 2014.
[42]
Anand Rajaraman, Jeffrey D Ullman, Jeffrey David Ullman, and Jeffrey David Ullman. Mining of massive datasets. 2012.
[43]
Murray Rosenblatt et al. Remarks on some nonparametric estimates of a density function. The Annals of Mathematical Statistics, 27(3):832--837, 1956.
[44]
Narayanan Shivakumar, Hector Garcia-Molina, and Chandra Chekuri. Filtering with approximate predicates. In VLDB, 1998.
[45]
Bernard W Silverman. Density estimation for statistics and data analysis, volume 26. CRC press, 1986.
[46]
Khurram Soomro, Amir Roshan Zamir, and Mubarak Shah. UCF101: A dataset of 101 human actions classes from videos in the wild. Preprint arXiv:1212.0402, 2012.
[47]
Abhinav Srivastava, Amlan Kundu, Shamik Sural, and Arun K Majumdar. Credit card fraud detection using hidden markov model. IEEE Trans. on Dependable and Secure Computing, 2008.
[48]
Ashish Thusoo et al. Hive: A Warehousing Solution Over A Map-Reduce Framework. Proc. VLDB Endow., 2009.
[49]
Jeffrey Ullman. Principles of database and knowledge-base systems, 1989.
[50]
Vladimir Naumovich Vapnik and Vlamimir Vapnik. Statistical learning theory, volume 1. Wiley New York, 1998.
[51]
Paul Viola and Michael Jones. Rapid object detection using a boosted cascade of simple features. In CVPR, 2001.
[52]
Xin Wang et al. IDK Cascades: Fast Deep Learning by Learning not to Overthink. Preprint arXiv:1706.00885, 2017.
[53]
Kilian Weinberger, Anirban Dasgupta, John Langford, Alex Smola, and Josh Attenberg. Feature hashing for large scale multitask learning. In ICML, 2009.
[54]
Longyin Wen et al. Detrac: A new benchmark and protocol for multi-object tracking. Preprint arXiv:1511.04136, 2015.
[55]
Xiangyang Xue, Wei Zhang, Jie Zhang, Bin Wu, Jianping Fan, and Yao Lu. Correlative multi-label multi-instance image annotation. In ICCV, 2011.

Cited By

View all
  • (2024)Optimizing Video Queries with Declarative CluesProceedings of the VLDB Endowment10.14778/3681954.368199817:11(3256-3268)Online publication date: 1-Jul-2024
  • (2024)Optimizing Video Selection LIMIT Queries with Commonsense KnowledgeProceedings of the VLDB Endowment10.14778/3654621.365463917:7(1751-1764)Online publication date: 1-Mar-2024
  • (2024)SketchQL: Video Moment Querying with a Visual Query InterfaceProceedings of the ACM on Management of Data10.1145/36771402:4(1-27)Online publication date: 30-Sep-2024
  • Show More Cited By

Index Terms

  1. Accelerating Machine Learning Inference with Probabilistic Predicates

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGMOD '18: Proceedings of the 2018 International Conference on Management of Data
    May 2018
    1874 pages
    ISBN:9781450347037
    DOI:10.1145/3183713
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 27 May 2018

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. image analysis
    2. inference
    3. machine learning
    4. model cascades
    5. probabilistic predicates
    6. query processing
    7. user-defined functions
    8. video analysis

    Qualifiers

    • Research-article

    Conference

    SIGMOD/PODS '18
    Sponsor:

    Acceptance Rates

    SIGMOD '18 Paper Acceptance Rate 90 of 461 submissions, 20%;
    Overall Acceptance Rate 785 of 4,003 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)82
    • Downloads (Last 6 weeks)11
    Reflects downloads up to 03 Oct 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Optimizing Video Queries with Declarative CluesProceedings of the VLDB Endowment10.14778/3681954.368199817:11(3256-3268)Online publication date: 1-Jul-2024
    • (2024)Optimizing Video Selection LIMIT Queries with Commonsense KnowledgeProceedings of the VLDB Endowment10.14778/3654621.365463917:7(1751-1764)Online publication date: 1-Mar-2024
    • (2024)SketchQL: Video Moment Querying with a Visual Query InterfaceProceedings of the ACM on Management of Data10.1145/36771402:4(1-27)Online publication date: 30-Sep-2024
    • (2024)GaussML: An End-to-End In-Database Machine Learning System2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00391(5198-5210)Online publication date: 13-May-2024
    • (2023)EQUI-VOCAL Demonstration: Synthesizing Video Queries from User InteractionsProceedings of the VLDB Endowment10.14778/3611540.361160016:12(3978-3981)Online publication date: 1-Aug-2023
    • (2023)DeepVQL: Deep Video Queries on PostgreSQLProceedings of the VLDB Endowment10.14778/3611540.361158316:12(3910-3913)Online publication date: 1-Aug-2023
    • (2023)PAINE Demo: Optimizing Video Selection Queries with Commonsense KnowledgeProceedings of the VLDB Endowment10.14778/3611540.361158116:12(3902-3905)Online publication date: 1-Aug-2023
    • (2023)EQUI-VOCAL: Synthesizing Queries for Compositional Video Events from Limited User InteractionsProceedings of the VLDB Endowment10.14778/3611479.361148216:11(2714-2727)Online publication date: 1-Jul-2023
    • (2023)Scaling a Declarative Cluster Manager Architecture with Query Optimization TechniquesProceedings of the VLDB Endowment10.14778/3603581.360359916:10(2618-2631)Online publication date: 8-Aug-2023
    • (2023)Extract-Transform-Load for Video StreamsProceedings of the VLDB Endowment10.14778/3598581.359860016:9(2302-2315)Online publication date: 1-May-2023
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media