Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3068943.3068947acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article
Public Access

Performance Prediction for Graph Queries

Published: 14 May 2017 Publication History

Abstract

Query performance prediction has shown benefits to query optimization and resource allocation for relational databases. Emerging applications are leading to search scenarios where workloads with heterogeneous, structure-less analytical queries are processed over large-scale graph and network data. This calls for effective models to predict the performance of graph analytical queries, which are often more involved than their relational counterparts.
In this paper, we study and evaluate predictive techniques for graph query performance prediction. We make several contributions. (1) We propose a general learning framework that makes use of practical and computationally efficient statistics from query scenarios and employs regression models. (2) We instantiate the framework with two routinely issued query classes, namely, reachability and graph pattern matching, that exhibit different query complexity. We develop modeling and learning algorithms for both query classes. (3) We show that our prediction models readily apply to resource-bounded querying, by providing a learning-based workload optimization strategy. Given a query workload and a time bound, the models select queries to be processed with a maximized query profit and a total cost within the bound. Using real-world graphs, we experimentally demonstrate the efficacy of our framework in terms of accuracy and the effectiveness of workload optimization.

References

[1]
D. W. Aha, D. Kibler, and M. K. Albert. Instance-based learning algorithms. Machine learning, pages 37--66, 1991.
[2]
M. Akdere, U. Çetintemel, M. Riondato, E. Upfal, and S. B. Zdonik. Learning-based query performance modeling and prediction. In ICDE, 2012.
[3]
M. Arias, J. D. Fernndez, M. A. Martnez-Prieto, and P. de la Fuente. An empirical study of real-world sparql queries. 2011.
[4]
L. Breiman. Random forests. Machine learning, pages 5--32, 2001.
[5]
L. Breiman, J. Friedman, C. J. Stone, and R. A. Olshen. Classification and regression trees. 1984.
[6]
J. Duggan, U. Cetintemel, O. Papaemmanouil, and E. Upfal. Performance prediction for concurrent database workloads. In SIGMOD, 2011.
[7]
R. Fagin, A. Lotem, and M. Naor. Optimal aggregation algorithms for middleware. Journal of computer and system sciences, 66(4):614--656, 2003.
[8]
W. Fan, X. Wang, and Y. Wu. Querying big graphs within bounded resources. In SIGMOD, 2014.
[9]
A. Ganapathi, H. Kuno, U. Dayal, J. L. Wiener, A. Fox, M. Jordan, and D. Patterson. Predicting multiple metrics for queries: Better decisions enabled by machine learning. In ICDM, 2009.
[10]
Q. Guo, R. W. White, S. T. Dumais, J. Wang, and B. Anderson. Predicting query performance using query, result, and user interaction features. In Adaptivity, Personalization and Fusion of Heterogeneous Information, 2010.
[11]
C. Gupta, A. Mehta, and U. Dayal. Pqr: Predicting query execution times for autonomous workload management. In Autonomic Computing, 2008. ICAC'08. International Conference on, 2008.
[12]
R. Hasan and F. Gandon. A machine learning approach to sparql query performance prediction. In WI-IAT, 2014.
[13]
C. Hauff, D. Kelly, and L. Azzopardi. A comparison of user and system query performance predictions. In CIKM, 2010.
[14]
J. Li, A. C. König, V. Narasayya, and S. Chaudhuri. Robust estimation of resource consumption for sql queries using statistical techniques. Proc. VLDB >Endow., pages 1555--1566, 2012.
[15]
G. Louppe, L. Wehenkel, A. Sutera, and P. Geurts. Understanding variable importances in forests of randomized trees. In NIPS, pages 431--439, 2013.
[16]
J. Lu, C. Lin, W. Wang, C. Li, and H. Wang. String similarity measures and joins with synonyms. In SIGMOD, 2013.
[17]
S. Ma, Y. Cao, W. Fan, J. Huai, and T. Wo. Capturing topology in graph pattern matching. VLDB, pages 310--321, 2011.
[18]
M. Morsey, J. Lehmann, S. Auer, and A.-C. Ngonga Ngomo. D Bpedia SPARQL Benchmark -- Performance Assessment with Real Queries on Real Data, pages 454--469. 2011.
[19]
M. H. Namaki, R. R. Chowdhury, M. R. Islam, J. R. Doppa, and Y. Wu. Learning to speed up query planning in graph databases. In ICAPS, 2017.
[20]
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in Python. JMLR, pages 2825--2830, 2011.
[21]
H. Qu and A. Labrinidis. Preference-aware query and update scheduling in web-databases. In ICDE, pages 356--365, 2007.
[22]
G. Seni and J. F. Elder. Ensemble methods in data mining: improving accuracy through combining predictions. Synthesis Lectures on Data Mining and Knowledge Discovery, pages 1--126, 2010.
[23]
S. K. Shevade, S. S. Keerthi, C. Bhattacharyya, and K. R. K. Murthy. Improvements to the smo algorithm for svm regression. TNNLS, pages 1188--1193, 2000.
[24]
V. V. Vazirani. Approximation algorithms. Springer Science & Business Media, 2013.
[25]
P. F. Velleman and D. C. Hoaglin. Applications, basics, and computing of exploratory data analysis. 1981.
[26]
W. Wu, Y. Chi, S. Zhu, J. Tatemura, H. Hacigümüs, and J. F. Naughton. Predicting query execution time: Are optimizer cost models really unusable? In ICDE, pages 1081--1092, 2013.
[27]
K. Yang, J. Li, and C. Wang. Missing values estimation in microarray data with partial least squares regression. In ICCS, pages 662--669, 2006.
[28]
S. Yang, F. Han, Y. Wu, and X. Yan. Fast top-k search in knowledge graphs. 2016.
[29]
S. Yang, Y. Wu, H. Sun, and X. Yan. Schemaless and structureless graph querying. VLDB, 2014.
[30]
N. Zhang, P. J. Haas, V. Josifovski, G. M. Lohman, and C. Zhang. Statistical learning techniques for costing xml queries. In VLDB, pages 289--300, 2005.

Cited By

View all
  • (2024)Cardinality estimation for property graph queries with gated learning approach on the graph databaseMultimedia Tools and Applications10.1007/s11042-024-19215-7Online publication date: 7-May-2024
  • (2022)Execution Time Prediction for Cypher Queries in the Neo4j Database Using a Learning ApproachSymmetry10.3390/sym1401005514:1(55)Online publication date: 1-Jan-2022
  • (2018)Network Similarity Prediction in Time-Evolving Graphs: A Machine Learning Approach2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW.2018.00183(1184-1193)Online publication date: May-2018
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
NDA'17: Proceedings of the 2nd International Workshop on Network Data Analytics
May 2017
46 pages
ISBN:9781450349901
DOI:10.1145/3068943
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 May 2017

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

Conference

SIGMOD/PODS'17
Sponsor:

Acceptance Rates

Overall Acceptance Rate 4 of 8 submissions, 50%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)81
  • Downloads (Last 6 weeks)12
Reflects downloads up to 08 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Cardinality estimation for property graph queries with gated learning approach on the graph databaseMultimedia Tools and Applications10.1007/s11042-024-19215-7Online publication date: 7-May-2024
  • (2022)Execution Time Prediction for Cypher Queries in the Neo4j Database Using a Learning ApproachSymmetry10.3390/sym1401005514:1(55)Online publication date: 1-Jan-2022
  • (2018)Network Similarity Prediction in Time-Evolving Graphs: A Machine Learning Approach2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW.2018.00183(1184-1193)Online publication date: May-2018
  • (2018)Multi-metric Graph Query Performance PredictionDatabase Systems for Advanced Applications10.1007/978-3-319-91452-7_19(289-306)Online publication date: 13-May-2018
  • (2017)Event pattern discovery by keywords in graph streams2017 IEEE International Conference on Big Data (Big Data)10.1109/BigData.2017.8258019(982-987)Online publication date: Dec-2017

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media