Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

Threshold Queries

Published: 08 June 2023 Publication History

Abstract

Threshold queries are an important class of queries that only require computing or counting answers up to a specified threshold value. To the best of our knowledge, threshold queries have been largely disregarded in the research literature, which is surprising considering how common they are in practice. We explore how such queries appear in practice and present a method that can be used to significantly improve the asymptotic bounds of their state-of-the-art evaluation algorithms. Our experimental evaluation of these methods shows order-of-magnitude performance improvements.

References

[1]
M. Arenas, P. Barcel´o, L. Libkin, W. Martens, and A. Pieris. Database Theory. Open source at https://github.com/pdm-book/community, 2022.
[2]
M. Arenas, L. A. Croquevielle, R. Jayaram, and C. Riveros. When is approximate counting for conjunctive queries tractable? In Proceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing, STOC 2021, pages 1015--1027, New York, NY, USA, 2021. Association for Computing Machinery.
[3]
R. Asif and M. A. Qadir. Enhancing the Nobel Prize schema. In 2017 International Conference on Communication, Computing and Digital Systems (C-CODE), pages 193--198, Islamabad,Pakistan, 2017. IEEE.
[4]
G. Bagan, A. Durand, and E. Grandjean. On acyclic conjunctive queries and constant delay enumeration. In Proc. CSL 2007, volume 4646 of LNCS, pages 208--222, Berlin, Heidelberg, 2007. Springer.
[5]
A.-L. Barab´asi and R. Albert. Emergence of scaling in random networks. Science, 286(5439):509--512, 1999.
[6]
C. Berkholz and N. Schweikardt. Constant delay enumeration with fpt-preprocessing for conjunctive queries of bounded submodular width. In Proc. MFCS 2019, volume 138 of LIPIcs, pages 58:1--58:15, Dagstuhl, Germany, 2019. Schloss Dagstuhl - Leibniz-Zentrum f¨ur Informatik.
[7]
A. Bonifati, S. Dumbrava, G. Fletcher, J. Hidders, M. Hofer, W. Martens, F. Murlak, J. Shinavier, S. Staworko, and D. Tomaszuk. Threshold queries in theory and in the wild. Proc. VLDB Endow., 15(5):1105--1118, 2022.
[8]
A. Bonifati, W. Martens, and T. Timm. Navigating the maze of wikidata query logs. In The World Wide Web Conference, pages 127--138, New York, NY, USA, 2019. Association for Computing Machinery.
[9]
A. Bonifati, W. Martens, and T. Timm. An analytical study of large SPARQL query logs. VLDB J., 29(2--3):655--679, 2020.
[10]
M. J. Carey and D. Kossmann. On saying ?enough already!" in sql. In Proceedings of the 1997 ACM SIGMOD international conference on Management of data, SIGMOD '97, pages 219--230, New York, NY, USA, 1997. Association for Computing Machinery.
[11]
D. D. Chamberlin, M. M. Astrahan, M. W. Blasgen, J. N. Gray, W. F. King, B. G. Lindsay, R. Lorie, J. W. Mehl, T. G. Price, F. Putzolu, P. G. Selinger, M. Schkolnick, D. R. Slutz, I. L. Traiger, B. W. Wade, and R. A. Yost. A history and evaluation of system r. Commun. ACM, 24(10):632--646, oct 1981.
[12]
S. Chaudhuri, R. Motwani, and V. R. Narasayya. On random sampling over joins. In A. Delis, C. Faloutsos, and S. Ghandeharizadeh, editors, SIGMOD 1999, Proceedings ACM SIGMOD International Conference on Management of Data, June 1--3, 1999, Philadelphia, Pennsylvania, USA, volume 28, pages 263--274, New York, NY, USA, 1999. Association for Computing Machinery.
[13]
CovidGraph. COVID-19 Knowledge Graph, 2021. https://covidgraph.org/.
[14]
V. Dalmau and P. Jonsson. The complexity of counting homomorphisms seen from the other side. Theor. Comput. Sci., 329(1--3):315--323, 2004.
[15]
S. Deep and P. Koutris. Compressed representations of conjunctive query results. In J. V. den Bussche and M. Arenas, editors, Proc. PODS 2018, pages 307--322, New York, NY, USA, 2018. ACM.
[16]
A. Durand and S. Mengel. Structural tractability of counting of solutions to conjunctive queries. Theory Comput. Syst., 57(4):1202--1249, 2015.
[17]
J. Finger and N. Polyzotis. Robust and efficient algorithms for rank join evaluation. In Proceedings of the 2009 ACM SIGMOD International Conference on Management of data, pages 415--428, New York, NY, USA, 2009. Association for Computing Machinery.
[18]
J. Flum and M. Grohe. The parameterized complexity of counting problems. SIAM J. Comput., 33(4):892--922, 2004.
[19]
G. Gottlob, G. Greco, and F. Scarcello. Treewidth and hypertree width. In L. Bordeaux, Y. Hamadi, and P. Kohli, editors, Tractability: Practical Approaches to Hard Problems, pages 3--38. Cambridge University Press, 2014.
[20]
M. Grohe, T. Schwentick, and L. Segoufin. When is the evaluation of conjunctive queries tractable? In ACM Symposium on Theory of Computing (STOC), pages 657--666, New York, NY, USA, 2001. Association for Computing Machinery.
[21]
ICIJ. The Offshore Leaks Database, 2022. https: //github.com/ICIJ/offshoreleaks-data-packages.
[22]
I. F. Ilyas, W. G. Aref, and A. K. Elmagarmid. Supporting top-k join queries in relational databases. The VLDB journal, 13(3):207--221, 2004.
[23]
A. Kara and D. Olteanu. Covers of query results. In B. Kimelfeld and Y. Amsterdamer, editors, 21st International Conference on Database Theory, volume 98 of LIPIcs, pages 16:1--16:22, Vienna, Austria, 2018. Schloss Dagstuhl - Leibniz-Zentrum f¨ur Informatik.
[24]
M. Kr¨otzsch. Practical linked data access via SPARQL: The case of wikidata. In LDOW@ WWW, pages 1--10, Lyon, France, 2018. CEUR Workshop Proceedings.
[25]
V. Leis, B. Radke, A. Gubichev, A. Mirchev, P. Boncz, A. Kemper, and T. Neumann. Query optimization through the looking glass, and what we found running the join order benchmark. The VLDB Journal, 27(5):643--668, 2018.
[26]
S. Malyshev, M. Kr¨otzsch, L. Gonz´alez, J. Gonsior, and A. Bielefeldt. Getting the most out of wikidata: Semantic technology usage in wikipedia's knowledge graph. In International Semantic Web Conference (ISWC), pages 376--394, Cham, 2018. Springer.
[27]
N. Mamoulis, M. L. Yiu, K. H. Cheng, and D. W. Cheung. Efficient top-k aggregation of ranked inputs. ACM Transactions on Database Systems (TODS), 32(3):19--es, 2007.
[28]
F. Murlak, J. Posiadala, and P. Susicki. On the semantics of Cypher's implicit group-by. In A. Cheung and K. Nguyen, editors, Proceedings of the 17th ACM SIGPLAN International Symposium on Database Programming Languages, DBPL 2019, Phoenix, AZ, USA, June 23, 2019, pages 59--69. ACM, 2019.
[29]
A. Natsev, Y.-C. Chang, J. R. Smith, C.-S. Li, and J. S. Vitter. Supporting incremental join queries on ranked inputs. In VLDB, volume 1, pages 281--290, San Francisco, CA, USA, 2001. Morgan Kaufmann
[30]
D. Olteanu and J. Z´avodn´y. Size bounds for factorised representations of query results. ACM Trans. Database Syst., 40(1):2:1--2:44, 2015.
[31]
R. Pichler and S. Skritek. Tractable counting of the answers to conjunctive queries. Journal of Computer and System Sciences, 79(6):984--1001, Sep 2013.
[32]
K. Shanley. TPC releases benchmark results on 65 systems. SIGMETRICS Perform. Evaluation Rev., 19(2):19--23, 1991.
[33]
M. D. Vose. A linear algorithm for generating random numbers with a given distribution. IEEE Transactions on software engineering, 17(9):972--975, 1991.
[34]
D. Vrandeci´c. Wikidata: A new platform for collaborative data collection. In Proceedings of the 21st International Conference on World Wide Web, WWW '12 Companion, pages 1063--1064, New York, NY, USA, 2012. Association for Computing Machinery.
[35]
A. J. Walker. An efficient method for generating discrete random variables with general distributions. ACM Trans. Math. Softw., 3(3):253--256, 1977.
[36]
M. Yannakakis. Algorithms for acyclic database schemes. In Proc. VLDB 1981, pages 82--94, Cannes, France, 1981. IEEE Computer Society.
[37]
Z. Zhao, R. Christensen, F. Li, X. Hu, and K. Yi. Random sampling over joins revisited. In G. Das, C. M. Jermaine, and P. A. Bernstein, editors, Proceedings of the 2018 International Conference on Management of Data, SIGMOD Conference 2018, Houston, TX, USA, June 10--15, 2018, pages 1525--1539, New York, NY, USA, 2018. Association for Computing Machinery.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM SIGMOD Record
ACM SIGMOD Record  Volume 52, Issue 1
March 2023
118 pages
ISSN:0163-5808
DOI:10.1145/3604437
Issue’s Table of Contents
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 June 2023
Published in SIGMOD Volume 52, Issue 1

Check for updates

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 90
    Total Downloads
  • Downloads (Last 12 months)57
  • Downloads (Last 6 weeks)9
Reflects downloads up to 07 Nov 2024

Other Metrics

Citations

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media