Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article
Free access

Answering queries using materialized views with minimum size

Published: 01 September 2006 Publication History

Abstract

In this paper, we study the following problem. Given a database and a set of queries, we want to find a set of views that can compute the answers to the queries, such that the amount of space, in bytes, required to store the viewset is minimum on the given database. (We also handle problem instances where the input has a set of database instances, as described by an oracle that returns the sizes of view relations for given view definitions.) This problem is important for applications such as distributed databases, data warehousing, and data integration. We explore the decidability and complexity of the problem for workloads of conjunctive queries. We show that results differ significantly depending on whether the workload queries have self-joins. Further, for queries without self-joins we describe a very compact search space of views, which contains all views in at least one optimal viewset. We present techniques for finding a minimum-size viewset for a single query without self-joins by using the shape of the query and its constraints, and validate the approach by extensive experiments.

References

[1]
1. Bernstein, P.A., Goodman, N., Wong, E., Reeve, C.L., Rothnie, Jr., J.B.: Query processing in a system for distributed databases (SDD-1). ACM Trans. Database Syst. (TODS) 6(4), 602-625 (1981).
[2]
2. Ceri, S., Pelagatti, G.: Distributed Databases: Principles and Systems. McGraw-Hill Book Company, New York (1984).
[3]
3. Özsu, M.T., Valduriez, P.: Principles of Distributed Database Systems. Prentice-Hall, Englewood Cliffs, NJ (1999).
[4]
4. Lenzerini, M.: Data integration: A theoretical perspective. PODS pp. 233-246 (2002).
[5]
5. Hacigümüs, H., Iyer, B., Mehrotra, S.: Providing database as a service. ICDE (2002).
[6]
6. Wiederhold, G.: Mediators in the architecture of future information systems. IEEE Comput. 25(3), 38-49 (1992).
[7]
7. Haas, L.M., Kossmann, D., Wimmers, E.L., Yang, J.: Optimizing queries across diverse data sources. In: Proc. VLDB pp. 276-285 (1997).
[8]
8. Hacigümüs, H., Iyer, B., Li, C., Mehrotra, S.: Executing SQL over encrypted data in the database-service-provider model. SIGMOD (2002).
[9]
9. Gupta, H.: Selection of views to materialize in a data warehouse. ICDT (1997).
[10]
10. Theodoratos, D., Ligoudistianos, S., Sellis, T.: Designing the global data warehouse with spj views. CAiSE (1999).
[11]
11. Theodoratos, D., Sellis, T.: Data warehouse configuration. In: Proc. VLDB (1997).
[12]
12. Yang, J., Karlapalem, K., Li, Q.: Algorithms for materialized view design in data warehousing environment. In: Proc. VLDB (1997).
[13]
13. Baralis, E., Paraboschi, S., Teniente, E.: Materialized view selection in a multidimensional database. In: Proc. VLDB (1997).
[14]
14. Gupta, H., Harinarayan, V., Rajaraman, A., Ullman, J.: Index selection in olap. ICDE (1997).
[15]
15. Harinarayan, V., Rajaraman, A., Ullman, J.: Implementing data cubes efficiently. SIGMOD (1996).
[16]
16. Bello, R., Dias, K., Downing, A., Feenan, J., Finnerty, J., Norcott, W., Sun, H., Witkowski, A., Ziauddin, M.: Materialized views in Oracle. In: Proc. VLDB pp. 659-664 (1998).
[17]
17. Goldstein, J., Larson, P.-A.: Optimizing queries using materialized views: A practical, scalable solution. SIGMOD 331-342 (2001).
[18]
18. Zaharioudakis, M., Cochrane, R., Lapis, G., Pirahesh, H., Urata, M.: Answering complex SQL queries using automatic summary tables. SIGMOD pp. 105-116 (2000).
[19]
19. Agrawal, S., Chaudhuri, S., Narasayya, V.: Automated selection of materialized views and indexes in Microsoft SQL Server. Proc. VLDB 496-505 (2000).
[20]
20. Chirkova, R.: The view-selection problem has an exponential-time lower bound for conjunctive queries and views. In: PODS pp. 159-168 (2002).
[21]
21. Chirkova, R., Genesereth, M.R.: Linearly bounded reformulations of conjunctive databases. DOOD (2000).
[22]
22. Chirkova, R., Halevy, A.Y., Suciu, D.: A formal perspective on the view selection problem. VLDB J. 11(3), pp. 216-237. (2002).
[23]
23. Afrati, F., Li, C., Ullman, J.D.: Generating efficient plans using views. pp. 319-330, SIGMOD (2001).
[24]
24. Levy, A., Mendelzon, A.O., Sagiv, Y., Srivastava, D.: Answering queries using views. pp. 95-104, PODS (1995).
[25]
25. Halevy, A.: Answering queries using views: A survey. VLDB J. (2001).
[26]
26. Calvanese, D., De Giacomo, G., Lenzerini, M.: Answering queries using views over description logics knowledge bases. PODS pp. 386-391 (2000).
[27]
27. Calvanese, D., De Giacomo, G., Lenzerini, M., Vardi, M.Y.: Answering regular path queries using views. ICDE pp. 389-398 (2000).
[28]
28. Li, C., Bawa, M., Ullman, J.D.: Minimizing view sets without losing query-answering power. ICDT pp. 99-113 (2001).
[29]
29. Chen, Z., Seshadri, P.: An algebraic compression framework for query results. ICDE pp. 177-188 (2000).
[30]
30. TPC-H: http://www.tpc.org/tpch/
[31]
31. Chandra, A.K., Merlin, P.M.: Optimal implementation of conjunctive queries in relational data bases. STOC pp. 77-90 (1977).
[32]
32. Sagiv, Y., Yannakakis, M.: Equivalences among relational expressions with the union and difference operators. J. ACM 27(4), 633- 655 (1980).
[33]
33. Pottinger, R., Levy, A.: A scalable algorithm for answering queries using views. Proc. VLDB (2000).
[34]
34. Yannakakis, M.: Algorithms for acyclic database schemes. Proc. VLDB, IEEE Computer Society Press 82-94 (1981).
[35]
35. Gupta, A., Sagiv, Y., Ullman, J.D., Widom, J.: Constraint checking with partial information. PODS pp. 45-55 (1994).
[36]
36. Klug, A.: On conjunctive queries containing inequalities. J. ACM 35(1), 146-160 (1988).
[37]
37. Selinger, P.G., Astrahan, M.M., Chamberlin, D.D., Lorie, R.A., Price, T.G.: Access path selection in a relational database management system. SIGMOD pp. 23-34 (1979).
[38]
38. Graham, M.: On the universal relation. Technical report, University of Toronto, Canada (1979).
[39]
39. Barsalou, T., Keller, A.M., Siambela, N., Wiederhold, G.: Updating relational databases through object-based views. In: Proceedings of the 1991 ACM SIGMOD International Conference on Management of Data, pp. 248-257 (1991).
[40]
40. Ioannidis, Y.E., Poosala, V.: Balancing histogram optimality and practicality for query result size estimation. In: Proceedings of the SIGMOD Conference, pp. 233-244 (1995).
[41]
41. Gibbons, P.B., Matias, Y.: New sampling-based summary statistics for improving approximate query answers. In: Proceedings of the SIGMOD Conference, pp. 331-342 (1998).
[42]
42. Chaudhuri, S.: An overview of query optimization in relational systems. PODS pp. 34-43 (1998).
[43]
43. Haas, P.J., Naughton, J.F., Seshadri, S., Stokes, L.: Sampling-based estimation of the number of distinct values of an attribute. In: Proc. VLDB pp. 311-322 (1995).

Cited By

View all
  • (2023)RavenProceedings of the Workshop on Human-In-the-Loop Data Analytics10.1145/3597465.3605219(1-7)Online publication date: 18-Jun-2023
  • (2020)Shared Execution Techniques for Business Data Analytics over Big Data StreamsProceedings of the 32nd International Conference on Scientific and Statistical Database Management10.1145/3400903.3400932(1-4)Online publication date: 7-Jul-2020
  • (2020)QuickPoint: Efficiently Identifying Densest Sub-Graphs in Online Social Networks for Event Stream DisseminationIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2018.288143532:2(332-346)Online publication date: 9-Jan-2020
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image The VLDB Journal — The International Journal on Very Large Data Bases
The VLDB Journal — The International Journal on Very Large Data Bases  Volume 15, Issue 3
September 2006
100 pages

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 01 September 2006

Author Tags

  1. Data warehouses
  2. Distributed systems
  3. Minimum-size viewsets
  4. Views

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)18
  • Downloads (Last 6 weeks)6
Reflects downloads up to 24 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2023)RavenProceedings of the Workshop on Human-In-the-Loop Data Analytics10.1145/3597465.3605219(1-7)Online publication date: 18-Jun-2023
  • (2020)Shared Execution Techniques for Business Data Analytics over Big Data StreamsProceedings of the 32nd International Conference on Scientific and Statistical Database Management10.1145/3400903.3400932(1-4)Online publication date: 7-Jul-2020
  • (2020)QuickPoint: Efficiently Identifying Densest Sub-Graphs in Online Social Networks for Event Stream DisseminationIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2018.288143532:2(332-346)Online publication date: 9-Jan-2020
  • (2016)SQL Scorecard for Improved Stability and Performance of Data WarehousesInternational Journal of Software Innovation10.4018/IJSI.20160701024:3(22-37)Online publication date: 1-Jul-2016
  • (2011)TIProceedings of the 2011 ACM SIGMOD International Conference on Management of data10.1145/1989323.1989391(649-660)Online publication date: 12-Jun-2011
  • (2008)A materialized view selection technique for XPath queriesProceedings of the 2nd conference on European computing conference10.5555/1562423.1562474(287-292)Online publication date: 11-Sep-2008

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media