Approximate Query Answering Using Data Warehouse Striping

Bernardino, Jorge R.; Furtado, Pedro S.; Madeira, Henrique C.

doi:10.1023/A:1016551309288

Approximate Query Answering Using Data Warehouse Striping

Published: September 2002

Volume 19, pages 145–167, (2002)
Cite this article

Journal of Intelligent Information Systems Aims and scope Submit manuscript

Jorge R. Bernardino¹,
Pedro S. Furtado² &
Henrique C. Madeira²

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Abstract

This paper presents and evaluates a simple but very effective method to implement large data warehouses on an arbitrary number of computers, achieving very high query execution performance and scalability. The data is distributed and processed in a potentially large number of autonomous computers using our technique called data warehouse striping (DWS). The major problem of DWS technique is that it would require a very expensive cluster of computers with fault tolerant capabilities to prevent a fault in a single computer to stop the whole system. In this paper, we propose a radically different approach to deal with the problem of the unavailability of one or more computers in the cluster, allowing the use of DWS with a very large number of inexpensive computers. The proposed approach is based on approximate query answering techniques that make it possible to deliver an approximate answer to the user even when one or more computers in the cluster are not available. The evaluation presented in the paper shows both analytically and experimentally that the approximate results obtained this way have a very small error that can be negligible in most of the cases.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

PatchIndex: exploiting approximate constraints in distributed databases

Article Open access 06 March 2021

Data-induced predicates for sideways information passing in query optimizers

Article 29 August 2021

Distributed Data Warehouse Resource Monitoring

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Acharaya, S., Gibbons, P., and Poosala, V. (2000). Congressional Samples for Approximate Answering of Groupby-Queries. In Proc. of ACM SIGMOD Int. Conf. on Management of Data, Dallas, Texas, USA (pp. 487–498).
Albrecht, J., Gunzel, H., and Lehner, W. (1998). An Architecture for Distributed OLAP. In Int. Conf. on Parallel and Distributed Processing Techniques and Applications (PDPTA), Las Vegas, USA.
Barbara, D. et al. (1997). The New Jersey Data Reduction Report. Bulletin of the Technical Committee on Data Engineering, 20(4), 3–45.
Google Scholar
Bernardino, J. and Madeira, H. (2000). A New Technique to Speedup Queries in Data Warehousing. In Proc. of Chalenges ADBIS-DASFA A Symposium on Advances in Databases and Information Systems, Prague, Czech Republic (pp. 21–32).
Bernardino, J. and Madeira, H. (2001). Experimental Evaluation of a New Distributed Partitioning Technique for DataWarehouses. In Proc. of Int. Database Engineering &; Applications Symposium IDEAS, Grenoble, France (pp. 312–321).
Chauduri, S. and Dayal, U. (1997). An Overview of DataWarehousing and OLAP Technology. SIGMOD Record, 26(1), 65–74.
Google Scholar
Chen, C.M. and Roussopoulos, N. (1994). Adaptive Selectivity Estimation Using Query Feedback. In Proc. ACM SIGMOD Int. Conf. on Management of Data (pp. 161–172).
Cochran, W.G. (1977). Sampling Techniques (3rd edn.). New York: John Wiley &; Sons.
Google Scholar
Codd, E.F., Codd, S.B., and Salley, C.T. (1993). Providing OLAP (Online Analitycal Processing) to User Analysts: An IT Mandate. Technical Report, E.F. Codd &; Associates.
Datta, A., Moon, B., and Thomas, H. (1998). A Case for Parallelism in Data Warehousing and OLAP. In Proc. of the 9th Int. Conf. on Database and Expert Systems Applications DEXA Workshop (pp. 226–231).
DeWitt, D.J. et al. (1990). The Gamma Database Machine Project. IEEE Trans. Knowledge and Data Engineering, 2(1), 44–62.
Google Scholar
DeWitt, D.J. and Gray, J. (1992). Parallel Database Systems: The Future of High Performance Database Systems. Communications of the ACM, 35(6), 85–98.
Google Scholar
Ganguly, S., Gibbons, P.B., Matias, Y., and Silberschatz, A. (1996). Bifocal Sampling for Skew-Resistant Join Size Estimation. In Proc. ACM SIGMOD Int. Conf. on Management of Data (pp. 271–281).
Gibbons, P.B. and Matias, Y. (1998a). New Sampling-Based Summary Statistics for Improving Approximate Query Answers. In Proc. ACM SIGMOD Int. Conf. on Management of Data (pp. 331–342).
Gibbons, P.B. and Matias, Y. (1998b). AQUA: System and Techniques for Approximate Query Answering. Bell Labs Technical Report.
Gibbons, P.B., Matias, Y., and Poosala, V. (1997a). Aqua Project, White Paper. Technical Report, Bell Laboratories, Murray Hill, New Jersey.
Google Scholar
Gibbons, P.B., Matias, Y., and Poosala, V. (1997b). Fast Incremental Maintenance of Approximate Histograms. In Proc. 23rd Int. Conf. on Very Large Data Bases VLDB (pp. 466–475).
Haas, P.J. (1997). Large-Sample and Deterministic Confidence Intervals for Online Aggregation. In uProc. 9th Int. Conf. on Scientific and Statistical Database Management, SSDBM (pp. 51–62).
Haas, P.J. (1999). Techniques for Online Exploration of Large Object-Relational Datasets. In Proc. 9th Int. Conf. on Scientific and Statistical Database Management, SSDBM (pp. 4–12).
Haas, P.J., Naughton, J.F., Seshadri, S., and Stokes, L. (1995). Sampling-Based Estimation of the Number of Distinct Values of an Attribute. In Proc. 21st Int. Conf. on Very Large Data Bases VLDB (pp. 311–322).
Haas, P.J., Naughton, J.F., and Swami, A.N. (1994). On the Relative Cost of Sampling for Join Selectivity Estimation. In Proc. 13th ACM Symp. on Principles of Database Systems (pp. 14–24).
Hansen, M.H., Hurwitz, W.M., and Madow, W.G. (1953). Sample Survey Methods and Theory (vols. I e II). New York: John Wiley &; Sons.
Google Scholar
Hellerstein, J.M., Haas, P.J., and Wang, H.J. (1997). Online Aggregation. In Proc. ACM SIGMOD Int. Conf. on Management of Data (pp. 171–182).
Hou, W.-C. and Taneja, B.K. (1998). Statistical Estimators for Relational Algebra Expressions. In Proc. 7th ACM Symp. on Principles of Database Systems (pp. 276–287).
Kimball, R. (1996). The Data Warehouse Toolkit. New York: J. Wiley &; Sons.
Google Scholar
Kimball, R., Reeves, L., Ross, M., and Thornthwalte, W. (1998). The Data Warehouse Lifecycle Toolkit. New York: J. Wiley &; Sons.
Google Scholar
Kooi, R.P. (1980). The Optimization of Queries in Relational Databases. PhD Thesis, Case Western Reserve University.
Lipton, R.J. and Naughton, J.F. (1995). Query Size Estimation by Adaptive Sampling. J. Computer and System Sciences, 51(1), 18–25.
Google Scholar
Lipton, R.J., Naughton, J.F., and Schneider, D.A. (1990). Practical Selectivity Estimation Through Adaptive Sampling. In Proc. ACM SIGMOD Int. Conf. on Management of Data (pp. 1–11).
Lu, H., Ooi, B.C., and Tan, K.L. (1994). Query Processing in Parallel Relational Database Systems. IEEE Computer Society.
Olap Council, APB-1 Benchmark, Olap Council, November 1998, available at www.olpacouncil.org.
Olken, F. and Rotem, D. (1992). Maintenance of Materialized Views of Sampling Queries. In Proc. 8th IEEE Int. Conf. on Data Engineering ICDE (pp. 632–664).
Poosala, V. (1997). Histogram-Based Estimation Techniques in Databases. PhD Thesis, University of Wisconsin-Madison.
Google Scholar
Poosala, V., Ganti, V., and Ioannidis, Y.E. (1999). Approximate Query Answering Using Histograms. IEEE Data Engineering Bulletin, 22(4), 5–14.
Google Scholar
Poosala, V., Ioannidis, Y.E., Haas, P.J., and Shekita, E.J. (1996). Improved Histograms for Selectivity Estimation of Range Predicates. In Proc. ACM SIGMOD Int. Conf. on Management of Data (pp. 294–305).
Rao, J. and Ross, K.A. (1998). Reusing Invariants: A New Strategy for Correlated Queries. In Proc. of ACM SIGMOD Int. Conf. on Management of Data, Seattle, USA (pp. 37–48).
Selinger, P. et al. (1979). Access Path Selection in a Relational Database Management System. In Proc. ACM SIGMOD Int. Conf. on Management of Data (pp. 23–34).
Seshadri, P., Pirahesh, H., and Cliff, T.Y. (1996). Complex Query Decorrelation. In Proc. IEEE Int. Conf. on Data Engineering ICDE (pp. 450–458).
Stonebraker, M., Katz, R., Patterson, D., and Oustershout, J. (1998). The Design of XPRS. In Proc. of the Int. Conf. on Very Large Databases VLDB, Los Angeles, USA (pp. 318–330).
Transaction Processing Council (1999). TPC Benchmark H. Transaction Processing Council, June 1999, available at www.tpc.org.
Vitter, J. and Wang, M. (1999). Approximate Computation of Multidimensional Aggregates of Sparse Data Using Wavelets. In Proc. ACM SIGMOD Int. Conf. on Management of Data (pp. 193–204).

Download references

Author information

Authors and Affiliations

Polytechnic of Coimbra, ISEC, DEIS, Apt. 10057, P-3030-601, Coimbra, Portugal
Jorge R. Bernardino
DEI, Pólo II, University of Coimbra, P-3030-290, Coimbra, Portugal
Pedro S. Furtado & Henrique C. Madeira

Authors

Jorge R. Bernardino
View author publications
You can also search for this author in PubMed Google Scholar
Pedro S. Furtado
View author publications
You can also search for this author in PubMed Google Scholar
Henrique C. Madeira
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bernardino, J.R., Furtado, P.S. & Madeira, H.C. Approximate Query Answering Using Data Warehouse Striping. Journal of Intelligent Information Systems 19, 145–167 (2002). https://doi.org/10.1023/A:1016551309288

Download citation

Issue Date: September 2002
DOI: https://doi.org/10.1023/A:1016551309288

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Approximate Query Answering Using Data Warehouse Striping

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

PatchIndex: exploiting approximate constraints in distributed databases

Data-induced predicates for sideways information passing in query optimizers

Distributed Data Warehouse Resource Monitoring

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Subscribe and save

Buy Now

Navigation

Approximate Query Answering Using Data Warehouse Striping

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

PatchIndex: exploiting approximate constraints in distributed databases

Data-induced predicates for sideways information passing in query optimizers

Distributed Data Warehouse Resource Monitoring

Explore related subjects

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Subscribe and save

Buy Now

Search

Navigation