Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2882903.2915240acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

iOLAP: Managing Uncertainty for Efficient Incremental OLAP

Published: 14 June 2016 Publication History

Abstract

The size of data and the complexity of analytics continue to grow along with the need for timely and cost-effective analysis. However, the growth of computation power cannot keep up with the growth of data. This calls for a paradigm shift from traditional batch OLAP processing model to an incremental OLAP processing model. In this paper, we propose iOLAP, an incremental OLAP query engine that provides a smooth trade-off between query accuracy and latency, and fulfills a full spectrum of user requirements from approximate but timely query execution to a more traditional accurate query execution. iOLAP enables interactive incremental query processing using a novel mini-batch execution model---given an OLAP query, iOLAP first randomly partitions the input dataset into smaller sets (mini-batches) and then incrementally processes through these mini-batches by executing a delta update query on each mini-batch, where each subsequent delta update query computes an update based on the output of the previous one. The key idea behind iOLAP is a novel delta update algorithm that models delta processing as an uncertainty propagation problem, and minimizes the recomputation during each subsequent delta update by minimizing the uncertainties in the partial (including intermediate) query results. We implement iOLAP on top of Apache Spark and have successfully demonstrated it at scale on over 100 machines. Extensive experiments on a multitude of queries and datasets demonstrate that iOLAP can deliver approximate query answers for complex OLAP queries orders of magnitude faster than traditional OLAP engines, while continuously delivering updates every few seconds.

References

[1]
Conviva Inc. http://www.conviva.com/.
[2]
Github repository anonymized for double-blind review.
[3]
Knowledge Management. http://www.globalgraphics.com/technology/knowledge-management/.
[4]
Spark and SparkSQL. http://spark.apache.org/.
[5]
TPC-H Benchmark. http://www.tpc.org/tpch/.
[6]
D. J. Abadi, Y. Ahmad, M. Balazinska, U. Cetintemel, M. Cherniack, J.-H. Hwang, W. Lindner, A. Maskey, A. Rasin, E. Ryvkina, et al. The design of the borealis stream processing engine. In CIDR, volume 5, pages 277--289, 2005.
[7]
S. Acharya, P. B. Gibbons, V. Poosala, and S. Ramaswamy. The aqua approximate query answering system. In SIGMOD Record, volume 28, pages 574--576, 1999.
[8]
S. Agarwal, H. Milner, A. Kleiner, A. Talwalkar, M. I. Jordan, S. Madden, B. Mozafari, and I. Stoica. Knowing when you're wrong: building fast and reliable approximate query processing systems. In SIGMOD, pages 481--492, 2014.
[9]
S. Agarwal, B. Mozafari, A. Panda, H. Milner, S. Madden, and I. Stoica. Blinkdb: queries with bounded errors and bounded response times on very large data. In EuroSys, pages 29--42, 2013.
[10]
Y. Ahmad, O. Kennedy, C. Koch, and M. Nikolic. Dbtoaster: Higher-order delta processing for dynamic, frequently fresh views. PVLDB, 5(10):968--979, 2012.
[11]
R. Ananthakrishna, A. Das, J. Gehrke, F. Korn, S. Muthukrishnan, and D. Srivastava. Efficient approximation of correlated sums on data streams. IEEE Trans. Knowl. Data Eng., 15(3):569--572, 2003.
[12]
B. Babcock, S. Chaudhuri, and G. Das. Dynamic sample selection for approximate query processing. In SIGMOD, pages 539--550, 2003.
[13]
J. A. Blakeley, P.-A. Larson, and F. W. Tompa. Efficiently updating materialized views. In SIGMOD, volume 15, pages 61--71, 1986.
[14]
O. P. Buneman and E. K. Clemons. Efficiently monitoring relational databases. TODS, 4(3):368--382, 1979.
[15]
B. Chandramouli, J. Goldstein, M. Barnett, R. DeLine, J. C. Platt, J. F. Terwilliger, and J. Wernsing. Trill: A high-performance incremental query processor for diverse analytics. PVLDB, 8(4):401--412, 2014.
[16]
B. Chandramouli, J. Goldstein, and A. Quamar. Scalable progressive analytics on big data in the cloud. PVLDB, 6(14):1726--1737, 2013.
[17]
S. Chaudhuri, G. Das, and V. Narasayya. Optimized stratified sampling for approximate query processing. TODS, 32(2):9, 2007.
[18]
S. Chaudhuri, R. Krishnamurthy, S. Potamianos, and K. Shim. Optimizing queries with materialized views. In ICDE, pages 190--190, 1995.
[19]
T. Condie, N. Conway, P. Alvaro, J. M. Hellerstein, J. Gerth, J. Talbot, K. Elmeleegy, and R. Sears. Online aggregation and continuous query support in mapreduce. In SIGMOD, pages 1115--1118, 2010.
[20]
F. Dobrian, V. Sekar, A. Awan, I. Stoica, D. A. Joseph, A. Ganjam, J. Zhan, and H. Zhang. Understanding the impact of video quality on user engagement. In Proceedings of the ACM SIGCOMM 2011 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications, Toronto, ON, Canada, August 15--19, 2011, pages 362--373, 2011.
[21]
B. Efron and R. J. Tibshirani. An Introduction to the Bootstrap. Chapman & Hall, New York, 1993.
[22]
T. M. Ghanem, A. K. Elmagarmid, P.-Å. Larson, and W. G. Aref. Supporting views in data stream management systems. TODS, 35(1):1, 2010.
[23]
T. J. Green, Z. G. Ives, and V. Tannen. Reconcilable differences. Theory Comput. Syst., 49(2):460--488, 2011.
[24]
T. J. Green, G. Karvounarakis, and V. Tannen. Provenance semirings. In Proceedings of the Twenty-Sixth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, June 11--13, 2007, Beijing, China, pages 31--40, 2007.
[25]
T. Griffin and L. Libkin. Incremental maintenance of views with duplicates. In ACM SIGMOD Record, volume 24, pages 328--339, 1995.
[26]
J. M. Hellerstein, P. J. Haas, and H. J. Wang. Online aggregation. In SIGMOD, pages 171--182, 1997.
[27]
S. Krishnan, J. Wang, M. J. Franklin, K. Goldberg, and T. Kraska. Stale view cleaning: Getting fresh answers from stale materialized views. PVLDB, 8(12):1370--1381, 2015.
[28]
J. Li, K. Tufte, V. Shkapenyuk, V. Papadimos, T. Johnson, and D. Maier. Out-of-order processing: a new architecture for high-performance stream systems. PVLDB, 1(1):274--288, 2008.
[29]
X. Liu, F. Dobrian, H. Milner, J. Jiang, V. Sekar, I. Stoica, and H. Zhang. A case for a coordinated internet video control plane. In ACM SIGCOMM 2012 Conference, SIGCOMM '12, Helsinki, Finland - August 13 - 17, 2012, pages 359--370, 2012.
[30]
R. Motwani, J. Widom, A. Arasu, B. Babcock, S. Babu, M. Datar, G. Manku, C. Olston, J. Rosenstein, and R. Varma. Query processing, approximation, and resource management in a data stream management system, 2002.
[31]
T. Palpanas, R. Sidle, R. Cochrane, and H. Pirahesh. Incremental maintenance for non-distributive aggregate functions. In PVLDB, pages 802--813, 2002.
[32]
N. Pansare, V. R. Borkar, C. Jermaine, and T. Condie. Online aggregation for large mapreduce jobs. volume 4, pages 1135--1145, 2011.
[33]
A. Pol and C. Jermaine. Relational confidence bounds are easy with the bootstrap. In SIGMOD, pages 587--598, 2005.
[34]
L. Sidirourgos, M. L. Kersten, and P. A. Boncz. Sciborq: Scientific data management with bounds on runtime and quality. In CIDR, volume 11, pages 296--301, 2011.
[35]
H. Thakkar, N. Laptev, H. Mousavi, B. Mozafari, V. Russo, and C. Zaniolo. SMM: A data stream management system for knowledge discovery. In ICDE, pages 757--768, 2011.
[36]
S. Tirthapura and D. P. Woodruff. A general method for estimating correlated aggregates over a data stream. In ICDE, pages 162--173, 2012.
[37]
J. Yang and J. Widom. Incremental computation and maintenance of temporal aggregates. In ICDE, pages 51--60, 2001.
[38]
K. Zeng, S. Agarwal, A. Dave, M. Armbrust, and I. Stoica. G-OLA: generalized on-line aggregation for interactive analysis on big data. In SIGMOD, pages 913--918, 2015.
[39]
K. Zeng, S. Gao, B. Mozafari, and C. Zaniolo. The analytical bootstrap: A new method for fast error estimation in approximate query processing. In SIGMOD, pages 277--288, 2014.

Cited By

View all
  • (2024)Serendipitous, Open Big Data Management and Analytics: The SeDaSOMA FrameworkModelling10.3390/modelling50300615:3(1173-1196)Online publication date: 4-Sep-2024
  • (2024)G-Learned Index: Enabling Efficient Learned Index on GPUIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.338121435:6(950-967)Online publication date: Jun-2024
  • (2023)When Automatic Filtering Comes to the Rescue: Pre-Computing Company Competitor Pairs in OwlerProceedings of the ACM on Management of Data10.1145/35897871:2(1-23)Online publication date: 20-Jun-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '16: Proceedings of the 2016 International Conference on Management of Data
June 2016
2300 pages
ISBN:9781450335317
DOI:10.1145/2882903
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 June 2016

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. OLAP
  2. approximate query processing
  3. bootstrap
  4. incremental

Qualifiers

  • Research-article

Funding Sources

  • LBNL Award
  • DARPA XData Award
  • NSF CISE Expeditions Award

Conference

SIGMOD/PODS'16
Sponsor:
SIGMOD/PODS'16: International Conference on Management of Data
June 26 - July 1, 2016
California, San Francisco, USA

Acceptance Rates

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)12
  • Downloads (Last 6 weeks)2
Reflects downloads up to 02 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Serendipitous, Open Big Data Management and Analytics: The SeDaSOMA FrameworkModelling10.3390/modelling50300615:3(1173-1196)Online publication date: 4-Sep-2024
  • (2024)G-Learned Index: Enabling Efficient Learned Index on GPUIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.338121435:6(950-967)Online publication date: Jun-2024
  • (2023)When Automatic Filtering Comes to the Rescue: Pre-Computing Company Competitor Pairs in OwlerProceedings of the ACM on Management of Data10.1145/35897871:2(1-23)Online publication date: 20-Jun-2023
  • (2023)Popularity Ratio Maximization: Surpassing Competitors through Influence PropagationProceedings of the ACM on Management of Data10.1145/35893091:2(1-26)Online publication date: 20-Jun-2023
  • (2023)EARLY: Efficient and Reliable Graph Neural Network for Dynamic GraphsProceedings of the ACM on Management of Data10.1145/35893081:2(1-28)Online publication date: 20-Jun-2023
  • (2023)Data Stream Clustering: An In-depth Empirical StudyProceedings of the ACM on Management of Data10.1145/35893071:2(1-26)Online publication date: 20-Jun-2023
  • (2023)Using Cloud Functions as Accelerator for Elastic Data AnalyticsProceedings of the ACM on Management of Data10.1145/35893061:2(1-27)Online publication date: 20-Jun-2023
  • (2023)Efficient Personalized PageRank Computation: The Power of Variance-Reduced Monte Carlo ApproachesProceedings of the ACM on Management of Data10.1145/35893051:2(1-26)Online publication date: 20-Jun-2023
  • (2023)Deep Active Alignment of Knowledge Graph Entities and SchemataProceedings of the ACM on Management of Data10.1145/35893041:2(1-26)Online publication date: 20-Jun-2023
  • (2023)Incentive-Aware Decentralized Data CollaborationProceedings of the ACM on Management of Data10.1145/35893031:2(1-27)Online publication date: 20-Jun-2023
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media