research-article

Meet me halfway: split maintenance of continuous views

Authors:

Christian Winter,

Tobias Schmidt,

Thomas Neumann,

Alfons KemperAuthors Info & Claims

Proceedings of the VLDB Endowment, Volume 13, Issue 12

Pages 2620 - 2633

https://doi.org/10.14778/3407790.3407849

Published: 01 July 2020 Publication History

Abstract

From Industry 4.0-driven factories to real-time trading algorithms, businesses depend on analytics on high-velocity real-time data. Often these analytics are performed not in dedicated stream processing engines but on views within a general-purpose database to combine current with historical data. However, traditional view maintenance algorithms are not designed with both the volume and velocity of data streams in mind.

In this paper, we propose a new type of view specialized for queries involving high-velocity inputs, called continuous view. The key component of continuous views is a novel maintenance strategy, splitting the work between inserts and queries. By performing initial parts of the view's query for each insert and the remainder at query time, we achieve both high input rates and low query latency. Further, we keep the memory overhead of our views small, independent of input velocity. To demonstrate the practicality of this strategy, we integrate continuous views into our Umbra database system. We show that split maintenance can outperform even dedicated stream processing engines on analytical workloads, all while still offering similar insert rates. Compared to modern materialized view maintenance approaches, such as deferred and incremental view maintenance, that often need to materialize expensive deltas, we achieve up to an order of magnitude higher insert throughput.

References

[1]

D. J. Abadi, Y. Ahmad, M. Balazinska, U. Çetintemel, M. Cherniack, J. Hwang, W. Lindner, A. Maskey, A. Rasin, E. Ryvkina, N. Tatbul, Y. Xing, and S. B. Zdonik. The design of the Borealis stream processing engine. In CIDR, pages 277--289. www.cidrdb.org, 2005.

[2]

D. J. Abadi, D. Carney, U. Çetintemel, M. Cherniack, C. Convey, S. Lee, M. Stonebraker, N. Tatbul, and S. B. Zdonik. Aurora: a new model and architecture for data stream management. VLDB J., 12(2):120--139, 2003.

Digital Library

[3]

A. Arasu, B. Babcock, S. Babu, M. Datar, K. Ito, I. Nishizawa, J. Rosenstein, and J. Widom. STREAM: the Stanford stream data manager. In SIGMOD Conference, page 665. ACM, 2003.

Digital Library

[4]

S. Babu and J. Widom. Continuous queries over data streams. SIGMOD Rec., 30(3):109--120, 2001.

Digital Library

[5]

E. Begoli, T. Akidau, F. Hueske, J. Hyde, K. Knight, and K. Knowles. One SQL to rule them all - an efficient and syntactically idiomatic approach to management of streams and tables. In SIGMOD Conference, pages 1757--1772. ACM, 2019.

Digital Library

[6]

J. A. Blakeley, P. Larson, and F. W. Tompa. Efficiently updating materialized views. In SIGMOD Conference, pages 61--71. ACM Press, 1986.

Digital Library

[7]

L. Braun, T. Etter, G. Gasparis, M. Kaufmann, D. Kossmann, D. Widmer, A. Avitzur, A. Iliopoulos, E. Levy, and N. Liang. Analytics in motion: High performance event-processing AND real-time analytics in the same database. In SIGMOD Conference, pages 251--264. ACM, 2015.

Digital Library

[8]

P. Carbone, A. Katsifodimos, S. Ewen, V. Markl, S. Haridi, and K. Tzoumas. Apache Flink^TM: Stream and batch processing in a single engine. IEEE Data Eng. Bull., 38(4):28--38, 2015.

[9]

D. Carney, U. Çetintemel, M. Cherniack, C. Convey, S. Lee, G. Seidman, M. Stonebraker, N. Tatbul, and S. B. Zdonik. Monitoring streams - A new class of data management applications. In VLDB, pages 215--226. Morgan Kaufmann, 2002.

Digital Library

[10]

B. Chandramouli, J. Goldstein, M. Barnett, R. DeLine, J. C. Platt, J. F. Terwilliger, and J. Wernsing. Trill: A high-performance incremental query processor for diverse analytics. PVLDB, 8(4):401--412, 2014.

Digital Library

[11]

S. Chandrasekaran and M. J. Franklin. Streaming queries over streaming data. In VLDB, pages 203--214. Morgan Kaufmann, 2002.

Digital Library

[12]

L. S. Colby, T. Griffin, L. Libkin, I. S. Mumick, and H. Trickey. Algorithms for deferred view maintenance. In SIGMOD Conference, pages 469--480. ACM Press, 1996.

Digital Library

[13]

R. C. Fernandez, M. Migliavacca, E. Kalyvianaki, and P. R. Pietzuch. Integrating scale out and fault tolerance in stream processing using operator state management. In SIGMOD Conference, pages 725--736. ACM, 2013.

Digital Library

[14]

T. M. Ghanem, A. K. Elmagarmid, P. Larson, and W. G. Aref. Supporting views in data stream management systems. ACM Trans. Database Syst., 35(1):1:1--1:47, 2010.

Digital Library

[15]

L. Golab, K. G. Bijay, and M. T. Özsu. Multi-query optimization of sliding window aggregates by schedule synchronization. In CIKM, pages 844--845. ACM, 2006.

Digital Library

[16]

T. Griffin and L. Libkin. Incremental maintenance of views with duplicates. In SIGMOD Conference, pages 328--339. ACM Press, 1995.

Digital Library

[17]

P. M. Grulich, S. Breß, S. Zeuch, J. Traub, J. von Bleichert, Z. Chen, T. Rabl, and V. Markl. Grizzly: Efficient stream processing through adaptive query compilation. In SIGMOD Conference, pages 2487--2503. ACM, 2020.

Digital Library

[18]

H. Gupta. Selection of views to materialize in a data warehouse. In ICDT, volume 1186 of Lecture Notes in Computer Science, pages 98--112. Springer, 1997.

Digital Library

[19]

D. Gyllstrom, E. Wu, H. Chae, Y. Diao, P. Stahlberg, and G. Anderson. SASE: complex event processing over streams (demo). In CIDR, pages 407--411. www.cidrdb.org, 2007.

[20]

M. Hirzel, G. Baudart, A. Bonifati, E. D. Valle, S. Sakr, and A. Vlachou. Stream processing languages in the big data era. SIGMOD Rec., 47(2):29--40, 2018.

Digital Library

[21]

N. Jain, S. Mishra, A. Srinivasan, J. Gehrke, J. Widom, H. Balakrishnan, U. Çetintemel, M. Cherniack, R. Tibbetts, and S. B. Zdonik. Towards a streaming SQL standard. PVLDB, 1(2):1379--1390, 2008.

Digital Library

[22]

D. Kifer, S. Ben-David, and J. Gehrke. Detecting change in data streams. In VLDB, pages 180--191. Morgan Kaufmann, 2004.

Digital Library

[23]

A. Kipf, V. Pandey, J. Böttcher, L. Braun, T. Neumann, and A. Kemper. Scalable analytics on fast data. ACM Trans. Database Syst., 44(1):1:1--1:35, 2019.

Digital Library

[24]

C. Koch, Y. Ahmad, O. Kennedy, M. Nikolic, A. Nötzli, D. Lupei, and A. Shaikhha. DBToaster: higher-order delta processing for dynamic, frequently fresh views. VLDB J., 23(2):253--278, 2014.

Digital Library

[25]

A. Koliousis, M. Weidlich, R. C. Fernandez, A. L. Wolf, P. Costa, and P. R. Pietzuch. SABER: window-based hybrid stream processing for heterogeneous architectures. In SIGMOD Conference, pages 555--569. ACM, 2016.

Digital Library

[26]

Y. Kotidis and N. Roussopoulos. A case for dynamic view management. ACM Trans. Database Syst., 26(4):388--423, 2001.

Digital Library

[27]

N. Koudas and D. Srivastava. Data stream query processing. In ICDE, page 1145. IEEE Computer Society, 2005.

Digital Library

[28]

S. Krishnamurthy, C. Wu, and M. J. Franklin. On-the-fly sharing for streamed aggregation. In SIGMOD Conference, pages 623--634. ACM, 2006.

Digital Library

[29]

V. Leis, P. A. Boncz, A. Kemper, and T. Neumann. Morsel-driven parallelism: a NUMA-aware query evaluation framework for the many-core age. In SIGMOD Conference, pages 743--754. ACM, 2014.

Digital Library

[30]

V. Leis, K. Kundhikanjana, A. Kemper, and T. Neumann. Efficient processing of window functions in analytical SQL queries. PVLDB, 8(10):1058--1069, 2015.

Digital Library

[31]

L. Liu, C. Pu, R. S. Barga, and T. Zhou. Differential evaluation of continual queries. In ICDCS, pages 458--465. IEEE Computer Society, 1996.

Digital Library

[32]

J. Meehan, C. Aslantas, S. Zdonik, N. Tatbul, and J. Du. Data ingestion for the connected world. In CIDR. www.cidrdb.org, 2017.

[33]

J. Meehan, N. Tatbul, S. Zdonik, C. Aslantas, U. Çetintemel, J. Du, T. Kraska, S. Madden, D. Maier, A. Pavlo, M. Stonebraker, K. Tufte, and H. Wang. S-Store: Streaming meets transaction processing. PVLDB, 8(13):2134--2145, 2015.

Digital Library

[34]

H. Mistry, P. Roy, S. Sudarshan, and K. Ramamritham. Materialized view selection and maintenance using multi-query optimization. In SIGMOD Conference, pages 307--318. ACM, 2001.

Digital Library

[35]

D. G. Murray, F. McSherry, R. Isaacs, M. Isard, P. Barham, and M. Abadi. Naiad: a timely dataflow system. In SOSP, pages 439--455. ACM, 2013.

Digital Library

[36]

K. Nakabasami, T. Amagasa, S. A. Shaikh, F. Gass, and H. Kitagawa. An architecture for stream OLAP exploiting SPE and OLAP engine. In BigData, pages 319--326. IEEE Computer Society, 2015.

Digital Library

[37]

T. Neumann. Efficiently compiling efficient query plans for modern hardware. PVLDB, 4(9):539--550, 2011.

Digital Library

[38]

T. Neumann and M. J. Freitag. Umbra: A disk-based system with in-memory performance. In CIDR. www.cidrdb.org, 2020.

[39]

T. Neumann, T. Mühlbauer, and A. Kemper. Fast serializable multi-version concurrency control for main-memory database systems. In SIGMOD Conference, pages 677--689. ACM, 2015.

Digital Library

[40]

M. Nikolic, B. Chandramouli, and J. Goldstein. Enabling signal processing over data streams. In SIGMOD Conference, pages 95--108. ACM, 2017.

Digital Library

[41]

M. Nikolic, M. Dashti, and C. Koch. How to win a hot dog eating contest: Distributed incremental view maintenance with batch updates. In SIGMOD Conference, pages 511--526. ACM, 2016.

Digital Library

[42]

PipelineDB - high-performance time-series aggregation for postgresql. https://github.com/pipelinedb/pipelinedb.

[43]

S. A. Shaikh and H. Kitagawa. Streamingcube: A unified framework for stream processing and OLAP analysis. In CIKM, pages 2527--2530. ACM, 2017.

Digital Library

[44]

O. Shmueli and A. Itai. Maintenance of views. In SIGMOD Conference, pages 240--255. ACM Press, 1984.

Digital Library

[45]

D. B. Terry, D. Goldberg, D. A. Nichols, and B. M. Oki. Continuous queries over append-only databases. In SIGMOD Conference, pages 321--330. ACM Press, 1992.

Digital Library

[46]

A. Toshniwal, S. Taneja, A. Shukla, K. Ramasamy, J. M. Patel, S. Kulkarni, J. Jackson, K. Gade, M. Fu, J. Donham, N. Bhagat, S. Mittal, and D. V. Ryaboy. Storm@twitter. In SIGMOD Conference, pages 147--156. ACM, 2014.

Digital Library

[47]

T. Urhan and M. J. Franklin. XJoin: A reactively-scheduled pipelined join operator. IEEE Data Eng. Bull., 23(2):27--33, 2000.

[48]

Y. Watanabe, S. Yamada, H. Kitagawa, and T. Amagasa. Integrating a stream processing engine and databases for persistent streaming data management. In DEXA, volume 4653 of Lecture Notes in Computer Science, pages 414--423. Springer, 2007.

Digital Library

[49]

Y. Yang, L. Golab, and M. T. Özsu. ViewDF: Declarative incremental view maintenance for streaming data. Inf. Syst., 71:55--67, 2017.

[50]

M. Zaharia, R. S. Xin, P. Wendell, T. Das, M. Armbrust, A. Dave, X. Meng, J. Rosen, S. Venkataraman, M. J. Franklin, A. Ghodsi, J. Gonzalez, S. Shenker, and I. Stoica. Apache Spark: a unified engine for big data processing. Commun. ACM, 59(11):56--65, 2016.

Digital Library

[51]

S. Zeuch, S. Breß, T. Rabl, B. D. Monte, J. Karimov, C. Lutz, M. Renz, J. Traub, and V. Markl. Analyzing efficient stream processing on modern hardware. PVLDB, 12(5):516--530, 2019.

Digital Library

[52]

J. Zhou, P. Larson, and H. G. Elmongui. Lazy maintenance of materialized views. In VLDB, pages 231--242. ACM, 2007.

Digital Library

[53]

J. Zhou, P. Larson, J. C. Freytag, and W. Lehner. Efficient exploitation of similar subexpressions for query processing. In SIGMOD Conference, pages 533--544. ACM, 2007.

Digital Library

Cited By

Bonifati ATommasini RBarcelo PSanchez-Pi NMeliou ASudarshan S(2024)An Overview of Continuous Querying in (Modern) Data SystemsCompanion of the 2024 International Conference on Management of Data10.1145/3626246.3654679(605-612)Online publication date: 9-Jun-2024
https://dl.acm.org/doi/10.1145/3626246.3654679
Reif MNeumann T(2022)A scalable and generic approach to range joinsProceedings of the VLDB Endowment10.14778/3551793.355184915:11(3018-3030)Online publication date: 1-Jul-2022
https://dl.acm.org/doi/10.14778/3551793.3551849
Winter CGiceva JNeumann TKemper A(2022)On-demand state separation for cloud data warehousingProceedings of the VLDB Endowment10.14778/3551793.355184515:11(2966-2979)Online publication date: 29-Sep-2022
https://dl.acm.org/doi/10.14778/3551793.3551845
Show More Cited By

Recommendations

Certain Answers Meet Zero-One Laws
PODS '18: Proceedings of the 37th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems

Query answering over incomplete data invariably relies on the standard notion of certain answers which gives a very coarse classification of query answers into those that are certain and those that are not. Here we propose to refine it by measuring how ...
Optimal Join Algorithms Meet Top-k
SIGMOD '20: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data

Top-k queries have been studied intensively in the database community and they are an important means to reduce query cost when only the "best" or "most interesting" results are needed instead of the full output. While some optimality results exist, ...
Shortest paths in simple polygons with polygon-meet constraints

We study a constrained version of the shortest path problem in simple polygons, in which the path must visit a given target polygon. We provide a worst-case optimal algorithm for this problem and also present a method to construct a subdivision of the ...

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment

Proceedings of the VLDB Endowment Volume 13, Issue 12

August 2020

1710 pages

ISSN:2150-8097

Editors:
Magdalena Balazinska
University of Washington
,
Xiaofang Zhou
University of Queensland, Australia

Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 July 2020

Published in PVLDB Volume 13, Issue 12

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
157
Total Downloads

Downloads (Last 12 months)23
Downloads (Last 6 weeks)3

Reflects downloads up to 15 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Bonifati ATommasini RBarcelo PSanchez-Pi NMeliou ASudarshan S(2024)An Overview of Continuous Querying in (Modern) Data SystemsCompanion of the 2024 International Conference on Management of Data10.1145/3626246.3654679(605-612)Online publication date: 9-Jun-2024
https://dl.acm.org/doi/10.1145/3626246.3654679
Reif MNeumann T(2022)A scalable and generic approach to range joinsProceedings of the VLDB Endowment10.14778/3551793.355184915:11(3018-3030)Online publication date: 1-Jul-2022
https://dl.acm.org/doi/10.14778/3551793.3551849
Winter CGiceva JNeumann TKemper A(2022)On-demand state separation for cloud data warehousingProceedings of the VLDB Endowment10.14778/3551793.355184515:11(2966-2979)Online publication date: 29-Sep-2022
https://dl.acm.org/doi/10.14778/3551793.3551845
Schüle MLang HSpringer MKemper ANeumann TGünnemann S(2022)Recursive SQL and GPU-support for in-database machine learningDistributed and Parallel Databases10.1007/s10619-022-07417-740:2-3(205-259)Online publication date: 1-Sep-2022
https://dl.acm.org/doi/10.1007/s10619-022-07417-7
Schule MLang HSpringer MKemper ANeumann TGunnemann S(2021)In-Database Machine Learning with SQL on GPUsProceedings of the 33rd International Conference on Scientific and Statistical Database Management10.1145/3468791.3468840(25-36)Online publication date: 6-Jul-2021
https://dl.acm.org/doi/10.1145/3468791.3468840

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents