Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Meet me halfway: split maintenance of continuous views

Published: 01 July 2020 Publication History

Abstract

From Industry 4.0-driven factories to real-time trading algorithms, businesses depend on analytics on high-velocity real-time data. Often these analytics are performed not in dedicated stream processing engines but on views within a general-purpose database to combine current with historical data. However, traditional view maintenance algorithms are not designed with both the volume and velocity of data streams in mind.
In this paper, we propose a new type of view specialized for queries involving high-velocity inputs, called continuous view. The key component of continuous views is a novel maintenance strategy, splitting the work between inserts and queries. By performing initial parts of the view's query for each insert and the remainder at query time, we achieve both high input rates and low query latency. Further, we keep the memory overhead of our views small, independent of input velocity. To demonstrate the practicality of this strategy, we integrate continuous views into our Umbra database system. We show that split maintenance can outperform even dedicated stream processing engines on analytical workloads, all while still offering similar insert rates. Compared to modern materialized view maintenance approaches, such as deferred and incremental view maintenance, that often need to materialize expensive deltas, we achieve up to an order of magnitude higher insert throughput.

References

[1]
D. J. Abadi, Y. Ahmad, M. Balazinska, U. Çetintemel, M. Cherniack, J. Hwang, W. Lindner, A. Maskey, A. Rasin, E. Ryvkina, N. Tatbul, Y. Xing, and S. B. Zdonik. The design of the Borealis stream processing engine. In CIDR, pages 277--289. www.cidrdb.org, 2005.
[2]
D. J. Abadi, D. Carney, U. Çetintemel, M. Cherniack, C. Convey, S. Lee, M. Stonebraker, N. Tatbul, and S. B. Zdonik. Aurora: a new model and architecture for data stream management. VLDB J., 12(2):120--139, 2003.
[3]
A. Arasu, B. Babcock, S. Babu, M. Datar, K. Ito, I. Nishizawa, J. Rosenstein, and J. Widom. STREAM: the Stanford stream data manager. In SIGMOD Conference, page 665. ACM, 2003.
[4]
S. Babu and J. Widom. Continuous queries over data streams. SIGMOD Rec., 30(3):109--120, 2001.
[5]
E. Begoli, T. Akidau, F. Hueske, J. Hyde, K. Knight, and K. Knowles. One SQL to rule them all - an efficient and syntactically idiomatic approach to management of streams and tables. In SIGMOD Conference, pages 1757--1772. ACM, 2019.
[6]
J. A. Blakeley, P. Larson, and F. W. Tompa. Efficiently updating materialized views. In SIGMOD Conference, pages 61--71. ACM Press, 1986.
[7]
L. Braun, T. Etter, G. Gasparis, M. Kaufmann, D. Kossmann, D. Widmer, A. Avitzur, A. Iliopoulos, E. Levy, and N. Liang. Analytics in motion: High performance event-processing AND real-time analytics in the same database. In SIGMOD Conference, pages 251--264. ACM, 2015.
[8]
P. Carbone, A. Katsifodimos, S. Ewen, V. Markl, S. Haridi, and K. Tzoumas. Apache FlinkTM: Stream and batch processing in a single engine. IEEE Data Eng. Bull., 38(4):28--38, 2015.
[9]
D. Carney, U. Çetintemel, M. Cherniack, C. Convey, S. Lee, G. Seidman, M. Stonebraker, N. Tatbul, and S. B. Zdonik. Monitoring streams - A new class of data management applications. In VLDB, pages 215--226. Morgan Kaufmann, 2002.
[10]
B. Chandramouli, J. Goldstein, M. Barnett, R. DeLine, J. C. Platt, J. F. Terwilliger, and J. Wernsing. Trill: A high-performance incremental query processor for diverse analytics. PVLDB, 8(4):401--412, 2014.
[11]
S. Chandrasekaran and M. J. Franklin. Streaming queries over streaming data. In VLDB, pages 203--214. Morgan Kaufmann, 2002.
[12]
L. S. Colby, T. Griffin, L. Libkin, I. S. Mumick, and H. Trickey. Algorithms for deferred view maintenance. In SIGMOD Conference, pages 469--480. ACM Press, 1996.
[13]
R. C. Fernandez, M. Migliavacca, E. Kalyvianaki, and P. R. Pietzuch. Integrating scale out and fault tolerance in stream processing using operator state management. In SIGMOD Conference, pages 725--736. ACM, 2013.
[14]
T. M. Ghanem, A. K. Elmagarmid, P. Larson, and W. G. Aref. Supporting views in data stream management systems. ACM Trans. Database Syst., 35(1):1:1--1:47, 2010.
[15]
L. Golab, K. G. Bijay, and M. T. Özsu. Multi-query optimization of sliding window aggregates by schedule synchronization. In CIKM, pages 844--845. ACM, 2006.
[16]
T. Griffin and L. Libkin. Incremental maintenance of views with duplicates. In SIGMOD Conference, pages 328--339. ACM Press, 1995.
[17]
P. M. Grulich, S. Breß, S. Zeuch, J. Traub, J. von Bleichert, Z. Chen, T. Rabl, and V. Markl. Grizzly: Efficient stream processing through adaptive query compilation. In SIGMOD Conference, pages 2487--2503. ACM, 2020.
[18]
H. Gupta. Selection of views to materialize in a data warehouse. In ICDT, volume 1186 of Lecture Notes in Computer Science, pages 98--112. Springer, 1997.
[19]
D. Gyllstrom, E. Wu, H. Chae, Y. Diao, P. Stahlberg, and G. Anderson. SASE: complex event processing over streams (demo). In CIDR, pages 407--411. www.cidrdb.org, 2007.
[20]
M. Hirzel, G. Baudart, A. Bonifati, E. D. Valle, S. Sakr, and A. Vlachou. Stream processing languages in the big data era. SIGMOD Rec., 47(2):29--40, 2018.
[21]
N. Jain, S. Mishra, A. Srinivasan, J. Gehrke, J. Widom, H. Balakrishnan, U. Çetintemel, M. Cherniack, R. Tibbetts, and S. B. Zdonik. Towards a streaming SQL standard. PVLDB, 1(2):1379--1390, 2008.
[22]
D. Kifer, S. Ben-David, and J. Gehrke. Detecting change in data streams. In VLDB, pages 180--191. Morgan Kaufmann, 2004.
[23]
A. Kipf, V. Pandey, J. Böttcher, L. Braun, T. Neumann, and A. Kemper. Scalable analytics on fast data. ACM Trans. Database Syst., 44(1):1:1--1:35, 2019.
[24]
C. Koch, Y. Ahmad, O. Kennedy, M. Nikolic, A. Nötzli, D. Lupei, and A. Shaikhha. DBToaster: higher-order delta processing for dynamic, frequently fresh views. VLDB J., 23(2):253--278, 2014.
[25]
A. Koliousis, M. Weidlich, R. C. Fernandez, A. L. Wolf, P. Costa, and P. R. Pietzuch. SABER: window-based hybrid stream processing for heterogeneous architectures. In SIGMOD Conference, pages 555--569. ACM, 2016.
[26]
Y. Kotidis and N. Roussopoulos. A case for dynamic view management. ACM Trans. Database Syst., 26(4):388--423, 2001.
[27]
N. Koudas and D. Srivastava. Data stream query processing. In ICDE, page 1145. IEEE Computer Society, 2005.
[28]
S. Krishnamurthy, C. Wu, and M. J. Franklin. On-the-fly sharing for streamed aggregation. In SIGMOD Conference, pages 623--634. ACM, 2006.
[29]
V. Leis, P. A. Boncz, A. Kemper, and T. Neumann. Morsel-driven parallelism: a NUMA-aware query evaluation framework for the many-core age. In SIGMOD Conference, pages 743--754. ACM, 2014.
[30]
V. Leis, K. Kundhikanjana, A. Kemper, and T. Neumann. Efficient processing of window functions in analytical SQL queries. PVLDB, 8(10):1058--1069, 2015.
[31]
L. Liu, C. Pu, R. S. Barga, and T. Zhou. Differential evaluation of continual queries. In ICDCS, pages 458--465. IEEE Computer Society, 1996.
[32]
J. Meehan, C. Aslantas, S. Zdonik, N. Tatbul, and J. Du. Data ingestion for the connected world. In CIDR. www.cidrdb.org, 2017.
[33]
J. Meehan, N. Tatbul, S. Zdonik, C. Aslantas, U. Çetintemel, J. Du, T. Kraska, S. Madden, D. Maier, A. Pavlo, M. Stonebraker, K. Tufte, and H. Wang. S-Store: Streaming meets transaction processing. PVLDB, 8(13):2134--2145, 2015.
[34]
H. Mistry, P. Roy, S. Sudarshan, and K. Ramamritham. Materialized view selection and maintenance using multi-query optimization. In SIGMOD Conference, pages 307--318. ACM, 2001.
[35]
D. G. Murray, F. McSherry, R. Isaacs, M. Isard, P. Barham, and M. Abadi. Naiad: a timely dataflow system. In SOSP, pages 439--455. ACM, 2013.
[36]
K. Nakabasami, T. Amagasa, S. A. Shaikh, F. Gass, and H. Kitagawa. An architecture for stream OLAP exploiting SPE and OLAP engine. In BigData, pages 319--326. IEEE Computer Society, 2015.
[37]
T. Neumann. Efficiently compiling efficient query plans for modern hardware. PVLDB, 4(9):539--550, 2011.
[38]
T. Neumann and M. J. Freitag. Umbra: A disk-based system with in-memory performance. In CIDR. www.cidrdb.org, 2020.
[39]
T. Neumann, T. Mühlbauer, and A. Kemper. Fast serializable multi-version concurrency control for main-memory database systems. In SIGMOD Conference, pages 677--689. ACM, 2015.
[40]
M. Nikolic, B. Chandramouli, and J. Goldstein. Enabling signal processing over data streams. In SIGMOD Conference, pages 95--108. ACM, 2017.
[41]
M. Nikolic, M. Dashti, and C. Koch. How to win a hot dog eating contest: Distributed incremental view maintenance with batch updates. In SIGMOD Conference, pages 511--526. ACM, 2016.
[42]
PipelineDB - high-performance time-series aggregation for postgresql. https://github.com/pipelinedb/pipelinedb.
[43]
S. A. Shaikh and H. Kitagawa. Streamingcube: A unified framework for stream processing and OLAP analysis. In CIKM, pages 2527--2530. ACM, 2017.
[44]
O. Shmueli and A. Itai. Maintenance of views. In SIGMOD Conference, pages 240--255. ACM Press, 1984.
[45]
D. B. Terry, D. Goldberg, D. A. Nichols, and B. M. Oki. Continuous queries over append-only databases. In SIGMOD Conference, pages 321--330. ACM Press, 1992.
[46]
A. Toshniwal, S. Taneja, A. Shukla, K. Ramasamy, J. M. Patel, S. Kulkarni, J. Jackson, K. Gade, M. Fu, J. Donham, N. Bhagat, S. Mittal, and D. V. Ryaboy. Storm@twitter. In SIGMOD Conference, pages 147--156. ACM, 2014.
[47]
T. Urhan and M. J. Franklin. XJoin: A reactively-scheduled pipelined join operator. IEEE Data Eng. Bull., 23(2):27--33, 2000.
[48]
Y. Watanabe, S. Yamada, H. Kitagawa, and T. Amagasa. Integrating a stream processing engine and databases for persistent streaming data management. In DEXA, volume 4653 of Lecture Notes in Computer Science, pages 414--423. Springer, 2007.
[49]
Y. Yang, L. Golab, and M. T. Özsu. ViewDF: Declarative incremental view maintenance for streaming data. Inf. Syst., 71:55--67, 2017.
[50]
M. Zaharia, R. S. Xin, P. Wendell, T. Das, M. Armbrust, A. Dave, X. Meng, J. Rosen, S. Venkataraman, M. J. Franklin, A. Ghodsi, J. Gonzalez, S. Shenker, and I. Stoica. Apache Spark: a unified engine for big data processing. Commun. ACM, 59(11):56--65, 2016.
[51]
S. Zeuch, S. Breß, T. Rabl, B. D. Monte, J. Karimov, C. Lutz, M. Renz, J. Traub, and V. Markl. Analyzing efficient stream processing on modern hardware. PVLDB, 12(5):516--530, 2019.
[52]
J. Zhou, P. Larson, and H. G. Elmongui. Lazy maintenance of materialized views. In VLDB, pages 231--242. ACM, 2007.
[53]
J. Zhou, P. Larson, J. C. Freytag, and W. Lehner. Efficient exploitation of similar subexpressions for query processing. In SIGMOD Conference, pages 533--544. ACM, 2007.

Cited By

View all
  • (2024)An Overview of Continuous Querying in (Modern) Data SystemsCompanion of the 2024 International Conference on Management of Data10.1145/3626246.3654679(605-612)Online publication date: 9-Jun-2024
  • (2022)A scalable and generic approach to range joinsProceedings of the VLDB Endowment10.14778/3551793.355184915:11(3018-3030)Online publication date: 1-Jul-2022
  • (2022)On-demand state separation for cloud data warehousingProceedings of the VLDB Endowment10.14778/3551793.355184515:11(2966-2979)Online publication date: 29-Sep-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment  Volume 13, Issue 12
August 2020
1710 pages
ISSN:2150-8097
Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 July 2020
Published in PVLDB Volume 13, Issue 12

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)23
  • Downloads (Last 6 weeks)3
Reflects downloads up to 15 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)An Overview of Continuous Querying in (Modern) Data SystemsCompanion of the 2024 International Conference on Management of Data10.1145/3626246.3654679(605-612)Online publication date: 9-Jun-2024
  • (2022)A scalable and generic approach to range joinsProceedings of the VLDB Endowment10.14778/3551793.355184915:11(3018-3030)Online publication date: 1-Jul-2022
  • (2022)On-demand state separation for cloud data warehousingProceedings of the VLDB Endowment10.14778/3551793.355184515:11(2966-2979)Online publication date: 29-Sep-2022
  • (2022)Recursive SQL and GPU-support for in-database machine learningDistributed and Parallel Databases10.1007/s10619-022-07417-740:2-3(205-259)Online publication date: 1-Sep-2022
  • (2021)In-Database Machine Learning with SQL on GPUsProceedings of the 33rd International Conference on Scientific and Statistical Database Management10.1145/3468791.3468840(25-36)Online publication date: 6-Jul-2021

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media