Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

InvaliDB: scalable push-based real-time queries on top of pull-based databases (extended)

Published: 01 August 2020 Publication History

Abstract

Traditional databases are optimized for pull-based queries, i.e. they make information available in direct response to client requests. While this access pattern is adequate for mostly static domains, it requires inefficient and slow workarounds (e.g. periodic polling) when clients need to stay up-to-date. Acknowledging reactive and interactive workloads, modern real-time databases such as Firebase, Meteor, and RethinkDB proactively deliver result updates to their clients through push-based real-time queries. However, current implementations are only of limited practical relevance, since they are incompatible with existing technology stacks, fail under heavy load, or do not support complex queries to begin with. To address these issues, we propose the system design InvaliDB which combines linear read and write scalability for real-time queries with superior query expressiveness and legacy compatibility. We compare InvaliDB against competing system designs to emphasize the benefits of our approach. To validate our claims of linear scalability, we further present an experimental evaluation of the InvaliDB prototype that has been serving customers at the Database-as-a-Service company Baqend since July 2017.

References

[1]
RethinkDB, 2016. https://www.rethinkdb.com/, accessed: 2019-09-21.
[2]
Apex, 2018. https://apex.apache.org/, accessed: 2018-08-18.
[3]
Moquette, 2018. https://github.com/moquette-io/moquette, accessed: 2018-05-27.
[4]
RabbitMQ, 2018. https://www.rabbitmq.com/, accessed: 2018-05-10.
[5]
ZeroMQ, 2018. http://zeromq.org/, accessed: 2018-03-26.
[6]
Firebase, 2019. https://firebase.google.com/, accessed: 2019-04-15.
[7]
Parse, 2019. https://parseplatform.org/, accessed: 2019-09-21.
[8]
Redis, 2019. https://redis.io/, accessed: 2019-10-05.
[9]
Baqend, 2020. https://www.baqend.com/, accessed: 2020-06-10.
[10]
T. Akidau, A. Balikov, K. Bekiroğlu, S. Chernyak, J. Haberman, R. Lax, S. McVeety, D. Mills, P. Nordstrom, and S. Whittle. MillWheel: Fault-Tolerant Stream Processing at Internet Scale. PVLDB, 6(11):1033--1044, Aug. 2013.
[11]
T. Akidau, R. Bradshaw, C. Chambers, S. Chernyak, R. J. Fernández-Moctezuma, R. Lax, S. McVeety, D. Mills, F. Perry, E. Schmidt, and S. Whittle. The Dataflow Model: A Practical Approach to Balancing Correctness, Latency, and Cost in Massive-Scale, Unbounded, out-of-Order Data Processing. PVLDB, 8(12):1792--1803, Aug. 2015.
[12]
A. Alexandrov, R. Bergmann, S. Ewen, et al. The Stratosphere Platform for Big Data Analytics. The VLDB Journal, 2014.
[13]
R. Ananthanarayanan, V. Basker, S. Das, A. Gupta, et al. Photon: Fault-tolerant and Scalable Joining of Continuous Data Streams. In SIGMOD '13, 2013.
[14]
S. F. Andler and J. Hansson, editors. Active, Real-Time, and Temporal Database Systems, number 1553 in Lecture Notes in Computer Science. Springer Berlin Heidelberg, 1998.
[15]
Apache Software Foundation. ActiveMQ, 2018. https://activemq.apache.org/, accessed: 2018-05-10.
[16]
Apache Software Foundation. Qpid, 2018. https://qpid.apache.org/, accessed: 2018-05-10.
[17]
P. Bailis and A. Ghodsi. Eventual Consistency Today: Limitations, Extensions, and Beyond. Queue, 11(3):20:20--20:32, Mar. 2013.
[18]
J. A. Blakeley, N. Coburn, and P.-A. Larson. Updating Derived Relations: Detecting Irrelevant and Autonomously Computable Updates. ACM Trans. Database Syst., 14(3):369--400, Sept. 1989.
[19]
J. A. Blakeley, P.-A. Larson, and F. W. Tompa. Efficiently Updating Materialized Views. SIGMOD Rec., 15(2):61--71, June 1986.
[20]
P. Bover. Firebase: the great, the meh, and the ugly. freeCodeCamp Blog, Jan. 2017. https://www.freecodecamp.org/news/a07252fbcf15/, accessed: 2017-05-21.
[21]
G. J. Chen, J. L. Wiener, S. Iyer, A. Jaiswal, R. Lei, N. Simha, W. Wang, K. Wilfong, T. Williamson, and S. Yilmaz. Realtime Data Processing at Facebook. In Proceedings of the 2016 International Conference on Management of Data, SIGMOD '16, pages 1087--1098, New York, NY, USA, 2016. ACM.
[22]
S. Chintapalli, D. Dagit, B. Evans, R. Farivar, T. Graves, et al. Benchmarking Streaming Computation Engines at Yahoo! Yahoo! Engineering Blog, Dec. 2015. http://yahooeng.tumblr.com/post/135321837876/benchmarking-streaming-computation-engines-at, accessed: 2016-10-17.
[23]
R. Chirkova and J. Yang. Materialized Views. Foundations and Trends in Databases, 4(4):295--405, 2012.
[24]
G. Cugola and A. Margara. Processing Flows of Information: From Data Stream to Complex Event Processing. ACM Comput. Surv., 44(3):15:1--15:62, June 2012.
[25]
A. Dufetel. Introducing Cloud Firestore: Our New Document Database for Apps. Firebase Blog, Oct. 2017. https://firebase.googleblog.com/2017/10/introducing-cloud-firestore.html, accessed: 2017-12-19.
[26]
C. Elkan. Independence of Logic Database Queries and Update. In Proceedings of the Ninth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, PODS '90, pages 154--160, New York, NY, USA, 1990. ACM.
[27]
J. Eriksson. Real-Time and Active Databases: A Survey. In Active, Real-Time, and Temporal Database Systems: Second International Workshop, ARTDB-97 Como, Italy, September 8-9, 1997 Proceedings, 1998.
[28]
K. P. Eswaran and D. D. Chamberlin. Functional Specifications of a Subsystem for Data Base Integrity. PVLDB, pages 48--68, 1975.
[29]
Firebase. Choose a Database: Cloud Firestore or Realtime Database, Dec. 2017. https://firebase.google.com/docs/firestore/rtdb-vs-firestore, accessed: 2017-12-19.
[30]
Firebase. Firestore: Quotas and Limits, Dec. 2017. https://firebase.google.com/docs/firestore/quotas, accessed: 2017-12-19.
[31]
Firebase. Order and Limit Data with Cloud Firestore, Dec. 2017. https://firebase.google.com/docs/firestore/query-data/order-limit-data, accessed: 2017-12-19.
[32]
Firebase. Perform Simple and Compound Queries in Cloud Firestore, Dec. 2017. https://firebase.google.com/docs/firestore/query-data/queries, accessed: 2017-12-19.
[33]
F. Gessert, M. Schaarschmidt, W. Wingerath, E. Witt, E. Yoneki, and N. Ritter. Quaestor: Query Web Caching for Database-as-a-Service Providers. PVLDB, 10(12):16701681, Aug. 2017.
[34]
L. Gidra, G. Thomas, J. Sopena, and M. Shapiro. A Study of the Scalability of Stop-the-world Garbage Collectors on Multicores. SIGARCH Comput. Archit. News, 41(1):229--240, Mar. 2013.
[35]
L. Golab and M. T. Zsu. Data Stream Management. Morgan & Claypool Publishers, 2010.
[36]
A. Gupta and I. S. Mumick. Materialized Views: Techniques, Implementations, and Applications. MIT press, 1999.
[37]
J. M. Hellerstein, M. Stonebraker, and J. Hamilton. Architecture of a Database System. Found. Trends Databases, 1(2):141--259, Feb. 2007.
[38]
P. Hintjens. ZeroMQ: Messaging for Many Applications. O'Reilly Media, 2013.
[39]
F. Hueske, S. Wang, and X. Jiang. Continuous Queries on Dynamic Tables. Flink Blog, Apr. 2017. https://flink.apache.org/news/2017/04/04/dynamic-tables.html, accessed: 2017-10-27.
[40]
B. Jamin. Reasons Not to Use Firebase. Chris Blog, Sept. 2016. https://crisp.im/blog/why-you-should-never-use-firebase-realtime-database/, accessed: 2017-05-21.
[41]
J. Katzen et al. Oplog Tailing Too Far Behind Not Helping, 2015. https://forums.meteor.com/t/oplog-tailing-too-far-behind-not-helping/2235, accessed: 2017-07-09.
[42]
J. Kreps. Introducing Kafka Streams: Stream Processing Made Simple. Confluent Blog, March 2016. http://www.confluent.io/blog/introducing-kafka-streams-stream-processing-made-simple/, accessed: 2016-09-19.
[43]
J. Kreps, N. Narkhede, and J. Rao. Kafka: a Distributed Messaging System for Log Processing. In NetDB'11, 2011.
[44]
S. Kulkarni, N. Bhagat, M. Fu, V. Kedigehalli, C. Kellogg, S. Mittal, J. M. Patel, K. Ramasamy, and S. Taneja. Twitter Heron: Stream Processing at Scale. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, 2015.
[45]
W. Lam, L. Liu, S. Prasad, A. Rajaraman, Z. Vacheri, and A. Doan. Muppet: MapReduce-Style Processing of Fast Data. PVLDB., 5(12):1814--1825, Aug. 2012.
[46]
A. Y. Levy and Y. Sagiv. Queries Independent of Updates. PVLDB, pages 171--181, 1993.
[47]
A. Mao et al. My Experience Hitting Limits on Meteor Performance, 2014. https://groups.google.com/forum/#!topic/meteor-talk/Y547Hh2z39Y, accessed: 2017-07-09.
[48]
W. Martin. Changefeeds in RethinkDB. RethinkDB Docs, 2015. https://rethinkdb.com/docs/changefeeds/javascript/, accessed: 2017-07-09.
[49]
Meteor Development Group. Livequery. Meteor Change Log v1.0.4, Mar. 2015. http://docs.meteor.com/changelog.html#livequery-1, accessed: 2017-07-09.
[50]
Meteor Development Group. Meteor, 2018. https://www.meteor.com/, accessed: 2018-05-10.
[51]
Microsoft. SQL Server 2008 R2 Books Online: Creating a Query for Notification, 2017. https://msdn.microsoft.com/en-us/library/ms181122.aspx, accessed: 2017-05-12.
[52]
MongoDB Inc. Structured Streaming Programming Guide: Caveats, 2016. https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#caveats-1, accessed: 2019-06-15.
[53]
MongoDB Inc. db.collection.findAndModify(), 2018. https://docs.mongodb.com/v3.6/reference/method/db.collection.findAndModify/, accessed: 2018-06-23.
[54]
MongoDB Inc., 2019. https://mongodb.com, accessed: 2019-09-15.
[55]
J. Mumm. How Wallaroo Scales Distributed State. Wallaroo Labs Blog, Oct. 2017. accessed: 2017-12-29.
[56]
C. Murray, T. Kyte, et al. Using Continuous Query Notification (CQN). In Oracle Database Development Guide, 12c Release 1 (12.1). Oracle, May 2017.
[57]
L. Neumeyer, B. Robbins, A. Nair, and A. Kesari. S4: Distributed Stream Computing Platform. In Proceedings of the 2010 IEEE International Conference on Data Mining Workshops, 2010.
[58]
M. Patzwahl. Inkrementelle Auswertung geobasierter MongoDB-Anfragen. Master's thesis, University of Hamburg, 2018.
[59]
The PostgreSQL Global Development Group. PostgreSQL 9.6 Documentation: Notify, 2017. https://www.postgresql.org/docs/9.6/static/sql-notify.html, accessed: 2017-05-13.
[60]
B. Purimetla, R. Sivasankaran, J. Stankovic, K. Ramamritham, and D. Towsley. A Study of Distributed Real-Time Active Database Applications. Technical report, Amherst, MA, USA, 1993.
[61]
D. Quass, A. Gupta, I. S. Mumick, and J. Widom. Making Views Self-maintainable for Data Warehousing. In Proc. of the 4th International Conference on Parallel and Distributed Information Systems, 1996.
[62]
N. Ramesh. Apache Samza, LinkedIn's Framework for Stream Processing. thenewstack.io, January 2015. https://thenewstack.io/apache-samza-linkedins-framework-for-stream-processing/, accessed: 2016-09-21.
[63]
A. Rose. Firebase: The Good, Bad, and the Ugly. Raizlabs Developer Blog, Dec. 2016. https://www.raizlabs.com/dev/2016/12/firebase-case-study/, accessed: 2017-05-21.
[64]
R. Schütt. Inkrementelle Auswertung von MongoDB-Volltextsuchanfragen. Master's thesis, University of Hamburg, 2018.
[65]
E. Simon, J. Kiernan, and C. d. Maindreville. Implementing High Level Active Rules on Top of a Relational DBMS. PVLDB, 1992.
[66]
W. Stein. RethinkDB versus PostgreSQL: My Personal Experience. CoCalc Blog, Feb. 2017. https://blog.sagemath.com/2017/02/09/rethinkdb-vs-postgres.html, accessed: 2017-07-09.
[67]
M. Stonebraker, U. Cetintemel, and S. Zdonik. The 8 Requirements of Real-time Stream Processing. SIGMOD Rec., 34(4):42--47, Dec. 2005.
[68]
A. Toshniwal, S. Taneja, A. Shukla, K. Ramasamy, J. M. Patel, S. Kulkarni, J. Jackson, K. Gade, M. Fu, J. Donham, N. Bhagat, S. Mittal, and D. Ryaboy. Storm@Twitter. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, 2014.
[69]
F. van Puffelen. Have you Met the Realtime Database? Firebase Blog, July 2016. accessed: 2017-05-20.
[70]
M. Wang. Parse LiveQuery Protocol Specification. GitHub, 2016. https://github.com/parse-community/parse-server, accessed: 2019-10-05.
[71]
M. Welsh, D. Culler, and E. Brewer. SEDA: An Architecture for Well-Conditioned, Scalable Internet Services. SIGOPS Oper. Syst. Rev., 2001.
[72]
W. Wingerath. Scalable Push-Based Real-Time Queries on Top of Pull-Based Databases. PhD thesis, University of Hamburg, 2019. https://invalidb.info/thesis.
[73]
W. Wingerath, F. Gessert, S. Friedrich, and N. Ritter. Real-Time Stream Processing for Big Data. it - Information Technology, 58(4), 2016.
[74]
W. Wingerath, F. Gessert, and N. Ritter. NoSQL & Real-Time Data Management in Research & Practice. In Proceedings of the 18th Conference on Business, Technology, and Web, BTW 2019, 2019.
[75]
W. Wingerath, F. Gessert, and N. Ritter. Twoogle: Searching Twitter With MongoDB Queries. In Proceedings of the 18th Conference on Business, Technology, and Web, BTW 2019, 2019.
[76]
W. Wingerath, F. Gessert, and N. Ritter. InvaliDB: Scalable Push-Based Real-Time Queries on Top of Pull-Based Databases. In 36th IEEE ICDE 2020, Dallas, Texas, 2020.
[77]
W. Wingerath, F. Gessert, E. Witt, S. Friedrich, and N. Ritter. Real-Time Data Management for Big Data. In Proceedings of the 21th International Conference on Extending Database Technology, EDBT 2018, 2018.
[78]
W. Wingerath, N. Ritter, and F. Gessert. Real-Time & Stream Data Management: Push-Based Data in Research & Practice. Springer International Publishing, 2019.
[79]
D. Workman et al. Large Number of Operations Hangs Server. Meteor GitHub Issues, 2014. https://github.com/meteor/meteor/issues/2668, accessed: 2016-10-01.
[80]
K. Yi, H. Yu, J. Yang, G. Xia, and Y. Chen. Efficient Maintenance of Materialized Top-k Views. Proceedings of the 19th International Conference on Data Engineering, 2003.
[81]
A. Yu. What Does It Mean to Be a Real-Time Database? --- Slava Kim at Devshop SF May 2015. Meteor Blog, June 2015. https://blog.meteor.com/aa0b671c8ab5/, accessed: 2017-05-20.
[82]
M. Zaharia, T. Das, H. Li, et al. Discretized Streams: Fault-tolerant Streaming Computation at Scale. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, SOSP '13, pages 423--438, New York, NY, USA, 2013. ACM.

Cited By

View all
  • (2024)An Overview of Continuous Querying in (Modern) Data SystemsCompanion of the 2024 International Conference on Management of Data10.1145/3626246.3654679(605-612)Online publication date: 9-Jun-2024
  • (2024)The Duck’s BrainDatenbank-Spektrum10.1007/s13222-024-00485-224:3(209-221)Online publication date: 9-Oct-2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment  Volume 13, Issue 12
August 2020
1710 pages
ISSN:2150-8097
Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 August 2020
Published in PVLDB Volume 13, Issue 12

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)13
  • Downloads (Last 6 weeks)1
Reflects downloads up to 26 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)An Overview of Continuous Querying in (Modern) Data SystemsCompanion of the 2024 International Conference on Management of Data10.1145/3626246.3654679(605-612)Online publication date: 9-Jun-2024
  • (2024)The Duck’s BrainDatenbank-Spektrum10.1007/s13222-024-00485-224:3(209-221)Online publication date: 9-Oct-2024

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media