Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3062341.3062369acmconferencesArticle/Chapter ViewAbstractPublication PagespldiConference Proceedingsconference-collections
research-article
Public Access

StreamQRE: modular specification and efficient evaluation of quantitative queries over streaming data

Published: 14 June 2017 Publication History

Abstract

Real-time decision making in emerging IoT applications typically relies on computing quantitative summaries of large data streams in an efficient and incremental manner. To simplify the task of programming the desired logic, we propose StreamQRE, which provides natural and high-level constructs for processing streaming data. Our language has a novel integration of linguistic constructs from two distinct programming paradigms: streaming extensions of relational query languages and quantitative extensions of regular expressions. The former allows the programmer to employ relational constructs to partition the input data by keys and to integrate data streams from different sources, while the latter can be used to exploit the logical hierarchy in the input stream for modular specifications.
We first present the core language with a small set of combinators, formal semantics, and a decidable type system. We then show how to express a number of common patterns with illustrative examples. Our compilation algorithm translates the high-level query into a streaming algorithm with precise complexity bounds on per-item processing time and total memory footprint. We also show how to integrate approximation algorithms into our framework. We report on an implementation in Java, and evaluate it with respect to existing high-performance engines for processing streaming data. Our experimental evaluation shows that (1) StreamQRE allows more natural and succinct specification of queries compared to existing frameworks, (2) the throughput of our implementation is higher than comparable systems (for example, two-to-four times greater than RxJava), and (3) the approximation algorithms supported by our implementation can lead to substantial memory savings.

References

[1]
Apache Flink: Scalable batch and stream data processing. https://flink.apache.org/.
[2]
Esper for Java. http://www.espertech.com/esper/.
[3]
ReactiveX: An API for asynchronous programming with observable streams. http://reactivex.io/.
[4]
D. J. Abadi, Y. Ahmad, M. Balazinska, U. Cetintemel, M. Cherniack, J.-H. Hwang, W. Lindner, A. Maskey, A. Rasin, E. Ryvkina, N. Tatbul, Y. Xing, and S. Zdonik. The design of the Borealis stream processing engine. In Proceedings of the 2nd Biennial Conference on Innovative Data Systems Research (CIDR ’05), number 2005, pages 277–289, 2005.
[5]
D. J. Abadi, D. Carney, U. Cetintemel, M. Cherniack, C. Convey, S. Lee, M. Stonebraker, N. Tatbul, and S. Zdonik. Aurora: A new model and architecture for data stream management. The VLDB Journal, 12(2):120–139, 2003.
[6]
S. Abiteboul, R. Hull, and V. Vianu, editors. Foundations of Databases. Addison-Wesley, 1995.
[7]
M. Ali, B. Chandramouli, J. Goldstein, and R. Schindlauer. The extensibility framework in Microsoft StreamInsight. In Proceedings of the 27th IEEE International Conference on Data Engineering (ICDE ’11), pages 1242–1253, 2011.
[8]
N. Alon, Y. Matias, and M. Szegedy. The space complexity of approximating the frequency moments. Journal of Computer and System Sciences, 58(1):137–147, 1999.
[9]
R. Alur, E. Berger, A. Drobnis, L. Fix, K. Fu, G. Hager, D. Lopresti, K. Nahrstedt, E. Mynatt, S. Patel, J. Rexford, J. Stankovic, and B. Zorn. Systems computing challenges in the Internet of Things. In Computing Community Consortium Whitepaper, 2016. 02980.
[10]
R. Alur, L. D’Antoni, and M. Raghothaman. DReX: A declarative language for efficiently evaluating regular string transformations. In Proceedings of the 42nd Annual Symposium on Principles of Programming Languages, POPL ’15, pages 125–137, 2015.
[11]
R. Alur, D. Fisman, and M. Raghothaman. Regular programming for quantitative properties of data streams. In Proceedings of the 25th European Symposium on Programming (ESOP ’16), pages 15–40, 2016.
[12]
R. Alur and K. Mamouras. An introduction to the StreamQRE language. http://www.seas.upenn.edu/~mamouras/ papers/StreamQRE-Intro.pdf, 2017. manuscript.
[13]
R. Alur and P. ˇ Cern´y. Streaming transducers for algorithmic verification of single-pass list-processing programs. In Proceedings of the 38th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL ’11, pages 599–610, 2011.
[14]
A. Arasu, B. Babcock, S. Babu, J. Cieslewicz, M. Datar, K. Ito, R. Motwani, U. Srivastava, and J. Widom. STREAM: The Stanford data stream management system. Technical Report 2004-20, Stanford InfoLab, 2004.
[15]
A. Arasu, S. Babu, and J. Widom. The CQL continuous query language: Semantic foundations and query execution. The VLDB Journal, 15(2):121–142, 2006.
[16]
A. Arasu and J. Widom. Resource sharing in continuous sliding-window aggregates. In Proceedings of the 30th International Conference on Very Large Data Bases, VLDB ’04, pages 336–347. VLDB Endowment, 2004.
[17]
S. Babu and J. Widom. Continuous queries over data streams. ACM Sigmod Record, 30(3):109–120, 2001.
[18]
R. S. Barga, J. Goldstein, M. Ali, and M. Hong. Consistent streaming through time: A vision for event stream processing. In Proceedings of the 3rd Biennial Conference on Innovative Data Systems Research (CIDR ’07), pages 363– 374, 2007.
[19]
R. Book, S. Even, S. Greibach, and G. Ott. Ambiguity in graphs and expressions. IEEE Transactions on Computers, 100(2):149–153, 1971.
[20]
L. Brenna, A. Demers, J. Gehrke, M. Hong, J. Ossher, B. Panda, M. Riedewald, M. Thatte, and W. White. Cayuga: A highperformance event processing engine. In Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data, SIGMOD ’07, pages 1100–1102, 2007.
[21]
J. Chen, D. J. DeWitt, F. Tian, and Y. Wang. NiagaraCQ: A scalable continuous query system for internet databases. In Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, SIGMOD ’00, pages 379–390, 2000.
[22]
S. Chintapalli, D. Dagit, B. Evans, R. Farivar, T. Graves, M. Holderbaugh, Z. Liu, K. Nusbaum, K. Patil, B. J. Peng, and P. Poulosky. Benchmarking streaming computation engines: Storm, Flink and Spark Streaming. In 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pages 1789–1792, 2016.
[23]
G. Cugola and A. Margara. Processing flows of information: From data stream to complex event processing. ACM Computing Surveys (CSUR), 44(3):15:1–15:62, 2012.
[24]
L. D’Antoni, M. Veanes, B. Livshits, and D. Molnar. Fast: A transducer-based language for tree manipulation. In Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’14, pages 384– 394, 2014.
[25]
M. Datar, A. Gionis, P. Indyk, and R. Motwani. Maintaining stream statistics over sliding windows. SIAM Journal on Computing, 31(6):1794–1813, 2002.
[26]
L. De Moura and N. Bjørner. Satisfiability modulo theories: Introduction and applications. Communications of the ACM, 54(9):69–77, 2011.
[27]
J. Dean and S. Ghemawat. MapReduce: Simplified data processing on large clusters. In Proceedings of the 6th Symposium on Operating System Design and Implementation (OSDI ’04), pages 137–150. USENIX Association, 2004.
[28]
J. Dean and S. Ghemawat. MapReduce: Simplified data processing on large clusters. Communications of the ACM, 51(1):107–113, 2008.
[29]
B. B. Grathwohl, F. Henglein, U. T. Rasmussen, K. A. Søholm, and S. P. Tørholm. Kleenex: Compiling nondeterministic transducers to deterministic streaming transducers. In Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL ’16, pages 284– 297, 2016.
[30]
M. B. Greenwald and S. Khanna. Quantiles and equi-depth histograms over streams. In M. Garofalakis, J. Gehrke, and R. Rastogi, editors, Data Stream Management: Processing High-Speed Data Streams, pages 45–86. Springer, 2016.
[31]
S. Guha and A. McGregor. Approximate quantiles and the order of the stream. In Proceedings of the 25th ACM SIGMODSIGACT-SIGART Symposium on Principles of Database Systems, PODS ’06, pages 273–279. ACM, 2006.
[32]
D. Gyllstrom, E. Wu, H.-J. Chae, Y. Diao, P. Stahlberg, and G. Anderson. SASE: Complex event processing over streams. In Proceedings of the 3rd Biennial Conference on Innovative Data Systems Research (CIDR ’07), pages 407–411, 2007.
[33]
M. Hirzel. Partition and compose: Parallel complex event processing. In Proceedings of the 6th ACM International Conference on Distributed Event-Based Systems, DEBS ’12, pages 191–200, 2012.
[34]
M. Hirzel, H. Andrade, B. Gedik, G. Jacques-Silva, R. Khandekar, V. Kumar, M. Mendell, H. Nasgaard, S. Schneider, R. Soulé, and K. L. Wu. IBM Streams Processing Language: Analyzing big data in motion. IBM Journal of Research and Development, 57(3/4):7:1–7:11, 2013.
[35]
P. Hooimeijer, B. Livshits, D. Molnar, P. Saxena, and M. Veanes. Fast and precise sanitizer analysis with BEK. In Proceedings of the 20th USENIX Conference on Security, SEC ’11, pages 1–16. USENIX Association, 2011.
[36]
J. Li, D. Maier, K. Tufte, V. Papadimos, and P. A. Tucker. Semantics and evaluation techniques for window aggregates in data streams. In Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data, SIGMOD ’05, pages 311–322. ACM, 2005.
[37]
B. Litt and Z. Ives. The international epilepsy electrophysiology database. In Proceedings of the Fifth International Workshop on Seizure Prediction, 2011.
[38]
B. Mozafari, K. Zeng, and C. Zaniolo. High-performance complex event processing over XML streams. In Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, SIGMOD ’12, pages 253–264, 2012.
[39]
S. Muthukrishnan. Data streams: Algorithms and applications. Foundations and Trends R in Theoretical Computer Science, 1(2):117–236, 2005.
[40]
R. E. Stearns and H. B. Hunt III. On the equivalence and containment problems for unambiguous regular expressions, regular grammars and finite automata. SIAM Journal on Computing, 14(3):598–611, 1985.
[41]
K. Tangwongsan, M. Hirzel, and S. Schneider. Constant-time sliding window aggregation. Technical Report RC25574 (WAT1511-030), IBM Research, 2015.
[42]
K. Tangwongsan, M. Hirzel, S. Schneider, and K.-L. Wu. General incremental sliding-window aggregation. Proceedings of the VLDB Endowment, 8(7):702–713, 2015.
[43]
P. Tucker, K. Tufte, V. Papadimos, and D. Maier. NEXMark: A benchmark for queries over data streams. Available at http: //datalab.cs.pdx.edu/niagara/NEXMark/, 2002.
[44]
P. A. Tucker, D. Maier, T. Sheard, and L. Fegaras. Exploiting punctuation semantics in continuous data streams. IEEE Transactions on Knowledge and Data Engineering, 15(3):555– 568, 2003.
[45]
M. Vaziri, O. Tardieu, R. Rabbah, P. Suter, and M. Hirzel. Stream processing with a spreadsheet. In Proceedings of the 28th European Conference on Object-Oriented Programming (ECOOP ’14), pages 360–384. Springer, 2014.
[46]
M. Veanes, P. de Halleux, and N. Tillmann. Rex: Symbolic regular expression explorer. In Proceedings of the 3rd International Conference on Software Testing, Verification and Validation (ICST ’10), pages 498–507. IEEE, 2010.
[47]
M. Veanes, P. Hooimeijer, B. Livshits, D. Molnar, and N. Bjorner. Symbolic finite state transducers: Algorithms and applications. In Proceedings of the 39th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL ’12, pages 137–150, 2012.
[48]
E. Wu, Y. Diao, and S. Rizvi. High-performance complex event processing over streams. In Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data, SIGMOD ’06, pages 407–418. ACM, 2006.
[49]
F. Zemke, A. Witkowski, M. Cherniack, and L. Colby. Pattern matching in sequences of rows. Technical report, 2007. ANSI Standard Proposal. Introduction Overview The StreamQRE Language Derived Stream Transformations Compilation Algorithm Approximation Experiments Related Work Conclusion

Cited By

View all
  • (2023)Dynamic Program Analysis with Flexible Instrumentation and Complex Event Processing2023 IEEE 34th International Symposium on Software Reliability Engineering (ISSRE)10.1109/ISSRE59848.2023.00048(742-751)Online publication date: 9-Oct-2023
  • (2023)A compositional framework for algebraic quantitative online monitoring over continuous-time signalsInternational Journal on Software Tools for Technology Transfer (STTT)10.1007/s10009-023-00719-w25:4(557-573)Online publication date: 1-Aug-2023
  • (2022)Stream processing with dependency-guided synchronizationProceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming10.1145/3503221.3508413(1-16)Online publication date: 2-Apr-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
PLDI 2017: Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation
June 2017
708 pages
ISBN:9781450349888
DOI:10.1145/3062341
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 June 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Quantitative Regular Expressions
  2. data stream processing

Qualifiers

  • Research-article

Funding Sources

Conference

PLDI '17
Sponsor:

Acceptance Rates

Overall Acceptance Rate 406 of 2,067 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)125
  • Downloads (Last 6 weeks)16
Reflects downloads up to 03 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Dynamic Program Analysis with Flexible Instrumentation and Complex Event Processing2023 IEEE 34th International Symposium on Software Reliability Engineering (ISSRE)10.1109/ISSRE59848.2023.00048(742-751)Online publication date: 9-Oct-2023
  • (2023)A compositional framework for algebraic quantitative online monitoring over continuous-time signalsInternational Journal on Software Tools for Technology Transfer (STTT)10.1007/s10009-023-00719-w25:4(557-573)Online publication date: 1-Aug-2023
  • (2022)Stream processing with dependency-guided synchronizationProceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming10.1145/3503221.3508413(1-16)Online publication date: 2-Apr-2022
  • (2022)BumbleBee: Application-aware adaptation for edge-cloud orchestration2022 IEEE/ACM 7th Symposium on Edge Computing (SEC)10.1109/SEC54971.2022.00017(122-135)Online publication date: Dec-2022
  • (2021)ASTRO: Reducing COVID-19 Exposure through Contact Prediction and AvoidanceACM Transactions on Spatial Algorithms and Systems10.1145/34904928:2(1-31)Online publication date: 30-Dec-2021
  • (2021)A Formal Framework for Complex Event RecognitionACM Transactions on Database Systems10.1145/348546346:4(1-49)Online publication date: 8-Dec-2021
  • (2021)Calculating dependently-typed compilers (functional pearl)Proceedings of the ACM on Programming Languages10.1145/34735875:ICFP(1-27)Online publication date: 19-Aug-2021
  • (2021)Automatic amortized resource analysis with the Quantum physicist’s methodProceedings of the ACM on Programming Languages10.1145/34735815:ICFP(1-29)Online publication date: 19-Aug-2021
  • (2021)How to evaluate blame for gradual typesProceedings of the ACM on Programming Languages10.1145/34735735:ICFP(1-29)Online publication date: 19-Aug-2021
  • (2021)An order-aware dataflow model for parallel Unix pipelinesProceedings of the ACM on Programming Languages10.1145/34735705:ICFP(1-28)Online publication date: 19-Aug-2021
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media