Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

A catalog of stream processing optimizations

Published: 01 March 2014 Publication History
  • Get Citation Alerts
  • Abstract

    Various research communities have independently arrived at stream processing as a programming model for efficient and parallel computing. These communities include digital signal processing, databases, operating systems, and complex event processing. Since each community faces applications with challenging performance requirements, each of them has developed some of the same optimizations, but often with conflicting terminology and unstated assumptions. This article presents a survey of optimizations for stream processing. It is aimed both at users who need to understand and guide the system’s optimizer and at implementers who need to make engineering tradeoffs. To consolidate terminology, this article is organized as a catalog, in a style similar to catalogs of design patterns or refactorings. To make assumptions explicit and help understand tradeoffs, each optimization is presented with its safety constraints (when does it preserve correctness?) and a profitability experiment (when does it improve performance?). We hope that this survey will help future streaming system builders to stand on the shoulders of giants from not just their own community.

    References

    [1]
    Daniel J. Abadi, Yanif Ahmad, Magdalena Balazinska, Uğur Çetintemel, Mitch Cherniack, Jeong-Hyon Hwang, Wolfgang Lindner, Anurag S. Maskey, Alexander Rasin, Esther Ryvkina, Nesime Tatbul, Ying Xing, and Stan Zdonik. 2005. The Design of the Borealis Stream Processing Engine. In Conference on Innovative Data Systems Research (CIDR). 277--289.
    [2]
    Daniel J. Abadi, Don Carney, Uğur Çetintemel, Mitch Cherniack, Christian Convey, Sangdon Lee, Michael Stonebraker, Nesime Tatbul, and Stan Zdonik. 2003. Aurora: A New Model and Architecture for Data Stream Management. VLDB Journal 12, 2 (Aug. 2003), 120--139.
    [3]
    Mohamed H. Ali, Ciprian Gerea, Balan Raman, Beysim Sezgin, Tiho Tarnavski, Tomer Verona, et al. 2009. Microsoft CEP Server and Online Behavioral Targeting. In Demo at the Conference on Very Large Data Bases (VLDB-Demo). 1558--1561.
    [4]
    Lisa Amini, Navendu Jain, Anshul Sehgal, Jeremy Silber, and Olivier Verscheure. 2006. Adaptive Control of Extreme-Scale Stream Processing Systems. In International Conference on Distributed Computing Systems (ICDCS).
    [5]
    Arvind Arasu, Shivnath Babu, and Jennifer Widom. 2006. The CQL Continuous Query Language: Semantic Foundations and Query Execution. Journal 15, 2 (June 2006), 121--142.
    [6]
    Remzi H. Arpaci-Dusseau, Eric Anderson, Noah Treuhaft, David E. Culler, Joseph M. Hellerstein, David Patterson, and Kathy Yelick. 1999. Cluster I/O with River: Making the Fast Case Common. In Workshop on I/O in Parallel and Distributed Systems (IOPADS). 10--22.
    [7]
    Joshua S. Auerbach, David F. Bacon, Perry Cheng, and Rodric M. Rabbah. 2010. Lime: A Java-compatible and Synthesizable Language for Heterogeneous Architectures. In Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA). 89--108.
    [8]
    Ron Avnur and Joseph M. Hellerstein. 2000. Eddies: Continuously Adaptive Query Processing. In International Conference on Management of Data (SIGMOD). 261--272.
    [9]
    Brian Babcock, Shivnath Babu, Mayur Datar, Rajeev Motwani, and Jennifer Widom. 2002. Models and Issues in Data Stream Systems. In Principles of Database Systems (PODS). 1--16.
    [10]
    Brian Babcock, Mayur Datar, and Rajeev Motwani. 2004. Load Shedding for Aggregation Queries over Data Streams. In International Conference on Data Engineering (ICDE). 350--361.
    [11]
    Shivnath Babu, Rajeev Motwani, Kamesh Munagala, Itaru Nishizawa, and Jennifer Widom. 2004. Adaptive Ordering of Pipelined Stream Filters. In International Conference on Management of Data (SIGMOD). 407--418.
    [12]
    Roger S. Barga, Jonathan Goldstein, Mohamed Ali, and Mingsheng Hong. 2007. Consistent Streaming through Time: A Vision for Event Stream Processing. In Conference on Innovative Data Systems Research (CIDR). 363--373.
    [13]
    Robert D. Blumofe, Christopher F. Joerg, Bradley C. Kuszmaul, Charles E. Leiserson, Keith H. Randall, and Yuli Zhou. 1995. Cilk: An Efficient Multithreaded Runtime System. In Principles and Practice of Parallel Programming (PPoPP). 207--216.
    [14]
    Irina Botan, Roozbeh Derakhshan, Nihal Dindar, Laura Haas, Renée J. Miller, and Nesime Tatbul. 2010. SECRET: A Model for Analysis of the Execution Semantics of Stream Processing Systems. In Conference on Very Large Data Bases (VLDB). 232--243.
    [15]
    Andrey Brito, Christof Fetzer, Heiko Sturzrehm, and Pascal Felber. 2008. Speculative Out-of-Order Event Processing with Software Transaction Memory. In Conference on Distributed Event-Based Systems (DEBS). 265--275.
    [16]
    Ian Buck, Tim Foley, Daniel Horn, Jeremy Sugerman, Kayvon Fatahalian, Mike Houston, and Pat Hanrahan. 2004. Brook for GPUs: Stream Computing on Graphics Hardware. In Computer Graphics and Interactive Techniques (SIGGRAPH). 777--786.
    [17]
    Kimberley Burchett, Gregory H. Cooper, and Shriram Krishnamurthi. 2007. Lowering: A Static Optimization Technique for Transparent Functional Reactivity. In Partial Evaluation and Semantics-Based Program Manipulation (PEPM). 1--80.
    [18]
    Don Carney, Uğur Cetintemel, Alex Rasin, Stan Zdonik, Mitch Cherniack, and Mike Stonebraker. 2003. Operator Scheduling in a Data Stream Manager. In Conference on Very Large Data Bases (VLDB). 309--320.
    [19]
    Jianjun Chen, David J. DeWitt, Feng Tian, and Yuan Wang. 2000. NiagaraCQ: A Scalable Continuous Query System for Internet Databases. In International Conference on Management of Data (SIGMOD). 379--390.
    [20]
    Tyson Condie, Neil Conway, Peter Alvaro, Joseph M. Hellerstein, Khaled Elmeleegy, and Russell Sears. 2010. MapReduce Online. In Networked Systems Design and Implementation (NSDI). 313--328.
    [21]
    Corinna Cortes, Kathleen Fisher, Daryl Pregibon, Anne Rogers, and Frederick Smith. 2004. Hancock: A Language for Analyzing Transactional Data Streams. ACM Transactions on Programming Languages and Systems 26, 2 (March 2004), 301--338.
    [22]
    Duncan Coutts, Roman Leshchinskiy, and Don Stewart. 2007. Stream Fusion: From Lists to Streams to Nothing at All. In International Conference on Functional Programming (ICFP). 315--326.
    [23]
    Jeffrey Dean and Sanjay Ghemawat. 2004. MapReduce: Simplified Data Processing on Large Clusters. In Operating Systems Design and Implementation (OSDI). 137--150.
    [24]
    David DeWitt and Jim Gray. 1992. Parallel Database Systems: The Future of High Performance Database Systems. Communications of the ACM (CACM) 35, 6 (June 1992), 85--98.
    [25]
    D. J. DeWitt, S. Ghandeharizadeh, D. A. Schneider, A. Bricker, H. I. Hsiao, and R. Rasmussen. 1990. The Gamma Database Machine Project. Transations on Knowledge and Data Engineering (TKDE) 2, 1 (March 1990), 44--62.
    [26]
    Yanlei Diao, Peter M. Fischer, Michael J. Franklin, and Raymond To. 2002. YFilter: Efficient and Scalable Filtering of XML Documents. In Demo at International Conference on Data Engineering (ICDE-Demo). 341--342.
    [27]
    Fred Douglis and John Ousterhout. 1991. Transparent Process Migration: Design Alternatives and the Sprite Implementation. Software—Practice and Experience 21, 8 (Aug. 1991), 757--785.
    [28]
    Charles L. Forgy. 1982. Rete: A Fast Algorithm for the Many Pattern/Many Object Pattern Match Problem. Artificial Intelligence 19, 1 (1982), 17--37.
    [29]
    Martin Fowler, Kent Beck, John Brant, and William Opdyke. 1999. Refactoring: Improving the Design of Existing Code. Addison-Wesley.
    [30]
    Matteo Frigo and Steven G. Johnson. 1998. FFTW: An Adaptive Software Architecture for the FFT. In International Conference on Acoustics, Speech and Signal Processing (ICASSP). 1381--1384.
    [31]
    Erich Gamma, Richard Helm, Ralph Johnson, and John Vlissides. 1995. Design Patterns: Elements of Reusable Object-Oriented Software. Addison-Wesley.
    [32]
    Hector Garcia-Molina, Jeffrey D. Ullman, and Jennifer Widom. 2008. Database Systems: The Complete Book (second ed.). Prentice Hall.
    [33]
    Buğra Gedik, Henrique Andrade, Kun-Lung Wu, Philip S. Yu, and Myungcheol Doo. 2008a. SPADE: The System S Declarative Stream Processing Engine. In International Conference on Management of Data (SIGMOD). 1123--1134.
    [34]
    Buğra Gedik, Kun-Lung Wu, and Philip S. Yu. 2008b. Efficient Construction of Compact Shedding Filters for Data Stream Processing. In International Conference on Data Engineering (ICDE). 396--405.
    [35]
    Michael I. Gordon, William Thies, and Saman Amarasinghe. 2006. Exploiting Coarse-Grained Task, Data, and Pipeline Parallelism in Stream Programs. In Architectural Support for Programming Languages and Operating Systems (ASPLOS). 151--162.
    [36]
    Michael I. Gordon, William Thies, Michal Karczmarek, Jasper Lin, Ali S. Meli, Andrew A. Lamb, Chris Leger, Jeremy Wong, Henry Hoffmann, David Maze, and Saman Amarasinghe. 2002. A Stream Compiler for Communication-Exposed Architectures. In Architectural Support for Programming Languages and Operating Systems (ASPLOS). 291--303.
    [37]
    Goetz Graefe. 1990. Encapsulation of Parallelism in the Volcano Query Processing System. In International Conference on Management of Data (SIGMOD). 102--111.
    [38]
    Martin Hirzel, Henrique Andrade, Buğra Gedik, Gabriela Jacques-Silva, Rohit Khandekar, Vibhore Kumar, Mark Mendell, Howard Nasgaard, Scott Schneider, Robert Soulé, and Kun-Lung Wu. 2013. IBM Streams Processing Language: Analyzing Big Data in Motion. IBM Journal of Research and Development (IBMRD) 57, 3/4 (May/July 2013), 7:1--7:11.
    [39]
    Amir Hormati, Yoonseo Choi, Manjunath Kudlur, Rodric M. Rabbah, Trevor N. Mudge, and Scott A. Mahlke. 2009. Flextream: Adaptive Compilation of Streaming Applications for Heterogeneous Architectures. In Parallel Architectures and Compilation Techniques (PACT). 214--223.
    [40]
    Amir H. Hormati, Yoonseo Choi, Mark Woh, Manjunath Kudlur, Rodric Rabbah, Trevor Mudge, and Scott Mahlke. 2010. MacroSS: Macro-SIMDization of Streaming Applications. In Architectural Support for Programming Languages and Operating Systems (ASPLOS). 285--296.
    [41]
    Fabian Hueske, Mathias Peters, Matthias J. Sax, Astrid Rheinländer, Rico Bergmann, Aljoscha Krettek, and Kostas Tzoumas. 2012. Opening the Black Boxes in Data Flow Optimization. In Conference on Very Large Data Bases (VLDB). 1256--1267.
    [42]
    Michael Isard, Mihai Budiuand Yuan Yu, Andrew Birrell, and Dennis Fetterly. 2007. Dryad: Distributed Data-Parallel Program from Sequential Building Blocks. In European Conference on Computer Systems (EuroSys). 59--72.
    [43]
    Namit Jain, Shailendra Mishra, Anand Srinivasan, Johannes Gehrke, Jennifer Widom, Hari Balakrishnan, Uğur Cetintemel, Mitch Cherniack, Richard Tibbets, and Stan Zdonik. 2008. Towards a Streaming SQL Standard. In Conference on Very Large Data Bases (VLDB). 1379--1390.
    [44]
    Westley M. Johnston, J. R. Paul Hanna, and Richard J. Millar. 2004. Advances in Dataflow Programming Languages. ACM Computing Surveys (CSUR) 36, 1 (March 2004), 1--34.
    [45]
    Rohit Khandekar, Irsten Hildrum, Sujay Parekh, Deepak Rajan, Joel Wolf, Kun-Lung Wu, Henrique Andrade, and Buğra Gedik. 2009. COLA: Optimizing Stream Processing Applications Via Graph Partitioning. In International Conference on Middleware. 308--327.
    [46]
    Monica Lam. 1988. Software Pipelining: An Effective Scheduling Technique for VLIW Machines. In Programming Language Design and Implementation (PLDI). 318--328.
    [47]
    Peng Li, Kunal Agrawal, Jeremy Buhler, Roger D. Chamberlain, and Joseph M. Lancaster. 2010. Deadlock-Avoidance for Streaming Applications with Split-Join Structure: Two Case Studies. In Application-specific Systems Architectures and Processors (ASAP). 333--336.
    [48]
    Sung-Soo Lim, Young Hyun Bae, Gyu Tae Jang, Byung-Do Rhee, Sang Lyul Min, Chang Yun Park, Heonshik Shin, Kunsoo Park, Soo-Mook Moonm, and Chong Sang Kim. 1995. An Accurate Worst Case Timing Analysis for RISC Processors. IEEE Transactions on Software Engineering (TSE) 21, 6 (July 1995), 593--604.
    [49]
    Bin Liu, Yali Zhu, and Elke A. Rundensteiner. 2006. Run-Time Operator State Spilling for Memory Intensive Long-Running Queries. In International Conference on Management of Data (SIGMOD). 347--358.
    [50]
    Boon Thau Loo, Tyson Condie, Joseph M. Hellerstein, Petros Maniatis, Timothy Roscoe, and Ion Stoica. 2005. Implementing Declarative Overlays. In Symposium on Operating Systems Principles (SOSP). 75--90.
    [51]
    Marcelo R. N. Mendes, Pedro Bizarro, and Paulo Marques. 2009. A Performance Study of Event Processing Systems. In TPC Technology Conference on Performance Evaluation & Benchmarking (TPC TC). 221--236.
    [52]
    Jeffrey C. Mogul and K. K. Ramakrishnan. 1997. Eliminating Receive Livelock in an Interrupt-Driven Kernel. ACM Transactions on Computer Systems (TOCS) 15, 3 (Aug. 1997), 217--252.
    [53]
    Robert Morris, Eddie Kohler, John Jannotti, and M. Frans Kaashoek. 1999. The Click Modular Router. In Symposium on Operating Systems Principles (SOSP). 263--297.
    [54]
    David Mosberger and Larry L. Peterson. 1996. Making Paths Explicit in the Scout Operating System. In Operating Systems Design and Implementation (OSDI). 153--167.
    [55]
    David Mosberger, Larry L. Peterson, Patrick G. Bridges, and Sean O’Malley. 1996. Analysis of Techniques to Improve Protocol Processing Latency. In Applications, Technologies, Architectures, and Protocols for Computer Communications (SIGCOMM). 73--84.
    [56]
    Chris Olston, Jing Jiang, and Jennifer Widom. 2003. Adaptive Filters for Continuous Queries over Distributed Data Streams. In International Conference on Management of Data (SIGMOD). 563--574.
    [57]
    Guilherme Ottoni, Ram Rangan, Adam Stoler, and David I. August. 2005. Automatic Thread Extraction with Decoupled Software Pipelining. In International Symposium on Microarchitecture (MICRO). 105--118.
    [58]
    Larry Page, Sergey Brin, R. Motwani, and T. Winograd. 1998. The PageRank Citation Ranking: Bringing Order to the Web. Stanford Digital Libraries Working Paper (1998).
    [59]
    Peter Pietzuch, Jonathan Ledlie, Jeffrey Schneidman, Mema Roussopoulos, Matt Welsh, and Margo Seltzer. 2006. Network-Aware Operator Placement for Stream-Processing Systems. In International Conference on Data Engineering (ICDE). 49--61.
    [60]
    Easwaran Raman, Guilherme Ottoni, Arun Raman, Matthew J. Bridges, and David I. August. 2008. Parallel-Stage Decoupled Software Pipelining. In Code Generation and Optimization (CGO). 114--123.
    [61]
    Christopher Ré, Jérôme Siméon, and Mary F. Fernàndez. 2006. A Complete and Efficient Algebraic Compiler for XQuery. In International Conference on Data Engineering (ICDE). 14--25.
    [62]
    Anton V. Riabov, Eric Bouillet, Mark D. Feblowitz, Zhen Liu, and Anand Ranganathan. 2008. Wishful Search: Interactive Composition of Data Mashups. In International World Wide Web Conferences (WWW). 775--784.
    [63]
    Martin C. Rinard and Pedro C. Diniz. 1996. Commutativity Analysis: A New Analysis Framework for Parallelizing Compilers. In Programming Language Design and Implementation (PLDI). 54--67.
    [64]
    Scott Schneider, Henrique Andrade, Buğra Gedik, Alain Biem, and Kun-Lung Wu. 2009. Elastic Scaling of Data Parallel Operators in Stream Processing. In International Parallel & Distributed Processing Symposium (IPDPS). 1--12.
    [65]
    Scott Schneider, Buğra Gedik, and Martin Hirzel. 2013. Tutorial: Stream Processing Optimizations. In Conference on Distributed Event-Based Systems (DEBS). 249--258.
    [66]
    Scott Schneider, Martin Hirzel, Buğra Gedik, and Kun-Lung Wu. 2012. Auto-Parallelizing Stateful Distributed Streaming Applications. In International Conference on Parallel Architectures and Compilation Techniques (PACT). 53--64.
    [67]
    Philippe Selo, Yoonho Park, Sujay Parekh, Chitra Venkatramani, Hari K. Pyla, and Fang Zheng. 2010. Adding Stream Processing System Flexibility to Exploit Low-overhead Communication Systems. In Workshop on High Performance Computational Finance (WHPCF). 1--8.
    [68]
    Janis Sermulins, William Thies, Rodric Rabbah, and Saman Amarasinghe. 2005. Cache Aware Optimization of Stream Programs. In Languages, Compiler, and Tool Support for Embedded Systems (LCTES). 115--126.
    [69]
    Mehul A. Shah, Joseph M. Hellerstein, and Eric Brewer. 2004. Highly Available, Fault-Tolerant, Parallel Dataflows. In International Conference on Management of Data (SIGMOD). 827--838.
    [70]
    Adam Silberstein, Rebecca Braynard, and Jun Yang. 2006. Constraint Chaining: On Energy-Efficient Continuous Monitoring in Sensor Networks. In International Conference on Management of Data (SIGMOD). 157--168.
    [71]
    Robert Soulé, Martin Hirzel, Buğra Gedik, and Robert Grimm. 2012. From a Calculus to an Execution Environment for Stream Processing. In Conference on Distributed Event-Based Systems (DEBS). 20--31.
    [72]
    Robert Soulé, Martin Hirzel, Robert Grimm, Buğra Gedik, Henrique Andrade, Vibhore Kumar, and Kun-Lung Wu. 2010. A Universal Calculus for Stream Processing Languages. In European Symposium on Programming (ESOP). 507--528.
    [73]
    Robert Stephens. 1997. A Survey of Stream Processing. Acta Informatica 34, 7 (July 1997), 491--541.
    [74]
    Storm. 2013. Distributed and Fault-Tolerant Realtime Computation. Retrieved September 2013 from http://storm-project.net/.
    [75]
    Yuzhe Tang and Buğra Gedik. 2012. Auto-Pipelining for Data Stream Processing. In Transactions on Parallel and Distributed Systems (TPDS). 10.1109/TPDS.2012.333.
    [76]
    Nesime Tatbul, Uğur Cetintemel, Stan Zdonik, Mitch Cherniack, and Michael Stonebraker. 2003. Load shedding in a data stream manager. In Conference on Very Large Data Bases (VLDB). 309--320.
    [77]
    William Thies and Saman Amarasinghe. 2010. An empirical characterization of stream programs and its implications for language and compiler design. In Parallel Architectures and Compilation Techniques (PACT). 365--376.
    [78]
    William Thies, Vikram Chandrasekhar, and Saman Amarasinghe. 2007. A Practical Approach to Exploiting Coarse-Grained Pipeline Parallelism in C Programs. In International Symposium on Microarchitecture (MICRO). 314--327.
    [79]
    William Thies, Michal Karczmarek, and Saman Amarasinghe. 2002. StreamIt: A Language for Streaming Applications. In International Conference on Compiler Construction (CC). 179--196.
    [80]
    Abhishek Udupa, R. Govindarajan, and Matthew J. Thazhuthaveetil. 2009. Software Pipelined Execution of Stream Programs on GPUs. In Code Generation and Optimization (CGO). 200--209.
    [81]
    Matt Welsh, David Culler, and Eric Brewer. 2001. SEDA: An Architecture for Well-Conditioned, Scalable Internet Services. In Symposium on Operating Systems Principles (SOSP). 230--243.
    [82]
    R. Clint Whaley, Antoine Petitet, and Jack J. Dongarra. 2001. Automated Empirical Optimizations of Software and the ATLAS Project. In Parallel Computing (PARCO). 3--35.
    [83]
    Joel Wolf, Nikhil Bansal, Kirsten Hildrum, Sujay Parekh, Deepak Rajan, Rohit Wagle, Kun-Lung Wu, and Lisa Fleischer. 2008. SODA: An Optimizing Scheduler for Large-Scale Stream-Based Distributed Computer Systems. In International Conference on Middleware. 306--325.
    [84]
    Eugene Wu, Yanlei Diao, and Shariq Rizvi. 2006. High-Performance Complex Event Processing over Streams. In International Conference on Management of Data (SIGMOD). 407--418.
    [85]
    Ying Xing, Stan Zdonik, and Jeong-Hyon Hwang. 2005. Dynamic Load Distribution in the Borealis Stream Processor. In International Conference on Data Engineering (ICDE). 791--802.
    [86]
    Jianxin Xiong, Jeremy Johnson, Robert Johson, and David Padua. 2001. SPL: A Language and Compiler for DSP Algorithms. In Programming Language Design and Implementation (PLDI). 298--308.
    [87]
    Kamen Yotov, Xiaoming Li, Gang Ren, Michael Cibulskis, Gerald DeJong, Maria Garzaran, David Padua, Keshav Pingali, Paul Stodghill, and Peng Wu. 2003. A Comparison of Empirical and Model-Driven Optimization. In Programming Language Design and Implementation (PLDI). 63--76.
    [88]
    Yuan Yu, Pradeep K. Gunda, and Michael Isard. 2009. Distributed Aggregation for Data-Parallel Computing: Interfaces and Implementations. In Symposium on Operating Systems Principles (SOSP). 247--260.
    [89]
    Yuan Yu, Michael Isard, Dennis Fetterly, Mihai Budiu, Úlfar Erlingsson, Pradeep Kumar Gunda, and Jon Currey. 2008. DryadLINQ: A System for General-Purpose Distributed Data-Parallel Computing Using a High-Level Language. In Operating Systems Design and Implementation (OSDI). 1--14.

    Cited By

    View all
    • (2024)ShuffleBench: A Benchmark for Large-Scale Data Shuffling Operations with Distributed Stream Processing FrameworksProceedings of the 15th ACM/SPEC International Conference on Performance Engineering10.1145/3629526.3645036(2-13)Online publication date: 7-May-2024
    • (2024)A Comprehensive Benchmarking Analysis of Fault Recovery in Stream Processing FrameworksProceedings of the 18th ACM International Conference on Distributed and Event-based Systems10.1145/3629104.3666040(171-182)Online publication date: 24-Jun-2024
    • (2024)An Overview of Continuous Querying in (Modern) Data SystemsCompanion of the 2024 International Conference on Management of Data10.1145/3626246.3654679(605-612)Online publication date: 9-Jun-2024
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Computing Surveys
    ACM Computing Surveys  Volume 46, Issue 4
    April 2014
    463 pages
    ISSN:0360-0300
    EISSN:1557-7341
    DOI:10.1145/2597757
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 01 March 2014
    Accepted: 01 September 2013
    Revised: 01 January 2013
    Received: 01 February 2012
    Published in CSUR Volume 46, Issue 4

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Stream processing
    2. optimizations

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)254
    • Downloads (Last 6 weeks)27
    Reflects downloads up to 27 Jul 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)ShuffleBench: A Benchmark for Large-Scale Data Shuffling Operations with Distributed Stream Processing FrameworksProceedings of the 15th ACM/SPEC International Conference on Performance Engineering10.1145/3629526.3645036(2-13)Online publication date: 7-May-2024
    • (2024)A Comprehensive Benchmarking Analysis of Fault Recovery in Stream Processing FrameworksProceedings of the 18th ACM International Conference on Distributed and Event-based Systems10.1145/3629104.3666040(171-182)Online publication date: 24-Jun-2024
    • (2024)An Overview of Continuous Querying in (Modern) Data SystemsCompanion of the 2024 International Conference on Management of Data10.1145/3626246.3654679(605-612)Online publication date: 9-Jun-2024
    • (2024)To Migrate or Not to Migrate: An Analysis of Operator Migration in Distributed Stream ProcessingIEEE Communications Surveys & Tutorials10.1109/COMST.2023.333095326:1(670-705)Online publication date: 1-Jan-2024
    • (2024)A survey on the evolution of stream processing systemsThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-023-00819-833:2(507-541)Online publication date: 1-Mar-2024
    • (2023)Genome Editing and its Applications in PlantsMedicinal Plants: Microbial Interactions, Molecular Techniques and Therapeutic Trends10.2174/9789815136838123010012(140-158)Online publication date: 19-Dec-2023
    • (2023)Monitoring Big Data Streams Using Data Stream Management Systems: Industrial Needs, Challenges, and ImprovementsAdvances in Operations Research10.1155/2023/25960692023(1-12)Online publication date: 27-Jun-2023
    • (2023)tf.data serviceProceedings of the 2023 ACM Symposium on Cloud Computing10.1145/3620678.3624666(358-375)Online publication date: 30-Oct-2023
    • (2023)A Model and Survey of Distributed Data-Intensive SystemsACM Computing Surveys10.1145/360480156:1(1-69)Online publication date: 26-Aug-2023
    • (2023)GeaFlow: A Graph Extended and Accelerated Dataflow SystemProceedings of the ACM on Management of Data10.1145/35897711:2(1-27)Online publication date: 20-Jun-2023
    • Show More Cited By

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media