Abstract
Data stream is a continuous, rapid, time-varying sequence of data elements which should be processed in an online manner. These matters are under research in Data Stream Management Systems (DSMSs). Single processor DSMSs cannot satisfy data stream applications’ requirements properly. Main shortcomings are tuple latency, tuple loss, and throughput. In our previous publications, we introduced parallel execution of continuous queries to overcome these problems via performance improvement, especially in terms of tuple latency. We scheduled operators in an event-driven manner which caused system performance reduction in periods between consecutive scheduling instances.
In this paper, a continuous scheduling method (dispatching) is presented to be more compatible with the continuous nature of data streams as well as queries to improve system adaptivity and performance. In a multiprocessing environment, the dispatching method forces processing nodes (logical machines) to send partially-processed tuples to next machines with minimum workload to execute the next operator on them. So, operator scheduling is done continuously and dynamically for each tuple processed by each operator. The dispatching method is described, formally presented, and its correctness is proved. Also, it is modeled in PetriNets and is evaluated via simulation. Results show that the dispatching method significantly improves system performance in terms of tuple latency, throughput, and tuple loss. Furthermore, the fluctuation of system performance parameters (against variation of system and stream characteristics) diminishes considerably and leads to high adaptivity with the underlying system.
Similar content being viewed by others
References
Babcock B, et al (2002) Models and issues in data stream systems. In: Proc of PODS, June 2002 (Invited paper)
The STREAM Group (2003) STREAM: the Stanford stream data manager. IEEE Data Eng Bull 26:19–26
Abadi D, et al (2003) Aurora: a new model and architecture for data stream management. VLDB J 2(12):120–139
Chandrasekaran S et al (2003) TelegraphCQ: continuous dataflow processing. In: ACM SIGMOD
Golab L, Ozsu MT (2003) Issues in data stream management. SIGMOD Rec 32(2):5–14
Sharaf MA (2005) Preemptive rate-based operator scheduling in a data stream management system. In: IEEE/AICCSA
Safaei AA, Haghjoo MS (2010) Parallel processing of data stream query operators. J Distrib Parallel Databases, 2(28):93–118. doi:10.1007/s10619-010-7066-3
Babcock B, et al (2004) Operator scheduling in data stream systems. VLDB J 13(4):333–353
Soliman MS, Tan G (2008) Operator-scheduling using dynamic chain for continuous-query processing. In: IEEE int conference on computer science and software engineering
Sharaf MA, et al (2008) Scheduling continuous queries in data stream management systems. In: PVLDB
Graefe G, et al (1994) Extensible query optimization and parallel execution in volcano. In: Query processing for advanced database systems. Kaufmann, San Mateo
DeWitt DJ, Gray J (1992) Parallel database systems: the future of high performance database processing. Commun ACM 35(6):85–98
Chakravarthy S, Pajjuri V (2006) Scheduling strategies and their evaluation in a data stream management system. Lecture notes in computer science, vol 4042. Springer, Berlin
Ghalambor M, Safaeei AA, Azgomi MA (2009) DSMS scheduling regarding complex QoS metrics. In: IEEE/ACS international conference on computer systems and applications (AICCSA), 10–13 May 2009
Abdollahi Azgomi M, Movaghar A (2004) Coloured stochastic activity networks: definitions and behavior. In: Proc. 20th annual UK performance engineering workshop (UKPEW’04), Bradford, UK, pp 297–308
Khalili A, Jalaly Bidgoly A, Abdollahi Azgomi M (2009) In: PDETool: a multi-formalism modeling tool for discrete-event systems based on SDES description, June 2009. Lecture notes in computer science, vol 5606, pp 343–352
Carney D, et al (2003) Operator scheduling in a data stream manager. In: Proceedings of the 29th international conference on very large data bases, Germany, pp 838–849
Zhou Y, et al (2008) Toward massive query optimization in large-scale distributed stream systems. Lecture notes in computer science, vol 5346, pp 326–345
Babcock B, et al (2003) Chain: operator scheduling for memory minimization in data stream systems. In: Proceedings of the ACM SIGMOD international conference
Widom BS (2002) Exploiting k-constraints to reduce memory overhead in continuous queries over data streams. Technical report, November 2002
Osman A, Ammar H (2005) Dynamic load management for distributed continuous query systems. In: Proceedings of the ICDE
Zhou Y, et al (2005) An adaptable distributed query processing architecture. Data Knowl Eng 53:283–309
Shah MA, et al (2003) Flux: an adaptive partitioning operator for continuous query systems. In: Proceedings of the ICDE
Avnur R, Hellerstein JM (2000) Eddies: continuously adaptive query processing. In: Proceedings of the ACM SIGMOD
Graefe G (1994) Volcano—an extensible and parallel query evaluation system. IEEE Trans Knowl Data Eng 6(1):120–135
Apers PMG, et al (1992) Prisma/db: a parallel, main memory relational DBMS. IEEE Trans Knowl Data Eng 4(6):541–554
Johnson T, et al (2008) Query-aware partitioning for monitoring massive network data streams. In: Proceedings of the ACM SIGMOD
Kramer J (2006) Dynamic plan migration for snapshot-equivalent continuous queries in data stream systems. In: ICSWN06
Zhu Y, et al (2004) Dynamic plan migration for continuous queries over data streams. In: Proceedings of the ACM SIGMOD
Safaei AA, et al (2009) Using finite state machines in processing continuous queries. Int. Rev Comput Softw 4(5):551–556
Tian F, DeWitt DJ (2003) Tuple routing strategies for distributed eddies. In: Proceedings of 29th VLDB conference, September 2003. ISBN: 0-12-722442-4
Graefe G (1993) Query evaluation techniques for large databases. ACM Comput Surv 25(2):73–170
Deshpande A (2004) An initial study of overheads of eddies. SIGMOD Rec 33(1):44–49
Tian F, DeWitt DJ (2003) Tuple routing strategies for distributed eddies. In: Proceedings of the 29th VLDB conference, pp 333–344
Nehme RV, et al (2009) Query mesh: multiroute query processing technology. In: Proceedings of the VLDB Endowment, v.2, n.2, August 2009
Nehme R, et al (2009) Self-tuning query mesh for adaptive multi-route query processing. In: Proceedings of the 12th international conference on extending database technology: advances in database technology (EDBT’09)
Zhou Y, et al (2006) Efficient dynamic operator placement in a locally distributed continuous query system. Lecture notes in computer science, vol 4275, pp 54–71
Alemi M (2010) Implementation of a real-time DSMS. MSc thesis, Iran University of Science and Technology
Woodcock J, Davies J (1996) Using Z: specification, refinement, and proof. Prentice-Hall international series in computer science. Prentice-Hall, New York. ISBN: 0-13-948472-8
Stallings W (2009) Operating systems: internals and design principles, 6th edn. Prentice-Hall, New York. ISBN-10: 0-13-600632-9
Brakerski Z, Dreizin V, Pattshamir B (2003) Dispatching in perfectly-periodic schedules. J Algorithms 49(2):219–239
Deitel H (1990) An introduction to operating systems, 2nd edn. Addison-Wesley, Reading
Babu S (2005) Adaptive query processing in data stream management systems. PhD thesis, Stanford University
Babu S, Motwani R, Munagala K, Nishizawa I, Widom J (2004) Adaptive ordering of pipelined stream filters. In: Proc SIGMOD conference, pp 407–418
Das A, Gehrke J, Riedewald M (2003) Approximate join processing over data streams. In: Proc SIGMOD conference, pp 40–51
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Safaei, A.A., Haghjoo, M.S. Dispatching stream operators in parallel execution of continuous queries. J Supercomput 61, 619–641 (2012). https://doi.org/10.1007/s11227-011-0621-5
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-011-0621-5