Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Dispatching stream operators in parallel execution of continuous queries

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Data stream is a continuous, rapid, time-varying sequence of data elements which should be processed in an online manner. These matters are under research in Data Stream Management Systems (DSMSs). Single processor DSMSs cannot satisfy data stream applications’ requirements properly. Main shortcomings are tuple latency, tuple loss, and throughput. In our previous publications, we introduced parallel execution of continuous queries to overcome these problems via performance improvement, especially in terms of tuple latency. We scheduled operators in an event-driven manner which caused system performance reduction in periods between consecutive scheduling instances.

In this paper, a continuous scheduling method (dispatching) is presented to be more compatible with the continuous nature of data streams as well as queries to improve system adaptivity and performance. In a multiprocessing environment, the dispatching method forces processing nodes (logical machines) to send partially-processed tuples to next machines with minimum workload to execute the next operator on them. So, operator scheduling is done continuously and dynamically for each tuple processed by each operator. The dispatching method is described, formally presented, and its correctness is proved. Also, it is modeled in PetriNets and is evaluated via simulation. Results show that the dispatching method significantly improves system performance in terms of tuple latency, throughput, and tuple loss. Furthermore, the fluctuation of system performance parameters (against variation of system and stream characteristics) diminishes considerably and leads to high adaptivity with the underlying system.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Babcock B, et al (2002) Models and issues in data stream systems. In: Proc of PODS, June 2002 (Invited paper)

    Google Scholar 

  2. The STREAM Group (2003) STREAM: the Stanford stream data manager. IEEE Data Eng Bull 26:19–26

    Google Scholar 

  3. Abadi D, et al (2003) Aurora: a new model and architecture for data stream management. VLDB J 2(12):120–139

    Google Scholar 

  4. Chandrasekaran S et al (2003) TelegraphCQ: continuous dataflow processing. In: ACM SIGMOD

    Google Scholar 

  5. Golab L, Ozsu MT (2003) Issues in data stream management. SIGMOD Rec 32(2):5–14

    Article  Google Scholar 

  6. Sharaf MA (2005) Preemptive rate-based operator scheduling in a data stream management system. In: IEEE/AICCSA

    Google Scholar 

  7. Safaei AA, Haghjoo MS (2010) Parallel processing of data stream query operators. J Distrib Parallel Databases, 2(28):93–118. doi:10.1007/s10619-010-7066-3

    Article  Google Scholar 

  8. Babcock B, et al (2004) Operator scheduling in data stream systems. VLDB J 13(4):333–353

    Article  Google Scholar 

  9. Soliman MS, Tan G (2008) Operator-scheduling using dynamic chain for continuous-query processing. In: IEEE int conference on computer science and software engineering

    Google Scholar 

  10. Sharaf MA, et al (2008) Scheduling continuous queries in data stream management systems. In: PVLDB

    Google Scholar 

  11. Graefe G, et al (1994) Extensible query optimization and parallel execution in volcano. In: Query processing for advanced database systems. Kaufmann, San Mateo

    Google Scholar 

  12. DeWitt DJ, Gray J (1992) Parallel database systems: the future of high performance database processing. Commun ACM 35(6):85–98

    Article  Google Scholar 

  13. Chakravarthy S, Pajjuri V (2006) Scheduling strategies and their evaluation in a data stream management system. Lecture notes in computer science, vol 4042. Springer, Berlin

    Google Scholar 

  14. Ghalambor M, Safaeei AA, Azgomi MA (2009) DSMS scheduling regarding complex QoS metrics. In: IEEE/ACS international conference on computer systems and applications (AICCSA), 10–13 May 2009

    Google Scholar 

  15. Abdollahi Azgomi M, Movaghar A (2004) Coloured stochastic activity networks: definitions and behavior. In: Proc. 20th annual UK performance engineering workshop (UKPEW’04), Bradford, UK, pp 297–308

    Google Scholar 

  16. Khalili A, Jalaly Bidgoly A, Abdollahi Azgomi M (2009) In: PDETool: a multi-formalism modeling tool for discrete-event systems based on SDES description, June 2009. Lecture notes in computer science, vol 5606, pp 343–352

    Google Scholar 

  17. Carney D, et al (2003) Operator scheduling in a data stream manager. In: Proceedings of the 29th international conference on very large data bases, Germany, pp 838–849

    Google Scholar 

  18. Zhou Y, et al (2008) Toward massive query optimization in large-scale distributed stream systems. Lecture notes in computer science, vol 5346, pp 326–345

    Google Scholar 

  19. Babcock B, et al (2003) Chain: operator scheduling for memory minimization in data stream systems. In: Proceedings of the ACM SIGMOD international conference

    Google Scholar 

  20. Widom BS (2002) Exploiting k-constraints to reduce memory overhead in continuous queries over data streams. Technical report, November 2002

  21. Osman A, Ammar H (2005) Dynamic load management for distributed continuous query systems. In: Proceedings of the ICDE

    Google Scholar 

  22. Zhou Y, et al (2005) An adaptable distributed query processing architecture. Data Knowl Eng 53:283–309

    Article  Google Scholar 

  23. Shah MA, et al (2003) Flux: an adaptive partitioning operator for continuous query systems. In: Proceedings of the ICDE

    Google Scholar 

  24. Avnur R, Hellerstein JM (2000) Eddies: continuously adaptive query processing. In: Proceedings of the ACM SIGMOD

    Google Scholar 

  25. Graefe G (1994) Volcano—an extensible and parallel query evaluation system. IEEE Trans Knowl Data Eng 6(1):120–135

    Article  Google Scholar 

  26. Apers PMG, et al (1992) Prisma/db: a parallel, main memory relational DBMS. IEEE Trans Knowl Data Eng 4(6):541–554

    Article  Google Scholar 

  27. Johnson T, et al (2008) Query-aware partitioning for monitoring massive network data streams. In: Proceedings of the ACM SIGMOD

    Google Scholar 

  28. Kramer J (2006) Dynamic plan migration for snapshot-equivalent continuous queries in data stream systems. In: ICSWN06

    Google Scholar 

  29. Zhu Y, et al (2004) Dynamic plan migration for continuous queries over data streams. In: Proceedings of the ACM SIGMOD

    Google Scholar 

  30. Safaei AA, et al (2009) Using finite state machines in processing continuous queries. Int. Rev Comput Softw 4(5):551–556

    Google Scholar 

  31. Tian F, DeWitt DJ (2003) Tuple routing strategies for distributed eddies. In: Proceedings of 29th VLDB conference, September 2003. ISBN: 0-12-722442-4

    Google Scholar 

  32. Graefe G (1993) Query evaluation techniques for large databases. ACM Comput Surv 25(2):73–170

    Article  Google Scholar 

  33. Deshpande A (2004) An initial study of overheads of eddies. SIGMOD Rec 33(1):44–49

    Article  Google Scholar 

  34. Tian F, DeWitt DJ (2003) Tuple routing strategies for distributed eddies. In: Proceedings of the 29th VLDB conference, pp 333–344

    Chapter  Google Scholar 

  35. Nehme RV, et al (2009) Query mesh: multiroute query processing technology. In: Proceedings of the VLDB Endowment, v.2, n.2, August 2009

    Google Scholar 

  36. Nehme R, et al (2009) Self-tuning query mesh for adaptive multi-route query processing. In: Proceedings of the 12th international conference on extending database technology: advances in database technology (EDBT’09)

    Google Scholar 

  37. Zhou Y, et al (2006) Efficient dynamic operator placement in a locally distributed continuous query system. Lecture notes in computer science, vol 4275, pp 54–71

    Google Scholar 

  38. Alemi M (2010) Implementation of a real-time DSMS. MSc thesis, Iran University of Science and Technology

  39. Woodcock J, Davies J (1996) Using Z: specification, refinement, and proof. Prentice-Hall international series in computer science. Prentice-Hall, New York. ISBN: 0-13-948472-8

    MATH  Google Scholar 

  40. Stallings W (2009) Operating systems: internals and design principles, 6th edn. Prentice-Hall, New York. ISBN-10: 0-13-600632-9

    Google Scholar 

  41. Brakerski Z, Dreizin V, Pattshamir B (2003) Dispatching in perfectly-periodic schedules. J Algorithms 49(2):219–239

    Article  MathSciNet  MATH  Google Scholar 

  42. Deitel H (1990) An introduction to operating systems, 2nd edn. Addison-Wesley, Reading

    Google Scholar 

  43. Babu S (2005) Adaptive query processing in data stream management systems. PhD thesis, Stanford University

  44. Babu S, Motwani R, Munagala K, Nishizawa I, Widom J (2004) Adaptive ordering of pipelined stream filters. In: Proc SIGMOD conference, pp 407–418

    Google Scholar 

  45. Das  A, Gehrke J, Riedewald M (2003) Approximate join processing over data streams. In: Proc SIGMOD conference, pp 40–51

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ali A. Safaei.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Safaei, A.A., Haghjoo, M.S. Dispatching stream operators in parallel execution of continuous queries. J Supercomput 61, 619–641 (2012). https://doi.org/10.1007/s11227-011-0621-5

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-011-0621-5

Keywords