Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1007/978-3-319-07782-6_6guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Operator Scale Out Using Time Utility Function in Big Data Stream Processing

Published: 23 June 2014 Publication History

Abstract

Many important big data applications require real-time processing of arriving data with high scalability, especially some IoT applications in where devices generate infinite data and environments are intrinsically volatile. Most of current Stream Processing Systems(SPS), like Storm or S4, often show an insufficient scalability as the architecture is based on static configurations. Although considerable research and industry effort has been invested on scale out of operators in SPS, most of them focus on how to scale out different type of operators based on an on-demand infrastructure. Few of them consider when and which operators should be scale out, as improper scale out may introduce extra overhead to the system. In this paper, we present a novel approach for finding bottleneck operator at run time and scale out only bottleneck operator. An algorithm is designed to find out bottleneck operator based on time utility function(TUF) model. The algorithm utilizes utility profit, utility penalty and utility threshold to evaluate the utility accrual of a run-time operator. With the rewarding of early completions and penalizing of missing deadline, the algorithm will scale out the operator when the utility accrual below the threshold. Experimental results show that our time-aware utility accrual approach can exactly identify and efficiently scale out the bottleneck operator at run time in data stream processing system.

References

[1]
http://hadoop.apache.org
[2]
http://storm.incubator.apache.org
[3]
Babcock, B., et al.: Models and issues in data stream systems. In: Proceedings of the Twenty-first ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems. ACM (2002)
[4]
Russell, M.A.: Mining the Social Web: Data Mining Facebook, Twitter, LinkedIn, Google+, GitHub, and More. O'Reilly Media, Inc. (2013)
[5]
Parikh, N., Sundaresan, N.: Scalable and near real-time burst detection from ecommerce queries. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM (2008)
[6]
Gulisano, V., et al.: Streamcloud: An elastic and scalable data streaming system. IEEE Transactions on Parallel and Distributed Systemsä23(12), 2351—2365 (2012)
[7]
Abadi, D.J., et al.: Aurora: a new model and architecture for data stream management. The VLDB Journal the International Journal on Very Large Data Basesä12(2), 120—139 (2003)
[8]
Neumeyer, L., Robbing, B., et al.: S4: Distributed Stream Computing Platform. In: ICDMW (2010)
[9]
Castro Fernandez, R., et al.: Integrating scale out and fault tolerance in stream processing using operator state management. In: Proceedings of the 2013 International Conference on Management of Data. ACM (2013)
[10]
Kumar, V., Palaniswami, S.: Exploiting Resource Overloading Using Utility Accrual Approach for Parallel Data Processing in Cloud
[11]
Wu, H., Ravindran, B., Jensen, E.D.: Utility accrual scheduling under joint utility and resource constraints. In: 2004 Proceedings of the Seventh IEEE International Symposium on Object-Oriented Real-Time Distributed Computing. IEEE (2004)
[12]
Kuno, H.: Surveying the e-services technical landscape. In: Second International Workshop on Advanced Issues of E-Commerce and Web-Based Information Systems, WECWIS 2000. IEEE (2000)
[13]
Liu, S., Quan, G., Ren, S.: On-line scheduling of real-time services for cloud computing. In: 2010 6th World Congress on Services (SERVICES-1). IEEE (2010)
[14]
Stonebraker, M., Tintemel, U., Zdonik, S.: The 8 requirements of real-time stream processing. ACM SIGMOD Recordä34(4), 42—47 (2005)
[15]
Lee, D., Kim, J.-S., Maeng, S.: Large-scale incremental processing with MapReduce. Future Generation Computer Systems (2013)
[16]
Yu, Y., et al.: Profit and penalty aware (pp-aware) scheduling for tasks with variable task execution time. In: Proceedings of the 2010 ACM Symposium on Applied Computing. ACM (2010)
[17]
Bulut, A., Singh, A.K.: A unified framework for monitoring data streams in real time. In: Proceedings of the 21st International Conference on Data Engineering, ICDE 2005. IEEE (2005)
[18]
Bartal, Y., et al.: Multiprocessor scheduling with rejection. SIAM Journal on Discrete Mathematicsä13(1), 64—78 (2000)
[19]
Zaharia, M., et al.: Discretized Streams: Fault-tolerant streaming computation at scale. In: Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles. ACM (2013)
[20]
Gedik, B., et al.: Elastic scaling for data stream processing. IEEE Transactions on Parallel and Distributed SystemsäPP(99), 1 (2013)
[21]
Backman, N., Fonseca, R., Cetintemel, U.: Managing parallelism for stream processing in the cloud. In: Proceedings of the 1st International Workshop on Hot Topics in Cloud Data Processing. ACM (2012)
[22]
Gedik, B.: Partitioning functions for stateful data parallelism in stream processing. The VLDB Journal, 1—23 (2013)

Cited By

View all
  • (2020)Resource Management and Scheduling in Distributed Stream Processing SystemsACM Computing Surveys10.1145/335539953:3(1-41)Online publication date: 28-May-2020

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings
WASA 2014: Proceedings of the 9th International Conference on Wireless Algorithms, Systems, and Applications - Volume 8491
June 2014
803 pages
ISBN:9783319077819
  • Editors:
  • Zhipeng Cai,
  • Chaokun Wang,
  • Siyao Cheng,
  • Hongzhi Wang,
  • Hong Gao

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 23 June 2014

Author Tags

  1. Big data
  2. Scalability
  3. Stream processing
  4. Utility accrual

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 12 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2020)Resource Management and Scheduling in Distributed Stream Processing SystemsACM Computing Surveys10.1145/335539953:3(1-41)Online publication date: 28-May-2020

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media