Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2465351.2465353acmconferencesArticle/Chapter ViewAbstractPublication PageseurosysConference Proceedingsconference-collections
research-article

TimeStream: reliable stream computation in the cloud

Published: 15 April 2013 Publication History

Abstract

TimeStream is a distributed system designed specifically for low-latency continuous processing of big streaming data on a large cluster of commodity machines. The unique characteristics of this emerging application domain have led to a significantly different design from the popular MapReduce-style batch data processing. In particular, we advocate a powerful new abstraction called resilient substitution that caters to the specific needs in this new computation model to handle failure recovery and dynamic reconfiguration in response to load changes. Several real-world applications running on our prototype have been shown to scale robustly with low latency while at the same time maintaining the simple and concise declarative programming model. TimeStream handles an on-line advertising aggregation pipeline at a rate of 700,000 URLs per second with a 2-second delay, while performing sentiment analysis of Twitter data at a peak rate close to 10,000 tweets per second, with approximately 2-second delay.

References

[1]
Hadoop. http://hadoop.apache.org/.
[2]
Storm. https://github.com/nathanmarz/storm/wiki.
[3]
Trident. https://github.com/nathanmarz/storm/wiki/Trident-tutorial.
[4]
Streambase systems. http://streambase.com/.
[5]
Meijer, E., Beckman, B., and Bierman, G. Linq: Reconciling object, relations and xml in the .NET framework. In SIGMOD, 2006.
[6]
Ali, M. H., Gerea, C., Raman, B. S., Sezgin, B., Tarnavski, T., Verona, T., Wang, P., Zabback, P., Ananthanarayan, A., Kirilov, A., Lu, M., Raizman, A., Krishnan, R., Schindlauer, R., Grabs, T., Bjeletich, S., Chandramouli, B., Goldstein, J., Bhat, S., Li, Y., Di Nicola, V., Wang, X., Maier, D., Grell, S., Nano, O., and Santos, I. Microsoft CEP server and online behavioral targeting. In VLDB, 2009.
[7]
Andrade, H., Gedik, B., Wu, K. L., and Yu, P. S. Processing high data rate streams in system S. J. Parallel Distrib. Comput. 71, 2 (2011), 145--156.
[8]
Balazinska, M., Balakrishnan, H., Madden, S., and Stonebraker, M. Fault-tolerance in the Borealis distributed stream processing system. In SIGMOD 2005.
[9]
Barga, R., Goldstein, J., Ali, M., and Hong, M. Consistent streaming through time: A vision for event stream processing. In CIDR, 2007.
[10]
Dean, J., and Ghemawat, S. MapReduce: Simplified data processing on large clusters. In OSDI, 2004.
[11]
Bhatotia, P., Wieder, A., Rodrigues, R., Acar, U. A., and Pasquin, R. Incoop: MapReduce for incremental computations. In SOCC, 2011.
[12]
Biem, A., Bouillet, E., Feng, H., Ranganathan, A., Riabov, A., Verscheure, O., Koutsopoulos, H., and Moran, C. IBM InfoSphere Streams for scalable, real-time, intelligent transportation services. In SIGMOD, 2010.
[13]
Chaiken, R., Jenkins, B., Larson, P., Ramsey, B., Shakib, D., Weaver, S., and Zhou, J. Scope: Easy and efficient parallel processing of massive data sets. In VLDB, 2008.
[14]
Chambers, C., Raniwala, A., Perry, F., Adams, S., Henry, R. R., Bradshaw, R., and Weizenbaum, N. FlumeJava: Easy, efficient data-parallel pipelines. In PLDI, 2010.
[15]
Gunda, P. K., Ravindranath, L., Thekkath, C. A., Yu, Y., and Zhuang, L. Nectar: Automatic management of data and computation in datacenters. In OSDI, 2010.
[16]
Hunt, P., Konar, M., Junqueira, F. P., and Reed, B. Zookeeper: Wait-free coordination for internet-scale systems. In USENIXATC, 2010.
[17]
Hwang, J. H., Balazinska, M., Rasin, A., Cetintemel, U., Stonebraker, M., and Zdonik, S. High-availability algorithms for distributed stream processing. In ICDE, 2005.
[18]
Lamport, L. Paxos made simple, fast, and byzantine. In OPODIS, 2002.
[19]
Liu, C., Correa, R., Gill, H., Gill, T., Li, X., Muthukumar, S., Saeed, T., Loo, B. T., and Basu, P. Puma: Policy-based unified multi-radio architecture for agile mesh networking. In COMSNETS, 2012).
[20]
Neumeyer, L., Robbins, B., Nair, A., and Kesari, A. S4: Distributed stream computing platform. In ICDM Workshops, 2010.
[21]
Olston, C., Reed, B., Srivastava, U., Kumar, R., and Tomkins, A. Pig Latin: A not-so-foreign language for data processing. In SIGMOD, 2008.
[22]
Popa, L., Budiu, M., Yu, Y., and Isard, M. DryadInc: Reusing work in large-scale computations. In HotCloud, 2009.
[23]
Qian, Z., Chen, X., Kang, N., Chen, M., Yu, Y., Moscibroda, T., and Zhang, Z. MadLINQ: Large-scale distributed matrix computation for the cloud. In EuroSys, 2012.
[24]
Shah, M. A., Hellerstein, J. M., and Brewer, E. Highly available, fault-tolerant, parallel dataflows. In SIGMOD, 2004.
[25]
Thusoo, A., Sarma, J. S., Jain, N., Shao, Z., Chakka, P., Anthony, S., Liu, H., Wyckoff, P., and Murthy, R. Hive: A warehousing solution over a MapReduce framework. In VLDB, 2009.
[26]
Xing, Y., Zdonik, S., and Hwang, J. H. Dynamic load distribution in the Borealis stream processor. In ICDE, 2005.
[27]
Yu, Y., Isard, M., Fetterly, D., Budiu, M., Erlingsson, U., Gunda, P. K., and Currey, J. DryadLINQ: A system for general-purpose distributed data-parallel computing using a high-level language. In OSDI, 2008.
[28]
Zaharia, M., Das, T., Li, H., Shenker, S., and Stoica, I. Discretized Streams: An efficient and fault-tolerant model for stream processing on large clusters. In HotCloud, 2012.

Cited By

View all
  • (2024)Storm-Based Scheduling Method for Streaming Computing Engine2024 Prognostics and System Health Management Conference (PHM)10.1109/PHM61473.2024.00012(20-28)Online publication date: 28-May-2024
  • (2023)StreamOps: Cloud-Native Runtime Management for Streaming Services in ByteDanceProceedings of the VLDB Endowment10.14778/3611540.361154316:12(3501-3514)Online publication date: 1-Aug-2023
  • (2023)Adaptive Fragment-Based Parallel State Recovery for Stream Processing SystemsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.325199734:8(2464-2478)Online publication date: Aug-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
EuroSys '13: Proceedings of the 8th ACM European Conference on Computer Systems
April 2013
401 pages
ISBN:9781450319942
DOI:10.1145/2465351
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 April 2013

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. StreamInsight
  2. cluster computing
  3. distributed stream processing
  4. dynamic reconfiguration
  5. fault-tolerance
  6. real-time
  7. resilient substitution

Qualifiers

  • Research-article

Conference

EuroSys '13
Sponsor:
EuroSys '13: Eighth Eurosys Conference 2013
April 15 - 17, 2013
Prague, Czech Republic

Acceptance Rates

EuroSys '13 Paper Acceptance Rate 28 of 143 submissions, 20%;
Overall Acceptance Rate 241 of 1,308 submissions, 18%

Upcoming Conference

EuroSys '25
Twentieth European Conference on Computer Systems
March 30 - April 3, 2025
Rotterdam , Netherlands

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)33
  • Downloads (Last 6 weeks)1
Reflects downloads up to 13 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Storm-Based Scheduling Method for Streaming Computing Engine2024 Prognostics and System Health Management Conference (PHM)10.1109/PHM61473.2024.00012(20-28)Online publication date: 28-May-2024
  • (2023)StreamOps: Cloud-Native Runtime Management for Streaming Services in ByteDanceProceedings of the VLDB Endowment10.14778/3611540.361154316:12(3501-3514)Online publication date: 1-Aug-2023
  • (2023)Adaptive Fragment-Based Parallel State Recovery for Stream Processing SystemsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.325199734:8(2464-2478)Online publication date: Aug-2023
  • (2023)Interminable Flows: A Generic, Joint, Customizable Resiliency Model for Big-Data Streaming PlatformsIEEE Access10.1109/ACCESS.2023.323936511(10762-10776)Online publication date: 2023
  • (2023)State of the art on quality control for data streams: A systematic literature reviewComputer Science Review10.1016/j.cosrev.2023.10055448(100554)Online publication date: May-2023
  • (2023)A survey on the evolution of stream processing systemsThe VLDB Journal10.1007/s00778-023-00819-833:2(507-541)Online publication date: 22-Nov-2023
  • (2022)Programmable Implementation and Blockchain Security Scheme Based on Edge Computing Firework ModelResearch Anthology on Edge Computing Protocols, Applications, and Integration10.4018/978-1-6684-5700-9.ch024(480-503)Online publication date: 1-Apr-2022
  • (2022)ScabbardProceedings of the VLDB Endowment10.14778/3489496.348951515:2(361-374)Online publication date: 4-Feb-2022
  • (2022)SWANProceedings of the 13th ACM SIGOPS Asia-Pacific Workshop on Systems10.1145/3546591.3547524(78-84)Online publication date: 23-Aug-2022
  • (2022)An Adaptive Energy-Aware Stochastic Task Execution Algorithm in Virtualized Networked DatacentersIEEE Transactions on Sustainable Computing10.1109/TSUSC.2021.31153887:2(371-385)Online publication date: 1-Apr-2022
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media