Abstract
Support transaction property and fault-tolerance is the key to applying stream processing to industry-scale applications; however the corresponding latency overhead must be minimized for accommodating real-time analytics. This issue has been studied in various contexts. In this work we develop the backtrack failure recovery mechanism to allow a task to roll forward without waiting for acknowledgement from its downstream target tasks in the failure-free case, but to request its upstream source tasks to resend the missing tuples only during failure recovery which is the rare case thus has limited impact on the overall performance. For further reduced latency we extend our solution in another dimension by applying the notion of optimistic checkpointing to stream processing, and propose the Continued stream processing with Window - based Checkpoint and Recovery (CWCR) approach, allowing a task to emit results tuple by tuple continuously but checkpoint in batch, and acknowledge, only once per window (e.g. time window). We also tackle the hard problems found in implementing a transactional layer on-top of an existing stream processing platform. We have implemented the proposed mechanisms on Fontainebleau, the distributed stream analytics infrastructure we built on top of the open-sourced Storm platform. Our experiment results reveal the novelty of the proposed technologies and the feasibility to support fault-tolerance with minimal latency overhead for real-time stream processing.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Arasu, A., Babu, S., Widom, J.: The CQL continuous query language: semantic foundations and query execution. VLDB J 15(2), 121–142 (2006)
Abadi, D.J., et al.: The design of the Borealis stream processing engine. In: CIDR (2005)
Balazinska, M., Balakrishnan, H., Madden, S., Stonebraker, M.: Fault-Tolerance in the Borealis distributed stream processing system. In: SIGMOD 2005 (2005)
Botan, I., Fischer, P.M., Kossmann, D., Tatbu, N.: Transactional stream processing. In: EDBT 2012 (2012)
Johnson, D.B., Zwaenepoel, W.: Recovery in distributed systems using optimistic message logging and checkpointing. J. Algorithms 11, 462–491 (1990)
Chen, Q., Hsu, M., Zeller, H.: Experience in continuous analytics as a service. In: EDBT 2011 (2011)
Chen, Q., Hsu, M.: Experience in extending query engine for continuous analytics. In: Bach Pedersen, T., Mohania, M.K., Tjoa, A.M. (eds.) DAWAK 2010. LNCS, vol. 6263, pp. 190–202. Springer, Heidelberg (2010)
Chen, Q., Hsu, M.: Query engine net for streaming analytics. In: Proceedings of 19th International Conference on Cooperative Information Systems (CoopIS) (2011)
DeWitt, D.J., Paulson, E., Robinson, E., Naughton, J., Royalty, J., Shankar, S., Krioukov, A.: Clustera: an integrated computation and data management system. In: VLDB 2008 (2008)
Franklin, M.J., et al.: Continuous analytics: rethinking query processing in a network-effect world. In: CIDR 2009 (2009)
Gedik, B., Andrade, H., Wu, K.-L., Yu, P.S., Doo, M.C.: SPADE: the system S declarative stream processing engine. In: ACM SIGMOD 2008 (2008)
Hwang, J.-H., Balazinska, M., et al.: High-availability algorithms for distributed stream processing. In: Proceedings of ICDE 2005, Washington, DC, USA (2005)
Johnson, D.B., Zwaenepoel, W.: Recovery in distributed systems using optimistic message logging and checkpointing. J. Algorithms 11, 462–491 (1990)
Li, J., Karp, A.: Access control for the services oriented architecture. In: ACM Workshop on Secure Web Services (2007)
Shah, M.A., Hellerstein, J.M., Brewer, E.: Highly available, fault-tolerant, parallel datafows. In: Proceedings of SIGMOD, New York, USA (2004)
Prasad Sistla, A., Welch, J.L.: Efficient distributed recovery using message logging. In: Proceedings of the Eighth Annual ACM Symposium on Principles of Distributed Computing (1989)
Stiegler, M., Li, J., Kambatla, K., Karp, A.: Clusterken: A Reliable Object-Based Messaging Framework to Support Data Center Processing, HPL-2011–44 (2011)
Tweeter, Transactional topologies (2012). https://github.com/nathanmarz/storm/wiki/Transactional-topologies
Wang, Y.M., Fuchs, W.K.: Optimistic message logging for independent checkpointing in message-passing systems. In: IEEE Symposium on Reliable Distribution System, pp. 147–154 (1992)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Chen, Q., Hsu, M., Castellanos, M. (2015). Backtrack-Based and Window-Oriented Optimistic Failure Recovery in Distributed Stream Processing. In: Castellanos, M., Dayal, U., Pedersen, T., Tatbul, N. (eds) Enabling Real-Time Business Intelligence. BIRTE BIRTE 2014 2013. Lecture Notes in Business Information Processing, vol 206. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-46839-5_4
Download citation
DOI: https://doi.org/10.1007/978-3-662-46839-5_4
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-46838-8
Online ISBN: 978-3-662-46839-5
eBook Packages: Computer ScienceComputer Science (R0)