Storm is an open source distributed real-time computation system. It provides guarantees of processing data reliably in real-time. Storm allows for building real-time streaming data pipelines that process unbounded streams of data reliably. Key features include being distributed, fault-tolerant, guaranteeing message processing, and providing a high level abstraction over message passing.
2. Basic info
• Open sourced September 19th
• Implementation is 12,000 lines of code
• Used by over 25 companies
• >2280 watchers on Github (most watched
JVM project)
• Very active mailing list
• >1700 messages
• >520 members
32. Stream grouping
• Shuffle grouping: pick a random task
• Fields grouping: mod hashing on a
subset of tuple fields
• All grouping: send to all tasks
• Global grouping: pick task with lowest id
78. Transactional topologies
• Will be available in next version of Storm
(0.7.0)
• Requires a source queue that can replay
identical batches of messages
• Aiming for first TransactionalSpout
implementation to use Kafka