This document discusses Uber's use of stream processing for their marketplace. It outlines several key use cases including real-time OLAP, complex event processing, and supply positioning. It then describes the challenges of processing large-scale geo-spatial temporal data in near real-time. The document proposes an overall architecture using Apache Kafka for event collection and Apache Samza for event processing. It notes some of the applications that can be built including dashboards, ad-hoc queries, and data visualizations. Finally, it discusses some trade-offs around using Lambda vs Kappa architectures and limitations of Samza.
Report
Share
Report
Share
1 of 85
Download to read offline
More Related Content
Streaming Processing in Uber Marketplace for Kafka Summit 2016
45. Natural Choice: Apache Kafka
- Low latency and high throughput
- Persistent events
- Distributes a topic by partitions
- Groups consumers by consumer groups
56. Why Apache Samza?
- DAG on Kafka
- Excellent integration with Kafka
- Built-in checkpointing
- Built-in state management
- Excellent support from our data team
81. We Use Lambda
- Spark + HDFS/S3 for batch processing
- Yes, it is painful, but
- We may need to go way back due to change of business
requirements
- Batch process can run faster — they scale differently
- It was not easy to start a new stream processing instance
84. Dealing with Limitation of Samza
-No broadcasting. We have to override
SystemStreamPartitionGrouper
-No dynamic topology. Can’t have arbitrary number of
nested CEP queries
-Tedious configuration and deployment of jobs. In house
code-gem and deployment solution