Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
SlideShare a Scribd company logo
Real-Time Data Streaming
with Databricks, Spark
and Power BI
Bennie Haelen
Principal Architect – Insight Digital Innovation
Use Case Description
• Large Metropolitan Fire Department
• Implemented a MDW architecture on Azure
• Based upon the Insight repeatable MDW framework architecture
Legend
RAW Ins-swdi-lens-aas
Azure
Automation
Ins-swdi-lens-lapp
PL_MT_raw2stage PL_processAAS
Dataflow
Workflow
PL_DATA_ORA_2_ADLS_FULL
DROPZONE
CSV file
1
2
4 7
8
9
Power BI
5
PL_MT_stage2mdw PL_DATA_mdw2asql
6
Ins-swdi-lens-asql
3
Ins-swdi-lens-adf
RAW/Archive
STAGE MDW
Oracle
.parquet
Workspace Folders
Storage Acct ins-swdi-lens-adls Databricks Hive Databases
Key Vaults
Ins-swdi-lens-email-lapp
Use Case Extension
• Need to add a real-time reporting channel
• Up-to-date location & status of equipment
• Location & status of firefighters, EMT personnel
• List of active incidents within the city
• Near real-time Visualization
• Automatically updating dashboard
• Map with automatic updates of locations and incidents
• Used by fire chiefs to make real-time move-up decisions
• Pre-emptively Move-up equipment & resources
Use Case Analysis
• Forwarding of events through the Azure Cloud
• ESB exposes a Web Sockets interface
• Azure function reads events from ESB through WebSockets interface
• Function forwards the events to the Azure cloud
• Function is hosted in a Web Application
Central FD Database
Ingest data from the
various event sources
Change Data Capture
Triggered with each
transactional operation
Enterprise Service Bus
CDC Ingest & forward
events to consumers
Solution
• Create Cloud ingest
• Real Time Stream processing
• Performant ACID Data Store
• Real-Time Visualization
`
Architectural Requirements
• Ingest Event Stream
• High ingestion rate (1000+ events per second)
• Need high-performance, fault tolerant service
• Stream Events, perform domain-specific conversions
• Need real-time streaming analytics
• Stored Processed Data in high-performant data store
• Keyed access to the data
• Ability to perform UPSERT operations
• Visualize the data in a real-time dashboard
• Updates triggered by data changes in the underlying data store
Solution Architecture
Ingestion Channel
Azure Event
Hubs
Event Processing
Databricks with Spark
Structured Streaming
Real-Time Data Store
Databricks Delta Lake
Visualization
Power BI Service
Dashboard
Ingest Event Stream
• High ingestion rate (1000+ events
per second)
• Need high-performance, fault
tolerant service
Azure Event Hubs
• Microsoft real-time data ingestion
engine
• Can ingest millions of events/second
• Kafka compatibility
Process Stream
• Continuous Processing
• Real time ingestion
• Micro-batch processing
Databricks on Azure
• Spark Structured Streaming
• Fault-tolerant Stream processing
engine
• Kafka compatibility
Real-Time Storage
• Keyed Access to Data
• Ability to perform UPSERTS
• Simple SQL-based access
Delta Lake
• ACID Transactions
• High Scalability
Real-Time Visualization
• Simple Integration
• Updates through Data Triggers
• Direct Query into Data Source
Microsoft Power BI
• Direct Query against Delta Lake
• Real-time dashboarding facilities
• Updates trigger through data
changes or push datasets
Demo Architecture
• nb-create-unitStatusTable notebook
Invokes the generic CreateDeltaTable with the
appropriate parameters to create our UnitStatus
table
• nb-create-delta-table notebook
Generic notebook which creates a Delta table
• nb-eventhub-spark-streaming notebook
reads the events from Event Hubs and invokes the
foreachBatch sink function implemented in nb-
unitstatus-event-processor notebook
• nb-unitstatus-event-processor
Processes the events, performs the transformations, and
finally updates our UnitStatusTable
Units-eh
Event Hub
C# .NET Console Application
nb-eventhub-spark-
streaming
Databricks Notebook
nb-unitstatus-
event-processor
Delta Table
old_stream_fd.
unit_status
Databricks Notebook
nb-create-unit-
status-table
Databricks Notebook
nb-create-delta-
table
Create Delta Table
unit_status
UPSERTS
Power BI Premium
Power BI Report
Streaming-
demo.eventsimulator
Databricks Notebook
Demo - Organization
Creation of
Delta Lake Table
Implementation Resources Walk Through
Spark Streaming
Notebook
Stream Processor
Function
Demo Run
Event Simulator
Demo 1 – Infrastructure Walkthrough
Demo 2 – Code Walkthrough
Demo 3 – Sample Run
Summary
• The need for large scale real-time stream processing
become more evident every day
• Provide organizations with the ability to respond quickly
to a dynamic business climate
• Spark Structured Streaming makes it easy to add a real-
time channel
• Simple extensions on top of Spark SQL
Feedback
Your feedback is important to us.
Don’t forget to rate and review the sessions.

More Related Content

Build Real-Time Applications with Databricks Streaming

  • 1. Real-Time Data Streaming with Databricks, Spark and Power BI Bennie Haelen Principal Architect – Insight Digital Innovation
  • 2. Use Case Description • Large Metropolitan Fire Department • Implemented a MDW architecture on Azure • Based upon the Insight repeatable MDW framework architecture Legend RAW Ins-swdi-lens-aas Azure Automation Ins-swdi-lens-lapp PL_MT_raw2stage PL_processAAS Dataflow Workflow PL_DATA_ORA_2_ADLS_FULL DROPZONE CSV file 1 2 4 7 8 9 Power BI 5 PL_MT_stage2mdw PL_DATA_mdw2asql 6 Ins-swdi-lens-asql 3 Ins-swdi-lens-adf RAW/Archive STAGE MDW Oracle .parquet Workspace Folders Storage Acct ins-swdi-lens-adls Databricks Hive Databases Key Vaults Ins-swdi-lens-email-lapp
  • 3. Use Case Extension • Need to add a real-time reporting channel • Up-to-date location & status of equipment • Location & status of firefighters, EMT personnel • List of active incidents within the city • Near real-time Visualization • Automatically updating dashboard • Map with automatic updates of locations and incidents • Used by fire chiefs to make real-time move-up decisions • Pre-emptively Move-up equipment & resources
  • 4. Use Case Analysis • Forwarding of events through the Azure Cloud • ESB exposes a Web Sockets interface • Azure function reads events from ESB through WebSockets interface • Function forwards the events to the Azure cloud • Function is hosted in a Web Application Central FD Database Ingest data from the various event sources Change Data Capture Triggered with each transactional operation Enterprise Service Bus CDC Ingest & forward events to consumers Solution • Create Cloud ingest • Real Time Stream processing • Performant ACID Data Store • Real-Time Visualization `
  • 5. Architectural Requirements • Ingest Event Stream • High ingestion rate (1000+ events per second) • Need high-performance, fault tolerant service • Stream Events, perform domain-specific conversions • Need real-time streaming analytics • Stored Processed Data in high-performant data store • Keyed access to the data • Ability to perform UPSERT operations • Visualize the data in a real-time dashboard • Updates triggered by data changes in the underlying data store
  • 6. Solution Architecture Ingestion Channel Azure Event Hubs Event Processing Databricks with Spark Structured Streaming Real-Time Data Store Databricks Delta Lake Visualization Power BI Service Dashboard Ingest Event Stream • High ingestion rate (1000+ events per second) • Need high-performance, fault tolerant service Azure Event Hubs • Microsoft real-time data ingestion engine • Can ingest millions of events/second • Kafka compatibility Process Stream • Continuous Processing • Real time ingestion • Micro-batch processing Databricks on Azure • Spark Structured Streaming • Fault-tolerant Stream processing engine • Kafka compatibility Real-Time Storage • Keyed Access to Data • Ability to perform UPSERTS • Simple SQL-based access Delta Lake • ACID Transactions • High Scalability Real-Time Visualization • Simple Integration • Updates through Data Triggers • Direct Query into Data Source Microsoft Power BI • Direct Query against Delta Lake • Real-time dashboarding facilities • Updates trigger through data changes or push datasets
  • 7. Demo Architecture • nb-create-unitStatusTable notebook Invokes the generic CreateDeltaTable with the appropriate parameters to create our UnitStatus table • nb-create-delta-table notebook Generic notebook which creates a Delta table • nb-eventhub-spark-streaming notebook reads the events from Event Hubs and invokes the foreachBatch sink function implemented in nb- unitstatus-event-processor notebook • nb-unitstatus-event-processor Processes the events, performs the transformations, and finally updates our UnitStatusTable Units-eh Event Hub C# .NET Console Application nb-eventhub-spark- streaming Databricks Notebook nb-unitstatus- event-processor Delta Table old_stream_fd. unit_status Databricks Notebook nb-create-unit- status-table Databricks Notebook nb-create-delta- table Create Delta Table unit_status UPSERTS Power BI Premium Power BI Report Streaming- demo.eventsimulator Databricks Notebook
  • 8. Demo - Organization Creation of Delta Lake Table Implementation Resources Walk Through Spark Streaming Notebook Stream Processor Function Demo Run Event Simulator
  • 9. Demo 1 – Infrastructure Walkthrough
  • 10. Demo 2 – Code Walkthrough
  • 11. Demo 3 – Sample Run
  • 12. Summary • The need for large scale real-time stream processing become more evident every day • Provide organizations with the ability to respond quickly to a dynamic business climate • Spark Structured Streaming makes it easy to add a real- time channel • Simple extensions on top of Spark SQL
  • 13. Feedback Your feedback is important to us. Don’t forget to rate and review the sessions.