This document discusses using Databricks, Spark, and Power BI for real-time data streaming. It describes a use case of a fire department needing real-time reporting of equipment locations, personnel statuses, and active incidents. The solution involves ingesting event data using Azure Event Hubs, processing the stream using Databricks and Spark Structured Streaming, storing the results in Delta Lake, and visualizing the data in Power BI dashboards. It then demonstrates the architecture by walking through creating Delta tables, streaming from Event Hubs to Delta Lake, and running a sample event simulator.
Report
Share
Report
Share
1 of 13
Download to read offline
More Related Content
Build Real-Time Applications with Databricks Streaming
1. Real-Time Data Streaming
with Databricks, Spark
and Power BI
Bennie Haelen
Principal Architect – Insight Digital Innovation
2. Use Case Description
• Large Metropolitan Fire Department
• Implemented a MDW architecture on Azure
• Based upon the Insight repeatable MDW framework architecture
Legend
RAW Ins-swdi-lens-aas
Azure
Automation
Ins-swdi-lens-lapp
PL_MT_raw2stage PL_processAAS
Dataflow
Workflow
PL_DATA_ORA_2_ADLS_FULL
DROPZONE
CSV file
1
2
4 7
8
9
Power BI
5
PL_MT_stage2mdw PL_DATA_mdw2asql
6
Ins-swdi-lens-asql
3
Ins-swdi-lens-adf
RAW/Archive
STAGE MDW
Oracle
.parquet
Workspace Folders
Storage Acct ins-swdi-lens-adls Databricks Hive Databases
Key Vaults
Ins-swdi-lens-email-lapp
3. Use Case Extension
• Need to add a real-time reporting channel
• Up-to-date location & status of equipment
• Location & status of firefighters, EMT personnel
• List of active incidents within the city
• Near real-time Visualization
• Automatically updating dashboard
• Map with automatic updates of locations and incidents
• Used by fire chiefs to make real-time move-up decisions
• Pre-emptively Move-up equipment & resources
4. Use Case Analysis
• Forwarding of events through the Azure Cloud
• ESB exposes a Web Sockets interface
• Azure function reads events from ESB through WebSockets interface
• Function forwards the events to the Azure cloud
• Function is hosted in a Web Application
Central FD Database
Ingest data from the
various event sources
Change Data Capture
Triggered with each
transactional operation
Enterprise Service Bus
CDC Ingest & forward
events to consumers
Solution
• Create Cloud ingest
• Real Time Stream processing
• Performant ACID Data Store
• Real-Time Visualization
`
5. Architectural Requirements
• Ingest Event Stream
• High ingestion rate (1000+ events per second)
• Need high-performance, fault tolerant service
• Stream Events, perform domain-specific conversions
• Need real-time streaming analytics
• Stored Processed Data in high-performant data store
• Keyed access to the data
• Ability to perform UPSERT operations
• Visualize the data in a real-time dashboard
• Updates triggered by data changes in the underlying data store
6. Solution Architecture
Ingestion Channel
Azure Event
Hubs
Event Processing
Databricks with Spark
Structured Streaming
Real-Time Data Store
Databricks Delta Lake
Visualization
Power BI Service
Dashboard
Ingest Event Stream
• High ingestion rate (1000+ events
per second)
• Need high-performance, fault
tolerant service
Azure Event Hubs
• Microsoft real-time data ingestion
engine
• Can ingest millions of events/second
• Kafka compatibility
Process Stream
• Continuous Processing
• Real time ingestion
• Micro-batch processing
Databricks on Azure
• Spark Structured Streaming
• Fault-tolerant Stream processing
engine
• Kafka compatibility
Real-Time Storage
• Keyed Access to Data
• Ability to perform UPSERTS
• Simple SQL-based access
Delta Lake
• ACID Transactions
• High Scalability
Real-Time Visualization
• Simple Integration
• Updates through Data Triggers
• Direct Query into Data Source
Microsoft Power BI
• Direct Query against Delta Lake
• Real-time dashboarding facilities
• Updates trigger through data
changes or push datasets
7. Demo Architecture
• nb-create-unitStatusTable notebook
Invokes the generic CreateDeltaTable with the
appropriate parameters to create our UnitStatus
table
• nb-create-delta-table notebook
Generic notebook which creates a Delta table
• nb-eventhub-spark-streaming notebook
reads the events from Event Hubs and invokes the
foreachBatch sink function implemented in nb-
unitstatus-event-processor notebook
• nb-unitstatus-event-processor
Processes the events, performs the transformations, and
finally updates our UnitStatusTable
Units-eh
Event Hub
C# .NET Console Application
nb-eventhub-spark-
streaming
Databricks Notebook
nb-unitstatus-
event-processor
Delta Table
old_stream_fd.
unit_status
Databricks Notebook
nb-create-unit-
status-table
Databricks Notebook
nb-create-delta-
table
Create Delta Table
unit_status
UPSERTS
Power BI Premium
Power BI Report
Streaming-
demo.eventsimulator
Databricks Notebook
8. Demo - Organization
Creation of
Delta Lake Table
Implementation Resources Walk Through
Spark Streaming
Notebook
Stream Processor
Function
Demo Run
Event Simulator
12. Summary
• The need for large scale real-time stream processing
become more evident every day
• Provide organizations with the ability to respond quickly
to a dynamic business climate
• Spark Structured Streaming makes it easy to add a real-
time channel
• Simple extensions on top of Spark SQL