At Comcast, our team has been architecting a customer experience platform which is able to react to near-real-time events and interactions and deliver appropriate and timely communications to customers. By combining the low latency capabilities of Apache Flink and the dataflow capabilities of Apache NiFi we are able to process events at high volume to trigger, enrich, filter, and act/communicate to enhance customer experiences. Apache Flink and Apache NiFi complement each other with their strengths in event streaming and correlation, state management, command-and-control, parallelism, development methodology, and interoperability with surrounding technologies. We will trace our journey from starting with Apache NiFi over three years ago and our more recent introduction of Apache Flink into our platform stack to handle more complex scenarios. In this presentation we will compare and contrast which business and technical use cases are best suited to which platform and explore different ways to integrate the two platforms into a single solution.
3. 3
COMCAST CUSTOMER RELATIONSHIPS
30.7 MILLION OVERALL CUSTOMER
RELATIONSHIPS AS OF Q1 2019
INCLUDING:
27.6 MILLION HIGH-SPEED INTERNET
21.9 MILLION VIDEO
11.4 MILLION VOICE
ONE MILLION CUSTOMER NET
ADDITIONS IN 2018
4. 4
DELIVER THE ULTIMATE CUSTOMER EXPERIENCE
IS THE CUSTOMER HAVING A GOOD EXPERIENCE
WITH OUR PRODUCTS AND SERVICE?
IF THE CUSTOMER ENGAGES US DIGITALLY, CAN
WE OFFER A SELF-SERVICE EXPERIENCE?
GUIDE THE CUSTOMER THROUGH A JOURNEY WITH
DIGITAL COMMUNICATIONS
KEEP THE CUSTOMER INFORMED WITH THE RIGHT
MESSAGE TO THE RIGHT PERSON AT THE RIGHT
TIME
REDUCE TIME AND COST TO THE BUSINESS AND
THE CUSTOMER
5. How do we personalize
the conversation?
Comcast collects, stores, and uses all data in accordance with our privacy disclosures to users and applicable laws.
10. 1 0
EXAMPLE WITH SMS RESPONSES
FOLLOWING UP ON THE
INTERACTION:
Is the problem resolved?
If so, great!
If not, offer to talk with an
agent.
11. APACHE
Apache®, Apache NiFi®, and the NiFi logo are either registered
trademarks or trademarks of the Apache Software Foundation in the
United States and/or other countries.
12. 1 2
WHAT IS APACHE NIFI?
ENTERPRISE DATA FLOW…. GET STUFF FROM SOMEWHERE TO SOMEWHERE ELSE
Source Systems
FTP
HTTP
SFTP
Kafka
RabbitMQ
JDBC
Kinesis
S3
….
Destination
Systems
FTP
HTTP
SFTP
Kafka
RabbitMQ
JDBC
Kinesis
S3
….
Do Stuff!
Transform
Validate
Enrich
Protocol Conversion
….
350+ Processors,
Controllers, and
Reporting Tasks
14. 1 4
WHAT IS NIFI GOOD FOR?
ASYNCHRONOUS AND STATELESS STREAM PROCESSING
PROTOCOL CONVERSION
FORMAT CONVERSION AND TRANSFORMATION
PUSH AND PULL SCENARIOS E.G. FTP
LOTS OF DIFFERENT SOURCE AND SINK TYPES
MILD CONTENT ENRICHMENT
SERVICE CALLS / REST CALLS
JDBC / CACHE LOOKUP
RAPIDLY CHANGING BUSINESS LOGIC***
RAPID PROTOTYPING***
CONFIGURE RATHER THAN CODE ***
EXTENSIBILITY (SCRIPTING PROCESSORS, CUSTOM (JAVA) PROCESSORS)
15. 1 5
OUR TEAM’S HISTORY WITH NIFI
FIRST PRODUCTION WORKFLOW MAY 2016
RECENT SNAPSHOT:
• 65+ USE CASES
• 900+ PROCESS GROUPS
• 7400+ PROCESSORS
• 44000+ THREADS
• 12 NODE PRIMARY PRODUCTION CLUSTER (16VCPU/32GB)
18. APACHE FLINK
Apache®, Apache Flink®, and the squirrel logo are either registered
trademarks or trademarks of the Apache Software Foundation in the
United States and/or other countries.
19. 1 9
WHAT IS APACHE FLINK?
REAL-TIME STREAM PROCESSING FRAMEWORK
DISTRIBUTED PARALLEL COMPUTE ENGINE
SIMILAR API STYLE TO APACHE SPARK
LOW LATENCY, HIGH PERFORMANCE
STATEFUL
SOURCE Reduce
Filter
Join
SOURCE
Map
Sum SINK
20. 2 0
FLINK STREAMING API STYLES
TABLE / SQL API
SQL PROVIDED BY APACHE CALCITE
SELECTS, JOINS, GROUP-BY, AGGREGATIONS
WINDOWS
TIME AND COUNT
WINDOW-BASED JOINS
WINDOW-BASED AGGREGATIONS
TEMPORAL TABLES
UDF (USER-DEFINED FUNCTIONS)
DATASTREAM API
MAP / REDUCE / FOLD
FILTER
AGGREGATIONS (SUM, MIN, MAX)
WINDOWS
TIME AND COUNT
TUMBLING, SLIDING
STREAM UNION, JOIN, CO-MAP
ITERATIONS
NOTE: THERE IS ALSO A BATCH API
21. 2 1
EXAMPLE “WORD COUNT” CODE
DataStream<WordWithCount> windowCounts = textInputStream
.flatMap(new FlatMapFunction<String, WordWithCount>() {
public void flatMap(String value, Collector<WordWithCount> out) {
for (String word : value.split("s")) {
out.collect(new WordWithCount(word, 1L));
}}
})
.keyBy("word")
.timeWindow(Time.seconds(5))
.reduce(new ReduceFunction<WordWithCount>() {
public WordWithCount reduce(WordWithCount a, WordWithCount b) {
return new WordWithCount(a.word, a.count + b.count);
}
});
22. 2 2
WHAT IS FLINK GOOD FOR?
HIGH THROUGHPUT STREAM PROCESSING
“MAP / REDUCE” STYLE PARALLEL COMPUTING
STATEFUL PROCESSING
AGGREGATIONS AND TIME WINDOWS
MULTIPLE-STREAM OPERATIONS
SQL-ON-STREAM
HOWEVER…
LIMITED “ORCHESTRATION”
LIMITED SOURCE / SINK TYPES
24. 2 4
OUR TEAM’S HISTORY WITH FLINK
USED FOR 4+ DIFFERENT KINDS OF USE CASES
FIRST DEV – NOV 2016
FIRST PRODUCTION – MAY 2018
CUSTOMER EXPERIENCE USE CASE:
• 7 BILLION DATA POINTS PER DAY
PRODUCTION SIZE FOR ABOVE:
• 14 FLINK APPLICATION CLUSTERS
• 150 VMS
• 1100 VCPU
• 5.8 TB RAM
25. 2 5
NIFI / FLINK MAJOR DIFFERENCES
NiFi Flink
Distributed-capable Distributed by nature
Lineage, queues, buffering Straight-through processing
100’s of processor types Stream-oriented operators
Limited state processing Natively stateful if desired
UI-driven visual development Code / compiled / deployed
29. 2 9
START SIMPLE (EVENT, CONDITION, ACTION)
Trigger
Event
Producers
Notification
Services
Action
NEED MORE
INFORMATION
30. 3 0
STATELESS USE CASE
Trigger Enrich Filter
Enterprise
Services
(REST)
Event
Producers
Notification
Services
Action
31. 3 1
EXAMPLE: VIDEO ON DEMAND
EVENT:
RECEIVE “VIDEO ON DEMAND” MESSAGE
TRIGGER:
IF (PRICE > 5) AND (TYPE = ‘RENTAL’)
ENRICH:
PREFERRED COMMUNICATION (EMAIL OR SMS)
ACTION:
SEND CONFIRMATION EMAIL OR SMS
32. 3 2
NIFI VERSION
Consume
Events
Extract Attributes
Call Customer
Pref Service
Set SMS
Parameters
Set Email
Parameters
Logging
Metrics
Send to
Communication
Handlers
37. 3 7
ENRICHMENT DATA PLANE
Streaming Compute Pipeline
AWS
S3
HDFS
Data File Abstraction
Databases
MODEL
Streaming
State
Sum
Avg
Time
Buckets
Stream
Data
QUERY
Enterprise Services
Data Sets at Rest
38. 3 8
CALLING SERVICES - NIFI
INVOKEHTTP PROCESSOR
NIFI GOOD FOR
• REQUEST PREPARATION
• RESULT TRANSFORMATION
• HTTP ATTRIBUTE HANDLING
• FAILURE AND RETRY LOGIC
40. 4 0
FLINK METHOD FOR CALLING SERVICES
ASYNC I/O OPERATOR
WORKS WITH ASYNC-CAPABLE POOLS
• HTTP
• JDBC
CODE-YOUR-OWN
NO BUILT-IN RETRY CAPABILITY
TIMEOUTS CAN LEAD TO FLOW FAILURE
41. 4 1
FLINK CONNECTED STREAM PATTERN
REST Service
Connected Stream
Operator
5 Minute Global Window
Enrichment Handler
43. 4 3
WHAT IS “STATE”?
STATE
1
STATE
2
STATE
3
ACTION ON
ENTRY
ACTION ON
EXIT
TRANSITION
CONDITION
TRANSITION
CONDITION
STATE
TIMEOUT
44. 4 4
EXAMPLE STATEFUL JOURNEY
ORDER
PLACED
IN
TRANSIT
OUT
FOR
DELIVERY
“YOUR ORDER
IS ON ITS WAY”
“SORRY WE
MISSED YOU”
SHIPPED
PLACED ON
LOCAL
TRUCK
11PM
EXPIRE
45. 4 5
NIFI STATE
PROCESSOR STATE (LOCAL AND CLUSTERED)
BACKED BY ZOOKEEPER
PROCESSORS:
UPDATEATTRIBUTE (LOCAL ONLY)
ATTRIBUTEROLLINGWINDOW
“DISTRIBUTED” MAP CACHE
IN-MEMORY OR REDIS-BACKED (NEW IN 1.8)
NODE-LOCAL OR “SINGLE NODE” CENTRAL CACHE
PROCESSORS:
PUTDISTRIBUTEMAPCACHE, GETDISTRIBUTEDMAPCACHE
BEFORE NIFI 1.8: NO EASY PARTITIONING / SHARDING
1.8 AND LATER: NODE BALANCED CONNECTIONS
PARTITION BY ATTRIBUTE
CACHE != STATE
(but you can store
state in a cache)
46. 4 6
USING EXTERNAL STATE WITH NIFI
USE EXTERNAL DATABASE (E.G. MYSQL)
PERIODIC QUERY TO FIND EXPIRED TIMERS
BEWARE OF RACE CONDITIONS / FREQUENT UPDATES
49. 4 9
FLINK APPROACH TO STATE
KEYED (NODE LOCAL) STATE
WINDOWED OPERATIONS (E.G. 10 MINUTE WINDOW SLIDING BY 1 MINUTE)
EVERY OPERATOR CAN HAS ITS OWN STATE
QUERYABLE STATE
ROCKSDB (IN-MEMORY + DISK STORAGE)
CHECKPOINTS AND SAVEPOINTS TO DURABLE FILESYSTEM (HDFS, S3)
50. 5 0
NETWORK
IN
DISTRIBUTED FLINK STATE
KAFKABROKERS
PARTITION 1
PARTITION 2
PARTITION 3
PARTITION 4
PARTITION 5
PARTITION 6
FlinkKafkaConsumer
FlinkKafkaConsumer
FlinkKafkaConsumer
FlinkKafkaConsumer
FlinkKafkaConsumer
FlinkKafkaConsumer
NODE 1
NODE 2
NODE 3
KeyedStreamOperator
KeyedStreamOperator
KeyedStreamOperator
KeyedStreamOperator
KeyedStreamOperator
KeyedStreamOperator
P1
P2
P3
P4
NETWORK
OUT
keyBy()
SHUFFLE/SORT
STATE
STATE
STATE
STATE
STATE
STATE
Local
STATE
51. 5 1
WORKING WITH FLINK STATE
private transient MapState<String, String> myState;
public void open(Configuration config) {
MapStateDescriptor<String, String> descriptor =
new MapStateDescriptor<String, String>(
“myStateName", // the state name
String.class, String.class); // K/V types
//get the mapstate for the key
myState = getRuntimeContext().getMapState(descriptor);
}
public String map(String myField) {
String myValue = myState.get(myField);
myState.put(myField, myValue + “ another one”);
}
DECLARE VARIABLE
DESCRIPTOR
WITH TYPE
INFORMATION
INITIALIZE
STATE
READ/WRITE STATE
56. 5 6
SOLUTION APPROACH
FLINK AS THE HIGH VOLUME EVENT PROCESSOR
• MANY USE CASES WITH ONE STREAM
• SQL ON STREAM
FLINK-BASED TRIGGER, FILTER, ENRICHMENT REQUEST, AND
ACTION REQUEST
FLINK MANAGES CUSTOMER JOURNEY STATE
NIFI FOR:
NAMED “PROFILES” FOR ENRICHMENT SERVICES
NAMED “PROFILES” FOR NOTIFICATIONS AND ACTIONS
Configuration-based
use cases in Flink
Library of handlers
in NiFi
59. 5 9
HIGH LEVEL SOLUTION (WITH STATE)
Trigger
Enrichment
Orchestration
Filter
Enterprise
Services
(REST)
Event
Producers
Notification
Services
Action
Orchestration
Enrich Action
EVENT AND MESSAGE ORCHESTRATION
FLINK
LOCAL
STATE
Journey
State
Management
FLINK
STATE
60. 6 0
NIFI + FLINK SOLUTION SUMMARY
NIFI FOR SERVICES, DATAFLOW, AND TEXT HANDLING
FLINK FOR HIGH-PERFORMANCE STREAM PROCESSING
FLINK FOR COMMON PATTERNS – CONFIG DRIVEN
FLINK FOR STATE MANAGEMENT
DECOUPLED LIBRARY OF ENRICHMENT HANDLERS AND ACTION HANDLERS
61. 6 1
FUTURE WORK
FLINK + NIFI
SELF-SERVICE
USE CASE PORTAL
INCREASE
CATALOG OF
ACTIONS AND
ENRICHMENT
PROFILES
MOVE MORE
COMMON
CAPABILITIES
TO FLINK