Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
SlideShare a Scribd company logo
A new Flink state
primitive to boost
your application
Nico Kruber
David Anderson
Flink Forward
About me
Open source
● Apache Flink contributor/committer since 2016
○ Focus on network stack, usability, and performance
● PhD in Computer Science at HU Berlin / Zuse Institute Berlin
● Software engineer -> Solutions architect -> Head of Solutions Architecture
@ DataArtisans/Ververica (acquired by Alibaba)
● Engineering @ Immerok
About Immerok
● Building a fully managed Apache Flink cloud service
for powering real-time systems at any scale
○ immerok.com
● Motivation
● BinarySortedState
● Results & Future Work
Use Case: Stream Sort
30 35
7 6
Use Case: Stream Sort - Code
void processElement(Long event, /*...*/) {
TimerService timerSvc = ctx.timerService();
long ts = ctx.timestamp();
if (!isLate(ts, timerSvc)) {
List<Long> listAtTs = events.get(ts);
if (listAtTs == null) {
listAtTs = new ArrayList<>();
events.put(ts, listAtTs);
MapState<Long, List<Long>> events;
MapStateDescriptor<Long, List<Long>> desc =
new MapStateDescriptor<>(
events = getRuntimeContext().getMapState(desc);
void onTimer(long ts, /*...*/) {
List<Long> listAtTs = events.get(ts);
if (listAtTs == null) {
listAtTs = new ArrayList<>();
events.put(ts, listAtTs);
Use Case: Stream Sort - What’s
Happening Underneath
State (RocksDB)
full list
as byte[]
search memtable
+ sst files
for one entry
(Java) list
List<Long> listAtTs = events.get(ts);
if (listAtTs == null) {
listAtTs = new ArrayList<>();
events.put(ts, listAtTs);
Use Case: Stream Sort - What’s
Happening Underneath
State (RocksDB)
full Java list
serialized list
as byte[]
add new entry
to memtable
(leave old one for
Use Case: Stream Sort - Alternative Solutions
● Using MapState<Long, Event> instead of MapState<Long, List<Event>>?
○ Cannot handle multiple events per timestamp
● Using Window API?
○ Efficient event storage per timestamp
○ No really well-matching window types: sliding, tumbling, and session windows
● Using HashMapStateBackend?
○ No de-/serialization overhead
○ state limited by available memory
○ no incremental checkpoints
● Using ListState<Event> and filtering in onTimer()?
○ Reduced overhead in processElement() vs. more to do in onTimer()
{ts: 5, code: GBP,
rate: 1.20}
{ts: 10, code: USD,
rate: 1.00}
{ts: 19, code: USD,
rate: 1.02}
{ts: 15, code: USD,
amount: 1.00}
{ts: 10, code: GBP,
amount: 2.00}
{ts: 25, code: USD,
amount: 3.00}
Use Case: Event-Time Stream Join
{ts1: 15, ts2: 10,
amount: 1.00}
{ts1: 10, ts2: 5,
amount: 2.40}
{ts1: 25, ts2: 19,
amount: 3.06}
t.ts AS ts1, r.ts AS ts2,
t.amount * r.rate AS amount
FROM transactions AS t
ON t.code = r.code;
void onTimer(long ts, /*...*/) {
TreeSet<Entry<Long, Double>> rates =
ratesInRange(ts, rate.entries());
Double myRate = getLast(rates);
tx -> out.collect(
new Joined(myRate, tx)));
Use Case: Event-Time Stream Join - Code
void processElement1(Transaction value,/*...*/) {
TimerService timerSvc = ctx.timerService();
long ts = ctx.timestamp();
if (!isLate(ts, timerSvc)) {
addTransaction(ts, value);
// similar for processElement2()
MapState<Long, List<Transaction>>
MapState<Long, Double> rate;
void onTimer(long ts, /*...*/) {
TreeSet<Entry<Long, Double>> rates =
ratesInRange(ts, rate.entries());
Double myRate = getLast(rates);
tx -> out.collect(
new Joined(myRate, tx)));
With RocksDB:
● always fetching all rates’ (key+value) bytes ⚠
● (also need to fit into memory ⚠)
● deserialize all keys keys during iteration ⚠
○ not deserializing values (at least) ✓
Use Case: Event-Time Stream Join - Code
void processElement1(Transaction value,/*...*/) {
TimerService timerSvc = ctx.timerService();
long ts = ctx.timestamp();
if (!isLate(ts, timerSvc)) {
addTransaction(ts, value);
// similar for processElement2()
Similar to stream sort, with RocksDB:
● always fetch/write full lists ⚠
● always de-/serialize full list ⚠
● additional stress on RocksDB compaction ⚠
MapState<Long, List<Transaction>>
MapState<Long, Double> rate;
BinarySortedState - History
“Temporal state” Hackathon project (David + Nico + Seth)
● Main primitives: getAt(), getAtOrBefore(), getAtOrAfter(), add(),
addAll(), update()
Nov 2021
April 2022
started as a Hackathon project (David + Nico) on
custom windowing with process functions
Created FLIP-220 and discussed on dev@flink.apache.org
● Extended scope further to allow arbitrary user keys (not just timestamps)
○ Identified further use cases in SQL operators, e.g. min/max with retractions
● Clarified serializer requirements
● Extend proposed API to offer range read and clear operations
● …
● A new keyed-state primitive, built on top of ListState
● Efficiently add to list of values for a user-provided key
● Efficiently iterate user-keys in a well-defined sort order,
with native state-backend support, especially RocksDB
● Efficient operations for time-based functions
(windowing, sorting, event-time joins, custom, ...)
● Operations on subset of the state, based on user-key ranges
● Portable between state backends (RocksDB, HashMap)
BinarySortedState - Goals
BinarySortedState - API (subject to change!)
● Point-operations:
○ valuesAt(key)
○ add(key, value)
○ put(key, values)
○ clearEntryAt(key)
● Lookups:
○ firstEntry(), firstEntry(fromKey)
○ lastEntry(), lastEntry(UK toKey)
● Cleanup:
○ clear()
● Range-operations:
○ readRange(fromKey, toKey,
○ readRangeUntil(toKey, inclusiveToKey)
○ readRangeFrom(fromKey)
● Range-deletes:
○ clearEntryAt(key)
○ clearRange(
fromKey, toKey, inclusiveToKey)
○ clearRangeUntil(toKey, inclusiveToKey)
○ clearRangeFrom(fromKey)
BinarySortedState<UK, UV>
● RocksDB is a key-value store writing into MemTables → flushing into SST files
● SST files are sorted by the key in lexicographical binary order
(byte-wise unsigned comparison)
BinarySortedState - How does it work with RocksDB?!
● RocksDB offers Prefix Seek and SeekForPrev
● RocksDBMapState.RocksDBMapIterator provides efficient iteration via:
○ Fetching up to 128 RocksDB entries at once
○ RocksDBMapEntry with lazy key/value deserialization
BinarySortedState - LexicographicalTypeSerializer
● “Just” need to provide serializers that are compatible with RocksDB’s sort order
● Based on lexicographical binary order as defined by byte-wise unsigned comparison
● Compatible serializers extend LexicographicalTypeSerializer
public abstract class LexicographicalTypeSerializer<T> extends TypeSerializer<T> {
public Optional<Comparator<T>> findComparator() { return Optional.empty(); }
Stream Sort w/out BinarySortedState (1)
private BinarySortedState<Long, Long> events;
BinarySortedStateDescriptor<Long, Long> desc =
new BinarySortedStateDescriptor<>(
events = getRuntimeContext()
private MapState<Long, List<Long>> events;
MapStateDescriptor<Long, List<Long>> desc =
new MapStateDescriptor<>(
events = getRuntimeContext()
public void onTimer(long ts, /*...*/) {
public void onTimer(long ts, /*...*/) {
Stream Sort w/out BinarySortedState (2)
public void processElement(/*...*/) {
// ...
events.add(ts, event);
// ...
public void processElement(/*...*/) {
// ...
List<Long> listAtTs = events.get(ts);
if (listAtTs == null) {
listAtTs = new ArrayList<>();
events.put(ts, listAtTs);
// ...
events.add(ts, event);
Stream Sort with
BinarySortedState - What’s
Happening Underneath
State (RocksDB)
add new merge
op to memtable
entry as byte[]
new list
Stream Sort with
BinarySortedState - What’s
Happening Underneath
State (RocksDB)
full list
as byte[]
(Java) list
search memtable
+ sst files
for all entries
Mark k/v as
(removal during
Stream Sort with
BinarySortedState - What’s
Happening Underneath
State (RocksDB)
Event-Time Stream Join w/out BinarySortedState (1)
private BinarySortedState<Long, Transaction>
private BinarySortedState<Long, Double> rate;
private MapState<Long, List<Transaction>>
private MapState<Long, Double> rate;
public void processElement1(/*...*/) {
// ...
if (!isLate(ts, timerSvc)) {
// append to BinarySortedState:
transactions.add(ts, value);
// similar for processElement2()
public void processElement1(/*...*/) {
// ...
if (!isLate(ts, timerSvc)) {
// replace list in MapState:
addTransaction(ts, value);
// similar for processElement2()
Event-Time Stream Join w/out BinarySortedState (2)
public void onTimer(long ts, /*...*/) {
Entry<Long, Iterable<Double>> myRate =
Double rateVal = Optional
.map(e -> e.getValue().iterator().next())
tx -> out.collect(
new Joined(rateVal, tx)));
if (myRate != null) {
rate.clearRangeUntil(myRate.getKey(), false);
public void onTimer(long ts, /*...*/) {
TreeSet<Entry<Long, Double>> rates =
ratesInRange(ts, rate.entries());
Double myRate = Optional
tx -> out.collect(
new Joined(myRate, tx)));
Stream Sort with BinarySortedState - Optimized
● Idea:
○ Increase efficiency by processing all events between watermarks
● Challenge:
○ Registering a timer for the next watermark will fire too often
➔ Solution:
○ Register timer for the first unprocessed event
○ When the timer fires:
■ Process all events until the current watermark (not the timer timestamp!)
● events.readRangeUntil(currentWatermark, true)
Event-Time Stream Join with BinarySortedState - Optimized
● Idea:
○ Increase efficiency by processing all events between watermarks
● Challenge:
○ Registering a timer for the next watermark will fire too often
➔ Solution:
○ Same as Stream Sort: Timer for first unprocessed event, processing until watermark, but:
○ When the timer fires:
■ Iterate both, transactions and rate (in the appropriate time range) in event-time order
Results & Future Work
Stream Sort
Stream Sort
Stream Join
● (Custom) stream sorting
● Time-based (custom) joins
● Code with range-reads or bulk-deletes
● Custom window implementations
● Min/Max with retractions
● …
● Basically everything maintaining a MapState<?, List<?>> or requiring range operations
BinarySortedState - Who will benefit?
What’s left to do?
● Iron out last bits and pieces + tests
○ Start voting thread for FLIP-220 on dev@flink.apache.org
○ Create a PR and get it merged
● Expected to land in Flink 1.17 (as experimental feature)
● Port Table/SQL/DataStream operators to improve efficiency:
○ TemporalRowTimeJoinOperator (PoC already done for validating the API ✓)
○ RowTimeSortOperator
○ IntervalJoinOperator
○ CepOperator
○ …
● Provide more LexicographicalTypeSerializers
Get ready to ROK!!!
Nico Kruber

More Related Content

What's hot

Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Flink Forward
Dynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data AlertsDynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data Alerts
Flink Forward
Apache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraApache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native Era
Flink Forward
Tuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptxTuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptx
Flink Forward
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta LakeSimplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Stephan Ewen - Experiences running Flink at Very Large Scale
Stephan Ewen -  Experiences running Flink at Very Large ScaleStephan Ewen -  Experiences running Flink at Very Large Scale
Stephan Ewen - Experiences running Flink at Very Large Scale
Introducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorIntroducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes Operator
Flink Forward
Demystifying flink memory allocation and tuning - Roshan Naik, Uber
Demystifying flink memory allocation and tuning - Roshan Naik, UberDemystifying flink memory allocation and tuning - Roshan Naik, Uber
Demystifying flink memory allocation and tuning - Roshan Naik, Uber
Flink Forward
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Flink Forward
Fundamentals of Apache Kafka
Fundamentals of Apache KafkaFundamentals of Apache Kafka
Fundamentals of Apache Kafka
Chhavi Parasher
The top 3 challenges running multi-tenant Flink at scale
The top 3 challenges running multi-tenant Flink at scaleThe top 3 challenges running multi-tenant Flink at scale
The top 3 challenges running multi-tenant Flink at scale
Flink Forward
Presto on Apache Spark: A Tale of Two Computation Engines
Presto on Apache Spark: A Tale of Two Computation EnginesPresto on Apache Spark: A Tale of Two Computation Engines
Presto on Apache Spark: A Tale of Two Computation Engines
Flink powered stream processing platform at Pinterest
Flink powered stream processing platform at PinterestFlink powered stream processing platform at Pinterest
Flink powered stream processing platform at Pinterest
Flink Forward
Streaming SQL with Apache Calcite
Streaming SQL with Apache CalciteStreaming SQL with Apache Calcite
Streaming SQL with Apache Calcite
Julian Hyde
Apache Flink Worst Practices
Apache Flink Worst PracticesApache Flink Worst Practices
Apache Flink Worst Practices
Konstantin Knauf
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudAmazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Noritaka Sekiyama
Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...
Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...
Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...
Flink Forward
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production DeploymentUsing the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Flink Forward
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to Redis
Dvir Volk

What's hot (20)

Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data AlertsDynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data Alerts
Apache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraApache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native Era
Tuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptxTuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptx
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta LakeSimplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Stephan Ewen - Experiences running Flink at Very Large Scale
Stephan Ewen -  Experiences running Flink at Very Large ScaleStephan Ewen -  Experiences running Flink at Very Large Scale
Stephan Ewen - Experiences running Flink at Very Large Scale
Introducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorIntroducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes Operator
Demystifying flink memory allocation and tuning - Roshan Naik, Uber
Demystifying flink memory allocation and tuning - Roshan Naik, UberDemystifying flink memory allocation and tuning - Roshan Naik, Uber
Demystifying flink memory allocation and tuning - Roshan Naik, Uber
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Fundamentals of Apache Kafka
Fundamentals of Apache KafkaFundamentals of Apache Kafka
Fundamentals of Apache Kafka
The top 3 challenges running multi-tenant Flink at scale
The top 3 challenges running multi-tenant Flink at scaleThe top 3 challenges running multi-tenant Flink at scale
The top 3 challenges running multi-tenant Flink at scale
Presto on Apache Spark: A Tale of Two Computation Engines
Presto on Apache Spark: A Tale of Two Computation EnginesPresto on Apache Spark: A Tale of Two Computation Engines
Presto on Apache Spark: A Tale of Two Computation Engines
Flink powered stream processing platform at Pinterest
Flink powered stream processing platform at PinterestFlink powered stream processing platform at Pinterest
Flink powered stream processing platform at Pinterest
Streaming SQL with Apache Calcite
Streaming SQL with Apache CalciteStreaming SQL with Apache Calcite
Streaming SQL with Apache Calcite
Apache Flink Worst Practices
Apache Flink Worst PracticesApache Flink Worst Practices
Apache Flink Worst Practices
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudAmazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...
Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...
Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production DeploymentUsing the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to Redis

Similar to Introducing BinarySortedMultiMap - A new Flink state primitive to boost your application performance

Lambda at Weather Scale - Cassandra Summit 2015
Lambda at Weather Scale - Cassandra Summit 2015Lambda at Weather Scale - Cassandra Summit 2015
Lambda at Weather Scale - Cassandra Summit 2015
Robbie Strickland
So you think you can stream.pptx
So you think you can stream.pptxSo you think you can stream.pptx
So you think you can stream.pptx
Prakash Chockalingam
OpenTSDB 2.0
OpenTSDB 2.0OpenTSDB 2.0
OpenTSDB 2.0
Norikra: SQL Stream Processing In Ruby
Norikra: SQL Stream Processing In RubyNorikra: SQL Stream Processing In Ruby
Norikra: SQL Stream Processing In Ruby
Application Monitoring using Open Source: VictoriaMetrics - ClickHouse
Application Monitoring using Open Source: VictoriaMetrics - ClickHouseApplication Monitoring using Open Source: VictoriaMetrics - ClickHouse
Application Monitoring using Open Source: VictoriaMetrics - ClickHouse
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
Altinity Ltd
BigQuery case study in Groovenauts & Dive into the DataflowJavaSDK
BigQuery case study in Groovenauts & Dive into the DataflowJavaSDKBigQuery case study in Groovenauts & Dive into the DataflowJavaSDK
BigQuery case study in Groovenauts & Dive into the DataflowJavaSDK
nagachika t
WSO2Con USA 2015: WSO2 Analytics Platform - The One Stop Shop for All Your Da...
WSO2Con USA 2015: WSO2 Analytics Platform - The One Stop Shop for All Your Da...WSO2Con USA 2015: WSO2 Analytics Platform - The One Stop Shop for All Your Da...
WSO2Con USA 2015: WSO2 Analytics Platform - The One Stop Shop for All Your Da...
Deep Dive with Spark Streaming - Tathagata Das - Spark Meetup 2013-06-17
Deep Dive with Spark Streaming - Tathagata  Das - Spark Meetup 2013-06-17Deep Dive with Spark Streaming - Tathagata  Das - Spark Meetup 2013-06-17
Deep Dive with Spark Streaming - Tathagata Das - Spark Meetup 2013-06-17
RMLL 2013 - Synchronize OpenLDAP and Active Directory with LSC
RMLL 2013 - Synchronize OpenLDAP and Active Directory with LSCRMLL 2013 - Synchronize OpenLDAP and Active Directory with LSC
RMLL 2013 - Synchronize OpenLDAP and Active Directory with LSC
Clément OUDOT
Spark streaming: Best Practices
Spark streaming: Best PracticesSpark streaming: Best Practices
Spark streaming: Best Practices
Prakash Chockalingam
Webinar about Spring Data Neo4j 4
Webinar about Spring Data Neo4j 4Webinar about Spring Data Neo4j 4
Webinar about Spring Data Neo4j 4
SparkSQL: A Compiler from Queries to RDDs
SparkSQL: A Compiler from Queries to RDDsSparkSQL: A Compiler from Queries to RDDs
SparkSQL: A Compiler from Queries to RDDs
Java/Scala Lab: Анатолий Кметюк - Scala SubScript: Алгебра для реактивного пр...
Java/Scala Lab: Анатолий Кметюк - Scala SubScript: Алгебра для реактивного пр...Java/Scala Lab: Анатолий Кметюк - Scala SubScript: Алгебра для реактивного пр...
Java/Scala Lab: Анатолий Кметюк - Scala SubScript: Алгебра для реактивного пр...
GeeksLab Odessa
Introducing TiDB [Delivered: 09/27/18 at NYC SQL Meetup]
Introducing TiDB [Delivered: 09/27/18 at NYC SQL Meetup]Introducing TiDB [Delivered: 09/27/18 at NYC SQL Meetup]
Introducing TiDB [Delivered: 09/27/18 at NYC SQL Meetup]
Kevin Xu
WSO2Con ASIA 2016: WSO2 Analytics Platform: The One Stop Shop for All Your Da...
WSO2Con ASIA 2016: WSO2 Analytics Platform: The One Stop Shop for All Your Da...WSO2Con ASIA 2016: WSO2 Analytics Platform: The One Stop Shop for All Your Da...
WSO2Con ASIA 2016: WSO2 Analytics Platform: The One Stop Shop for All Your Da...
WSO2 Analytics Platform: The one stop shop for all your data needs
WSO2 Analytics Platform: The one stop shop for all your data needsWSO2 Analytics Platform: The one stop shop for all your data needs
WSO2 Analytics Platform: The one stop shop for all your data needs
Sriskandarajah Suhothayan
BDA403 How Netflix Monitors Applications in Real-time with Amazon Kinesis
BDA403 How Netflix Monitors Applications in Real-time with Amazon KinesisBDA403 How Netflix Monitors Applications in Real-time with Amazon Kinesis
BDA403 How Netflix Monitors Applications in Real-time with Amazon Kinesis
Amazon Web Services
Real-Time ETL in Practice with WSO2 Enterprise Integrator
Real-Time ETL in Practice with WSO2 Enterprise IntegratorReal-Time ETL in Practice with WSO2 Enterprise Integrator
Real-Time ETL in Practice with WSO2 Enterprise Integrator
Spark what's new what's coming
Spark what's new what's comingSpark what's new what's coming
Spark what's new what's coming

Similar to Introducing BinarySortedMultiMap - A new Flink state primitive to boost your application performance (20)

Lambda at Weather Scale - Cassandra Summit 2015
Lambda at Weather Scale - Cassandra Summit 2015Lambda at Weather Scale - Cassandra Summit 2015
Lambda at Weather Scale - Cassandra Summit 2015
So you think you can stream.pptx
So you think you can stream.pptxSo you think you can stream.pptx
So you think you can stream.pptx
OpenTSDB 2.0
OpenTSDB 2.0OpenTSDB 2.0
OpenTSDB 2.0
Norikra: SQL Stream Processing In Ruby
Norikra: SQL Stream Processing In RubyNorikra: SQL Stream Processing In Ruby
Norikra: SQL Stream Processing In Ruby
Application Monitoring using Open Source: VictoriaMetrics - ClickHouse
Application Monitoring using Open Source: VictoriaMetrics - ClickHouseApplication Monitoring using Open Source: VictoriaMetrics - ClickHouse
Application Monitoring using Open Source: VictoriaMetrics - ClickHouse
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
BigQuery case study in Groovenauts & Dive into the DataflowJavaSDK
BigQuery case study in Groovenauts & Dive into the DataflowJavaSDKBigQuery case study in Groovenauts & Dive into the DataflowJavaSDK
BigQuery case study in Groovenauts & Dive into the DataflowJavaSDK
WSO2Con USA 2015: WSO2 Analytics Platform - The One Stop Shop for All Your Da...
WSO2Con USA 2015: WSO2 Analytics Platform - The One Stop Shop for All Your Da...WSO2Con USA 2015: WSO2 Analytics Platform - The One Stop Shop for All Your Da...
WSO2Con USA 2015: WSO2 Analytics Platform - The One Stop Shop for All Your Da...
Deep Dive with Spark Streaming - Tathagata Das - Spark Meetup 2013-06-17
Deep Dive with Spark Streaming - Tathagata  Das - Spark Meetup 2013-06-17Deep Dive with Spark Streaming - Tathagata  Das - Spark Meetup 2013-06-17
Deep Dive with Spark Streaming - Tathagata Das - Spark Meetup 2013-06-17
RMLL 2013 - Synchronize OpenLDAP and Active Directory with LSC
RMLL 2013 - Synchronize OpenLDAP and Active Directory with LSCRMLL 2013 - Synchronize OpenLDAP and Active Directory with LSC
RMLL 2013 - Synchronize OpenLDAP and Active Directory with LSC
Spark streaming: Best Practices
Spark streaming: Best PracticesSpark streaming: Best Practices
Spark streaming: Best Practices
Webinar about Spring Data Neo4j 4
Webinar about Spring Data Neo4j 4Webinar about Spring Data Neo4j 4
Webinar about Spring Data Neo4j 4
SparkSQL: A Compiler from Queries to RDDs
SparkSQL: A Compiler from Queries to RDDsSparkSQL: A Compiler from Queries to RDDs
SparkSQL: A Compiler from Queries to RDDs
Java/Scala Lab: Анатолий Кметюк - Scala SubScript: Алгебра для реактивного пр...
Java/Scala Lab: Анатолий Кметюк - Scala SubScript: Алгебра для реактивного пр...Java/Scala Lab: Анатолий Кметюк - Scala SubScript: Алгебра для реактивного пр...
Java/Scala Lab: Анатолий Кметюк - Scala SubScript: Алгебра для реактивного пр...
Introducing TiDB [Delivered: 09/27/18 at NYC SQL Meetup]
Introducing TiDB [Delivered: 09/27/18 at NYC SQL Meetup]Introducing TiDB [Delivered: 09/27/18 at NYC SQL Meetup]
Introducing TiDB [Delivered: 09/27/18 at NYC SQL Meetup]
WSO2Con ASIA 2016: WSO2 Analytics Platform: The One Stop Shop for All Your Da...
WSO2Con ASIA 2016: WSO2 Analytics Platform: The One Stop Shop for All Your Da...WSO2Con ASIA 2016: WSO2 Analytics Platform: The One Stop Shop for All Your Da...
WSO2Con ASIA 2016: WSO2 Analytics Platform: The One Stop Shop for All Your Da...
WSO2 Analytics Platform: The one stop shop for all your data needs
WSO2 Analytics Platform: The one stop shop for all your data needsWSO2 Analytics Platform: The one stop shop for all your data needs
WSO2 Analytics Platform: The one stop shop for all your data needs
BDA403 How Netflix Monitors Applications in Real-time with Amazon Kinesis
BDA403 How Netflix Monitors Applications in Real-time with Amazon KinesisBDA403 How Netflix Monitors Applications in Real-time with Amazon Kinesis
BDA403 How Netflix Monitors Applications in Real-time with Amazon Kinesis
Real-Time ETL in Practice with WSO2 Enterprise Integrator
Real-Time ETL in Practice with WSO2 Enterprise IntegratorReal-Time ETL in Practice with WSO2 Enterprise Integrator
Real-Time ETL in Practice with WSO2 Enterprise Integrator
Spark what's new what's coming
Spark what's new what's comingSpark what's new what's coming
Spark what's new what's coming

More from Flink Forward

“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
Flink Forward
One sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async SinkOne sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async Sink
Flink Forward
Flink SQL on Pulsar made easy
Flink SQL on Pulsar made easyFlink SQL on Pulsar made easy
Flink SQL on Pulsar made easy
Flink Forward
Processing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial ServicesProcessing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial Services
Flink Forward
Welcome to the Flink Community!
Welcome to the Flink Community!Welcome to the Flink Community!
Welcome to the Flink Community!
Flink Forward
Practical learnings from running thousands of Flink jobs
Practical learnings from running thousands of Flink jobsPractical learnings from running thousands of Flink jobs
Practical learnings from running thousands of Flink jobs
Flink Forward
Extending Flink SQL for stream processing use cases
Extending Flink SQL for stream processing use casesExtending Flink SQL for stream processing use cases
Extending Flink SQL for stream processing use cases
Flink Forward
Using Queryable State for Fun and Profit
Using Queryable State for Fun and ProfitUsing Queryable State for Fun and Profit
Using Queryable State for Fun and Profit
Flink Forward
Changelog Stream Processing with Apache Flink
Changelog Stream Processing with Apache FlinkChangelog Stream Processing with Apache Flink
Changelog Stream Processing with Apache Flink
Flink Forward
Large Scale Real Time Fraudulent Web Behavior Detection
Large Scale Real Time Fraudulent Web Behavior DetectionLarge Scale Real Time Fraudulent Web Behavior Detection
Large Scale Real Time Fraudulent Web Behavior Detection
Flink Forward
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Flink Forward
Building Reliable Lakehouses with Apache Flink and Delta Lake
Building Reliable Lakehouses with Apache Flink and Delta LakeBuilding Reliable Lakehouses with Apache Flink and Delta Lake
Building Reliable Lakehouses with Apache Flink and Delta Lake
Flink Forward
Near real-time statistical modeling and anomaly detection using Flink!
Near real-time statistical modeling and anomaly detection using Flink!Near real-time statistical modeling and anomaly detection using Flink!
Near real-time statistical modeling and anomaly detection using Flink!
Flink Forward

More from Flink Forward (13)

“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
One sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async SinkOne sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async Sink
Flink SQL on Pulsar made easy
Flink SQL on Pulsar made easyFlink SQL on Pulsar made easy
Flink SQL on Pulsar made easy
Processing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial ServicesProcessing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial Services
Welcome to the Flink Community!
Welcome to the Flink Community!Welcome to the Flink Community!
Welcome to the Flink Community!
Practical learnings from running thousands of Flink jobs
Practical learnings from running thousands of Flink jobsPractical learnings from running thousands of Flink jobs
Practical learnings from running thousands of Flink jobs
Extending Flink SQL for stream processing use cases
Extending Flink SQL for stream processing use casesExtending Flink SQL for stream processing use cases
Extending Flink SQL for stream processing use cases
Using Queryable State for Fun and Profit
Using Queryable State for Fun and ProfitUsing Queryable State for Fun and Profit
Using Queryable State for Fun and Profit
Changelog Stream Processing with Apache Flink
Changelog Stream Processing with Apache FlinkChangelog Stream Processing with Apache Flink
Changelog Stream Processing with Apache Flink
Large Scale Real Time Fraudulent Web Behavior Detection
Large Scale Real Time Fraudulent Web Behavior DetectionLarge Scale Real Time Fraudulent Web Behavior Detection
Large Scale Real Time Fraudulent Web Behavior Detection
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Building Reliable Lakehouses with Apache Flink and Delta Lake
Building Reliable Lakehouses with Apache Flink and Delta LakeBuilding Reliable Lakehouses with Apache Flink and Delta Lake
Building Reliable Lakehouses with Apache Flink and Delta Lake
Near real-time statistical modeling and anomaly detection using Flink!
Near real-time statistical modeling and anomaly detection using Flink!Near real-time statistical modeling and anomaly detection using Flink!
Near real-time statistical modeling and anomaly detection using Flink!

Recently uploaded

Blue Screen Of Death | Windows Down | Biggest IT failure
Blue Screen Of Death | Windows Down | Biggest IT failureBlue Screen Of Death | Windows Down | Biggest IT failure
Blue Screen Of Death | Windows Down | Biggest IT failure
Dexbytes Infotech Pvt Ltd
AMD Zen 5 Architecture Deep Dive from Tech Day
AMD Zen 5 Architecture Deep Dive from Tech DayAMD Zen 5 Architecture Deep Dive from Tech Day
AMD Zen 5 Architecture Deep Dive from Tech Day
Low Hong Chuan
TrustArc Webinar - Innovating with TRUSTe Responsible AI Certification
TrustArc Webinar - Innovating with TRUSTe Responsible AI CertificationTrustArc Webinar - Innovating with TRUSTe Responsible AI Certification
TrustArc Webinar - Innovating with TRUSTe Responsible AI Certification
Epicor Kinetic REST API Services Overview.pptx
Epicor Kinetic REST API Services Overview.pptxEpicor Kinetic REST API Services Overview.pptx
Epicor Kinetic REST API Services Overview.pptx
Piyush Khalate
Increase Quality with User Access Policies - July 2024
Increase Quality with User Access Policies - July 2024Increase Quality with User Access Policies - July 2024
Increase Quality with User Access Policies - July 2024
Peter Caitens
Getting Ready for Copilot for Microsoft 365 with Governance Features in Share...
Getting Ready for Copilot for Microsoft 365 with Governance Features in Share...Getting Ready for Copilot for Microsoft 365 with Governance Features in Share...
Getting Ready for Copilot for Microsoft 365 with Governance Features in Share...
Juan Carlos Gonzalez
Webinar: Transforming Substation Automation with Open Source Solutions
Webinar: Transforming Substation Automation with Open Source SolutionsWebinar: Transforming Substation Automation with Open Source Solutions
Webinar: Transforming Substation Automation with Open Source Solutions
IVE 2024 Short Course Lecture 9 - Empathic Computing in VR
IVE 2024 Short Course Lecture 9 - Empathic Computing in VRIVE 2024 Short Course Lecture 9 - Empathic Computing in VR
IVE 2024 Short Course Lecture 9 - Empathic Computing in VR
Mark Billinghurst
Mega MUG 2024: Working smarter in Marketo
Mega MUG 2024: Working smarter in MarketoMega MUG 2024: Working smarter in Marketo
Mega MUG 2024: Working smarter in Marketo
Stephanie Tyagita
Leading Bigcommerce Development Services for Online Retailers
Leading Bigcommerce Development Services for Online RetailersLeading Bigcommerce Development Services for Online Retailers
Leading Bigcommerce Development Services for Online Retailers
Yury Chemerkin
STKI Israeli IT Market Study v2 August 2024.pdf
STKI Israeli IT Market Study v2 August 2024.pdfSTKI Israeli IT Market Study v2 August 2024.pdf
STKI Israeli IT Market Study v2 August 2024.pdf
Dr. Jimmy Schwarzkopf
Connecting Attitudes and Social Influences with Designs for Usable Security a...
Connecting Attitudes and Social Influences with Designs for Usable Security a...Connecting Attitudes and Social Influences with Designs for Usable Security a...
Connecting Attitudes and Social Influences with Designs for Usable Security a...
Cori Faklaris
IVE 2024 Short Course - Lecture 8 - Electroencephalography (EEG) Basics
IVE 2024 Short Course - Lecture 8 - Electroencephalography (EEG) BasicsIVE 2024 Short Course - Lecture 8 - Electroencephalography (EEG) Basics
IVE 2024 Short Course - Lecture 8 - Electroencephalography (EEG) Basics
Mark Billinghurst
Planetek Italia Corporate Profile Brochure
Planetek Italia Corporate Profile BrochurePlanetek Italia Corporate Profile Brochure
Planetek Italia Corporate Profile Brochure
Planetek Italia Srl
FIDO Munich Seminar: FIDO Tech Principles.pptx
FIDO Munich Seminar: FIDO Tech Principles.pptxFIDO Munich Seminar: FIDO Tech Principles.pptx
FIDO Munich Seminar: FIDO Tech Principles.pptx
FIDO Alliance
Scientific-Based Blockchain TON Project Analysis Report
Scientific-Based Blockchain  TON Project Analysis ReportScientific-Based Blockchain  TON Project Analysis Report
Scientific-Based Blockchain TON Project Analysis Report
Bài tập tiếng anh lớp 9 - Ôn tập tuyển sinh
Bài tập tiếng anh lớp 9 - Ôn tập tuyển sinhBài tập tiếng anh lớp 9 - Ôn tập tuyển sinh
Bài tập tiếng anh lớp 9 - Ôn tập tuyển sinh
FIDO Munich Seminar Workforce Authentication Case Study.pptx
FIDO Munich Seminar Workforce Authentication Case Study.pptxFIDO Munich Seminar Workforce Authentication Case Study.pptx
FIDO Munich Seminar Workforce Authentication Case Study.pptx
FIDO Alliance
Jacquard Fabric Explained: Origins, Characteristics, and Uses
Jacquard Fabric Explained: Origins, Characteristics, and UsesJacquard Fabric Explained: Origins, Characteristics, and Uses
Jacquard Fabric Explained: Origins, Characteristics, and Uses

Recently uploaded (20)

Blue Screen Of Death | Windows Down | Biggest IT failure
Blue Screen Of Death | Windows Down | Biggest IT failureBlue Screen Of Death | Windows Down | Biggest IT failure
Blue Screen Of Death | Windows Down | Biggest IT failure
AMD Zen 5 Architecture Deep Dive from Tech Day
AMD Zen 5 Architecture Deep Dive from Tech DayAMD Zen 5 Architecture Deep Dive from Tech Day
AMD Zen 5 Architecture Deep Dive from Tech Day
TrustArc Webinar - Innovating with TRUSTe Responsible AI Certification
TrustArc Webinar - Innovating with TRUSTe Responsible AI CertificationTrustArc Webinar - Innovating with TRUSTe Responsible AI Certification
TrustArc Webinar - Innovating with TRUSTe Responsible AI Certification
Epicor Kinetic REST API Services Overview.pptx
Epicor Kinetic REST API Services Overview.pptxEpicor Kinetic REST API Services Overview.pptx
Epicor Kinetic REST API Services Overview.pptx
Increase Quality with User Access Policies - July 2024
Increase Quality with User Access Policies - July 2024Increase Quality with User Access Policies - July 2024
Increase Quality with User Access Policies - July 2024
Getting Ready for Copilot for Microsoft 365 with Governance Features in Share...
Getting Ready for Copilot for Microsoft 365 with Governance Features in Share...Getting Ready for Copilot for Microsoft 365 with Governance Features in Share...
Getting Ready for Copilot for Microsoft 365 with Governance Features in Share...
Webinar: Transforming Substation Automation with Open Source Solutions
Webinar: Transforming Substation Automation with Open Source SolutionsWebinar: Transforming Substation Automation with Open Source Solutions
Webinar: Transforming Substation Automation with Open Source Solutions
IVE 2024 Short Course Lecture 9 - Empathic Computing in VR
IVE 2024 Short Course Lecture 9 - Empathic Computing in VRIVE 2024 Short Course Lecture 9 - Empathic Computing in VR
IVE 2024 Short Course Lecture 9 - Empathic Computing in VR
Mega MUG 2024: Working smarter in Marketo
Mega MUG 2024: Working smarter in MarketoMega MUG 2024: Working smarter in Marketo
Mega MUG 2024: Working smarter in Marketo
Leading Bigcommerce Development Services for Online Retailers
Leading Bigcommerce Development Services for Online RetailersLeading Bigcommerce Development Services for Online Retailers
Leading Bigcommerce Development Services for Online Retailers
STKI Israeli IT Market Study v2 August 2024.pdf
STKI Israeli IT Market Study v2 August 2024.pdfSTKI Israeli IT Market Study v2 August 2024.pdf
STKI Israeli IT Market Study v2 August 2024.pdf
Connecting Attitudes and Social Influences with Designs for Usable Security a...
Connecting Attitudes and Social Influences with Designs for Usable Security a...Connecting Attitudes and Social Influences with Designs for Usable Security a...
Connecting Attitudes and Social Influences with Designs for Usable Security a...
IVE 2024 Short Course - Lecture 8 - Electroencephalography (EEG) Basics
IVE 2024 Short Course - Lecture 8 - Electroencephalography (EEG) BasicsIVE 2024 Short Course - Lecture 8 - Electroencephalography (EEG) Basics
IVE 2024 Short Course - Lecture 8 - Electroencephalography (EEG) Basics
Planetek Italia Corporate Profile Brochure
Planetek Italia Corporate Profile BrochurePlanetek Italia Corporate Profile Brochure
Planetek Italia Corporate Profile Brochure
FIDO Munich Seminar: FIDO Tech Principles.pptx
FIDO Munich Seminar: FIDO Tech Principles.pptxFIDO Munich Seminar: FIDO Tech Principles.pptx
FIDO Munich Seminar: FIDO Tech Principles.pptx
Scientific-Based Blockchain TON Project Analysis Report
Scientific-Based Blockchain  TON Project Analysis ReportScientific-Based Blockchain  TON Project Analysis Report
Scientific-Based Blockchain TON Project Analysis Report
Bài tập tiếng anh lớp 9 - Ôn tập tuyển sinh
Bài tập tiếng anh lớp 9 - Ôn tập tuyển sinhBài tập tiếng anh lớp 9 - Ôn tập tuyển sinh
Bài tập tiếng anh lớp 9 - Ôn tập tuyển sinh
FIDO Munich Seminar Workforce Authentication Case Study.pptx
FIDO Munich Seminar Workforce Authentication Case Study.pptxFIDO Munich Seminar Workforce Authentication Case Study.pptx
FIDO Munich Seminar Workforce Authentication Case Study.pptx
Jacquard Fabric Explained: Origins, Characteristics, and Uses
Jacquard Fabric Explained: Origins, Characteristics, and UsesJacquard Fabric Explained: Origins, Characteristics, and Uses
Jacquard Fabric Explained: Origins, Characteristics, and Uses

Introducing BinarySortedMultiMap - A new Flink state primitive to boost your application performance

  • 1. Introducing BinarySortedState A new Flink state primitive to boost your application performance Nico Kruber – Software Engineer —— David Anderson – Community Engineering – Flink Forward 22
  • 2. About me Open source ● Apache Flink contributor/committer since 2016 ○ Focus on network stack, usability, and performance Career ● PhD in Computer Science at HU Berlin / Zuse Institute Berlin ● Software engineer -> Solutions architect -> Head of Solutions Architecture @ DataArtisans/Ververica (acquired by Alibaba) ● Engineering @ Immerok About Immerok ● Building a fully managed Apache Flink cloud service for powering real-time systems at any scale ○ immerok.com 2
  • 5. Use Case: Stream Sort 20 15 11 30 35 27 21 40 41 35 1 2 5 7 6 9 10 11 13 14
  • 6. Use Case: Stream Sort - Code void processElement(Long event, /*...*/) { TimerService timerSvc = ctx.timerService(); long ts = ctx.timestamp(); if (!isLate(ts, timerSvc)) { List<Long> listAtTs = events.get(ts); if (listAtTs == null) { listAtTs = new ArrayList<>(); } listAtTs.add(event); events.put(ts, listAtTs); timerSvc.registerEventTimeTimer(ts); } } MapState<Long, List<Long>> events; MapStateDescriptor<Long, List<Long>> desc = new MapStateDescriptor<>( "Events", Types.LONG, Types.LIST(Types.LONG)); events = getRuntimeContext().getMapState(desc); void onTimer(long ts, /*...*/) { events.get(ts).forEach(out::collect); events.remove(ts); }
  • 7. List<Long> listAtTs = events.get(ts); if (listAtTs == null) { listAtTs = new ArrayList<>(); } listAtTs.add(event); events.put(ts, listAtTs); Use Case: Stream Sort - What’s Happening Underneath State (RocksDB) De-/Serializer full list as byte[] search memtable + sst files for one entry lookup deserialized (Java) list
  • 8. List<Long> listAtTs = events.get(ts); if (listAtTs == null) { listAtTs = new ArrayList<>(); } listAtTs.add(event); events.put(ts, listAtTs); Use Case: Stream Sort - What’s Happening Underneath State (RocksDB) De-/Serializer full Java list serialized list as byte[] add new entry to memtable (leave old one for compaction)
  • 9. Use Case: Stream Sort - Alternative Solutions ● Using MapState<Long, Event> instead of MapState<Long, List<Event>>? ○ Cannot handle multiple events per timestamp ● Using Window API? ○ Efficient event storage per timestamp ○ No really well-matching window types: sliding, tumbling, and session windows ● Using HashMapStateBackend? ○ No de-/serialization overhead ○ state limited by available memory ○ no incremental checkpoints ● Using ListState<Event> and filtering in onTimer()? ○ Reduced overhead in processElement() vs. more to do in onTimer()
  • 10. {ts: 5, code: GBP, rate: 1.20} {ts: 10, code: USD, rate: 1.00} {ts: 19, code: USD, rate: 1.02} rates {ts: 15, code: USD, amount: 1.00} {ts: 10, code: GBP, amount: 2.00} {ts: 25, code: USD, amount: 3.00} transactions Use Case: Event-Time Stream Join {ts1: 15, ts2: 10, amount: 1.00} {ts1: 10, ts2: 5, amount: 2.40} {ts1: 25, ts2: 19, amount: 3.06} SELECT t.ts AS ts1, r.ts AS ts2, t.amount * r.rate AS amount FROM transactions AS t LEFT JOIN rates FOR SYSTEM_TIME AS OF t.ts AS r ON t.code = r.code;
  • 11. void onTimer(long ts, /*...*/) { TreeSet<Entry<Long, Double>> rates = ratesInRange(ts, rate.entries()); Double myRate = getLast(rates); transactions.get(ts).forEach( tx -> out.collect( new Joined(myRate, tx))); deleteAllButLast(rates); transactions.remove(ts); } Use Case: Event-Time Stream Join - Code void processElement1(Transaction value,/*...*/) { TimerService timerSvc = ctx.timerService(); long ts = ctx.timestamp(); if (!isLate(ts, timerSvc)) { addTransaction(ts, value); timerSvc.registerEventTimeTimer(ts); } } // similar for processElement2() MapState<Long, List<Transaction>> transactions; MapState<Long, Double> rate;
  • 12. void onTimer(long ts, /*...*/) { TreeSet<Entry<Long, Double>> rates = ratesInRange(ts, rate.entries()); Double myRate = getLast(rates); transactions.get(ts).forEach( tx -> out.collect( new Joined(myRate, tx))); deleteAllButLast(rates); transactions.remove(ts); } With RocksDB: ● always fetching all rates’ (key+value) bytes ⚠ ● (also need to fit into memory ⚠) ● deserialize all keys keys during iteration ⚠ ○ not deserializing values (at least) ✓ Use Case: Event-Time Stream Join - Code void processElement1(Transaction value,/*...*/) { TimerService timerSvc = ctx.timerService(); long ts = ctx.timestamp(); if (!isLate(ts, timerSvc)) { addTransaction(ts, value); timerSvc.registerEventTimeTimer(ts); } } // similar for processElement2() Similar to stream sort, with RocksDB: ● always fetch/write full lists ⚠ ● always de-/serialize full list ⚠ ● additional stress on RocksDB compaction ⚠ MapState<Long, List<Transaction>> transactions; MapState<Long, Double> rate;
  • 14. BinarySortedState - History “Temporal state” Hackathon project (David + Nico + Seth) ● Main primitives: getAt(), getAtOrBefore(), getAtOrAfter(), add(), addAll(), update() 2020 Nov 2021 April 2022 started as a Hackathon project (David + Nico) on custom windowing with process functions Created FLIP-220 and discussed on dev@flink.apache.org ● Extended scope further to allow arbitrary user keys (not just timestamps) ○ Identified further use cases in SQL operators, e.g. min/max with retractions ● Clarified serializer requirements ● Extend proposed API to offer range read and clear operations ● …
  • 15. ● A new keyed-state primitive, built on top of ListState ● Efficiently add to list of values for a user-provided key ● Efficiently iterate user-keys in a well-defined sort order, with native state-backend support, especially RocksDB ● Efficient operations for time-based functions (windowing, sorting, event-time joins, custom, ...) ● Operations on subset of the state, based on user-key ranges ● Portable between state backends (RocksDB, HashMap) BinarySortedState - Goals
  • 16. BinarySortedState - API (subject to change!) ● Point-operations: ○ valuesAt(key) ○ add(key, value) ○ put(key, values) ○ clearEntryAt(key) ● Lookups: ○ firstEntry(), firstEntry(fromKey) ○ lastEntry(), lastEntry(UK toKey) ● Cleanup: ○ clear() ● Range-operations: ○ readRange(fromKey, toKey, inclusiveToKey) ○ readRangeUntil(toKey, inclusiveToKey) ○ readRangeFrom(fromKey) ● Range-deletes: ○ clearEntryAt(key) ○ clearRange( fromKey, toKey, inclusiveToKey) ○ clearRangeUntil(toKey, inclusiveToKey) ○ clearRangeFrom(fromKey) BinarySortedState<UK, UV>
  • 17. ● RocksDB is a key-value store writing into MemTables → flushing into SST files ● SST files are sorted by the key in lexicographical binary order (byte-wise unsigned comparison) BinarySortedState - How does it work with RocksDB?! ● RocksDB offers Prefix Seek and SeekForPrev ● RocksDBMapState.RocksDBMapIterator provides efficient iteration via: ○ Fetching up to 128 RocksDB entries at once ○ RocksDBMapEntry with lazy key/value deserialization
  • 18. BinarySortedState - LexicographicalTypeSerializer ● “Just” need to provide serializers that are compatible with RocksDB’s sort order ● Based on lexicographical binary order as defined by byte-wise unsigned comparison ● Compatible serializers extend LexicographicalTypeSerializer public abstract class LexicographicalTypeSerializer<T> extends TypeSerializer<T> { public Optional<Comparator<T>> findComparator() { return Optional.empty(); } }
  • 19. Stream Sort w/out BinarySortedState (1) private BinarySortedState<Long, Long> events; BinarySortedStateDescriptor<Long, Long> desc = new BinarySortedStateDescriptor<>( "Events", LexicographicLongSerializer.INSTANCE, Types.LONG); events = getRuntimeContext() .getBinarySortedState(desc); private MapState<Long, List<Long>> events; MapStateDescriptor<Long, List<Long>> desc = new MapStateDescriptor<>( "Events", Types.LONG, Types.LIST(Types.LONG)); events = getRuntimeContext() .getMapState(desc); public void onTimer(long ts, /*...*/) { events.valuesAt(ts).forEach(out::collect); events.clearEntryAt(ts); } public void onTimer(long ts, /*...*/) { events.get(ts).forEach(out::collect); events.remove(ts); }
  • 20. Stream Sort w/out BinarySortedState (2) public void processElement(/*...*/) { // ... events.add(ts, event); // ... } public void processElement(/*...*/) { // ... List<Long> listAtTs = events.get(ts); if (listAtTs == null) { listAtTs = new ArrayList<>(); } listAtTs.add(event); events.put(ts, listAtTs); // ... }
  • 21. events.add(ts, event); Stream Sort with BinarySortedState - What’s Happening Underneath State (RocksDB) De-/Serializer add new merge op to memtable Serialized entry as byte[] new list entry
  • 22. Stream Sort with BinarySortedState - What’s Happening Underneath State (RocksDB) De-/Serializer full list as byte[] Lookup deserialized (Java) list search memtable + sst files for all entries events.valuesAt(ts).forEach(out::collect);
  • 23. Mark k/v as deleted (removal during compaction) Stream Sort with BinarySortedState - What’s Happening Underneath State (RocksDB) De-/Serializer events.clearEntryAt(ts); Delete
  • 24. Event-Time Stream Join w/out BinarySortedState (1) private BinarySortedState<Long, Transaction> transactions; private BinarySortedState<Long, Double> rate; private MapState<Long, List<Transaction>> transactions; private MapState<Long, Double> rate; public void processElement1(/*...*/) { // ... if (!isLate(ts, timerSvc)) { // append to BinarySortedState: transactions.add(ts, value); timerSvc.registerEventTimeTimer(ts); } } // similar for processElement2() public void processElement1(/*...*/) { // ... if (!isLate(ts, timerSvc)) { // replace list in MapState: addTransaction(ts, value); timerSvc.registerEventTimeTimer(ts); } } // similar for processElement2()
  • 25. Event-Time Stream Join w/out BinarySortedState (2) public void onTimer(long ts, /*...*/) { Entry<Long, Iterable<Double>> myRate = rate.lastEntry(ts); Double rateVal = Optional .ofNullable(myRate) .map(e -> e.getValue().iterator().next()) .orElse(null); transactions.valuesAt(ts).forEach( tx -> out.collect( new Joined(rateVal, tx))); if (myRate != null) { rate.clearRangeUntil(myRate.getKey(), false); } transactions.clearEntryAt(ts); } public void onTimer(long ts, /*...*/) { TreeSet<Entry<Long, Double>> rates = ratesInRange(ts, rate.entries()); Double myRate = Optional .ofNullable(rates.pollLast()) .map(Entry::getValue) .orElse(null); transactions.get(ts).forEach( tx -> out.collect( new Joined(myRate, tx))); rates.forEach(this::removeRate); transactions.remove(ts); }
  • 26. Stream Sort with BinarySortedState - Optimized ● Idea: ○ Increase efficiency by processing all events between watermarks ● Challenge: ○ Registering a timer for the next watermark will fire too often ➔ Solution: ○ Register timer for the first unprocessed event ○ When the timer fires: ■ Process all events until the current watermark (not the timer timestamp!) ● events.readRangeUntil(currentWatermark, true)
  • 27. Event-Time Stream Join with BinarySortedState - Optimized ● Idea: ○ Increase efficiency by processing all events between watermarks ● Challenge: ○ Registering a timer for the next watermark will fire too often ➔ Solution: ○ Same as Stream Sort: Timer for first unprocessed event, processing until watermark, but: ○ When the timer fires: ■ Iterate both, transactions and rate (in the appropriate time range) in event-time order
  • 32. ● (Custom) stream sorting ● Time-based (custom) joins ● Code with range-reads or bulk-deletes ● Custom window implementations ● Min/Max with retractions ● … ● Basically everything maintaining a MapState<?, List<?>> or requiring range operations BinarySortedState - Who will benefit?
  • 33. What’s left to do? ● Iron out last bits and pieces + tests ○ Start voting thread for FLIP-220 on dev@flink.apache.org ○ Create a PR and get it merged ● Expected to land in Flink 1.17 (as experimental feature) ● Port Table/SQL/DataStream operators to improve efficiency: ○ TemporalRowTimeJoinOperator (PoC already done for validating the API ✓) ○ RowTimeSortOperator ○ IntervalJoinOperator ○ CepOperator ○ … ● Provide more LexicographicalTypeSerializers
  • 34. Get ready to ROK!!! Nico Kruber linkedin.com/in/nico-kruber nico@immerok.com