Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
SlideShare a Scribd company logo
Introducing
BinarySortedState
A new Flink state
primitive to boost
your application
performance
Nico Kruber
–
Software
Engineer
——
David Anderson
–
Community
Engineering
–
Flink Forward
22
About me
Open source
● Apache Flink contributor/committer since 2016
○ Focus on network stack, usability, and performance
Career
● PhD in Computer Science at HU Berlin / Zuse Institute Berlin
● Software engineer -> Solutions architect -> Head of Solutions Architecture
@ DataArtisans/Ververica (acquired by Alibaba)
● Engineering @ Immerok
About Immerok
● Building a fully managed Apache Flink cloud service
for powering real-time systems at any scale
○ immerok.com
2
Agenda
● Motivation
● BinarySortedState
● Results & Future Work
Motivation
Use Case: Stream Sort
20
15
11
30 35
27
21
40
41
35
1
2
5
7 6
9
10
11
13
14
Use Case: Stream Sort - Code
void processElement(Long event, /*...*/) {
TimerService timerSvc = ctx.timerService();
long ts = ctx.timestamp();
if (!isLate(ts, timerSvc)) {
List<Long> listAtTs = events.get(ts);
if (listAtTs == null) {
listAtTs = new ArrayList<>();
}
listAtTs.add(event);
events.put(ts, listAtTs);
timerSvc.registerEventTimeTimer(ts);
}
}
MapState<Long, List<Long>> events;
MapStateDescriptor<Long, List<Long>> desc =
new MapStateDescriptor<>(
"Events",
Types.LONG,
Types.LIST(Types.LONG));
events = getRuntimeContext().getMapState(desc);
void onTimer(long ts, /*...*/) {
events.get(ts).forEach(out::collect);
events.remove(ts);
}
List<Long> listAtTs = events.get(ts);
if (listAtTs == null) {
listAtTs = new ArrayList<>();
}
listAtTs.add(event);
events.put(ts, listAtTs);
Use Case: Stream Sort - What’s
Happening Underneath
State (RocksDB)
De-/Serializer
full list
as byte[]
search memtable
+ sst files
for one entry
lookup
deserialized
(Java) list
List<Long> listAtTs = events.get(ts);
if (listAtTs == null) {
listAtTs = new ArrayList<>();
}
listAtTs.add(event);
events.put(ts, listAtTs);
Use Case: Stream Sort - What’s
Happening Underneath
State (RocksDB)
De-/Serializer
full Java list
serialized list
as byte[]
add new entry
to memtable
(leave old one for
compaction)
Use Case: Stream Sort - Alternative Solutions
● Using MapState<Long, Event> instead of MapState<Long, List<Event>>?
○ Cannot handle multiple events per timestamp
● Using Window API?
○ Efficient event storage per timestamp
○ No really well-matching window types: sliding, tumbling, and session windows
● Using HashMapStateBackend?
○ No de-/serialization overhead
○ state limited by available memory
○ no incremental checkpoints
● Using ListState<Event> and filtering in onTimer()?
○ Reduced overhead in processElement() vs. more to do in onTimer()
{ts: 5, code: GBP,
rate: 1.20}
{ts: 10, code: USD,
rate: 1.00}
{ts: 19, code: USD,
rate: 1.02}
rates
{ts: 15, code: USD,
amount: 1.00}
{ts: 10, code: GBP,
amount: 2.00}
{ts: 25, code: USD,
amount: 3.00}
transactions
Use Case: Event-Time Stream Join
{ts1: 15, ts2: 10,
amount: 1.00}
{ts1: 10, ts2: 5,
amount: 2.40}
{ts1: 25, ts2: 19,
amount: 3.06}
SELECT
t.ts AS ts1, r.ts AS ts2,
t.amount * r.rate AS amount
FROM transactions AS t
LEFT JOIN rates
FOR SYSTEM_TIME AS OF t.ts AS r
ON t.code = r.code;
void onTimer(long ts, /*...*/) {
TreeSet<Entry<Long, Double>> rates =
ratesInRange(ts, rate.entries());
Double myRate = getLast(rates);
transactions.get(ts).forEach(
tx -> out.collect(
new Joined(myRate, tx)));
deleteAllButLast(rates);
transactions.remove(ts);
}
Use Case: Event-Time Stream Join - Code
void processElement1(Transaction value,/*...*/) {
TimerService timerSvc = ctx.timerService();
long ts = ctx.timestamp();
if (!isLate(ts, timerSvc)) {
addTransaction(ts, value);
timerSvc.registerEventTimeTimer(ts);
}
}
// similar for processElement2()
MapState<Long, List<Transaction>>
transactions;
MapState<Long, Double> rate;
void onTimer(long ts, /*...*/) {
TreeSet<Entry<Long, Double>> rates =
ratesInRange(ts, rate.entries());
Double myRate = getLast(rates);
transactions.get(ts).forEach(
tx -> out.collect(
new Joined(myRate, tx)));
deleteAllButLast(rates);
transactions.remove(ts);
}
With RocksDB:
● always fetching all rates’ (key+value) bytes ⚠
● (also need to fit into memory ⚠)
● deserialize all keys keys during iteration ⚠
○ not deserializing values (at least) ✓
Use Case: Event-Time Stream Join - Code
void processElement1(Transaction value,/*...*/) {
TimerService timerSvc = ctx.timerService();
long ts = ctx.timestamp();
if (!isLate(ts, timerSvc)) {
addTransaction(ts, value);
timerSvc.registerEventTimeTimer(ts);
}
}
// similar for processElement2()
Similar to stream sort, with RocksDB:
● always fetch/write full lists ⚠
● always de-/serialize full list ⚠
● additional stress on RocksDB compaction ⚠
MapState<Long, List<Transaction>>
transactions;
MapState<Long, Double> rate;
BinarySortedState
BinarySortedState - History
“Temporal state” Hackathon project (David + Nico + Seth)
● Main primitives: getAt(), getAtOrBefore(), getAtOrAfter(), add(),
addAll(), update()
2020
Nov 2021
April 2022
started as a Hackathon project (David + Nico) on
custom windowing with process functions
Created FLIP-220 and discussed on dev@flink.apache.org
● Extended scope further to allow arbitrary user keys (not just timestamps)
○ Identified further use cases in SQL operators, e.g. min/max with retractions
● Clarified serializer requirements
● Extend proposed API to offer range read and clear operations
● …
● A new keyed-state primitive, built on top of ListState
● Efficiently add to list of values for a user-provided key
● Efficiently iterate user-keys in a well-defined sort order,
with native state-backend support, especially RocksDB
● Efficient operations for time-based functions
(windowing, sorting, event-time joins, custom, ...)
● Operations on subset of the state, based on user-key ranges
● Portable between state backends (RocksDB, HashMap)
BinarySortedState - Goals
BinarySortedState - API (subject to change!)
● Point-operations:
○ valuesAt(key)
○ add(key, value)
○ put(key, values)
○ clearEntryAt(key)
● Lookups:
○ firstEntry(), firstEntry(fromKey)
○ lastEntry(), lastEntry(UK toKey)
● Cleanup:
○ clear()
● Range-operations:
○ readRange(fromKey, toKey,
inclusiveToKey)
○ readRangeUntil(toKey, inclusiveToKey)
○ readRangeFrom(fromKey)
● Range-deletes:
○ clearEntryAt(key)
○ clearRange(
fromKey, toKey, inclusiveToKey)
○ clearRangeUntil(toKey, inclusiveToKey)
○ clearRangeFrom(fromKey)
BinarySortedState<UK, UV>
● RocksDB is a key-value store writing into MemTables → flushing into SST files
● SST files are sorted by the key in lexicographical binary order
(byte-wise unsigned comparison)
BinarySortedState - How does it work with RocksDB?!
● RocksDB offers Prefix Seek and SeekForPrev
● RocksDBMapState.RocksDBMapIterator provides efficient iteration via:
○ Fetching up to 128 RocksDB entries at once
○ RocksDBMapEntry with lazy key/value deserialization
BinarySortedState - LexicographicalTypeSerializer
● “Just” need to provide serializers that are compatible with RocksDB’s sort order
● Based on lexicographical binary order as defined by byte-wise unsigned comparison
● Compatible serializers extend LexicographicalTypeSerializer
public abstract class LexicographicalTypeSerializer<T> extends TypeSerializer<T> {
public Optional<Comparator<T>> findComparator() { return Optional.empty(); }
}
Stream Sort w/out BinarySortedState (1)
private BinarySortedState<Long, Long> events;
BinarySortedStateDescriptor<Long, Long> desc =
new BinarySortedStateDescriptor<>(
"Events",
LexicographicLongSerializer.INSTANCE,
Types.LONG);
events = getRuntimeContext()
.getBinarySortedState(desc);
private MapState<Long, List<Long>> events;
MapStateDescriptor<Long, List<Long>> desc =
new MapStateDescriptor<>(
"Events",
Types.LONG,
Types.LIST(Types.LONG));
events = getRuntimeContext()
.getMapState(desc);
public void onTimer(long ts, /*...*/) {
events.valuesAt(ts).forEach(out::collect);
events.clearEntryAt(ts);
}
public void onTimer(long ts, /*...*/) {
events.get(ts).forEach(out::collect);
events.remove(ts);
}
Stream Sort w/out BinarySortedState (2)
public void processElement(/*...*/) {
// ...
events.add(ts, event);
// ...
}
public void processElement(/*...*/) {
// ...
List<Long> listAtTs = events.get(ts);
if (listAtTs == null) {
listAtTs = new ArrayList<>();
}
listAtTs.add(event);
events.put(ts, listAtTs);
// ...
}
events.add(ts, event);
Stream Sort with
BinarySortedState - What’s
Happening Underneath
State (RocksDB)
De-/Serializer
add new merge
op to memtable
Serialized
entry as byte[]
new list
entry
Stream Sort with
BinarySortedState - What’s
Happening Underneath
State (RocksDB)
De-/Serializer
full list
as byte[]
Lookup
deserialized
(Java) list
search memtable
+ sst files
for all entries
events.valuesAt(ts).forEach(out::collect);
Mark k/v as
deleted
(removal during
compaction)
Stream Sort with
BinarySortedState - What’s
Happening Underneath
State (RocksDB)
De-/Serializer
events.clearEntryAt(ts);
Delete
Event-Time Stream Join w/out BinarySortedState (1)
private BinarySortedState<Long, Transaction>
transactions;
private BinarySortedState<Long, Double> rate;
private MapState<Long, List<Transaction>>
transactions;
private MapState<Long, Double> rate;
public void processElement1(/*...*/) {
// ...
if (!isLate(ts, timerSvc)) {
// append to BinarySortedState:
transactions.add(ts, value);
timerSvc.registerEventTimeTimer(ts);
}
}
// similar for processElement2()
public void processElement1(/*...*/) {
// ...
if (!isLate(ts, timerSvc)) {
// replace list in MapState:
addTransaction(ts, value);
timerSvc.registerEventTimeTimer(ts);
}
}
// similar for processElement2()
Event-Time Stream Join w/out BinarySortedState (2)
public void onTimer(long ts, /*...*/) {
Entry<Long, Iterable<Double>> myRate =
rate.lastEntry(ts);
Double rateVal = Optional
.ofNullable(myRate)
.map(e -> e.getValue().iterator().next())
.orElse(null);
transactions.valuesAt(ts).forEach(
tx -> out.collect(
new Joined(rateVal, tx)));
if (myRate != null) {
rate.clearRangeUntil(myRate.getKey(), false);
}
transactions.clearEntryAt(ts);
}
public void onTimer(long ts, /*...*/) {
TreeSet<Entry<Long, Double>> rates =
ratesInRange(ts, rate.entries());
Double myRate = Optional
.ofNullable(rates.pollLast())
.map(Entry::getValue)
.orElse(null);
transactions.get(ts).forEach(
tx -> out.collect(
new Joined(myRate, tx)));
rates.forEach(this::removeRate);
transactions.remove(ts);
}
Stream Sort with BinarySortedState - Optimized
● Idea:
○ Increase efficiency by processing all events between watermarks
● Challenge:
○ Registering a timer for the next watermark will fire too often
➔ Solution:
○ Register timer for the first unprocessed event
○ When the timer fires:
■ Process all events until the current watermark (not the timer timestamp!)
● events.readRangeUntil(currentWatermark, true)
Event-Time Stream Join with BinarySortedState - Optimized
● Idea:
○ Increase efficiency by processing all events between watermarks
● Challenge:
○ Registering a timer for the next watermark will fire too often
➔ Solution:
○ Same as Stream Sort: Timer for first unprocessed event, processing until watermark, but:
○ When the timer fires:
■ Iterate both, transactions and rate (in the appropriate time range) in event-time order
Results & Future Work
+50%
Stream Sort
+130%
Optimized
Stream Sort
+60%
Stream Join
● (Custom) stream sorting
● Time-based (custom) joins
● Code with range-reads or bulk-deletes
● Custom window implementations
● Min/Max with retractions
● …
● Basically everything maintaining a MapState<?, List<?>> or requiring range operations
BinarySortedState - Who will benefit?
What’s left to do?
● Iron out last bits and pieces + tests
○ Start voting thread for FLIP-220 on dev@flink.apache.org
○ Create a PR and get it merged
● Expected to land in Flink 1.17 (as experimental feature)
● Port Table/SQL/DataStream operators to improve efficiency:
○ TemporalRowTimeJoinOperator (PoC already done for validating the API ✓)
○ RowTimeSortOperator
○ IntervalJoinOperator
○ CepOperator
○ …
● Provide more LexicographicalTypeSerializers
Get ready to ROK!!!
Nico Kruber
linkedin.com/in/nico-kruber
nico@immerok.com

More Related Content

What's hot

Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Flink Forward
 
Dynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data AlertsDynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data Alerts
Flink Forward
 
Apache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraApache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native Era
Flink Forward
 
Tuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptxTuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptx
Flink Forward
 
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta LakeSimplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Databricks
 
Stephan Ewen - Experiences running Flink at Very Large Scale
Stephan Ewen -  Experiences running Flink at Very Large ScaleStephan Ewen -  Experiences running Flink at Very Large Scale
Stephan Ewen - Experiences running Flink at Very Large Scale
Ververica
 
Introducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorIntroducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes Operator
Flink Forward
 
Demystifying flink memory allocation and tuning - Roshan Naik, Uber
Demystifying flink memory allocation and tuning - Roshan Naik, UberDemystifying flink memory allocation and tuning - Roshan Naik, Uber
Demystifying flink memory allocation and tuning - Roshan Naik, Uber
Flink Forward
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Flink Forward
 
Fundamentals of Apache Kafka
Fundamentals of Apache KafkaFundamentals of Apache Kafka
Fundamentals of Apache Kafka
Chhavi Parasher
 
The top 3 challenges running multi-tenant Flink at scale
The top 3 challenges running multi-tenant Flink at scaleThe top 3 challenges running multi-tenant Flink at scale
The top 3 challenges running multi-tenant Flink at scale
Flink Forward
 
Presto on Apache Spark: A Tale of Two Computation Engines
Presto on Apache Spark: A Tale of Two Computation EnginesPresto on Apache Spark: A Tale of Two Computation Engines
Presto on Apache Spark: A Tale of Two Computation Engines
Databricks
 
Flink powered stream processing platform at Pinterest
Flink powered stream processing platform at PinterestFlink powered stream processing platform at Pinterest
Flink powered stream processing platform at Pinterest
Flink Forward
 
Streaming SQL with Apache Calcite
Streaming SQL with Apache CalciteStreaming SQL with Apache Calcite
Streaming SQL with Apache Calcite
Julian Hyde
 
Apache Flink Worst Practices
Apache Flink Worst PracticesApache Flink Worst Practices
Apache Flink Worst Practices
Konstantin Knauf
 
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudAmazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Noritaka Sekiyama
 
Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...
Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...
Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...
Flink Forward
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
Databricks
 
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production DeploymentUsing the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Flink Forward
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to Redis
Dvir Volk
 

What's hot (20)

Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
 
Dynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data AlertsDynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data Alerts
 
Apache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraApache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native Era
 
Tuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptxTuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptx
 
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta LakeSimplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
 
Stephan Ewen - Experiences running Flink at Very Large Scale
Stephan Ewen -  Experiences running Flink at Very Large ScaleStephan Ewen -  Experiences running Flink at Very Large Scale
Stephan Ewen - Experiences running Flink at Very Large Scale
 
Introducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorIntroducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes Operator
 
Demystifying flink memory allocation and tuning - Roshan Naik, Uber
Demystifying flink memory allocation and tuning - Roshan Naik, UberDemystifying flink memory allocation and tuning - Roshan Naik, Uber
Demystifying flink memory allocation and tuning - Roshan Naik, Uber
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
 
Fundamentals of Apache Kafka
Fundamentals of Apache KafkaFundamentals of Apache Kafka
Fundamentals of Apache Kafka
 
The top 3 challenges running multi-tenant Flink at scale
The top 3 challenges running multi-tenant Flink at scaleThe top 3 challenges running multi-tenant Flink at scale
The top 3 challenges running multi-tenant Flink at scale
 
Presto on Apache Spark: A Tale of Two Computation Engines
Presto on Apache Spark: A Tale of Two Computation EnginesPresto on Apache Spark: A Tale of Two Computation Engines
Presto on Apache Spark: A Tale of Two Computation Engines
 
Flink powered stream processing platform at Pinterest
Flink powered stream processing platform at PinterestFlink powered stream processing platform at Pinterest
Flink powered stream processing platform at Pinterest
 
Streaming SQL with Apache Calcite
Streaming SQL with Apache CalciteStreaming SQL with Apache Calcite
Streaming SQL with Apache Calcite
 
Apache Flink Worst Practices
Apache Flink Worst PracticesApache Flink Worst Practices
Apache Flink Worst Practices
 
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudAmazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
 
Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...
Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...
Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
 
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production DeploymentUsing the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production Deployment
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to Redis
 

Similar to Introducing BinarySortedMultiMap - A new Flink state primitive to boost your application performance

Lambda at Weather Scale - Cassandra Summit 2015
Lambda at Weather Scale - Cassandra Summit 2015Lambda at Weather Scale - Cassandra Summit 2015
Lambda at Weather Scale - Cassandra Summit 2015
Robbie Strickland
 
So you think you can stream.pptx
So you think you can stream.pptxSo you think you can stream.pptx
So you think you can stream.pptx
Prakash Chockalingam
 
OpenTSDB 2.0
OpenTSDB 2.0OpenTSDB 2.0
OpenTSDB 2.0
HBaseCon
 
Norikra: SQL Stream Processing In Ruby
Norikra: SQL Stream Processing In RubyNorikra: SQL Stream Processing In Ruby
Norikra: SQL Stream Processing In Ruby
SATOSHI TAGOMORI
 
Application Monitoring using Open Source: VictoriaMetrics - ClickHouse
Application Monitoring using Open Source: VictoriaMetrics - ClickHouseApplication Monitoring using Open Source: VictoriaMetrics - ClickHouse
Application Monitoring using Open Source: VictoriaMetrics - ClickHouse
VictoriaMetrics
 
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
Altinity Ltd
 
BigQuery case study in Groovenauts & Dive into the DataflowJavaSDK
BigQuery case study in Groovenauts & Dive into the DataflowJavaSDKBigQuery case study in Groovenauts & Dive into the DataflowJavaSDK
BigQuery case study in Groovenauts & Dive into the DataflowJavaSDK
nagachika t
 
WSO2Con USA 2015: WSO2 Analytics Platform - The One Stop Shop for All Your Da...
WSO2Con USA 2015: WSO2 Analytics Platform - The One Stop Shop for All Your Da...WSO2Con USA 2015: WSO2 Analytics Platform - The One Stop Shop for All Your Da...
WSO2Con USA 2015: WSO2 Analytics Platform - The One Stop Shop for All Your Da...
WSO2
 
Deep Dive with Spark Streaming - Tathagata Das - Spark Meetup 2013-06-17
Deep Dive with Spark Streaming - Tathagata  Das - Spark Meetup 2013-06-17Deep Dive with Spark Streaming - Tathagata  Das - Spark Meetup 2013-06-17
Deep Dive with Spark Streaming - Tathagata Das - Spark Meetup 2013-06-17
spark-project
 
RMLL 2013 - Synchronize OpenLDAP and Active Directory with LSC
RMLL 2013 - Synchronize OpenLDAP and Active Directory with LSCRMLL 2013 - Synchronize OpenLDAP and Active Directory with LSC
RMLL 2013 - Synchronize OpenLDAP and Active Directory with LSC
Clément OUDOT
 
Spark streaming: Best Practices
Spark streaming: Best PracticesSpark streaming: Best Practices
Spark streaming: Best Practices
Prakash Chockalingam
 
Webinar about Spring Data Neo4j 4
Webinar about Spring Data Neo4j 4Webinar about Spring Data Neo4j 4
Webinar about Spring Data Neo4j 4
GraphAware
 
SparkSQL: A Compiler from Queries to RDDs
SparkSQL: A Compiler from Queries to RDDsSparkSQL: A Compiler from Queries to RDDs
SparkSQL: A Compiler from Queries to RDDs
Databricks
 
Java/Scala Lab: Анатолий Кметюк - Scala SubScript: Алгебра для реактивного пр...
Java/Scala Lab: Анатолий Кметюк - Scala SubScript: Алгебра для реактивного пр...Java/Scala Lab: Анатолий Кметюк - Scala SubScript: Алгебра для реактивного пр...
Java/Scala Lab: Анатолий Кметюк - Scala SubScript: Алгебра для реактивного пр...
GeeksLab Odessa
 
Introducing TiDB [Delivered: 09/27/18 at NYC SQL Meetup]
Introducing TiDB [Delivered: 09/27/18 at NYC SQL Meetup]Introducing TiDB [Delivered: 09/27/18 at NYC SQL Meetup]
Introducing TiDB [Delivered: 09/27/18 at NYC SQL Meetup]
Kevin Xu
 
WSO2Con ASIA 2016: WSO2 Analytics Platform: The One Stop Shop for All Your Da...
WSO2Con ASIA 2016: WSO2 Analytics Platform: The One Stop Shop for All Your Da...WSO2Con ASIA 2016: WSO2 Analytics Platform: The One Stop Shop for All Your Da...
WSO2Con ASIA 2016: WSO2 Analytics Platform: The One Stop Shop for All Your Da...
WSO2
 
WSO2 Analytics Platform: The one stop shop for all your data needs
WSO2 Analytics Platform: The one stop shop for all your data needsWSO2 Analytics Platform: The one stop shop for all your data needs
WSO2 Analytics Platform: The one stop shop for all your data needs
Sriskandarajah Suhothayan
 
BDA403 How Netflix Monitors Applications in Real-time with Amazon Kinesis
BDA403 How Netflix Monitors Applications in Real-time with Amazon KinesisBDA403 How Netflix Monitors Applications in Real-time with Amazon Kinesis
BDA403 How Netflix Monitors Applications in Real-time with Amazon Kinesis
Amazon Web Services
 
Real-Time ETL in Practice with WSO2 Enterprise Integrator
Real-Time ETL in Practice with WSO2 Enterprise IntegratorReal-Time ETL in Practice with WSO2 Enterprise Integrator
Real-Time ETL in Practice with WSO2 Enterprise Integrator
WSO2
 
Spark what's new what's coming
Spark what's new what's comingSpark what's new what's coming
Spark what's new what's coming
Databricks
 

Similar to Introducing BinarySortedMultiMap - A new Flink state primitive to boost your application performance (20)

Lambda at Weather Scale - Cassandra Summit 2015
Lambda at Weather Scale - Cassandra Summit 2015Lambda at Weather Scale - Cassandra Summit 2015
Lambda at Weather Scale - Cassandra Summit 2015
 
So you think you can stream.pptx
So you think you can stream.pptxSo you think you can stream.pptx
So you think you can stream.pptx
 
OpenTSDB 2.0
OpenTSDB 2.0OpenTSDB 2.0
OpenTSDB 2.0
 
Norikra: SQL Stream Processing In Ruby
Norikra: SQL Stream Processing In RubyNorikra: SQL Stream Processing In Ruby
Norikra: SQL Stream Processing In Ruby
 
Application Monitoring using Open Source: VictoriaMetrics - ClickHouse
Application Monitoring using Open Source: VictoriaMetrics - ClickHouseApplication Monitoring using Open Source: VictoriaMetrics - ClickHouse
Application Monitoring using Open Source: VictoriaMetrics - ClickHouse
 
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
 
BigQuery case study in Groovenauts & Dive into the DataflowJavaSDK
BigQuery case study in Groovenauts & Dive into the DataflowJavaSDKBigQuery case study in Groovenauts & Dive into the DataflowJavaSDK
BigQuery case study in Groovenauts & Dive into the DataflowJavaSDK
 
WSO2Con USA 2015: WSO2 Analytics Platform - The One Stop Shop for All Your Da...
WSO2Con USA 2015: WSO2 Analytics Platform - The One Stop Shop for All Your Da...WSO2Con USA 2015: WSO2 Analytics Platform - The One Stop Shop for All Your Da...
WSO2Con USA 2015: WSO2 Analytics Platform - The One Stop Shop for All Your Da...
 
Deep Dive with Spark Streaming - Tathagata Das - Spark Meetup 2013-06-17
Deep Dive with Spark Streaming - Tathagata  Das - Spark Meetup 2013-06-17Deep Dive with Spark Streaming - Tathagata  Das - Spark Meetup 2013-06-17
Deep Dive with Spark Streaming - Tathagata Das - Spark Meetup 2013-06-17
 
RMLL 2013 - Synchronize OpenLDAP and Active Directory with LSC
RMLL 2013 - Synchronize OpenLDAP and Active Directory with LSCRMLL 2013 - Synchronize OpenLDAP and Active Directory with LSC
RMLL 2013 - Synchronize OpenLDAP and Active Directory with LSC
 
Spark streaming: Best Practices
Spark streaming: Best PracticesSpark streaming: Best Practices
Spark streaming: Best Practices
 
Webinar about Spring Data Neo4j 4
Webinar about Spring Data Neo4j 4Webinar about Spring Data Neo4j 4
Webinar about Spring Data Neo4j 4
 
SparkSQL: A Compiler from Queries to RDDs
SparkSQL: A Compiler from Queries to RDDsSparkSQL: A Compiler from Queries to RDDs
SparkSQL: A Compiler from Queries to RDDs
 
Java/Scala Lab: Анатолий Кметюк - Scala SubScript: Алгебра для реактивного пр...
Java/Scala Lab: Анатолий Кметюк - Scala SubScript: Алгебра для реактивного пр...Java/Scala Lab: Анатолий Кметюк - Scala SubScript: Алгебра для реактивного пр...
Java/Scala Lab: Анатолий Кметюк - Scala SubScript: Алгебра для реактивного пр...
 
Introducing TiDB [Delivered: 09/27/18 at NYC SQL Meetup]
Introducing TiDB [Delivered: 09/27/18 at NYC SQL Meetup]Introducing TiDB [Delivered: 09/27/18 at NYC SQL Meetup]
Introducing TiDB [Delivered: 09/27/18 at NYC SQL Meetup]
 
WSO2Con ASIA 2016: WSO2 Analytics Platform: The One Stop Shop for All Your Da...
WSO2Con ASIA 2016: WSO2 Analytics Platform: The One Stop Shop for All Your Da...WSO2Con ASIA 2016: WSO2 Analytics Platform: The One Stop Shop for All Your Da...
WSO2Con ASIA 2016: WSO2 Analytics Platform: The One Stop Shop for All Your Da...
 
WSO2 Analytics Platform: The one stop shop for all your data needs
WSO2 Analytics Platform: The one stop shop for all your data needsWSO2 Analytics Platform: The one stop shop for all your data needs
WSO2 Analytics Platform: The one stop shop for all your data needs
 
BDA403 How Netflix Monitors Applications in Real-time with Amazon Kinesis
BDA403 How Netflix Monitors Applications in Real-time with Amazon KinesisBDA403 How Netflix Monitors Applications in Real-time with Amazon Kinesis
BDA403 How Netflix Monitors Applications in Real-time with Amazon Kinesis
 
Real-Time ETL in Practice with WSO2 Enterprise Integrator
Real-Time ETL in Practice with WSO2 Enterprise IntegratorReal-Time ETL in Practice with WSO2 Enterprise Integrator
Real-Time ETL in Practice with WSO2 Enterprise Integrator
 
Spark what's new what's coming
Spark what's new what's comingSpark what's new what's coming
Spark what's new what's coming
 

More from Flink Forward

“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
Flink Forward
 
One sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async SinkOne sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async Sink
Flink Forward
 
Flink SQL on Pulsar made easy
Flink SQL on Pulsar made easyFlink SQL on Pulsar made easy
Flink SQL on Pulsar made easy
Flink Forward
 
Processing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial ServicesProcessing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial Services
Flink Forward
 
Welcome to the Flink Community!
Welcome to the Flink Community!Welcome to the Flink Community!
Welcome to the Flink Community!
Flink Forward
 
Practical learnings from running thousands of Flink jobs
Practical learnings from running thousands of Flink jobsPractical learnings from running thousands of Flink jobs
Practical learnings from running thousands of Flink jobs
Flink Forward
 
Extending Flink SQL for stream processing use cases
Extending Flink SQL for stream processing use casesExtending Flink SQL for stream processing use cases
Extending Flink SQL for stream processing use cases
Flink Forward
 
Using Queryable State for Fun and Profit
Using Queryable State for Fun and ProfitUsing Queryable State for Fun and Profit
Using Queryable State for Fun and Profit
Flink Forward
 
Changelog Stream Processing with Apache Flink
Changelog Stream Processing with Apache FlinkChangelog Stream Processing with Apache Flink
Changelog Stream Processing with Apache Flink
Flink Forward
 
Large Scale Real Time Fraudulent Web Behavior Detection
Large Scale Real Time Fraudulent Web Behavior DetectionLarge Scale Real Time Fraudulent Web Behavior Detection
Large Scale Real Time Fraudulent Web Behavior Detection
Flink Forward
 
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Flink Forward
 
Building Reliable Lakehouses with Apache Flink and Delta Lake
Building Reliable Lakehouses with Apache Flink and Delta LakeBuilding Reliable Lakehouses with Apache Flink and Delta Lake
Building Reliable Lakehouses with Apache Flink and Delta Lake
Flink Forward
 
Near real-time statistical modeling and anomaly detection using Flink!
Near real-time statistical modeling and anomaly detection using Flink!Near real-time statistical modeling and anomaly detection using Flink!
Near real-time statistical modeling and anomaly detection using Flink!
Flink Forward
 

More from Flink Forward (13)

“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
 
One sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async SinkOne sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async Sink
 
Flink SQL on Pulsar made easy
Flink SQL on Pulsar made easyFlink SQL on Pulsar made easy
Flink SQL on Pulsar made easy
 
Processing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial ServicesProcessing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial Services
 
Welcome to the Flink Community!
Welcome to the Flink Community!Welcome to the Flink Community!
Welcome to the Flink Community!
 
Practical learnings from running thousands of Flink jobs
Practical learnings from running thousands of Flink jobsPractical learnings from running thousands of Flink jobs
Practical learnings from running thousands of Flink jobs
 
Extending Flink SQL for stream processing use cases
Extending Flink SQL for stream processing use casesExtending Flink SQL for stream processing use cases
Extending Flink SQL for stream processing use cases
 
Using Queryable State for Fun and Profit
Using Queryable State for Fun and ProfitUsing Queryable State for Fun and Profit
Using Queryable State for Fun and Profit
 
Changelog Stream Processing with Apache Flink
Changelog Stream Processing with Apache FlinkChangelog Stream Processing with Apache Flink
Changelog Stream Processing with Apache Flink
 
Large Scale Real Time Fraudulent Web Behavior Detection
Large Scale Real Time Fraudulent Web Behavior DetectionLarge Scale Real Time Fraudulent Web Behavior Detection
Large Scale Real Time Fraudulent Web Behavior Detection
 
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
 
Building Reliable Lakehouses with Apache Flink and Delta Lake
Building Reliable Lakehouses with Apache Flink and Delta LakeBuilding Reliable Lakehouses with Apache Flink and Delta Lake
Building Reliable Lakehouses with Apache Flink and Delta Lake
 
Near real-time statistical modeling and anomaly detection using Flink!
Near real-time statistical modeling and anomaly detection using Flink!Near real-time statistical modeling and anomaly detection using Flink!
Near real-time statistical modeling and anomaly detection using Flink!
 

Recently uploaded

Blue Screen Of Death | Windows Down | Biggest IT failure
Blue Screen Of Death | Windows Down | Biggest IT failureBlue Screen Of Death | Windows Down | Biggest IT failure
Blue Screen Of Death | Windows Down | Biggest IT failure
Dexbytes Infotech Pvt Ltd
 
AMD Zen 5 Architecture Deep Dive from Tech Day
AMD Zen 5 Architecture Deep Dive from Tech DayAMD Zen 5 Architecture Deep Dive from Tech Day
AMD Zen 5 Architecture Deep Dive from Tech Day
Low Hong Chuan
 
TrustArc Webinar - Innovating with TRUSTe Responsible AI Certification
TrustArc Webinar - Innovating with TRUSTe Responsible AI CertificationTrustArc Webinar - Innovating with TRUSTe Responsible AI Certification
TrustArc Webinar - Innovating with TRUSTe Responsible AI Certification
TrustArc
 
Epicor Kinetic REST API Services Overview.pptx
Epicor Kinetic REST API Services Overview.pptxEpicor Kinetic REST API Services Overview.pptx
Epicor Kinetic REST API Services Overview.pptx
Piyush Khalate
 
Increase Quality with User Access Policies - July 2024
Increase Quality with User Access Policies - July 2024Increase Quality with User Access Policies - July 2024
Increase Quality with User Access Policies - July 2024
Peter Caitens
 
Getting Ready for Copilot for Microsoft 365 with Governance Features in Share...
Getting Ready for Copilot for Microsoft 365 with Governance Features in Share...Getting Ready for Copilot for Microsoft 365 with Governance Features in Share...
Getting Ready for Copilot for Microsoft 365 with Governance Features in Share...
Juan Carlos Gonzalez
 
Webinar: Transforming Substation Automation with Open Source Solutions
Webinar: Transforming Substation Automation with Open Source SolutionsWebinar: Transforming Substation Automation with Open Source Solutions
Webinar: Transforming Substation Automation with Open Source Solutions
DanBrown980551
 
IVE 2024 Short Course Lecture 9 - Empathic Computing in VR
IVE 2024 Short Course Lecture 9 - Empathic Computing in VRIVE 2024 Short Course Lecture 9 - Empathic Computing in VR
IVE 2024 Short Course Lecture 9 - Empathic Computing in VR
Mark Billinghurst
 
Mega MUG 2024: Working smarter in Marketo
Mega MUG 2024: Working smarter in MarketoMega MUG 2024: Working smarter in Marketo
Mega MUG 2024: Working smarter in Marketo
Stephanie Tyagita
 
Leading Bigcommerce Development Services for Online Retailers
Leading Bigcommerce Development Services for Online RetailersLeading Bigcommerce Development Services for Online Retailers
Leading Bigcommerce Development Services for Online Retailers
SynapseIndia
 
DefCamp_2016_Chemerkin_Yury_--_publish.pdf
DefCamp_2016_Chemerkin_Yury_--_publish.pdfDefCamp_2016_Chemerkin_Yury_--_publish.pdf
DefCamp_2016_Chemerkin_Yury_--_publish.pdf
Yury Chemerkin
 
STKI Israeli IT Market Study v2 August 2024.pdf
STKI Israeli IT Market Study v2 August 2024.pdfSTKI Israeli IT Market Study v2 August 2024.pdf
STKI Israeli IT Market Study v2 August 2024.pdf
Dr. Jimmy Schwarzkopf
 
Connecting Attitudes and Social Influences with Designs for Usable Security a...
Connecting Attitudes and Social Influences with Designs for Usable Security a...Connecting Attitudes and Social Influences with Designs for Usable Security a...
Connecting Attitudes and Social Influences with Designs for Usable Security a...
Cori Faklaris
 
IVE 2024 Short Course - Lecture 8 - Electroencephalography (EEG) Basics
IVE 2024 Short Course - Lecture 8 - Electroencephalography (EEG) BasicsIVE 2024 Short Course - Lecture 8 - Electroencephalography (EEG) Basics
IVE 2024 Short Course - Lecture 8 - Electroencephalography (EEG) Basics
Mark Billinghurst
 
Planetek Italia Corporate Profile Brochure
Planetek Italia Corporate Profile BrochurePlanetek Italia Corporate Profile Brochure
Planetek Italia Corporate Profile Brochure
Planetek Italia Srl
 
FIDO Munich Seminar: FIDO Tech Principles.pptx
FIDO Munich Seminar: FIDO Tech Principles.pptxFIDO Munich Seminar: FIDO Tech Principles.pptx
FIDO Munich Seminar: FIDO Tech Principles.pptx
FIDO Alliance
 
Scientific-Based Blockchain TON Project Analysis Report
Scientific-Based Blockchain  TON Project Analysis ReportScientific-Based Blockchain  TON Project Analysis Report
Scientific-Based Blockchain TON Project Analysis Report
SelcukTOPAL2
 
Bài tập tiếng anh lớp 9 - Ôn tập tuyển sinh
Bài tập tiếng anh lớp 9 - Ôn tập tuyển sinhBài tập tiếng anh lớp 9 - Ôn tập tuyển sinh
Bài tập tiếng anh lớp 9 - Ôn tập tuyển sinh
NguynThNhQunh59
 
FIDO Munich Seminar Workforce Authentication Case Study.pptx
FIDO Munich Seminar Workforce Authentication Case Study.pptxFIDO Munich Seminar Workforce Authentication Case Study.pptx
FIDO Munich Seminar Workforce Authentication Case Study.pptx
FIDO Alliance
 
Jacquard Fabric Explained: Origins, Characteristics, and Uses
Jacquard Fabric Explained: Origins, Characteristics, and UsesJacquard Fabric Explained: Origins, Characteristics, and Uses
Jacquard Fabric Explained: Origins, Characteristics, and Uses
ldtexsolbl
 

Recently uploaded (20)

Blue Screen Of Death | Windows Down | Biggest IT failure
Blue Screen Of Death | Windows Down | Biggest IT failureBlue Screen Of Death | Windows Down | Biggest IT failure
Blue Screen Of Death | Windows Down | Biggest IT failure
 
AMD Zen 5 Architecture Deep Dive from Tech Day
AMD Zen 5 Architecture Deep Dive from Tech DayAMD Zen 5 Architecture Deep Dive from Tech Day
AMD Zen 5 Architecture Deep Dive from Tech Day
 
TrustArc Webinar - Innovating with TRUSTe Responsible AI Certification
TrustArc Webinar - Innovating with TRUSTe Responsible AI CertificationTrustArc Webinar - Innovating with TRUSTe Responsible AI Certification
TrustArc Webinar - Innovating with TRUSTe Responsible AI Certification
 
Epicor Kinetic REST API Services Overview.pptx
Epicor Kinetic REST API Services Overview.pptxEpicor Kinetic REST API Services Overview.pptx
Epicor Kinetic REST API Services Overview.pptx
 
Increase Quality with User Access Policies - July 2024
Increase Quality with User Access Policies - July 2024Increase Quality with User Access Policies - July 2024
Increase Quality with User Access Policies - July 2024
 
Getting Ready for Copilot for Microsoft 365 with Governance Features in Share...
Getting Ready for Copilot for Microsoft 365 with Governance Features in Share...Getting Ready for Copilot for Microsoft 365 with Governance Features in Share...
Getting Ready for Copilot for Microsoft 365 with Governance Features in Share...
 
Webinar: Transforming Substation Automation with Open Source Solutions
Webinar: Transforming Substation Automation with Open Source SolutionsWebinar: Transforming Substation Automation with Open Source Solutions
Webinar: Transforming Substation Automation with Open Source Solutions
 
IVE 2024 Short Course Lecture 9 - Empathic Computing in VR
IVE 2024 Short Course Lecture 9 - Empathic Computing in VRIVE 2024 Short Course Lecture 9 - Empathic Computing in VR
IVE 2024 Short Course Lecture 9 - Empathic Computing in VR
 
Mega MUG 2024: Working smarter in Marketo
Mega MUG 2024: Working smarter in MarketoMega MUG 2024: Working smarter in Marketo
Mega MUG 2024: Working smarter in Marketo
 
Leading Bigcommerce Development Services for Online Retailers
Leading Bigcommerce Development Services for Online RetailersLeading Bigcommerce Development Services for Online Retailers
Leading Bigcommerce Development Services for Online Retailers
 
DefCamp_2016_Chemerkin_Yury_--_publish.pdf
DefCamp_2016_Chemerkin_Yury_--_publish.pdfDefCamp_2016_Chemerkin_Yury_--_publish.pdf
DefCamp_2016_Chemerkin_Yury_--_publish.pdf
 
STKI Israeli IT Market Study v2 August 2024.pdf
STKI Israeli IT Market Study v2 August 2024.pdfSTKI Israeli IT Market Study v2 August 2024.pdf
STKI Israeli IT Market Study v2 August 2024.pdf
 
Connecting Attitudes and Social Influences with Designs for Usable Security a...
Connecting Attitudes and Social Influences with Designs for Usable Security a...Connecting Attitudes and Social Influences with Designs for Usable Security a...
Connecting Attitudes and Social Influences with Designs for Usable Security a...
 
IVE 2024 Short Course - Lecture 8 - Electroencephalography (EEG) Basics
IVE 2024 Short Course - Lecture 8 - Electroencephalography (EEG) BasicsIVE 2024 Short Course - Lecture 8 - Electroencephalography (EEG) Basics
IVE 2024 Short Course - Lecture 8 - Electroencephalography (EEG) Basics
 
Planetek Italia Corporate Profile Brochure
Planetek Italia Corporate Profile BrochurePlanetek Italia Corporate Profile Brochure
Planetek Italia Corporate Profile Brochure
 
FIDO Munich Seminar: FIDO Tech Principles.pptx
FIDO Munich Seminar: FIDO Tech Principles.pptxFIDO Munich Seminar: FIDO Tech Principles.pptx
FIDO Munich Seminar: FIDO Tech Principles.pptx
 
Scientific-Based Blockchain TON Project Analysis Report
Scientific-Based Blockchain  TON Project Analysis ReportScientific-Based Blockchain  TON Project Analysis Report
Scientific-Based Blockchain TON Project Analysis Report
 
Bài tập tiếng anh lớp 9 - Ôn tập tuyển sinh
Bài tập tiếng anh lớp 9 - Ôn tập tuyển sinhBài tập tiếng anh lớp 9 - Ôn tập tuyển sinh
Bài tập tiếng anh lớp 9 - Ôn tập tuyển sinh
 
FIDO Munich Seminar Workforce Authentication Case Study.pptx
FIDO Munich Seminar Workforce Authentication Case Study.pptxFIDO Munich Seminar Workforce Authentication Case Study.pptx
FIDO Munich Seminar Workforce Authentication Case Study.pptx
 
Jacquard Fabric Explained: Origins, Characteristics, and Uses
Jacquard Fabric Explained: Origins, Characteristics, and UsesJacquard Fabric Explained: Origins, Characteristics, and Uses
Jacquard Fabric Explained: Origins, Characteristics, and Uses
 

Introducing BinarySortedMultiMap - A new Flink state primitive to boost your application performance

  • 1. Introducing BinarySortedState A new Flink state primitive to boost your application performance Nico Kruber – Software Engineer —— David Anderson – Community Engineering – Flink Forward 22
  • 2. About me Open source ● Apache Flink contributor/committer since 2016 ○ Focus on network stack, usability, and performance Career ● PhD in Computer Science at HU Berlin / Zuse Institute Berlin ● Software engineer -> Solutions architect -> Head of Solutions Architecture @ DataArtisans/Ververica (acquired by Alibaba) ● Engineering @ Immerok About Immerok ● Building a fully managed Apache Flink cloud service for powering real-time systems at any scale ○ immerok.com 2
  • 5. Use Case: Stream Sort 20 15 11 30 35 27 21 40 41 35 1 2 5 7 6 9 10 11 13 14
  • 6. Use Case: Stream Sort - Code void processElement(Long event, /*...*/) { TimerService timerSvc = ctx.timerService(); long ts = ctx.timestamp(); if (!isLate(ts, timerSvc)) { List<Long> listAtTs = events.get(ts); if (listAtTs == null) { listAtTs = new ArrayList<>(); } listAtTs.add(event); events.put(ts, listAtTs); timerSvc.registerEventTimeTimer(ts); } } MapState<Long, List<Long>> events; MapStateDescriptor<Long, List<Long>> desc = new MapStateDescriptor<>( "Events", Types.LONG, Types.LIST(Types.LONG)); events = getRuntimeContext().getMapState(desc); void onTimer(long ts, /*...*/) { events.get(ts).forEach(out::collect); events.remove(ts); }
  • 7. List<Long> listAtTs = events.get(ts); if (listAtTs == null) { listAtTs = new ArrayList<>(); } listAtTs.add(event); events.put(ts, listAtTs); Use Case: Stream Sort - What’s Happening Underneath State (RocksDB) De-/Serializer full list as byte[] search memtable + sst files for one entry lookup deserialized (Java) list
  • 8. List<Long> listAtTs = events.get(ts); if (listAtTs == null) { listAtTs = new ArrayList<>(); } listAtTs.add(event); events.put(ts, listAtTs); Use Case: Stream Sort - What’s Happening Underneath State (RocksDB) De-/Serializer full Java list serialized list as byte[] add new entry to memtable (leave old one for compaction)
  • 9. Use Case: Stream Sort - Alternative Solutions ● Using MapState<Long, Event> instead of MapState<Long, List<Event>>? ○ Cannot handle multiple events per timestamp ● Using Window API? ○ Efficient event storage per timestamp ○ No really well-matching window types: sliding, tumbling, and session windows ● Using HashMapStateBackend? ○ No de-/serialization overhead ○ state limited by available memory ○ no incremental checkpoints ● Using ListState<Event> and filtering in onTimer()? ○ Reduced overhead in processElement() vs. more to do in onTimer()
  • 10. {ts: 5, code: GBP, rate: 1.20} {ts: 10, code: USD, rate: 1.00} {ts: 19, code: USD, rate: 1.02} rates {ts: 15, code: USD, amount: 1.00} {ts: 10, code: GBP, amount: 2.00} {ts: 25, code: USD, amount: 3.00} transactions Use Case: Event-Time Stream Join {ts1: 15, ts2: 10, amount: 1.00} {ts1: 10, ts2: 5, amount: 2.40} {ts1: 25, ts2: 19, amount: 3.06} SELECT t.ts AS ts1, r.ts AS ts2, t.amount * r.rate AS amount FROM transactions AS t LEFT JOIN rates FOR SYSTEM_TIME AS OF t.ts AS r ON t.code = r.code;
  • 11. void onTimer(long ts, /*...*/) { TreeSet<Entry<Long, Double>> rates = ratesInRange(ts, rate.entries()); Double myRate = getLast(rates); transactions.get(ts).forEach( tx -> out.collect( new Joined(myRate, tx))); deleteAllButLast(rates); transactions.remove(ts); } Use Case: Event-Time Stream Join - Code void processElement1(Transaction value,/*...*/) { TimerService timerSvc = ctx.timerService(); long ts = ctx.timestamp(); if (!isLate(ts, timerSvc)) { addTransaction(ts, value); timerSvc.registerEventTimeTimer(ts); } } // similar for processElement2() MapState<Long, List<Transaction>> transactions; MapState<Long, Double> rate;
  • 12. void onTimer(long ts, /*...*/) { TreeSet<Entry<Long, Double>> rates = ratesInRange(ts, rate.entries()); Double myRate = getLast(rates); transactions.get(ts).forEach( tx -> out.collect( new Joined(myRate, tx))); deleteAllButLast(rates); transactions.remove(ts); } With RocksDB: ● always fetching all rates’ (key+value) bytes ⚠ ● (also need to fit into memory ⚠) ● deserialize all keys keys during iteration ⚠ ○ not deserializing values (at least) ✓ Use Case: Event-Time Stream Join - Code void processElement1(Transaction value,/*...*/) { TimerService timerSvc = ctx.timerService(); long ts = ctx.timestamp(); if (!isLate(ts, timerSvc)) { addTransaction(ts, value); timerSvc.registerEventTimeTimer(ts); } } // similar for processElement2() Similar to stream sort, with RocksDB: ● always fetch/write full lists ⚠ ● always de-/serialize full list ⚠ ● additional stress on RocksDB compaction ⚠ MapState<Long, List<Transaction>> transactions; MapState<Long, Double> rate;
  • 14. BinarySortedState - History “Temporal state” Hackathon project (David + Nico + Seth) ● Main primitives: getAt(), getAtOrBefore(), getAtOrAfter(), add(), addAll(), update() 2020 Nov 2021 April 2022 started as a Hackathon project (David + Nico) on custom windowing with process functions Created FLIP-220 and discussed on dev@flink.apache.org ● Extended scope further to allow arbitrary user keys (not just timestamps) ○ Identified further use cases in SQL operators, e.g. min/max with retractions ● Clarified serializer requirements ● Extend proposed API to offer range read and clear operations ● …
  • 15. ● A new keyed-state primitive, built on top of ListState ● Efficiently add to list of values for a user-provided key ● Efficiently iterate user-keys in a well-defined sort order, with native state-backend support, especially RocksDB ● Efficient operations for time-based functions (windowing, sorting, event-time joins, custom, ...) ● Operations on subset of the state, based on user-key ranges ● Portable between state backends (RocksDB, HashMap) BinarySortedState - Goals
  • 16. BinarySortedState - API (subject to change!) ● Point-operations: ○ valuesAt(key) ○ add(key, value) ○ put(key, values) ○ clearEntryAt(key) ● Lookups: ○ firstEntry(), firstEntry(fromKey) ○ lastEntry(), lastEntry(UK toKey) ● Cleanup: ○ clear() ● Range-operations: ○ readRange(fromKey, toKey, inclusiveToKey) ○ readRangeUntil(toKey, inclusiveToKey) ○ readRangeFrom(fromKey) ● Range-deletes: ○ clearEntryAt(key) ○ clearRange( fromKey, toKey, inclusiveToKey) ○ clearRangeUntil(toKey, inclusiveToKey) ○ clearRangeFrom(fromKey) BinarySortedState<UK, UV>
  • 17. ● RocksDB is a key-value store writing into MemTables → flushing into SST files ● SST files are sorted by the key in lexicographical binary order (byte-wise unsigned comparison) BinarySortedState - How does it work with RocksDB?! ● RocksDB offers Prefix Seek and SeekForPrev ● RocksDBMapState.RocksDBMapIterator provides efficient iteration via: ○ Fetching up to 128 RocksDB entries at once ○ RocksDBMapEntry with lazy key/value deserialization
  • 18. BinarySortedState - LexicographicalTypeSerializer ● “Just” need to provide serializers that are compatible with RocksDB’s sort order ● Based on lexicographical binary order as defined by byte-wise unsigned comparison ● Compatible serializers extend LexicographicalTypeSerializer public abstract class LexicographicalTypeSerializer<T> extends TypeSerializer<T> { public Optional<Comparator<T>> findComparator() { return Optional.empty(); } }
  • 19. Stream Sort w/out BinarySortedState (1) private BinarySortedState<Long, Long> events; BinarySortedStateDescriptor<Long, Long> desc = new BinarySortedStateDescriptor<>( "Events", LexicographicLongSerializer.INSTANCE, Types.LONG); events = getRuntimeContext() .getBinarySortedState(desc); private MapState<Long, List<Long>> events; MapStateDescriptor<Long, List<Long>> desc = new MapStateDescriptor<>( "Events", Types.LONG, Types.LIST(Types.LONG)); events = getRuntimeContext() .getMapState(desc); public void onTimer(long ts, /*...*/) { events.valuesAt(ts).forEach(out::collect); events.clearEntryAt(ts); } public void onTimer(long ts, /*...*/) { events.get(ts).forEach(out::collect); events.remove(ts); }
  • 20. Stream Sort w/out BinarySortedState (2) public void processElement(/*...*/) { // ... events.add(ts, event); // ... } public void processElement(/*...*/) { // ... List<Long> listAtTs = events.get(ts); if (listAtTs == null) { listAtTs = new ArrayList<>(); } listAtTs.add(event); events.put(ts, listAtTs); // ... }
  • 21. events.add(ts, event); Stream Sort with BinarySortedState - What’s Happening Underneath State (RocksDB) De-/Serializer add new merge op to memtable Serialized entry as byte[] new list entry
  • 22. Stream Sort with BinarySortedState - What’s Happening Underneath State (RocksDB) De-/Serializer full list as byte[] Lookup deserialized (Java) list search memtable + sst files for all entries events.valuesAt(ts).forEach(out::collect);
  • 23. Mark k/v as deleted (removal during compaction) Stream Sort with BinarySortedState - What’s Happening Underneath State (RocksDB) De-/Serializer events.clearEntryAt(ts); Delete
  • 24. Event-Time Stream Join w/out BinarySortedState (1) private BinarySortedState<Long, Transaction> transactions; private BinarySortedState<Long, Double> rate; private MapState<Long, List<Transaction>> transactions; private MapState<Long, Double> rate; public void processElement1(/*...*/) { // ... if (!isLate(ts, timerSvc)) { // append to BinarySortedState: transactions.add(ts, value); timerSvc.registerEventTimeTimer(ts); } } // similar for processElement2() public void processElement1(/*...*/) { // ... if (!isLate(ts, timerSvc)) { // replace list in MapState: addTransaction(ts, value); timerSvc.registerEventTimeTimer(ts); } } // similar for processElement2()
  • 25. Event-Time Stream Join w/out BinarySortedState (2) public void onTimer(long ts, /*...*/) { Entry<Long, Iterable<Double>> myRate = rate.lastEntry(ts); Double rateVal = Optional .ofNullable(myRate) .map(e -> e.getValue().iterator().next()) .orElse(null); transactions.valuesAt(ts).forEach( tx -> out.collect( new Joined(rateVal, tx))); if (myRate != null) { rate.clearRangeUntil(myRate.getKey(), false); } transactions.clearEntryAt(ts); } public void onTimer(long ts, /*...*/) { TreeSet<Entry<Long, Double>> rates = ratesInRange(ts, rate.entries()); Double myRate = Optional .ofNullable(rates.pollLast()) .map(Entry::getValue) .orElse(null); transactions.get(ts).forEach( tx -> out.collect( new Joined(myRate, tx))); rates.forEach(this::removeRate); transactions.remove(ts); }
  • 26. Stream Sort with BinarySortedState - Optimized ● Idea: ○ Increase efficiency by processing all events between watermarks ● Challenge: ○ Registering a timer for the next watermark will fire too often ➔ Solution: ○ Register timer for the first unprocessed event ○ When the timer fires: ■ Process all events until the current watermark (not the timer timestamp!) ● events.readRangeUntil(currentWatermark, true)
  • 27. Event-Time Stream Join with BinarySortedState - Optimized ● Idea: ○ Increase efficiency by processing all events between watermarks ● Challenge: ○ Registering a timer for the next watermark will fire too often ➔ Solution: ○ Same as Stream Sort: Timer for first unprocessed event, processing until watermark, but: ○ When the timer fires: ■ Iterate both, transactions and rate (in the appropriate time range) in event-time order
  • 32. ● (Custom) stream sorting ● Time-based (custom) joins ● Code with range-reads or bulk-deletes ● Custom window implementations ● Min/Max with retractions ● … ● Basically everything maintaining a MapState<?, List<?>> or requiring range operations BinarySortedState - Who will benefit?
  • 33. What’s left to do? ● Iron out last bits and pieces + tests ○ Start voting thread for FLIP-220 on dev@flink.apache.org ○ Create a PR and get it merged ● Expected to land in Flink 1.17 (as experimental feature) ● Port Table/SQL/DataStream operators to improve efficiency: ○ TemporalRowTimeJoinOperator (PoC already done for validating the API ✓) ○ RowTimeSortOperator ○ IntervalJoinOperator ○ CepOperator ○ … ● Provide more LexicographicalTypeSerializers
  • 34. Get ready to ROK!!! Nico Kruber linkedin.com/in/nico-kruber nico@immerok.com