Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
SlideShare a Scribd company logo

1

How We Added Replication to
QuestDB, a time-series database
Javier Ramírez
@supercoco9
Database Advocate

2

Agenda.
If you dislike technical details, this is
the wrong presentation
● Intro to Fast & Streaming Data
● Overview of QuestDB Storage
● About Replication
● Common solutions
● The QuestDB implementation
● Parallel Write-Ahead Log
● Physical layout
● Object Storage
● Dealing with upgrades
● What’s next

3

Not all data
problems are
the same

4

We have 400k smart meters, each
sending a record every 5 minutes.
~120 million rows per day
Real request from potential user

5

● a factory floor with 500 machines, or
● a fleet with 500 vehicles, or
● 50 trains, with 10 cars each, or
● 500 users with a mobile phone
Sending data every second
How to be a (data) billionaire
43,200,000 rows a day
302,400,000 rows a week
1,314,144,000 rows a month

6

● Optimised for fast ingestion
● Data lifecycle policies
● Analytics over chunks of time
● Time-based aggregations
● Often power real-time dashboards
Time-series database basics

7

How We Added Replication to QuestDB - JonTheBeach

8

QuestDB would like to be known for:
● Performance
○ Also with smaller machines
● Developer Experience
○ Multiple protocols and client libraries. Sensible SQL
extensions
● Open Source
○ (Apache 2.0 license)*
* Enterprise and Cloud Versions add non OSS features like Single Sign On, RBAC, managed snapshots, or multi-primary replication

9

QuestDB in action: quick showcase
https://dashboard.questdb.io/d/fb13b4ab-b1c9-4a54-a920-b60c5fb0363f/publi
c-dashboard-questdb-io-use-cases-crypto?orgId=1&refresh=250ms
https://dashboard.questdb.io/d/d0ede584-a923-4b21-acd1-af8b7c63f5c8/publ
ic-dashboard-dashboards-taxi?orgId=1&refresh=1s
https://github.com/questdb/time-series-streaming-analytics-template
https://demo.questdb.io

10

A year ago

11

Production is a scary place
● Application errors
● Connectivity issues
● Network timeout/server busy
● Component temporarily
offline/restarting/updating
● Hardware failure
● Full disk
● Just how protocols work

12

The path to implementing replication
● Reflecting on our current (at the time) storage layer
● Deciding the flavour of replication we want
● Decoupling ingestion from storage
● Making it robust (upgrades, fault-tolerance…)

13

QuestDB at a glance
13
Network
API
Compute
API
Storage
API Storage Engine Writer API
ILP (over
TCP socket
or HTTP)
Bulk Loader SQL Engine
REST
PG Wire
Data Ingress
Data Egress
Reader API
Reader API

14

QuestDB ingestion and storage layer
● Data always stored by incremental timestamp.
● No indexes needed*. Data is immediately available
after writing.
● Data partitioned by time units and stored in tabular
columnar format.
● Predictable ingestion rate, even under demanding
workloads (millions/second).
● Row updates and upserts supported.
https://questdb.io/docs/concept/storage-model/

15

Storage Engine - file system layout
15
2022–04-11
2022–04-12
.d .d
.d
.i
.d .i
string.d
price.d
string.i
Partition 2
Partition 1
Fixed-size
Column file
Var-size
Column files
price.d
string.d
string.i

16

Storage Engine - var-size
16
O1
O2
.i
O3
.d
index
data
Var size data
● Index contains 64-bit offsets into data file
● Data entries are length prefixed
● Index contains N+1 elements
O4

17

Storage Engine - Partitioning
17
Ticker Side Price Amount Time
ETH-USD sell 2615.54 0.00044 18:00
ETH-USD sell 2617.13 0.001 19:00
BTC-USD buy 39269.98 0.000245 18:02
ETH-USD buy 2640.21 0.14810976 18:07
BTC-USD buy 39270.01 0.000245 19:05
BTC-USD sell 39250.21 0.00046562 18:05
ETH-USD sell 2621.03 0.000127 19:09
ETH-USD buy 2640.18 0.02593599 19:06
BTC-USD buy 39270.01 0.0127959 18:10
ETH-USD sell 2615.54 0.00044 18:00
BTC-USD buy 39269.98 0.000245 18:02
BTC-USD sell 39250.21 0.00046562 18:05
ETH-USD buy 2640.21 0.14810976 18:07
BTC-USD buy 39270.01 0.0127959 18:10
ETH-USD sell 2617.13 0.001 19:00
BTC-USD buy 39270.01 0.000245 19:05
ETH-USD buy 2640.18 0.02593599 19:06
ETH-USD sell 2621.03 0.000127 19:09
18H
19H
SQL:

18

● Partitions are versioned
● Columns are versioned within partition
● Merge operation will create a new partition with new transaction index
● Queries will switch over to new snapshot when they are ready
Storage Engine - snapshots
18
2022–04-11T18.9901
ticker.d.10031
2022–04-11T18.9945
ticker.d.10049

19

The classic storage model would lock on pg-wire

20

Enter Replication

21

Architectural Considerations
● Synchronous vs Asynchronous replication
● Multi primary vs Single primary with Read-Only Replicas
● External coordinator vs Peer-to-Peer
● Replicate everything vs Replicate Shards
● Write Ahead Log vs non-sorted (for example, hinted handoffs)

22

WAL-based replication basics

23

PostgreSQL
Replication
Overview
● WAL based
● Read-replicas
● Primary node is the
coordinator

24

Clickhouse
Replication
Overview
● WAL based
● Multi Primary
● Coordination via CH
Keeper
● Replicas download WAL
parts from primary
● Sharding available

25

QuestDB Replication:
● Multi-primary
● Parallel Write Ahead Log
● Asynchronous
● Coordination via custom Sequencer
● No visibility needed across nodes
● Dataset replicated via shared storage (S3/AZ Blob/NFS/HDFS…)
● No Sharding or distributed queries (in roadmap)
● Optimistic lock-free schema changes
● Side effect: deduplication at ingestion time

26

How We Added Replication to QuestDB - JonTheBeach

27

Parallel Write Ahead Log (WAL)
27

28

28
Non-blocking
Parallel WAL
Across
Protocols
And
Connections.
Deduplication happens here

29

Parallelism in the Write-Ahead Log
Client Connections C1 C2 C3
ϟ ϟ ϟ
W3
W2
W1
WAL Writers
tx01 tx03 tx04
tx02
tx06
tx11
tx05
tx08
tx12
tx07
tx09
tx10
Sequencer W1[0] W1[1] W3[0]
W2[0] …

30

Out-of-order Merge
W3
W2
W1
tx01 tx03 tx04
tx02
tx06
tx11
tx05
tx08
tx12
tx07
tx09
tx10
W1[0] W1[1] W3[0]
W2[0] …
tx01
ts price symbol qty
ts01 178.08 AAPL 1000
ts02 148.66 GOOGL 400
ts03 424.86 MSFT 5000
ts10 178.09 AMZN 100
ts11 505.08 META 2500
ts12 394.14 GS 2000
… … … …
tx02
ts price symbol qty
ts04 192.42 JPM 5000
ts05 288.78 V 300
ts06 156.40 JNJ 6500
ts07 181.62 AMD 7800
ts08 37.33 BAC 1500
ts09 60.83 KO 4000
… … … …

31

Types of WAL records
● Data Record
● SQL Record (DDL Schema Changes)
● Symbol Entry and Symbol Map Records
● Bind Variable and Named Bind Variable Records
● Commit Record

32

Logical view of WAL records
"records": [
{"type": "SymbolMap", "columnCount": 2, "symbolCounts": [256, 256]},
{"type": "SymbolEntry", "columnIndex": 0, "symbolName": "ETH-USD"},
{"type": "SymbolEntry", "columnIndex": 0, "symbolName": "BTC-USD"},
{"type": "SymbolEntry", "columnIndex": 1, "symbolName": "buy"},
{"type": "SymbolEntry", "columnIndex": 1, "symbolName": "sell"},
{"type": "Data", "columnCount": 5, "rowCount": 10,
"columnTypes": ["SYMBOL", "SYMBOL", "DOUBLE", "DOUBLE", "TIMESTAMP"],
"columnData": [
[0, 0, 0, 0, 1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 0, 0, 0, 0, 0, 0],
[3201.14, 3201.18, 3201.18, 3201.14, 65204.52, 65204.52, 65204.39, 65202.28, 65200.06, 65200.05],
[0.86098648, 0.04550806, 0.12002546, 1.352E-5, 0.01118905, 0.00730946, 0.0001, 0.04907981, 0.00157006,
0.00075903],
["2024-05-06T09:40:04.747332Z", "2024-05-06T09:40:04.747332Z", "2024-05-06T09:40:04.747332Z",
"2024-05-06T09:40:04.747494Z", "2024-05-06T09:40:04.988272Z", "2024-05-06T09:40:04.988272Z",
"2024-05-06T09:40:04.988272Z", "2024-05-06T09:40:04.988272Z", "2024-05-06T09:40:04.988272Z",
"2024-05-06T09:40:05.000078Z"]
]
},
{"type": "Commit", "timestamp": "2024-05-06T09:40:05.000078Z"
}
]
CREATE TABLE trades ( symbol SYMBOL capacity 256 CACHE, side SYMBOL capacity 256 CACHE, price
DOUBLE, amount DOUBLE, timestamp TIMESTAMP) timestamp (timestamp) PARTITION BY DAY WAL;

33

Physical layout of
the WAL files
├── db
│ ├── Table
│ │ │
│ │ ├── Partition 1
│ │ │ ├── _archive
│ │ │ ├── column1.d
│ │ │ ├── column2.d
│ │ │ ├── column2.k
│ │ │ └── ...
│ │ ├── Partition 2
│ │ │ ├── _archive
│ │ │ ├── column1.d
│ │ │ ├── column2.d
│ │ │ ├── column2.k
│ │ │ └── ...
│ │ ├── txn_seq
│ │ │ ├── _meta
│ │ │ ├── _txnlog
│ │ │ └── _wal_index.d
│ │ ├── wal1
│ │ │ └── 0
│ │ │ ├── _meta
│ │ │ ├── _event
│ │ │ ├── column1.d
│ │ │ ├── column2.d
│ │ │ └── ...
│ │ ├── wal2
│ │ │ └── 0
│ │ │ │ ├── _meta
│ │ │ │ ├── _event
│ │ │ │ ├── column1.d
│ │ │ │ ├── column2.d
│ │ │ │ └── ...
│ │ │ └── 1
│ │ │ ├── _meta
│ │ │ ├── _event
│ │ │ ├── column1.d
│ │ │ ├── column2.d
│ │ │ └── ...
│ │ │
│ │ ├── _meta
│ │ ├── _txn
│ │ └── _cv
● One WALx subfolder per table and connection
● txn_seq folder to serialize transactions across
parallel WAL folders
● _event file as transaction index for each WAL
folder
● _meta files with schema version/data
● One file per column (for now), with the binary
data
● _cv file for Commit Verification

34

34

35

Optimistic locking-free schema changes

36

WAL Applied
to Replicas:
Compress and Send
to Shared / Object
Storage
36

37

Dealing with upgrades: index.msgpack
pub struct TableMetadata {
/// The number of transactions in each sequencer part.
pub sequencer_part_txn_count
: u32,
/// The first transaction with data in the object store.
/// Note: `TxnId::zero()` represents a newly created table.
pub first_txn: TxnId,
/// Timestamp of the `first_txn`.
/// If `first_txn > 0` (i.e. a non-new table),
/// then this represents the lowest bound for a minimum required
/// full-database snapshot.
pub first_at: EpochMicros,
/// The last transaction (inclusive) with data in the object
store.
pub last_txn: TxnId,
/// The timestamp of when the table was created.
pub created_at: EpochMicros,
/// The timestamp when the table was dropped.
pub deleted_at: Option<EpochMicros>,
}
pub struct Index {
/// Format version
pub version: u64,
pub sync_id: IndexSyncId,
/// Map of tables to their
creation and deletion times.
pub tables: HashMap<TableDirName,
TableMetadata>,
}

38

Multi-primary ingestion (Enterprise only right now)
Same concept than local sequencer and transaction IDs, but with a sequencer
backed by FoundationDB which stores metadata and information about cluster
members.
Client libraries transparently get the addresses of available primaries and
replicas to send data and queries.
Optimistic locking for conflict resolution.

39

What’s next: Parquet (also coming to Open Source)
● Separation of storage and computation
● Allows using datasets larger than a single drive
● Allows for data lakehouse architecture
● “First-mile” time-series queries are served from local storage (also on
parquet), and older data is served from the shared file system.
● Tight integration with our query engine to leverage compression as
much as possible
● Arrow Database Connectivity compatibility to read data out quickly

40

What we discussed.
If you dislike technical details, it is
probably too late now
● Intro to Fast & Streaming Data
● Overview of QuestDB Storage
● About Replication
● Common solutions
● The QuestDB implementation
● Parallel Write-Ahead Log
● Physical layout
● Object Storage
● Dealing with upgrades
● What’s next

41

QuestDB OSS
Open Source. Self-managed. Suitable for
production workloads.
https://github.com/questdb/questdb
QuestDB Enterprise
Licensed. Self-managed. Enterprise features like
RBAC, compression, replication, TLS on all
protocols, cold storage, K8s operator…
https://questdb.io/enterprise/
QuestDB Cloud
Fully managed, pay per usage environment,
with enterprise-grade features.
https://questdb.io/cloud/

42

42
● github.com/questdb/questdb
● https://questdb.io
● https://demo.questdb.io
● https://slack.questdb.io/
● https://github.com/questdb/time-series-
streaming-analytics-template
We 💕 contributions
and GitHub ⭐ stars
Javier Ramírez
@supercoco9
Database Advocate

More Related Content

How We Added Replication to QuestDB - JonTheBeach

  • 1. How We Added Replication to QuestDB, a time-series database Javier Ramírez @supercoco9 Database Advocate
  • 2. Agenda. If you dislike technical details, this is the wrong presentation ● Intro to Fast & Streaming Data ● Overview of QuestDB Storage ● About Replication ● Common solutions ● The QuestDB implementation ● Parallel Write-Ahead Log ● Physical layout ● Object Storage ● Dealing with upgrades ● What’s next
  • 3. Not all data problems are the same
  • 4. We have 400k smart meters, each sending a record every 5 minutes. ~120 million rows per day Real request from potential user
  • 5. ● a factory floor with 500 machines, or ● a fleet with 500 vehicles, or ● 50 trains, with 10 cars each, or ● 500 users with a mobile phone Sending data every second How to be a (data) billionaire 43,200,000 rows a day 302,400,000 rows a week 1,314,144,000 rows a month
  • 6. ● Optimised for fast ingestion ● Data lifecycle policies ● Analytics over chunks of time ● Time-based aggregations ● Often power real-time dashboards Time-series database basics
  • 8. QuestDB would like to be known for: ● Performance ○ Also with smaller machines ● Developer Experience ○ Multiple protocols and client libraries. Sensible SQL extensions ● Open Source ○ (Apache 2.0 license)* * Enterprise and Cloud Versions add non OSS features like Single Sign On, RBAC, managed snapshots, or multi-primary replication
  • 9. QuestDB in action: quick showcase https://dashboard.questdb.io/d/fb13b4ab-b1c9-4a54-a920-b60c5fb0363f/publi c-dashboard-questdb-io-use-cases-crypto?orgId=1&refresh=250ms https://dashboard.questdb.io/d/d0ede584-a923-4b21-acd1-af8b7c63f5c8/publ ic-dashboard-dashboards-taxi?orgId=1&refresh=1s https://github.com/questdb/time-series-streaming-analytics-template https://demo.questdb.io
  • 11. Production is a scary place ● Application errors ● Connectivity issues ● Network timeout/server busy ● Component temporarily offline/restarting/updating ● Hardware failure ● Full disk ● Just how protocols work
  • 12. The path to implementing replication ● Reflecting on our current (at the time) storage layer ● Deciding the flavour of replication we want ● Decoupling ingestion from storage ● Making it robust (upgrades, fault-tolerance…)
  • 13. QuestDB at a glance 13 Network API Compute API Storage API Storage Engine Writer API ILP (over TCP socket or HTTP) Bulk Loader SQL Engine REST PG Wire Data Ingress Data Egress Reader API Reader API
  • 14. QuestDB ingestion and storage layer ● Data always stored by incremental timestamp. ● No indexes needed*. Data is immediately available after writing. ● Data partitioned by time units and stored in tabular columnar format. ● Predictable ingestion rate, even under demanding workloads (millions/second). ● Row updates and upserts supported. https://questdb.io/docs/concept/storage-model/
  • 15. Storage Engine - file system layout 15 2022–04-11 2022–04-12 .d .d .d .i .d .i string.d price.d string.i Partition 2 Partition 1 Fixed-size Column file Var-size Column files price.d string.d string.i
  • 16. Storage Engine - var-size 16 O1 O2 .i O3 .d index data Var size data ● Index contains 64-bit offsets into data file ● Data entries are length prefixed ● Index contains N+1 elements O4
  • 17. Storage Engine - Partitioning 17 Ticker Side Price Amount Time ETH-USD sell 2615.54 0.00044 18:00 ETH-USD sell 2617.13 0.001 19:00 BTC-USD buy 39269.98 0.000245 18:02 ETH-USD buy 2640.21 0.14810976 18:07 BTC-USD buy 39270.01 0.000245 19:05 BTC-USD sell 39250.21 0.00046562 18:05 ETH-USD sell 2621.03 0.000127 19:09 ETH-USD buy 2640.18 0.02593599 19:06 BTC-USD buy 39270.01 0.0127959 18:10 ETH-USD sell 2615.54 0.00044 18:00 BTC-USD buy 39269.98 0.000245 18:02 BTC-USD sell 39250.21 0.00046562 18:05 ETH-USD buy 2640.21 0.14810976 18:07 BTC-USD buy 39270.01 0.0127959 18:10 ETH-USD sell 2617.13 0.001 19:00 BTC-USD buy 39270.01 0.000245 19:05 ETH-USD buy 2640.18 0.02593599 19:06 ETH-USD sell 2621.03 0.000127 19:09 18H 19H SQL:
  • 18. ● Partitions are versioned ● Columns are versioned within partition ● Merge operation will create a new partition with new transaction index ● Queries will switch over to new snapshot when they are ready Storage Engine - snapshots 18 2022–04-11T18.9901 ticker.d.10031 2022–04-11T18.9945 ticker.d.10049
  • 19. The classic storage model would lock on pg-wire
  • 21. Architectural Considerations ● Synchronous vs Asynchronous replication ● Multi primary vs Single primary with Read-Only Replicas ● External coordinator vs Peer-to-Peer ● Replicate everything vs Replicate Shards ● Write Ahead Log vs non-sorted (for example, hinted handoffs)
  • 23. PostgreSQL Replication Overview ● WAL based ● Read-replicas ● Primary node is the coordinator
  • 24. Clickhouse Replication Overview ● WAL based ● Multi Primary ● Coordination via CH Keeper ● Replicas download WAL parts from primary ● Sharding available
  • 25. QuestDB Replication: ● Multi-primary ● Parallel Write Ahead Log ● Asynchronous ● Coordination via custom Sequencer ● No visibility needed across nodes ● Dataset replicated via shared storage (S3/AZ Blob/NFS/HDFS…) ● No Sharding or distributed queries (in roadmap) ● Optimistic lock-free schema changes ● Side effect: deduplication at ingestion time
  • 27. Parallel Write Ahead Log (WAL) 27
  • 29. Parallelism in the Write-Ahead Log Client Connections C1 C2 C3 ϟ ϟ ϟ W3 W2 W1 WAL Writers tx01 tx03 tx04 tx02 tx06 tx11 tx05 tx08 tx12 tx07 tx09 tx10 Sequencer W1[0] W1[1] W3[0] W2[0] …
  • 30. Out-of-order Merge W3 W2 W1 tx01 tx03 tx04 tx02 tx06 tx11 tx05 tx08 tx12 tx07 tx09 tx10 W1[0] W1[1] W3[0] W2[0] … tx01 ts price symbol qty ts01 178.08 AAPL 1000 ts02 148.66 GOOGL 400 ts03 424.86 MSFT 5000 ts10 178.09 AMZN 100 ts11 505.08 META 2500 ts12 394.14 GS 2000 … … … … tx02 ts price symbol qty ts04 192.42 JPM 5000 ts05 288.78 V 300 ts06 156.40 JNJ 6500 ts07 181.62 AMD 7800 ts08 37.33 BAC 1500 ts09 60.83 KO 4000 … … … …
  • 31. Types of WAL records ● Data Record ● SQL Record (DDL Schema Changes) ● Symbol Entry and Symbol Map Records ● Bind Variable and Named Bind Variable Records ● Commit Record
  • 32. Logical view of WAL records "records": [ {"type": "SymbolMap", "columnCount": 2, "symbolCounts": [256, 256]}, {"type": "SymbolEntry", "columnIndex": 0, "symbolName": "ETH-USD"}, {"type": "SymbolEntry", "columnIndex": 0, "symbolName": "BTC-USD"}, {"type": "SymbolEntry", "columnIndex": 1, "symbolName": "buy"}, {"type": "SymbolEntry", "columnIndex": 1, "symbolName": "sell"}, {"type": "Data", "columnCount": 5, "rowCount": 10, "columnTypes": ["SYMBOL", "SYMBOL", "DOUBLE", "DOUBLE", "TIMESTAMP"], "columnData": [ [0, 0, 0, 0, 1, 1, 1, 1, 1, 1], [1, 1, 1, 1, 0, 0, 0, 0, 0, 0], [3201.14, 3201.18, 3201.18, 3201.14, 65204.52, 65204.52, 65204.39, 65202.28, 65200.06, 65200.05], [0.86098648, 0.04550806, 0.12002546, 1.352E-5, 0.01118905, 0.00730946, 0.0001, 0.04907981, 0.00157006, 0.00075903], ["2024-05-06T09:40:04.747332Z", "2024-05-06T09:40:04.747332Z", "2024-05-06T09:40:04.747332Z", "2024-05-06T09:40:04.747494Z", "2024-05-06T09:40:04.988272Z", "2024-05-06T09:40:04.988272Z", "2024-05-06T09:40:04.988272Z", "2024-05-06T09:40:04.988272Z", "2024-05-06T09:40:04.988272Z", "2024-05-06T09:40:05.000078Z"] ] }, {"type": "Commit", "timestamp": "2024-05-06T09:40:05.000078Z" } ] CREATE TABLE trades ( symbol SYMBOL capacity 256 CACHE, side SYMBOL capacity 256 CACHE, price DOUBLE, amount DOUBLE, timestamp TIMESTAMP) timestamp (timestamp) PARTITION BY DAY WAL;
  • 33. Physical layout of the WAL files ├── db │ ├── Table │ │ │ │ │ ├── Partition 1 │ │ │ ├── _archive │ │ │ ├── column1.d │ │ │ ├── column2.d │ │ │ ├── column2.k │ │ │ └── ... │ │ ├── Partition 2 │ │ │ ├── _archive │ │ │ ├── column1.d │ │ │ ├── column2.d │ │ │ ├── column2.k │ │ │ └── ... │ │ ├── txn_seq │ │ │ ├── _meta │ │ │ ├── _txnlog │ │ │ └── _wal_index.d │ │ ├── wal1 │ │ │ └── 0 │ │ │ ├── _meta │ │ │ ├── _event │ │ │ ├── column1.d │ │ │ ├── column2.d │ │ │ └── ... │ │ ├── wal2 │ │ │ └── 0 │ │ │ │ ├── _meta │ │ │ │ ├── _event │ │ │ │ ├── column1.d │ │ │ │ ├── column2.d │ │ │ │ └── ... │ │ │ └── 1 │ │ │ ├── _meta │ │ │ ├── _event │ │ │ ├── column1.d │ │ │ ├── column2.d │ │ │ └── ... │ │ │ │ │ ├── _meta │ │ ├── _txn │ │ └── _cv ● One WALx subfolder per table and connection ● txn_seq folder to serialize transactions across parallel WAL folders ● _event file as transaction index for each WAL folder ● _meta files with schema version/data ● One file per column (for now), with the binary data ● _cv file for Commit Verification
  • 34. 34
  • 36. WAL Applied to Replicas: Compress and Send to Shared / Object Storage 36
  • 37. Dealing with upgrades: index.msgpack pub struct TableMetadata { /// The number of transactions in each sequencer part. pub sequencer_part_txn_count : u32, /// The first transaction with data in the object store. /// Note: `TxnId::zero()` represents a newly created table. pub first_txn: TxnId, /// Timestamp of the `first_txn`. /// If `first_txn > 0` (i.e. a non-new table), /// then this represents the lowest bound for a minimum required /// full-database snapshot. pub first_at: EpochMicros, /// The last transaction (inclusive) with data in the object store. pub last_txn: TxnId, /// The timestamp of when the table was created. pub created_at: EpochMicros, /// The timestamp when the table was dropped. pub deleted_at: Option<EpochMicros>, } pub struct Index { /// Format version pub version: u64, pub sync_id: IndexSyncId, /// Map of tables to their creation and deletion times. pub tables: HashMap<TableDirName, TableMetadata>, }
  • 38. Multi-primary ingestion (Enterprise only right now) Same concept than local sequencer and transaction IDs, but with a sequencer backed by FoundationDB which stores metadata and information about cluster members. Client libraries transparently get the addresses of available primaries and replicas to send data and queries. Optimistic locking for conflict resolution.
  • 39. What’s next: Parquet (also coming to Open Source) ● Separation of storage and computation ● Allows using datasets larger than a single drive ● Allows for data lakehouse architecture ● “First-mile” time-series queries are served from local storage (also on parquet), and older data is served from the shared file system. ● Tight integration with our query engine to leverage compression as much as possible ● Arrow Database Connectivity compatibility to read data out quickly
  • 40. What we discussed. If you dislike technical details, it is probably too late now ● Intro to Fast & Streaming Data ● Overview of QuestDB Storage ● About Replication ● Common solutions ● The QuestDB implementation ● Parallel Write-Ahead Log ● Physical layout ● Object Storage ● Dealing with upgrades ● What’s next
  • 41. QuestDB OSS Open Source. Self-managed. Suitable for production workloads. https://github.com/questdb/questdb QuestDB Enterprise Licensed. Self-managed. Enterprise features like RBAC, compression, replication, TLS on all protocols, cold storage, K8s operator… https://questdb.io/enterprise/ QuestDB Cloud Fully managed, pay per usage environment, with enterprise-grade features. https://questdb.io/cloud/
  • 42. 42 ● github.com/questdb/questdb ● https://questdb.io ● https://demo.questdb.io ● https://slack.questdb.io/ ● https://github.com/questdb/time-series- streaming-analytics-template We 💕 contributions and GitHub ⭐ stars Javier Ramírez @supercoco9 Database Advocate