Pinot: Realtime Distributed OLAP datastore

Pinot
Kishore Gopalakrishna
Tuesday, August 18, 15

Agenda
• Pinot @ LinkedIn - Current
• Pinot - Architecture
• Pinot Operations
• Pinot @ LinkedIn - Future

Slice and Dice Metrics

Pinot @ LinkedIn
Customers Members Internal tools

• 100B documents
• 1B documents ingested per day
• 100M queries per day
• 10’s of ms latency
• 30 tables in prod, 250 * 3 std app nodes

Pinot @ LinkedIn

Key features
SQL-like
interface
Columnar
storage and
indexing
Real-time
data load

(S)QL: Filters and Aggs
SELECT count(*)
FROM companyFollowHistoricalEvents
WHERE entityId = 121011 AND
'day' >= 15949 AND 'day' <= 15963 AND
paid = 'y’ AND
action = 'stop'

(S)QL: Group By
SELECT count(*)
'day' >= 15949 AND 'day' <= 15963 AND
paid = 'y’
GROUP BY action

(S)QL: ORDER BY and LIMIT
SELECT *
entityId = 1000 AND
action = 'start'
ORDER BY creationTime DESC LIMIT 1

Whats not supported
• JOIN: unpredictable performance
• NOT A SOURCE OF TRUTH
• Mutation

Pinot
• Data flow
• Query Execution
• How to use/operate
• Pinot @ LinkedIn - Future

Broker Helix
Real
time
Historical
Kafka Hadoop
Pinot
Architecture
Queries
Raw
Data

Pinot
• Pinot segments

Pinot Segment layout: Columnar storage

Pinot Segment layout: Sorted Forward Index

Pinot Segment layout: Other techniques
• Indexes: Inverted index, Bitmap, RoaringBitmap
• Compression: Dictionary Encoding, P4Delta
• Multi Valued columns, skip lists,
• Hyperloglog for unique
• T-digest for Percentile, Quantile


Data aware
pre-computation
Star tree Index

Pinot
• Query Execution

Pinot Query Execution: Distributed
Servers
S1
S3
S2
S1
S3
S2
Helix
Brokers

Servers
1.Query
S1
S3
S2
S1
S3
S2
Helix
Brokers

Servers
1.Query
S1
S3
S2
S1
S3
S2
Helix
2. Fetch routing table from HelixBrokers

Servers
1.Query
S1
S3
S2
S1
S3
S2
Helix
3. Scatter Request

Servers
1.Query
S1
S3
S2
S1
S3
S2
Helix
3. Scatter Request
4. Process Request
&
send response

Servers
1.Query
S1
S3
S2
S1
S3
S2
Helix
3. Scatter Request
4. Process Request
&
send response
5. Gather Response

Servers
1.Query
S1
S3
S2
S1
S3
S2
Helix
3. Scatter Request
4. Process Request
&
send response
5. Gather Response
6. Return Response

Pinot Query Execution: Single Node Architecture
EXECUTION ENGINE
INVERTED
INDEX
BITMAP
INDEX
COLUMN FORMAT
PLANNER

Pinot Query Execution: Single Node Architecture
SELECT
campaignId,
sum(clicks)
FROM Table A
WHERE
accountId = 121011
AND
'day' >= 15949
GROUP BY
campaignId
account Id daycampaign Id click
Filter
Operator
Projection
Operator
Aggregation
Group by
Operator
Combine Operator
Pinot
Segments
Data sources
Matching
doc ids
campaignId,Click tuple

Pinot
• Operations

Cluster Management: Deployment
Helix
Brokers
Servers
• Brokers and Servers register themselves in Helix
• All servers start with no use case specific configuration
Controller

On boarding new use case
Helix
Brokers
Servers
XLNT XLNT
XLNT
Create Table
command
Controller
XLNT
XLNTTag
Servers
TableName
Brokers
3
XLNT_T1
1

Segment Assignment
Servers
S3
S2
S1
Upload Segment S2
S1
S3
S2
S1
S3
Helix
Brokers
Copies
TableName
2
XLNT_T1
Controller

• AUTO recovery mode: Automatically redistribute
segments on failure/addition of new nodes
• Custom mode: Run in degraded mode until node is
restarted/replaced.
Pinot - Fault tolerance/Elasticity

Pinot vs Druid
Druid Pinot
Architecture
Realtime + Offline,
Realtime only
Realtime + Offline
Realtime only -> consistency is hard and
schema evolution/Bootstrap is hard
Inverted Index
Always On all columns,
Fixed
Configurable on per
column basis
Allows trade off between scanning v/s
inverted index + scanning. More data can be
fit in given memory size
Data organization N/A Sorts data
Organizing data provides speed/better
compression and removes the need for
inverted index
Smart pre-
materialization
N/A star-tree Allows trade off between latency and space
Query Execution
Layer
Fixed Plan
Split into Planning
and execution
Smart choices can be made at runtime
based on metadata/query.

• Documentation & tooling
• In progress - consistency among real time replicas.
• Improve cost to serve - leverage SSD, partial pre
materialization
• ThirdEye - Business Metrics Monitoring
Pinot - Future

Thank You
30

Pinot: Realtime Distributed OLAP datastore

More Related Content

Pinot: Realtime Distributed OLAP datastore