Druid Adoption Tips and Tricks

Apache®, Apache Druid®, Druid®, and the Druid logo are either registered trademarks or trademarks of The Apache Software Foundation in the United States and other countries.

peter.marshall@imply.io
20 years in Enterprise Architecture
CRM, EDRM, ERP, EIP, Digital Services,
Security, BI, RI, and MDM
BA Theology (!) and Computer Studies
TOGAF certiﬁed
Book collector & A/V buyer
Prime Timeline = proper timeline
#werk
Peter Marshall
Technology Evangelist
petermarshall.io

Community Issues
Tips & Tricks
Call to Action
Questions & Answers

Peter Marshall
Technology Evangelist
@petermarshallio
peter.marshall@imply.io
Jelena Zanko
Senior Community Manager
@JelenaZanko
jelena.zanko@imply.io
Matt Sarrel
Developer Evangelist
@msarrel
matt.sarrel@imply.io
Rachel Pedreschi
Vice President of Community
@rachelpedreschi
rachel@imply.io
ASSIST WRITE EXCHANGE

ASSIST
The day-to-day problems people have
The most common issues
Potential content
Potential docs updates
Potential code changes
Archdruids!

DESIGN
Defining the data pipeline,
noting all the building
blocks required, noting how
only how they will be
realised but how and what
data objects will flow
through that pipeline and
with what size, shape, and
regularity.
DEPLOY
Manually or with
automation, assigning
Apache Druid components
and configurations to the
infrastructure - including
network services, routing
and firewall configuration,
encryption certificates -
along with the three Druid
dependencies: deep
storage, Zookeeper, and a
metadata database.
CREATE
Using the features of
Apache Druid within the
pipeline to achieve the
desired design.
STABILISE
Hardening and all those
tasks you would associate
with good service
transition, from defining
OLAs / SLAs to training and
educating your target
audience.
OPERATE
Monitoring, support and
maintenance of the
transitioned system to
meet SLAs.
Ingest
DBM
Query

Ingest
DBM
Query
Deﬁning ingestion tasks that will bring
statistics-ready data into Druid from
storage and delivery services,
(including schema, parsing rules,
transformation, ﬁltering, connectivity,
and tuning options) and ensuring
their distributed execution, led by the
overlord, is performant and
complete.
Led by the coordinator, replication
and distribution of the ingested data
according to rules, while allowing for
defragmenting (“compact”), reshaping,
heating / cooling, and deleting that
data.
Programming SQL / Druid Native code
executed by the distributed processes
that are led by the broker service
(possibly via the router process) with
security applied.

Ingest
DBM
Query
Defining ingestion tasks that will bring
statistics-ready data into Druid from
storage and delivery services,
(including schema, parsing rules,
transformation, filtering, connectivity,
and tuning options) and ensuring
their distributed execution, led by the
overlord, is performant and complete.
Led by the coordinator, replication
and distribution of the ingested data
according to rules, while allowing for
defragmenting (“compact”), reshaping,
heating / cooling, and deleting that
data.
Programming SQL / Druid Native code
executed by the distributed processes
that are led by the broker service
(possibly via the router process) with
security applied.
General Questions
Specifications (ingestion and
compaction), and how they are
written or generated
Execution of the ingestion
Inbound Integration to things like
Hadoop and Kafka
General Questions
Deletion (kill tasks) and
distribution of ingested data,
whether that’s immediately
afterwards or afterwards
Any metadata questions, ie sys.*
Auto Compaction configuration
(not the job itself - that’s a spec…)
General Questions
Authorisation and
Authentication via the broker
Designing fast, effective queries,
whether that’s SQL or Native.
Execution of queries
Outbound Integration of Druid
with tools like Superset

“ingestion is not happening to druid even if the data is present in the topic.”
“compact task seems to be blocked by the index task”
“failing task with "runnerStatusCode":"WAITING"”
“Ingestion task fails with RunTime Exception during BUILD_SEGMENTS phase”
“the task is still running until the time limit speciﬁed and then is marked as FAILED”
“it seems that the throughput does not cross 1M average”
“its taking more than hour to ingest. When we triggered kill task, its taking forever”
“tips or tricks on improving ingestion performance?”
“Ingestion was throttled for [35,610] millis because persists were pending”
Examples of Ingestion Execution problems

Examples of Ingestion Speciﬁcation problems
“How to resolve for NULL values when they are coming from source table?”
“Previous sequenceNumber [289] is no longer available for partition [0].”
“Error on batch ingesting from diﬀerent druid datasource”
“how to do some data formatting while handling schema changes”
“I am not seeing Druid doing any rollups”
“regexp_extract function is causing nullpointerexceptions”
“Anyone tried to hardcode the timeStamp?”

Don’t Walk Alone
Work with your end users to sketch
out what your Druid-powered user
interface is going to look like.
DESIGN DEPLOY CREATE STABILISE OPERATE

Tips for Druid Design
Real-time data analysis starts with time as a key dimension.
Comparisons make people think diﬀerently.
Filters make one visual cover multiple contexts.
Measures make one visual cover multiple indicators.
Create data sources focused on speed.
Create magic!

Druid != Island
Think about how and what you will
deploy onto your infrastructure,
especially Druid’s dependencies

Production
Specialised collectors
Applications & APIs
Machine & Human Data
Environmental Sensors
Stream & Bulk Repositories
Delivery
Real-time Analytics
BI Reporting & Dashboards
Buses, Queues & Databases
Search & Filtering UIs
Applications & APIs
Consolidation
Enrichment
Transformation & Stripping
Veriﬁcation & Validation
Filtering & Sorting
Processing
ENRICHMENT AND CLEANSING
REDUCING IN THE SIZE OF THE INPUT
Storage
RETENTION FOR LATER
FOR OPERATIONS & FOR RESEARCH
Query
Feature & Structure Discovery
Segmentation & Classiﬁcation
Recommendation
Prediction & Anomaly Detection
Statistical Calculations
EASY AND SPEEDY ANALYSIS
BY STATS, LEARNING, SEARCH...
Delivery
GUARANTEED DELIVERY
POSSIBLY STORING FOR A LIMITED PERIOD

Jump Right Er… no.
Configure JRE properties well -
especially heaps - and choose the right
hardware
Take time to read the tuning guide
druid.apache.org/docs/latest/operations/basic-cluster-tuning.html
druid.apache.org/docs/latest/configuration/index.html#jvm-configuration-best-practices

Historical Runtime Properties
druid.server.http.numThreads
druid.processing.buffer.sizeBytes
druid.processing.numMergeBuffers
druid.processing.numThreads
druid.server.maxSize
druid.historical.cache.useCache
druid.historical.cache.populateCache
druid.cache.sizeInBytes
Historical Java Properties
-Xms
-Xmx
-XX:MaxDirectMemorySize
Middle Manager Runtime Properties
druid.worker.capacity
druid.indexer.fork.property.druid.processing.numMergeBuffers
druid.indexer.fork.property.druid.processing.buffer.sizeBytes
druid.indexer.fork.property.druid.processing.numThreads
MiddleManager Java Properties
-Xms
-Xmx
MiddleManager Peon Java Properties
-Xms
-Xmx
Broker Runtime Properties
druid.broker.http.numConnections
druid.broker.cache.useCache
druid.broker.cache.populateCache
druid.broker.http.maxQueuedBytes
Broker / Coordinator / Overlord Java Properties
-Xms
-Xmx

Stay Connected
Druid is a highly distributed, loosely
coupled system on purpose.
Care for your interprocess
communication systems and paths:
especially Zookeeper and Http
druid.apache.org/docs/latest/dependencies/zookeeper.html
druid.apache.org/docs/latest/conﬁguration/index.html#zookeeper

Know Your Team
Get to know the core distributed
collaborations of Apache Druid

Zookeeper
Historical
Historical
OverlordOverlordCoordinator
Metadata
Store
Historical
BrokerOverlord
Middle Manager
(Indexer)
Deep Store

Love Your Log
Get to know the logs.
For ingestion, particularly the overlord,
middle manager and its tasks.
For what happens next, particularly
the coordinator and historicals.

K.I.S.S.
Be agile: set up a lab, start simple and
start small, working up to perfection

Create a target query list
Understand which source data columns you will need
at ingestion time (filtering, transformation, lookup) and
which are used at query time (filtering, sorting,
calculations, grouping, bucketing, aggregations)
Set up your dimension spec and execute queries,
recording query performance
Explore what other queries (Time Series, Group By,
Top N) you could do with the data
Add more subtasks and monitor the payloads
Add more data and check the lag
Some Ingestion Spec Tips
Use ingestion-time filter to eke out performance and
storage efficiencies
Use transforms to replace or create data closer to the
queries that people will execute
Use time granularity and roll-up to generate metrics
and datasketches (set, quantile, and cardinality)

Digest the Speciﬁcs
Learn ingestion speciﬁcations in detail
through your own exploration and
experimentation, from the docs, and
from the community

Understand Segments
Historicals serve all used segments,
and deep storage stores them
Query time relates directly to segment
size: lots of small segments means lots
of small query tasks
Segments are tracked in master nodes
and registered in the metadata DB

Segment Tips & Tricks
Filter rows and think carefully about what dimensions you need
Use diﬀerent segment granularities and row maximums to control the number of
segments generated
Apply time bucketting with query granularity and roll-up
Think about tiering your historicals using drop and load rules
Consider not just initial ingestion but on-going re-indexing
Never forget compaction!
Check local (maxBytesInMemory, maxRowsInMemory,
intermediatePersistPeriod) and deep storage (maxRowsPerSegment,
maxTotalRows, intermediateHandoffPeriod, taskDuration) persists

Ask Us Anything!
Find other people in the community
that have had the same issue as you

Learn from the best!
Find other people in the community
who have walked your walk!

COMMUNITY@IMPLY.IO
druid.apache.org/community

Get Meta
Collect metrics and understand how
your work aﬀects them

Infrastructure
Host
Druid Service
Instance Type
Imply Version
Druid Version
Query Data
Num Metrics
Num Complex Metrics
Interval
Duration
Filters?
Remote Address
Memory Used
Maximum Memory Used
Garbage Collections
Total / Average GC Time
Total CPU Time
User CPU Time
CPU Wait Time
CPU System Use
Avg Jetty Connections
Min / Max Jetty Conns
Infrastructure Query Cache
Hit Rate
Hits
Misses
Timeouts
Errors
Size
Number of Entries
Average Entry Size
Evictions
Query Patterns
Query Count
Average Query Time
98%ile Query Time
Max Query Time
Tasks
Task Id
Task Status
Ingestion
Events Processed
Unparseable Events
Thrown Away Events
Output Rows
Persists
Total Back Pressure
Message Gap
Kafka Lag
Avg Return Size
Total CPU Time
Subquery Count
Avg Subquery Time
Data Source
Query Type
Native / Query ID
Successful?
Identity
Context
https://druid.apache.org/docs/latest/operations/metrics.html
Metrics & Measures

Inform capacity planning
Isolate potential execution bottlenecks
Check and investigate cluster performance
Flatten the learning curve for running Druid at scale
Infrastructure Use & Experience Ingestion

How can we all help each other?
1 Come with us, join us...
Join ASF Slack and the Google User Group, say hi, give people (socially distanced) hugs -
and link back to the docs
2 Tippy-Tappy-Typey-Type-Type
Blog helpful tips and tricks and walkthroughs of your own ingestion integrations,
speciﬁcations, and execution conﬁgurations. Contribute code and doc updates :-D
3 Make Pretty Slides
Take part in Ask Me Anything, Town Hall, and Druid meetups about ingestion.

Walkthroughs Tips & Tricks DDCSO Content

Druid Adoption Tips and Tricks

More Related Content

Druid Adoption Tips and Tricks