Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
SlideShare a Scribd company logo
HBase in Practice
Lars George ā€“ Partner and Co-Founder @ OpenCore
DataWorks Summit 2017 - Munich
NoSQL is no SQL is SQL?
About Me
ā€¢ Partner & Co-Founder at OpenCore
ā€¢ Before that
ā€¢ Lars: EMEA Chief Architect at Cloudera (5+ years)
ā€¢ Hadoop since 2007
ā€¢ Apache Committer & Apache Member
ā€¢ HBase (also in PMC)
ā€¢ Lars: Oā€™Reilly Author: HBase ā€“ The Definitive Guide
ā€¢ Contact
ā€¢ lars.george@opencore.com
ā€¢ @larsgeorge
Website: www.opencore.com
Agenda
ā€¢ Brief Intro To Core Concepts
ā€¢ Access Options
ā€¢ Data Modelling
ā€¢ Performance Tuning
ā€¢ Use-Cases
ā€¢ Summary
Introduction To Core Concepts
HBase Tables
ā€¢ From user perspective, HBase is similar to a database, or spreadsheet
ā€¢ There are rows and columns, storing values
ā€¢ By default asking for a specific row/column combination returns the
current value (that is, that last value stored there)
HBase Tables
ā€¢ HBase can have a
different schema
per row
ā€¢ Could be called
schema-less
ā€¢ Primary access by
the user given row
key and column
name
ā€¢ Sorting of rows and
columns by their
key (aka names)
HBase Tables
ā€¢ Each row/column coordinate is tagged with a version number, allowing
multi-versioned values
ā€¢ Version is usually
the current time
(as epoch)
ā€¢ API lets user ask
for versions
(specific, by count,
or by ranges)
ā€¢ Up to 2B versions
HBase Tables
ā€¢ Table data is cut into pieces to distribute over cluster
ā€¢ Regions split table into
shards at size boundaries
ā€¢ Families split within
regions to group
sets of columns
together
ā€¢ At least one of
each is needed
Scalability ā€“ Regions as Shards
ā€¢ A region is served by exactly
one region server
ā€¢ Every region server serves
many regions
ā€¢ Table data is spread over servers
ā€¢ Distribution of I/O
ā€¢ Assignment is based on
configurable logic
ā€¢ Balancing cluster load
ā€¢ Clients talk directly to region
servers
Column Family-Oriented
ā€¢ Group multiple columns into
physically separated locations
ā€¢ Apply different properties to each
family
ā€¢ TTL, compression, versions, ā€¦
ā€¢ Useful to separate distinct data
sets that are related
ā€¢ Also useful to separate larger blob
from meta data
Data Management
ā€¢ What is available is tracked in three
locations
ā€¢ System catalog table hbase:meta
ā€¢ Files in HDFS directories
ā€¢ Open region instances on servers
ā€¢ System aligns these locations
ā€¢ Sometimes (very rarely) a repair may
be needed using HBase Fsck
ā€¢ Redundant information is useful to
repair corrupt tables
HBase really isā€¦.
ā€¢ A distributed Hash Map
ā€¢ Imagine a complex, concatenated key including the user given row key and
column name, the timestamp (version)
ā€¢ Complex key points to actual value, that is, the cell
Fold, Store, and Shift
ā€¢ Logical rows in tables are
really stored as flat key-value
pairs
ā€¢ Each carries full coordinates
ā€¢ Pertinent information can be
freely placed in cell to
improve lookup
ā€¢ HBase is a column-family
grouped key-value store
HFile Format Information
ā€¢ All data is stored in a custom (open-source) format, called HFile
ā€¢ Data is stored in blocks (64KB default)
ā€¢ Trade-off between lookups and I/O throughput
ā€¢ Compression, encoding applied _after_ limit check
ā€¢ Index, filter and meta data is stored in separate blocks
ā€¢ Fixed trailer allows traversal of file structure
ā€¢ Newer versions introduce multilayered index and filter structures
ā€¢ Only load master index and load partial index blocks on demand
ā€¢ Reading data requires deserialization of block into cells
ā€¢ Kind of Amdahlā€™s Law applies
HBase Architecture
ā€¢ One Master and many Worker servers
ā€¢ Clients mostly communicate with workers
ā€¢ Workers store actual data
ā€¢ Memstore for accruing
ā€¢ HFile for persistence
ā€¢ WAL for fail-safety
ā€¢ Data provided as regions
ā€¢ HDFS is backing store
ā€¢ But could be another
HBase Architecture (cont.)
HBase Architecture (cont.)
ā€¢ Based on Log-Structured Merge-Trees (LSM-Trees)
ā€¢ Inserts are done in write-ahead log first
ā€¢ Data is stored in memory and flushed to disk on regular intervals or based
on size
ā€¢ Small flushes are merged in the background to keep number of files small
ā€¢ Reads read memory stores first and then disk based files second
ā€¢ Deletes are handled with ā€œtombstoneā€
markers
ā€¢ Atomicity on row level no matter how
many columns
ā€¢ Keeps locking model easy
Merge Reads
ā€¢ Read Memstore & StoreFiles
using separate scanners
ā€¢ Merge matching cells into
single row ā€œviewā€
ā€¢ Deleteā€™s mask existing data
ā€¢ Bloom filters help skip
StoreFiles
ā€¢ Reads may have to span
many files
APIs and Access Options
HBase Clients
ā€¢ Native Java Client/API
ā€¢ Non-Java Clients
ā€¢ REST server
ā€¢ Thrift server
ā€¢ Jython, Groovy DSL
ā€¢ Spark
ā€¢ TableInputFormat/TableOutputFormat for MapReduce
ā€¢ HBase as MapReduce source and/or target
ā€¢ Also available for table snapshots
ā€¢ HBase Shell
ā€¢ JRuby shell adding get, put, scan etc. and admin calls
ā€¢ Phoenix, Impala, Hive, ā€¦
Java API
From Wikipedia:
ā€¢ CRUD: ā€œIn computer programming, create, read, update, and delete are the
four basic functions of persistent storage.ā€
ā€¢ Other variations of CRUD include
ā€¢ BREAD (Browse, Read, Edit, Add, Delete)
ā€¢ MADS (Modify, Add, Delete, Show)
ā€¢ DAVE (Delete, Add, View, Edit)
ā€¢ CRAP (Create, Retrieve, Alter, Purge)
Wait
what?
Java API (cont.)
ā€¢ CRUD
ā€¢ put: Create and update a row (CU)
ā€¢ get: Retrieve an entire, or partial row (R)
ā€¢ delete: Delete a cell, column, columns, or row (D)
ā€¢ CRUD+SI
ā€¢ scan: Scan any number of rows (S)
ā€¢ increment: Increment a column value (I)
ā€¢ CRUD+SI+CAS
ā€¢ Atomic compare-and-swap (CAS)
ā€¢ Combined get, check, and put operation
ā€¢ Helps to overcome lack of full transactions
Java API (cont.)
ā€¢ Batch Operations
ā€¢ Support Get, Put, and Delete
ā€¢ Reduce network round-trips
ā€¢ If possible, batch operation to the server to gain better overall throughput
ā€¢ Filters
ā€¢ Can be used with Get and Scan operations
ā€¢ Server side hinting
ā€¢ Reduce data transferred to client
ā€¢ Filters are no guarantee for fast scans
ā€¢ Still full table scan in worst-case scenario
ā€¢ Might have to implement your own
ā€¢ Filters can hint next row key
Data Modeling
Whereā€™s your data at?
Key Cardinality
ā€¢ The best performance is gained from using row keys
ā€¢ Time range bound reads can skip store files
ā€¢ So can Bloom Filters
ā€¢ Selecting column families
reduces the amount of data
to be scanned
ā€¢ Pure value based access
is a full table scan
ā€¢ Filters often are too, but
reduce network traffic
Key/Table Design
ā€¢ Crucial to gain best performance
ā€¢ Why do I need to know? Well, you also need to know that RDBMS is only working
well when columns are indexed and query plan is OK
ā€¢ Absence of secondary indexes forces use of row key or column name
sorting
ā€¢ Transfer multiple indexes into one
ā€¢ Generate large table -> Good since fits architecture and spreads across cluster
ā€¢ DDI
ā€¢ Stands for Denormalization, Duplication and Intelligent Keys
ā€¢ Needed to overcome trade-offs of architecture
ā€¢ Denormalization -> Replacement for JOINs
ā€¢ Duplication -> Design for reads
ā€¢ Intelligent Keys -> Implement indexing and sorting, optimize reads
Pre-materialize Everything
ā€¢ Achieve one read per customer request if possible
ā€¢ Otherwise keep at lowest number
ā€¢ Reads between 10ms (cache miss) and 1ms (cache hit)
ā€¢ Use MapReduce or Spark to compute exacts in batch
ā€¢ Store and merge updates live
ā€¢ Use increment() methods
ļƒ˜Motto: ā€œDesign for Readsā€
Tall-Narrow vs. Flat-Wide Tables
ā€¢ Rows do not split
ā€¢ Might end up with one row per region
ā€¢ Same storage footprint
ā€¢ Put more details into the row key
ā€¢ Sometimes dummy column only
ā€¢ Make use of partial key scans
ā€¢ Tall with Scans, Wide with Gets
ā€¢ Atomicity only on row level
ā€¢ Examples
ā€¢ Large graphs, stored as adjacency matrix (narrow)
ā€¢ Message inbox (wide)
Sequential Keys
<timestamp><more key>: {CF: {CQ: {TS : Val}}}
ā€¢ Hotspotting on regions is bad!
ā€¢ Instead do one of the following:
ā€¢ Salting
ā€¢ Prefix <timestamp> with distributed value
ā€¢ Binning or bucketing rows across regions
ā€¢ Key field swap/promotion
ā€¢ Move <more key> before the timestamp (see OpenTSDB)
ā€¢ Randomization
ā€¢ Move <timestamp> out of key or prefix with MD5 hash
ā€¢ Might also be mitigated by overall spread of workloads
Key Design Choices
ā€¢ Based on access pattern, either use
sequential or random keys
ā€¢ Often a combination of both is needed
ā€¢ Overcome architectural limitations
ā€¢ Neither is necessarily bad
ā€¢ Use bulk import for sequential keys and
reads
ā€¢ Random keys are good for random access
patterns
Checklist
ā€¢ Design for Use-Case
ā€¢ Read, Write, or Both?
ā€¢ Avoid Hotspotting
ā€¢ Hash leading key part, or use salting/bucketing
ā€¢ Use bulk loading where possible
ā€¢ Monitor your servers!
ā€¢ Presplit tables
ā€¢ Try prefix encoding when values are small
ā€¢ Otherwise use compression (or both)
ā€¢ For Reads: Restrict yourself
ā€¢ Specify what you need, i.e. columns, families, time range
ā€¢ Shift details to appropriate position
ā€¢ Composite Keys
ā€¢ Column Qualifiers
Performance Tuning
1000 knobs to turnā€¦ 20 are important?
Everything is Pluggable
ā€¢ Cell
ā€¢ Memstore
ā€¢ Flush Policy
ā€¢ Compaction
Policy
ā€¢ Cache
ā€¢ WAL
ā€¢ RPC handling
ā€¢ ā€¦
Cluster Tuning
ā€¢ First, tune the global settings
ā€¢ Heap size and GC algorithm
ā€¢ Memory share for reads and writes
ā€¢ Enable Block Cache
ā€¢ Number of RPC handlers
ā€¢ Load Balancer
ā€¢ Default flush and compaction strategy
ā€¢ Thread pools (10+)
ā€¢ Next, tune the per-table and family settings
ā€¢ Region sizes
ā€¢ Block sizes
ā€¢ Compression and encoding
ā€¢ Compactions
ā€¢ ā€¦
Region Balancer Tuning
ā€¢ A background process in the HBase
Master is tracking load on servers
ā€¢ The load balancer moves regions
occasionally
ā€¢ Multiple implementations exists
ā€¢ Simple counts number of regions
ā€¢ Stochastic determines cost
ā€¢ Favored Node pins HDFS block
replicas
ā€¢ Can be tuned further
ā€¢ Cluster-wide setting!
RPC Tuning
ā€¢ Default is one queue for
all types of requests
ā€¢ Can be split into
separate queues for
reads and writes
ā€¢ Read queue can be
further split into reads
and scans
ļƒ˜ Stricter resource limits,
but may avoid cross-
starvation
Key Tuning
ā€¢ Design keys to match use-case
ā€¢ Sequential, salted, or random
ā€¢ Use sorting to convey meaning
ā€¢ Colocate related data
ā€¢ Spread load over all servers
ā€¢ Clever key design can make use
of distribution: aging-out regions
Compaction Tuning
ā€¢ Default compaction settings are aggressive
ā€¢ Set for update use-case
ā€¢ For insert use-cases, Blooms are effective
ā€¢ Allows to tune down compactions
ā€¢ Saves resources by reducing write amplification
ā€¢ More store files are also enabling faster full
table scans with time range bound scans
ā€¢ Server can ignore older files
ā€¢ Large regions may be eligible for advanced
compaction strategies
ā€¢ Stripe or date-tiered compactions
ā€¢ Reduce rewrites to fraction of region size
Use-Cases
What works well, what does not, and what is so-so
Placing the Use-Case
ā€¢ HBase chooses to work best for random access
ā€¢ You can optimize a table to prefer scans over gets
ā€¢ Fewer columns with larger payload
ā€¢ Larger HFile block sizes (maybe even
duplicate data in two differently
configured column families)
ā€¢ After that is the realm of hybrid systems
ā€¢ For fastest scans use brute force HDFS
and native query engine with a
columnar format
Big Data Workloads
Low
latency
Batch
Random Access Full ScanShort Scan
HDFS + MR
(Hive/Pig)
HBase
HBase + Snapshots
-> HDFS + MR/Spark
HDFS
+ SQL
HBase + MR/Spark
Big Data Workloads
Low
latency
Batch
Random Access Full ScanShort Scan
HDFS + MR/Spark
(Hive/Pig)
HBase
HBase + Snapshots
-> HDFS + MR/Spark
HDFS
+ SQL
HBase + MR/Spark
Current Metrics
Graph data
Simple Entities
Hybrid Entity Time series
+ Rollup serving
Messages
Analytic archive
Hybrid Entity Time series
+ Rollup generation
Index building
Entity Time series
Summary
Wrapping it upā€¦
Optimizations
Mostly Inserts Use-Cases
ā€¢ Tune down compactions
ā€¢ Compaction ratio, max store file size
ā€¢ Use Bloom Filters
ā€¢ On by default for row keys
Mostly Update Use-Cases
ā€¢ Batch updates if possible
Mostly Serial Keys
ā€¢ Use bulk loading or salting
Mostly Random Keys
ā€¢ Hash key with MD5 prefix
Mostly Random Reads
ā€¢ Decrease HFile block size
ā€¢ Use random keys
Mostly Scans
ā€¢ Increase HFile (and HDFS) block size
ā€¢ Reduce columns and increase cell sizes
What mattersā€¦
ā€¢ For optimal performance, two things need to be considered:
ā€¢ Optimize the cluster and table settings
ā€¢ Choose the matching key schema
ā€¢ Ensure load is spread over tables and cluster nodes
ā€¢ HBase works best for random access and bound scans
ā€¢ HBase can be optimized for larger scans, but its sweet spot is short burst scans (can
be parallelized too) and random point gets
ā€¢ Java heap space limits addressable space
ā€¢ Play with region sizes, compaction strategies, and key design to maximize result
ā€¢ Using HBase for a suitable use-case will make for a happy customerā€¦
ā€¢ Conversely, forcing it into non-suitable use-cases may be cause for trouble
Questions?
Thank You!
@larsgeorge

More Related Content

HBase in Practice

  • 1. HBase in Practice Lars George ā€“ Partner and Co-Founder @ OpenCore DataWorks Summit 2017 - Munich NoSQL is no SQL is SQL?
  • 2. About Me ā€¢ Partner & Co-Founder at OpenCore ā€¢ Before that ā€¢ Lars: EMEA Chief Architect at Cloudera (5+ years) ā€¢ Hadoop since 2007 ā€¢ Apache Committer & Apache Member ā€¢ HBase (also in PMC) ā€¢ Lars: Oā€™Reilly Author: HBase ā€“ The Definitive Guide ā€¢ Contact ā€¢ lars.george@opencore.com ā€¢ @larsgeorge Website: www.opencore.com
  • 3. Agenda ā€¢ Brief Intro To Core Concepts ā€¢ Access Options ā€¢ Data Modelling ā€¢ Performance Tuning ā€¢ Use-Cases ā€¢ Summary
  • 5. HBase Tables ā€¢ From user perspective, HBase is similar to a database, or spreadsheet ā€¢ There are rows and columns, storing values ā€¢ By default asking for a specific row/column combination returns the current value (that is, that last value stored there)
  • 6. HBase Tables ā€¢ HBase can have a different schema per row ā€¢ Could be called schema-less ā€¢ Primary access by the user given row key and column name ā€¢ Sorting of rows and columns by their key (aka names)
  • 7. HBase Tables ā€¢ Each row/column coordinate is tagged with a version number, allowing multi-versioned values ā€¢ Version is usually the current time (as epoch) ā€¢ API lets user ask for versions (specific, by count, or by ranges) ā€¢ Up to 2B versions
  • 8. HBase Tables ā€¢ Table data is cut into pieces to distribute over cluster ā€¢ Regions split table into shards at size boundaries ā€¢ Families split within regions to group sets of columns together ā€¢ At least one of each is needed
  • 9. Scalability ā€“ Regions as Shards ā€¢ A region is served by exactly one region server ā€¢ Every region server serves many regions ā€¢ Table data is spread over servers ā€¢ Distribution of I/O ā€¢ Assignment is based on configurable logic ā€¢ Balancing cluster load ā€¢ Clients talk directly to region servers
  • 10. Column Family-Oriented ā€¢ Group multiple columns into physically separated locations ā€¢ Apply different properties to each family ā€¢ TTL, compression, versions, ā€¦ ā€¢ Useful to separate distinct data sets that are related ā€¢ Also useful to separate larger blob from meta data
  • 11. Data Management ā€¢ What is available is tracked in three locations ā€¢ System catalog table hbase:meta ā€¢ Files in HDFS directories ā€¢ Open region instances on servers ā€¢ System aligns these locations ā€¢ Sometimes (very rarely) a repair may be needed using HBase Fsck ā€¢ Redundant information is useful to repair corrupt tables
  • 12. HBase really isā€¦. ā€¢ A distributed Hash Map ā€¢ Imagine a complex, concatenated key including the user given row key and column name, the timestamp (version) ā€¢ Complex key points to actual value, that is, the cell
  • 13. Fold, Store, and Shift ā€¢ Logical rows in tables are really stored as flat key-value pairs ā€¢ Each carries full coordinates ā€¢ Pertinent information can be freely placed in cell to improve lookup ā€¢ HBase is a column-family grouped key-value store
  • 14. HFile Format Information ā€¢ All data is stored in a custom (open-source) format, called HFile ā€¢ Data is stored in blocks (64KB default) ā€¢ Trade-off between lookups and I/O throughput ā€¢ Compression, encoding applied _after_ limit check ā€¢ Index, filter and meta data is stored in separate blocks ā€¢ Fixed trailer allows traversal of file structure ā€¢ Newer versions introduce multilayered index and filter structures ā€¢ Only load master index and load partial index blocks on demand ā€¢ Reading data requires deserialization of block into cells ā€¢ Kind of Amdahlā€™s Law applies
  • 15. HBase Architecture ā€¢ One Master and many Worker servers ā€¢ Clients mostly communicate with workers ā€¢ Workers store actual data ā€¢ Memstore for accruing ā€¢ HFile for persistence ā€¢ WAL for fail-safety ā€¢ Data provided as regions ā€¢ HDFS is backing store ā€¢ But could be another
  • 17. HBase Architecture (cont.) ā€¢ Based on Log-Structured Merge-Trees (LSM-Trees) ā€¢ Inserts are done in write-ahead log first ā€¢ Data is stored in memory and flushed to disk on regular intervals or based on size ā€¢ Small flushes are merged in the background to keep number of files small ā€¢ Reads read memory stores first and then disk based files second ā€¢ Deletes are handled with ā€œtombstoneā€ markers ā€¢ Atomicity on row level no matter how many columns ā€¢ Keeps locking model easy
  • 18. Merge Reads ā€¢ Read Memstore & StoreFiles using separate scanners ā€¢ Merge matching cells into single row ā€œviewā€ ā€¢ Deleteā€™s mask existing data ā€¢ Bloom filters help skip StoreFiles ā€¢ Reads may have to span many files
  • 19. APIs and Access Options
  • 20. HBase Clients ā€¢ Native Java Client/API ā€¢ Non-Java Clients ā€¢ REST server ā€¢ Thrift server ā€¢ Jython, Groovy DSL ā€¢ Spark ā€¢ TableInputFormat/TableOutputFormat for MapReduce ā€¢ HBase as MapReduce source and/or target ā€¢ Also available for table snapshots ā€¢ HBase Shell ā€¢ JRuby shell adding get, put, scan etc. and admin calls ā€¢ Phoenix, Impala, Hive, ā€¦
  • 21. Java API From Wikipedia: ā€¢ CRUD: ā€œIn computer programming, create, read, update, and delete are the four basic functions of persistent storage.ā€ ā€¢ Other variations of CRUD include ā€¢ BREAD (Browse, Read, Edit, Add, Delete) ā€¢ MADS (Modify, Add, Delete, Show) ā€¢ DAVE (Delete, Add, View, Edit) ā€¢ CRAP (Create, Retrieve, Alter, Purge) Wait what?
  • 22. Java API (cont.) ā€¢ CRUD ā€¢ put: Create and update a row (CU) ā€¢ get: Retrieve an entire, or partial row (R) ā€¢ delete: Delete a cell, column, columns, or row (D) ā€¢ CRUD+SI ā€¢ scan: Scan any number of rows (S) ā€¢ increment: Increment a column value (I) ā€¢ CRUD+SI+CAS ā€¢ Atomic compare-and-swap (CAS) ā€¢ Combined get, check, and put operation ā€¢ Helps to overcome lack of full transactions
  • 23. Java API (cont.) ā€¢ Batch Operations ā€¢ Support Get, Put, and Delete ā€¢ Reduce network round-trips ā€¢ If possible, batch operation to the server to gain better overall throughput ā€¢ Filters ā€¢ Can be used with Get and Scan operations ā€¢ Server side hinting ā€¢ Reduce data transferred to client ā€¢ Filters are no guarantee for fast scans ā€¢ Still full table scan in worst-case scenario ā€¢ Might have to implement your own ā€¢ Filters can hint next row key
  • 25. Key Cardinality ā€¢ The best performance is gained from using row keys ā€¢ Time range bound reads can skip store files ā€¢ So can Bloom Filters ā€¢ Selecting column families reduces the amount of data to be scanned ā€¢ Pure value based access is a full table scan ā€¢ Filters often are too, but reduce network traffic
  • 26. Key/Table Design ā€¢ Crucial to gain best performance ā€¢ Why do I need to know? Well, you also need to know that RDBMS is only working well when columns are indexed and query plan is OK ā€¢ Absence of secondary indexes forces use of row key or column name sorting ā€¢ Transfer multiple indexes into one ā€¢ Generate large table -> Good since fits architecture and spreads across cluster ā€¢ DDI ā€¢ Stands for Denormalization, Duplication and Intelligent Keys ā€¢ Needed to overcome trade-offs of architecture ā€¢ Denormalization -> Replacement for JOINs ā€¢ Duplication -> Design for reads ā€¢ Intelligent Keys -> Implement indexing and sorting, optimize reads
  • 27. Pre-materialize Everything ā€¢ Achieve one read per customer request if possible ā€¢ Otherwise keep at lowest number ā€¢ Reads between 10ms (cache miss) and 1ms (cache hit) ā€¢ Use MapReduce or Spark to compute exacts in batch ā€¢ Store and merge updates live ā€¢ Use increment() methods ļƒ˜Motto: ā€œDesign for Readsā€
  • 28. Tall-Narrow vs. Flat-Wide Tables ā€¢ Rows do not split ā€¢ Might end up with one row per region ā€¢ Same storage footprint ā€¢ Put more details into the row key ā€¢ Sometimes dummy column only ā€¢ Make use of partial key scans ā€¢ Tall with Scans, Wide with Gets ā€¢ Atomicity only on row level ā€¢ Examples ā€¢ Large graphs, stored as adjacency matrix (narrow) ā€¢ Message inbox (wide)
  • 29. Sequential Keys <timestamp><more key>: {CF: {CQ: {TS : Val}}} ā€¢ Hotspotting on regions is bad! ā€¢ Instead do one of the following: ā€¢ Salting ā€¢ Prefix <timestamp> with distributed value ā€¢ Binning or bucketing rows across regions ā€¢ Key field swap/promotion ā€¢ Move <more key> before the timestamp (see OpenTSDB) ā€¢ Randomization ā€¢ Move <timestamp> out of key or prefix with MD5 hash ā€¢ Might also be mitigated by overall spread of workloads
  • 30. Key Design Choices ā€¢ Based on access pattern, either use sequential or random keys ā€¢ Often a combination of both is needed ā€¢ Overcome architectural limitations ā€¢ Neither is necessarily bad ā€¢ Use bulk import for sequential keys and reads ā€¢ Random keys are good for random access patterns
  • 31. Checklist ā€¢ Design for Use-Case ā€¢ Read, Write, or Both? ā€¢ Avoid Hotspotting ā€¢ Hash leading key part, or use salting/bucketing ā€¢ Use bulk loading where possible ā€¢ Monitor your servers! ā€¢ Presplit tables ā€¢ Try prefix encoding when values are small ā€¢ Otherwise use compression (or both) ā€¢ For Reads: Restrict yourself ā€¢ Specify what you need, i.e. columns, families, time range ā€¢ Shift details to appropriate position ā€¢ Composite Keys ā€¢ Column Qualifiers
  • 32. Performance Tuning 1000 knobs to turnā€¦ 20 are important?
  • 33. Everything is Pluggable ā€¢ Cell ā€¢ Memstore ā€¢ Flush Policy ā€¢ Compaction Policy ā€¢ Cache ā€¢ WAL ā€¢ RPC handling ā€¢ ā€¦
  • 34. Cluster Tuning ā€¢ First, tune the global settings ā€¢ Heap size and GC algorithm ā€¢ Memory share for reads and writes ā€¢ Enable Block Cache ā€¢ Number of RPC handlers ā€¢ Load Balancer ā€¢ Default flush and compaction strategy ā€¢ Thread pools (10+) ā€¢ Next, tune the per-table and family settings ā€¢ Region sizes ā€¢ Block sizes ā€¢ Compression and encoding ā€¢ Compactions ā€¢ ā€¦
  • 35. Region Balancer Tuning ā€¢ A background process in the HBase Master is tracking load on servers ā€¢ The load balancer moves regions occasionally ā€¢ Multiple implementations exists ā€¢ Simple counts number of regions ā€¢ Stochastic determines cost ā€¢ Favored Node pins HDFS block replicas ā€¢ Can be tuned further ā€¢ Cluster-wide setting!
  • 36. RPC Tuning ā€¢ Default is one queue for all types of requests ā€¢ Can be split into separate queues for reads and writes ā€¢ Read queue can be further split into reads and scans ļƒ˜ Stricter resource limits, but may avoid cross- starvation
  • 37. Key Tuning ā€¢ Design keys to match use-case ā€¢ Sequential, salted, or random ā€¢ Use sorting to convey meaning ā€¢ Colocate related data ā€¢ Spread load over all servers ā€¢ Clever key design can make use of distribution: aging-out regions
  • 38. Compaction Tuning ā€¢ Default compaction settings are aggressive ā€¢ Set for update use-case ā€¢ For insert use-cases, Blooms are effective ā€¢ Allows to tune down compactions ā€¢ Saves resources by reducing write amplification ā€¢ More store files are also enabling faster full table scans with time range bound scans ā€¢ Server can ignore older files ā€¢ Large regions may be eligible for advanced compaction strategies ā€¢ Stripe or date-tiered compactions ā€¢ Reduce rewrites to fraction of region size
  • 39. Use-Cases What works well, what does not, and what is so-so
  • 40. Placing the Use-Case ā€¢ HBase chooses to work best for random access ā€¢ You can optimize a table to prefer scans over gets ā€¢ Fewer columns with larger payload ā€¢ Larger HFile block sizes (maybe even duplicate data in two differently configured column families) ā€¢ After that is the realm of hybrid systems ā€¢ For fastest scans use brute force HDFS and native query engine with a columnar format
  • 41. Big Data Workloads Low latency Batch Random Access Full ScanShort Scan HDFS + MR (Hive/Pig) HBase HBase + Snapshots -> HDFS + MR/Spark HDFS + SQL HBase + MR/Spark
  • 42. Big Data Workloads Low latency Batch Random Access Full ScanShort Scan HDFS + MR/Spark (Hive/Pig) HBase HBase + Snapshots -> HDFS + MR/Spark HDFS + SQL HBase + MR/Spark Current Metrics Graph data Simple Entities Hybrid Entity Time series + Rollup serving Messages Analytic archive Hybrid Entity Time series + Rollup generation Index building Entity Time series
  • 44. Optimizations Mostly Inserts Use-Cases ā€¢ Tune down compactions ā€¢ Compaction ratio, max store file size ā€¢ Use Bloom Filters ā€¢ On by default for row keys Mostly Update Use-Cases ā€¢ Batch updates if possible Mostly Serial Keys ā€¢ Use bulk loading or salting Mostly Random Keys ā€¢ Hash key with MD5 prefix Mostly Random Reads ā€¢ Decrease HFile block size ā€¢ Use random keys Mostly Scans ā€¢ Increase HFile (and HDFS) block size ā€¢ Reduce columns and increase cell sizes
  • 45. What mattersā€¦ ā€¢ For optimal performance, two things need to be considered: ā€¢ Optimize the cluster and table settings ā€¢ Choose the matching key schema ā€¢ Ensure load is spread over tables and cluster nodes ā€¢ HBase works best for random access and bound scans ā€¢ HBase can be optimized for larger scans, but its sweet spot is short burst scans (can be parallelized too) and random point gets ā€¢ Java heap space limits addressable space ā€¢ Play with region sizes, compaction strategies, and key design to maximize result ā€¢ Using HBase for a suitable use-case will make for a happy customerā€¦ ā€¢ Conversely, forcing it into non-suitable use-cases may be cause for trouble

Editor's Notes

  1. For Developers & End-Users ā€“ Apache Phoenix, Spark
  2. Importance of Row Key structure
  3. Time-series Data etc.
  4. Time-series Data etc.