From: DataWorks Summit 2017 - Munich - 20170406
HBase hast established itself as the backend for many operational and interactive use-cases, powering well-known services that support millions of users and thousands of concurrent requests. In terms of features HBase has come a long way, overing advanced options such as multi-level caching on- and off-heap, pluggable request handling, fast recovery options such as region replicas, table snapshots for data governance, tuneable write-ahead logging and so on. This talk is based on the research for the an upcoming second release of the speakers HBase book, correlated with the practical experience in medium to large HBase projects around the world. You will learn how to plan for HBase, starting with the selection of the matching use-cases, to determining the number of servers needed, leading into performance tuning options. There is no reason to be afraid of using HBase, but knowing its basic premises and technical choices will make using it much more successful. You will also learn about many of the new features of HBase up to version 1.3, and where they are applicable.
Report
Share
Report
Share
1 of 47
More Related Content
HBase in Practice
1. HBase in Practice
Lars George ā Partner and Co-Founder @ OpenCore
DataWorks Summit 2017 - Munich
NoSQL is no SQL is SQL?
2. About Me
ā¢ Partner & Co-Founder at OpenCore
ā¢ Before that
ā¢ Lars: EMEA Chief Architect at Cloudera (5+ years)
ā¢ Hadoop since 2007
ā¢ Apache Committer & Apache Member
ā¢ HBase (also in PMC)
ā¢ Lars: OāReilly Author: HBase ā The Definitive Guide
ā¢ Contact
ā¢ lars.george@opencore.com
ā¢ @larsgeorge
Website: www.opencore.com
5. HBase Tables
ā¢ From user perspective, HBase is similar to a database, or spreadsheet
ā¢ There are rows and columns, storing values
ā¢ By default asking for a specific row/column combination returns the
current value (that is, that last value stored there)
6. HBase Tables
ā¢ HBase can have a
different schema
per row
ā¢ Could be called
schema-less
ā¢ Primary access by
the user given row
key and column
name
ā¢ Sorting of rows and
columns by their
key (aka names)
7. HBase Tables
ā¢ Each row/column coordinate is tagged with a version number, allowing
multi-versioned values
ā¢ Version is usually
the current time
(as epoch)
ā¢ API lets user ask
for versions
(specific, by count,
or by ranges)
ā¢ Up to 2B versions
8. HBase Tables
ā¢ Table data is cut into pieces to distribute over cluster
ā¢ Regions split table into
shards at size boundaries
ā¢ Families split within
regions to group
sets of columns
together
ā¢ At least one of
each is needed
9. Scalability ā Regions as Shards
ā¢ A region is served by exactly
one region server
ā¢ Every region server serves
many regions
ā¢ Table data is spread over servers
ā¢ Distribution of I/O
ā¢ Assignment is based on
configurable logic
ā¢ Balancing cluster load
ā¢ Clients talk directly to region
servers
10. Column Family-Oriented
ā¢ Group multiple columns into
physically separated locations
ā¢ Apply different properties to each
family
ā¢ TTL, compression, versions, ā¦
ā¢ Useful to separate distinct data
sets that are related
ā¢ Also useful to separate larger blob
from meta data
11. Data Management
ā¢ What is available is tracked in three
locations
ā¢ System catalog table hbase:meta
ā¢ Files in HDFS directories
ā¢ Open region instances on servers
ā¢ System aligns these locations
ā¢ Sometimes (very rarely) a repair may
be needed using HBase Fsck
ā¢ Redundant information is useful to
repair corrupt tables
12. HBase really isā¦.
ā¢ A distributed Hash Map
ā¢ Imagine a complex, concatenated key including the user given row key and
column name, the timestamp (version)
ā¢ Complex key points to actual value, that is, the cell
13. Fold, Store, and Shift
ā¢ Logical rows in tables are
really stored as flat key-value
pairs
ā¢ Each carries full coordinates
ā¢ Pertinent information can be
freely placed in cell to
improve lookup
ā¢ HBase is a column-family
grouped key-value store
14. HFile Format Information
ā¢ All data is stored in a custom (open-source) format, called HFile
ā¢ Data is stored in blocks (64KB default)
ā¢ Trade-off between lookups and I/O throughput
ā¢ Compression, encoding applied _after_ limit check
ā¢ Index, filter and meta data is stored in separate blocks
ā¢ Fixed trailer allows traversal of file structure
ā¢ Newer versions introduce multilayered index and filter structures
ā¢ Only load master index and load partial index blocks on demand
ā¢ Reading data requires deserialization of block into cells
ā¢ Kind of Amdahlās Law applies
15. HBase Architecture
ā¢ One Master and many Worker servers
ā¢ Clients mostly communicate with workers
ā¢ Workers store actual data
ā¢ Memstore for accruing
ā¢ HFile for persistence
ā¢ WAL for fail-safety
ā¢ Data provided as regions
ā¢ HDFS is backing store
ā¢ But could be another
17. HBase Architecture (cont.)
ā¢ Based on Log-Structured Merge-Trees (LSM-Trees)
ā¢ Inserts are done in write-ahead log first
ā¢ Data is stored in memory and flushed to disk on regular intervals or based
on size
ā¢ Small flushes are merged in the background to keep number of files small
ā¢ Reads read memory stores first and then disk based files second
ā¢ Deletes are handled with ātombstoneā
markers
ā¢ Atomicity on row level no matter how
many columns
ā¢ Keeps locking model easy
18. Merge Reads
ā¢ Read Memstore & StoreFiles
using separate scanners
ā¢ Merge matching cells into
single row āviewā
ā¢ Deleteās mask existing data
ā¢ Bloom filters help skip
StoreFiles
ā¢ Reads may have to span
many files
20. HBase Clients
ā¢ Native Java Client/API
ā¢ Non-Java Clients
ā¢ REST server
ā¢ Thrift server
ā¢ Jython, Groovy DSL
ā¢ Spark
ā¢ TableInputFormat/TableOutputFormat for MapReduce
ā¢ HBase as MapReduce source and/or target
ā¢ Also available for table snapshots
ā¢ HBase Shell
ā¢ JRuby shell adding get, put, scan etc. and admin calls
ā¢ Phoenix, Impala, Hive, ā¦
21. Java API
From Wikipedia:
ā¢ CRUD: āIn computer programming, create, read, update, and delete are the
four basic functions of persistent storage.ā
ā¢ Other variations of CRUD include
ā¢ BREAD (Browse, Read, Edit, Add, Delete)
ā¢ MADS (Modify, Add, Delete, Show)
ā¢ DAVE (Delete, Add, View, Edit)
ā¢ CRAP (Create, Retrieve, Alter, Purge)
Wait
what?
22. Java API (cont.)
ā¢ CRUD
ā¢ put: Create and update a row (CU)
ā¢ get: Retrieve an entire, or partial row (R)
ā¢ delete: Delete a cell, column, columns, or row (D)
ā¢ CRUD+SI
ā¢ scan: Scan any number of rows (S)
ā¢ increment: Increment a column value (I)
ā¢ CRUD+SI+CAS
ā¢ Atomic compare-and-swap (CAS)
ā¢ Combined get, check, and put operation
ā¢ Helps to overcome lack of full transactions
23. Java API (cont.)
ā¢ Batch Operations
ā¢ Support Get, Put, and Delete
ā¢ Reduce network round-trips
ā¢ If possible, batch operation to the server to gain better overall throughput
ā¢ Filters
ā¢ Can be used with Get and Scan operations
ā¢ Server side hinting
ā¢ Reduce data transferred to client
ā¢ Filters are no guarantee for fast scans
ā¢ Still full table scan in worst-case scenario
ā¢ Might have to implement your own
ā¢ Filters can hint next row key
25. Key Cardinality
ā¢ The best performance is gained from using row keys
ā¢ Time range bound reads can skip store files
ā¢ So can Bloom Filters
ā¢ Selecting column families
reduces the amount of data
to be scanned
ā¢ Pure value based access
is a full table scan
ā¢ Filters often are too, but
reduce network traffic
26. Key/Table Design
ā¢ Crucial to gain best performance
ā¢ Why do I need to know? Well, you also need to know that RDBMS is only working
well when columns are indexed and query plan is OK
ā¢ Absence of secondary indexes forces use of row key or column name
sorting
ā¢ Transfer multiple indexes into one
ā¢ Generate large table -> Good since fits architecture and spreads across cluster
ā¢ DDI
ā¢ Stands for Denormalization, Duplication and Intelligent Keys
ā¢ Needed to overcome trade-offs of architecture
ā¢ Denormalization -> Replacement for JOINs
ā¢ Duplication -> Design for reads
ā¢ Intelligent Keys -> Implement indexing and sorting, optimize reads
27. Pre-materialize Everything
ā¢ Achieve one read per customer request if possible
ā¢ Otherwise keep at lowest number
ā¢ Reads between 10ms (cache miss) and 1ms (cache hit)
ā¢ Use MapReduce or Spark to compute exacts in batch
ā¢ Store and merge updates live
ā¢ Use increment() methods
ļMotto: āDesign for Readsā
28. Tall-Narrow vs. Flat-Wide Tables
ā¢ Rows do not split
ā¢ Might end up with one row per region
ā¢ Same storage footprint
ā¢ Put more details into the row key
ā¢ Sometimes dummy column only
ā¢ Make use of partial key scans
ā¢ Tall with Scans, Wide with Gets
ā¢ Atomicity only on row level
ā¢ Examples
ā¢ Large graphs, stored as adjacency matrix (narrow)
ā¢ Message inbox (wide)
29. Sequential Keys
<timestamp><more key>: {CF: {CQ: {TS : Val}}}
ā¢ Hotspotting on regions is bad!
ā¢ Instead do one of the following:
ā¢ Salting
ā¢ Prefix <timestamp> with distributed value
ā¢ Binning or bucketing rows across regions
ā¢ Key field swap/promotion
ā¢ Move <more key> before the timestamp (see OpenTSDB)
ā¢ Randomization
ā¢ Move <timestamp> out of key or prefix with MD5 hash
ā¢ Might also be mitigated by overall spread of workloads
30. Key Design Choices
ā¢ Based on access pattern, either use
sequential or random keys
ā¢ Often a combination of both is needed
ā¢ Overcome architectural limitations
ā¢ Neither is necessarily bad
ā¢ Use bulk import for sequential keys and
reads
ā¢ Random keys are good for random access
patterns
31. Checklist
ā¢ Design for Use-Case
ā¢ Read, Write, or Both?
ā¢ Avoid Hotspotting
ā¢ Hash leading key part, or use salting/bucketing
ā¢ Use bulk loading where possible
ā¢ Monitor your servers!
ā¢ Presplit tables
ā¢ Try prefix encoding when values are small
ā¢ Otherwise use compression (or both)
ā¢ For Reads: Restrict yourself
ā¢ Specify what you need, i.e. columns, families, time range
ā¢ Shift details to appropriate position
ā¢ Composite Keys
ā¢ Column Qualifiers
34. Cluster Tuning
ā¢ First, tune the global settings
ā¢ Heap size and GC algorithm
ā¢ Memory share for reads and writes
ā¢ Enable Block Cache
ā¢ Number of RPC handlers
ā¢ Load Balancer
ā¢ Default flush and compaction strategy
ā¢ Thread pools (10+)
ā¢ Next, tune the per-table and family settings
ā¢ Region sizes
ā¢ Block sizes
ā¢ Compression and encoding
ā¢ Compactions
ā¢ ā¦
35. Region Balancer Tuning
ā¢ A background process in the HBase
Master is tracking load on servers
ā¢ The load balancer moves regions
occasionally
ā¢ Multiple implementations exists
ā¢ Simple counts number of regions
ā¢ Stochastic determines cost
ā¢ Favored Node pins HDFS block
replicas
ā¢ Can be tuned further
ā¢ Cluster-wide setting!
36. RPC Tuning
ā¢ Default is one queue for
all types of requests
ā¢ Can be split into
separate queues for
reads and writes
ā¢ Read queue can be
further split into reads
and scans
ļ Stricter resource limits,
but may avoid cross-
starvation
37. Key Tuning
ā¢ Design keys to match use-case
ā¢ Sequential, salted, or random
ā¢ Use sorting to convey meaning
ā¢ Colocate related data
ā¢ Spread load over all servers
ā¢ Clever key design can make use
of distribution: aging-out regions
38. Compaction Tuning
ā¢ Default compaction settings are aggressive
ā¢ Set for update use-case
ā¢ For insert use-cases, Blooms are effective
ā¢ Allows to tune down compactions
ā¢ Saves resources by reducing write amplification
ā¢ More store files are also enabling faster full
table scans with time range bound scans
ā¢ Server can ignore older files
ā¢ Large regions may be eligible for advanced
compaction strategies
ā¢ Stripe or date-tiered compactions
ā¢ Reduce rewrites to fraction of region size
40. Placing the Use-Case
ā¢ HBase chooses to work best for random access
ā¢ You can optimize a table to prefer scans over gets
ā¢ Fewer columns with larger payload
ā¢ Larger HFile block sizes (maybe even
duplicate data in two differently
configured column families)
ā¢ After that is the realm of hybrid systems
ā¢ For fastest scans use brute force HDFS
and native query engine with a
columnar format
42. Big Data Workloads
Low
latency
Batch
Random Access Full ScanShort Scan
HDFS + MR/Spark
(Hive/Pig)
HBase
HBase + Snapshots
-> HDFS + MR/Spark
HDFS
+ SQL
HBase + MR/Spark
Current Metrics
Graph data
Simple Entities
Hybrid Entity Time series
+ Rollup serving
Messages
Analytic archive
Hybrid Entity Time series
+ Rollup generation
Index building
Entity Time series
44. Optimizations
Mostly Inserts Use-Cases
ā¢ Tune down compactions
ā¢ Compaction ratio, max store file size
ā¢ Use Bloom Filters
ā¢ On by default for row keys
Mostly Update Use-Cases
ā¢ Batch updates if possible
Mostly Serial Keys
ā¢ Use bulk loading or salting
Mostly Random Keys
ā¢ Hash key with MD5 prefix
Mostly Random Reads
ā¢ Decrease HFile block size
ā¢ Use random keys
Mostly Scans
ā¢ Increase HFile (and HDFS) block size
ā¢ Reduce columns and increase cell sizes
45. What mattersā¦
ā¢ For optimal performance, two things need to be considered:
ā¢ Optimize the cluster and table settings
ā¢ Choose the matching key schema
ā¢ Ensure load is spread over tables and cluster nodes
ā¢ HBase works best for random access and bound scans
ā¢ HBase can be optimized for larger scans, but its sweet spot is short burst scans (can
be parallelized too) and random point gets
ā¢ Java heap space limits addressable space
ā¢ Play with region sizes, compaction strategies, and key design to maximize result
ā¢ Using HBase for a suitable use-case will make for a happy customerā¦
ā¢ Conversely, forcing it into non-suitable use-cases may be cause for trouble