Apache HBase 0.98
Apache HBase 0.98
Apache HBase 0.98
98
Committer, Apache HBase, Apache Software Foundation Big Data US Research And Development, Intel
Who Am I?
Committer and PMC Member, Apache HBase project Apache HBase Committer Member of the Big Data Research And Development Group at Intel Release manager for Apache HBase 0.98 Apache HBase 0.98
Apache Software Foundation community project Apache Open source Free license
Chubby
Google
ZooKeeper
Apache, Yahoo, FB ?
Shark
Structured Query
R
Statistics
Sqoop
Giraph
Graph analysis framework
Mahout
Data mining
Pig
Data Manipulation
Hive
Structured Query
Oozie
Data Flow
HBase Coprocessors
Data execution engine
Log Data Collector
Flume
HBase
Distributed Database
HDFS 2.0
Hadoop Distributed File System
Coordination
Zookeeper
YARN (MRv2)
Spark
Splits Table B
Regions
Assignments
RegionServers
System integrators can deploy application code that runs where the data resides
HBase
Column oriented Multi-row within region only region No native query language SQL AuthN and AuthZ (ACL, Visibility labels) new in 0.98 and(, ) 0.98 Single row index only Petabytes PB Millions of operations per second
Performance improvements
Improved WAL write threading model (HBASE-8755) WAL Stripe compactions (HBASE-7667) REST streaming scans (HBASE-9343) REST
Cell Tags()
All values written to HBase are stored into cells HBase(cells) Cells can now also carry one or more tags Cells(tags)
Metadata, considered distinct from the key and the value , (key and value) We use tags to implement per cell ACLs and visibility labels (tags) cell
HFile Version 3
New file format, supporting cell tags and block encryption Enabled with a site configuration file change
hfile.format.version = 3
HFile v2 data is transparently migrated over time as new files are written by flushes and compactions HFile v2 flush compaction
in
However, versions prior to 0.98.0 ignore X , 0.98.0 E(X)excute () Now access to coprocessor Endpoint invocations can be controlled on a global, per-table, or per-column family basis (coprocessor Endpoint) column-family
compact Can reduce read latency variability and reduce compaction data volume (write amplification) compact Some use cases can benefit but the feature is complex to configure and tune, consult the documentation for detail , ,
Binary API compatibility not guaranteed, some applications may need minor changes Binary API,
APIsHBase 1.0 Tag compression in HFile Tag Hfile Performance improvements for encryption
End Questions?