NoSQL & HBase overview

NoSQL
• NoSQL is all about scalability
• Scaling to size
• Scaling to complexity
• Deliver Heavy R/W workloads
• Data duplication and denormalization are first-class
citizens

Re Check..
• What is CAP theorem?
• Does NoSQL supports Transaction?
• NoSQL Types?

HBase
• Scalable, distributed data store
• Sorted map of maps / Key- Value store
• Open source avatar of Google’s Bigtable
• Sparse
• Multi dimensional
• Tightly integrated with Hadoop
• Not a RDBMS

Architecture
HDFS((DataNodes)
Storage
ZooKeeper
Membership management
RegionServers
Serve the regions
HBase Masters
Janitorial work

Important Terms
• Table
• Consists of rows and columns
• Row
• Has a bunch of columns.
• Identified by a rowkey (primary’ key)
• Column Qualifier
• Dynamic column name
• Column Family
• Column groups - logical and physical (Similar access pattern)
• Cell
• The actual element that contains the data for a row-column insertion
• Version
• Every cell has multiple versions

Logical & Tall(v/s(Wide(tab Plehsy(sstiocraal gSet(rfuocottuprreint
CF1 CF2
r1 c1:v1 c1:v9 c6:v2
r2 c1:v2 c3:v6
r3 c2:v3 c5:v6
r4 c2:v4
r5 c1:v1 c3:v5 c7:v8
HFile for CF1 HFile for CF2
r1:CF1:c1:t1:v1
r2:CF1:c1:t2:v2
r2:CF1:c3:t3:v6
r3:CF1:c2:t1:v3
r4:CF1:c2:t1:v4
r5:CF1:c1:t2:v1
r5:CF1:c3:t3:v5
r1:CF2:c1:t1:v9
r1:CF2:c6:t4:v2
r3:CF2:c5:t4:v6
r5:CF2:c7:t3:v8
Result object returned for a Get() on row r5
r5:CF1:c1:t2:v1
r5:CF1:c3:t3:v5
r5:cf2:c7:t3:v8
KeyValue objects
Cell
Value
Time
Stamp
Col
Qual
Col
Fam
Row
Key
Key Value
Logical representation of an HBase table.
We'll look at what it means to Get() row r5 from this table.
Actual physical storage of the table
Structure of a KeyValue object

(J)Ruby Shell Commands
• General
• DDL
• Create
• Describe
• Namespace
• DML
• Put
• Get
• Scan
• Delete
• Tools
• Replication
• Snapshot
• Security
• Visibility
Creating Table:
create 'DEVICE_DETAIL','BASIC_INFO','CONTRACT_INFO'
Data Generation :
put 'DEVICE_DETAIL','Device1','BASIC_INFO:IP_ADDR','10.10.10.10'
put 'DEVICE_DETAIL','Device2','BASIC_INFO:IP_ADDR','20.20.20.20'
Descripting Table:
describe 'DEVICE_DETAIL'
Alert Info :
alter 'DEVICE_DETAIL',{NAME => 'CONTRACT_INFO',VERSIONS => 3 }
Update Data:
put 'DEVICE_DETAIL','Device2','CONTRACT_INFO:CONTRACT_NUMBER','22222222'
Multi- Version Example :
get 'DEVICE_DETAIL','Device2', {COLUMN=>'CONTRACT_INFO:CONTRACT_NUMBER', VERSIONS=>2}
Scan Info:
scan 'DEVICE_DETAIL’
Scan with Filter :
scan 'DEVICE_DETAIL' , { COLUMNS => 'CONTRACT_INFO:STATUS', LIMIT => 10, FILTER =>
"ValueFilter( =, 'binary:IN_ACTIVE' )" }
Delete Info:
delete 'DEVICE_DETAIL','Device2','CONTRACT_INFO:STATUS'

Java API
• HTable
• HBaseAdmin
• HTablePool
• Get
• Put
• Delete
• Scan
• Increment
• HTableDescriptor
• HTableInterface
• Result
• ResultScanner
• KeyValue
HTable table = new HTable(configuration, hbasetablename);
Put row = new Put(Bytes.toBytes(rowKey));
row.add(Bytes.toBytes(columnFamily), Bytes.toBytes(key),
Bytes.toBytes(value));
Get getKey = new Get(Bytes.toBytes(key));
Result result = table.get(getKey);

Spark HBase
// create configuration
val config = HBaseConfiguration.create()
config.set("hbase.zookeeper.quorum", "localhost")
config.set("hbase.zookeeper.property.clientPort","2181")
config.set("hbase.mapreduce.inputtable", "hbaseTableName")
// read data
val hbaseData = sparkContext.hadoopRDD(new JobConf(config), classOf[TableInputFormat],
classOf[ImmutableBytesWritable], classOf[Result])
// count rows
println(hbaseData.count)

Re Check..
• Column family?
• HBase components?
• Name few Shell commands?
• Version in HBase?

Use Case
• Canonical(use(case:(storing(crawl(data(and(indices(for(search
14
1
Web Search
powered by Bigtable
Crawlers
Crawlers
1 Crawlers constantly scour the Internet for new pages.
Those pages are stored as individual records in Bigtable. 3
2 A MapReduce job runs over the entire table, generating
search indexes for the Web Search application.
4
2
5
Indexing the Internet
Searching the Internet
3 The user initiates a Web Search request.
4 The Web Search application queries the Search Indexes
and retries matching documents directly from Bigtable.
5 Search results are presented to the user.
Internets Bigtable
Crawlers
Crawlers
MapReduce
You
Search
InSdeeaxrch
InSdeeaxrch
Index
Web Search

NoSQL & HBase overview

Related slideshows

More Related Content

NoSQL & HBase overview

Editor's Notes