3. NoSQL
• NoSQL is all about scalability
• Scaling to size
• Scaling to complexity
• Deliver Heavy R/W workloads
• Data duplication and denormalization are first-class
citizens
8. Re Check..
• What is CAP theorem?
• Does NoSQL supports Transaction?
• NoSQL Types?
9. HBase
• Scalable, distributed data store
• Sorted map of maps / Key- Value store
• Open source avatar of Google’s Bigtable
• Sparse
• Multi dimensional
• Tightly integrated with Hadoop
• Not a RDBMS
15. Important Terms
• Table
• Consists of rows and columns
• Row
• Has a bunch of columns.
• Identified by a rowkey (primary’ key)
• Column Qualifier
• Dynamic column name
• Column Family
• Column groups - logical and physical (Similar access pattern)
• Cell
• The actual element that contains the data for a row-column insertion
• Version
• Every cell has multiple versions
16. Logical & Tall(v/s(Wide(tab Plehsy(sstiocraal gSet(rfuocottuprreint
CF1 CF2
r1 c1:v1 c1:v9 c6:v2
r2 c1:v2 c3:v6
r3 c2:v3 c5:v6
r4 c2:v4
r5 c1:v1 c3:v5 c7:v8
HFile for CF1 HFile for CF2
r1:CF1:c1:t1:v1
r2:CF1:c1:t2:v2
r2:CF1:c3:t3:v6
r3:CF1:c2:t1:v3
r4:CF1:c2:t1:v4
r5:CF1:c1:t2:v1
r5:CF1:c3:t3:v5
r1:CF2:c1:t1:v9
r1:CF2:c6:t4:v2
r3:CF2:c5:t4:v6
r5:CF2:c7:t3:v8
Result object returned for a Get() on row r5
r5:CF1:c1:t2:v1
r5:CF1:c3:t3:v5
r5:cf2:c7:t3:v8
KeyValue objects
Cell
Value
Time
Stamp
Col
Qual
Col
Fam
Row
Key
Key Value
Logical representation of an HBase table.
We'll look at what it means to Get() row r5 from this table.
Actual physical storage of the table
Structure of a KeyValue object
17. (J)Ruby Shell Commands
• General
• DDL
• Create
• Describe
• Namespace
• DML
• Put
• Get
• Scan
• Delete
• Tools
• Replication
• Snapshot
• Security
• Visibility
Creating Table:
create 'DEVICE_DETAIL','BASIC_INFO','CONTRACT_INFO'
Data Generation :
put 'DEVICE_DETAIL','Device1','BASIC_INFO:IP_ADDR','10.10.10.10'
put 'DEVICE_DETAIL','Device2','BASIC_INFO:IP_ADDR','20.20.20.20'
Descripting Table:
describe 'DEVICE_DETAIL'
Alert Info :
alter 'DEVICE_DETAIL',{NAME => 'CONTRACT_INFO',VERSIONS => 3 }
Update Data:
put 'DEVICE_DETAIL','Device2','CONTRACT_INFO:CONTRACT_NUMBER','22222222'
Multi- Version Example :
get 'DEVICE_DETAIL','Device2', {COLUMN=>'CONTRACT_INFO:CONTRACT_NUMBER', VERSIONS=>2}
Scan Info:
scan 'DEVICE_DETAIL’
Scan with Filter :
scan 'DEVICE_DETAIL' , { COLUMNS => 'CONTRACT_INFO:STATUS', LIMIT => 10, FILTER =>
"ValueFilter( =, 'binary:IN_ACTIVE' )" }
Delete Info:
delete 'DEVICE_DETAIL','Device2','CONTRACT_INFO:STATUS'
18. Java API
• HTable
• HBaseAdmin
• HTablePool
• Get
• Put
• Delete
• Scan
• Increment
• HTableDescriptor
• HTableInterface
• Result
• ResultScanner
• KeyValue
HTable table = new HTable(configuration, hbasetablename);
Put row = new Put(Bytes.toBytes(rowKey));
row.add(Bytes.toBytes(columnFamily), Bytes.toBytes(key),
Bytes.toBytes(value));
Get getKey = new Get(Bytes.toBytes(key));
Result result = table.get(getKey);
19. Spark HBase
// create configuration
val config = HBaseConfiguration.create()
config.set("hbase.zookeeper.quorum", "localhost")
config.set("hbase.zookeeper.property.clientPort","2181")
config.set("hbase.mapreduce.inputtable", "hbaseTableName")
// read data
val hbaseData = sparkContext.hadoopRDD(new JobConf(config), classOf[TableInputFormat],
classOf[ImmutableBytesWritable], classOf[Result])
// count rows
println(hbaseData.count)
28. Use Case
• Canonical(use(case:(storing(crawl(data(and(indices(for(search
14
1
Web Search
powered by Bigtable
Crawlers
Crawlers
1 Crawlers constantly scour the Internet for new pages.
Those pages are stored as individual records in Bigtable. 3
2 A MapReduce job runs over the entire table, generating
search indexes for the Web Search application.
4
2
5
Indexing the Internet
Searching the Internet
3 The user initiates a Web Search request.
4 The Web Search application queries the Search Indexes
and retries matching documents directly from Bigtable.
5 Search results are presented to the user.
Internets Bigtable
Crawlers
Crawlers
MapReduce
You
Search
InSdeeaxrch
InSdeeaxrch
Index
Web Search
Most NoSQL stores lack true ACID transactions, although a few recent systems, such as FairCom c-treeACE, Google Spanner (though technically a NewSQL database) and FoundationDB, have made them central to their designs.
Eventual consistency is a consistency model used in distributed computing to achieve high availability that informally guarantees that, if no new updates are made to a given data item, eventually all accesses to that item will return the last updated value
Eventually consistent services are often classified as providing BASE (Basically Available, Soft state, Eventual consistency) semantics, in contrast to traditional ACID (Atomicity, Consistency, Isolation, Durability) guarantees.
Eric Brewer’s CAP theorem says that if you want consistency, availability, and partition tolerance, you have to settle for two out of three. (For a distributed system, partition tolerance means the system will continue to work unless there is a total network failure. A few nodes can fail and the system keeps going.)
Consistency means that each client always has the same view of the data.
Availability means that all clients can always read and write.
Partition tolerance means that the system works well across physical network partitions.
http://localhost:60010/master-status
Eric Brewer’s CAP theorem says that if you want consistency, availability, and partition tolerance, you have to settle for two out of three. (For a distributed system, partition tolerance means the system will continue to work unless there is a total network failure. A few nodes can fail and the system keeps going.)
Consistency means that each client always has the same view of the data.
Availability means that all clients can always read and write.
Partition tolerance means that the system works well across physical network partitions.