Lesson 6 NoSQL Databases HBase
Lesson 6 NoSQL Databases HBase
DB NoSQL
Structured Unstructured
Why NoSQL?
With the explosion of social media sites, such as Facebook and Twitter, the demand to manage
large data has grown tremendously.
Graph
Example:
Record Record
s s
Hav Hav
e e
Properties
Problem Statement: In this demonstration, you will learn, how to tune YARN and allow HBase to run
smoothly without being resource starved.
Access: Click on the Practice Labs tab on the left side panel of the LMS. Copy or note the
username and password that is generated. Click on the Launch Lab button. On the page that
appears, enter the username and password in the respective fields, and click Login.
HBase Overview
What Is HBase?
It can store huge amount of data in tabular format for extremely fast reads and writes.
HBase is mostly used in a scenario that requires regular and consistent inserting and overwriting of data.
Why HBase?
Therefore, a solution is required to access, read, or write data anytime regardless of its sequence in the
clusters of data.
Characteristics of HBase
HBase is a database in which tables have no schema. At the time of table creation, column families are
defined, not columns.
HBase: Real-Life Connect
Facebook’s messenger platform needs to store over 135 trillion messages every month.
HBase has two types of nodes: Master and RegionServer. Their characteristics are as follows:
Master RegionServer
• Single Master node running at a • One or more RegionServers
time running at a time
• Manages cluster operations • Hosts tables and performs reads
HBase and buffer writes
• Not a part of the read or write Nodes
path • RegionServer is communicated in
order to read and write
A region in HBase is the subset of a table’s rows. The Master node detects the status of RegionServers and
assigns regions to it.
HBase Components
HDFS
Storage Model of HBase
Partitioning:
• A table is horizontally partitioned into regions.
• Each region is managed by a RegionServer.
• A RegionServer may hold multiple regions.
A1
A2 Region
Null🡪A3
A22
Logical View-All rows in a table
A3 Region
A3🡪F34
…
…
Region
K4 F34🡪K80
…
… Region
O90 k80🡪095
Region
… 095🡪null
… RegionServer RegionServer RegionServer
…
Z30
Z55
Data Storage in HBase
Data is stored Data is stored in files called HFiles or StoreFiles that are usually saved in HDFS.
in files called HFiles or StoreFiles that are usually saved in HDFS.
. .. CF2:C1 CF1:C8
rowkey CF1:C1 CF1:C2 CF1:C3
. ..
rowkey CF1:C1 CF1:C2 CF1:C3 CF2:C1 CF1:C8
.
Cells within a column family are sorted physically. Very sparse as most cells have NULL values.
Row Key
The table shows a comparison between HBase and a Relational Database Management System (RDBMS):
HBase RDBMS
Automatic partitioning Usually manual and admin-driven partitioning
Scales linearly and automatically with new Usually scales vertically by adding more hardware
nodes resources
Leverages batch processing with MapReduce Relies on multiple threads or processes rather
distributed processing than MapReduce distributed processing
Connecting to HBase
Connecting to HBase
MapReduce
Rest/Thrift
Hive/Pig/HCatalog Java Application
Gateway
/Hue
Java API
ZooKeeper
HBase
HDFS
HBase Shell Commands
Common commands include, but are not limited to, the following:
Drop the table named. Table must first be disabled HBase> drop ‘t1′
Delete Put
Deleting a cell value Putting a cell value
Problem Statement: Create a sample HBase table on the cluster, enter some data, query the table, then
clean up the data and exit.
Access: Click on the Practice Labs tab on the left side panel of the LMS. Copy or note the
username and password that is generated. Click on the Launch Lab button. On the page that
appears, enter the username and password in the respective fields, and click Login.
Unassisted Practice
Steps to Perform
• HBase Shell
// Create a table called simplilearn with one column family named stats:
create 'simplilearn', 'stats’
// Add a test value to the daily column in the stats column family for row 1:
put 'simplilearn', 'row1', 'stats:daily', 'test-daily-value’
Unassisted Practice
Steps to Perform
• HBase Shell
// Add a test value to the weekly column in the stats column family for row 1:
put 'simplilearn', 'row1', 'stats:weekly', 'test-weekly-value’
// Add a test value to the weekly column in the stats column family for row 2:
put 'simplilearn', 'row2', 'stats:weekly', 'test-weekly-value’
It focuses on the relationships between entities and is able to infer new knowledge
out of existing information.
Why Graph Databases?
Independent of the total size of your dataset, graph databases excel at managing
highly connected data and complex queries.
Property Graph Model
Nodes Relationships
Relationships provide
directed, named,
Nodes are the entities in
semantically relevant
the graph. Nodes can
connections between
be tagged with labels,
two node entities.
representing their
It always has a
different roles in your
direction, a type, a start
domain.
node, and an end node.
Assisted Practice
Problem Statement: In this demonstration, you will learn, how to create a NoSQL graph database.
Access: Click on the Practice Labs tab on the left side panel of the LMS. Copy or note the
username and password that is generated. Click on the Launch Lab button. On the page that
appears, enter the username and password in the respective fields, and click Login.
Key Takeaways
c. In variable schema
c. In variable schema
Global transport private limited is in transport analytics and they are keen to ensure the
safety of people. Nowadays, as the population is increasing accidents are also becoming
more and more frequent. Accidents occur mostly when the route is long, the driver is drunk,
or the roads are damaged. The company collects data of all the accidents and provides
important insights that can reduce the number of accidents. The company wants to create a
public portal where anyone can see the accident’s aggregated data.
Your task is to suggest a suitable database and design a schema which can cover most of the
use cases.
You are given a file that contains details about the various parameter of accidents.
The column details are as follows:
1. Year
2. TYPE
3. 0-3 hrs. (Night)
4. 3-6 hrs. (Night)
5. 6-9 hrs (Day)
6. 9-12 hrs (Day)
7. 12-15 hrs (Day)
8. 15-18 hrs (Day)
9. 18-21 hrs (Night)
10. 21-24 hrs (Night)
11. Total
Lesson-End-Project
Problem Statement:
You have to save the given data in HBase in such a way that you can solve the below queries.
Please mention what you are selecting as a row key and why.