DDMUNIT5
DDMUNIT5
DDMUNIT5
MONGO DB:
SQL NOSQL
What is MongoDB?
✔ Instead of storing your data in tables and rows as you would with a relational
database, in MongoDB you store JSON-like documents with dynamic
schemas(schema-free, schema less).
MongoDB is a Schema Free DB.
MongoDB does not need any pre-defined data schema
Every document could have different data!
MongoDB Architecture
Shar
ding is a method for distributing data across multiple machines.
• Partition your data
• Scale write throughput
• Increase capacity
• Auto-balancing
MongoDB uses sharding to support deployments with very large data sets and high
throughput operations.
Features of MongoDB
1. Document-Oriented storege
2. Full Index Support
3. Replication & High Availability
4. Auto-Sharding
5. Aggregation
6. MongoDB Atlas
7. Various APIs
8. JavaScript, Python, Ruby, Perl, Java, Java, Scala, C#, C++,
Haskell, Erlang
9. Community
MONGODB CRUD OPERATIONS
• Create
• db.collection.insert( <document> )
• db.collection.save( <document> )
• db.collection.update( <query>, <update>, { upsert: true } )
• Read
• db.collection.find( <query>, <projection> )
• db.collection.findOne( <query>, <projection> )
• Update
• db.collection.update( <query>, <update>, <options> )
• Delete
• db.collection.remove( <query>, <justOne> )
CASE 1:
HBASE
Apache Hadoop is an open source framework that is used to efficiently store and process
large datasets ranging in size from gigabytes to petabytes of data.
● HBase is an open source, sparse, consistent distributed, sorted map modeled after Google’s
BigTable. ● Began as a project by Powerset to process massive amounts of data for natural language
processing. ● Developed as part of Apache’s Hadoop project and runs on top of Hadoop Distributed
File System.
HBase: HBase is an open source database from Apache that runs on Hadoop cluster. It falls
under the non-relational database management system. Column Oriented. NO SQL DB.
HBase can be used without Hadoop. Running HBase in standalone mode will use the
local file system. Hadoop is just a distributed file system with redundancy and the ability to
scale to very large sizes.
Architecture of HBase
HBase is a Columnar data store, also called Tabular data store. The main
difference of a column-oriented database compared to a row-
oriented database (RBMS) is about how data is stored in disk. Check how
the following table would be serialized using a row-oriented and a column-
oriented approach (Source: Columnar Database, Wikipedia).
EmpI
LastnameFirstnameSalary
d
1 Smith Joe 40000
2 Jones Mary 50000
3 Johnson Cathy 44000
Row-oriented
1,Smith,Joe,40000;
2,Jones,Mary,50000;
3,Johnson,Cathy,44000;
Column-oriented
1,2,3;
Smith,Jones,Johnson;
Joe,Mary,Cathy;
40000,50000,44000;
June 19, 2020 admin 0 Comments hbase update, Hbase commands, Hbase create, hbase read
table, Hbase, Hbase crud
HBase CRUD Operations
General Commands
HBase provides shell commands to directly interact with the Database and below are a few most used
shell commands.
status: This command will display the cluster information and health of the cluster.
1 hbase(main):>status
2 hbase(main):>status "detailed"
version: This will provide information about the version of HBase.
1 hbase(main):> version
whoami : This will list the current user.
1 hbase(main):>whoami
table_help : This will give the reference shell command for HBase.
1 hbase(main):009:>table_help
Create
Let’s create an HBase table and insert data into the table. Now that we know, while creating a table
user needs to create required Column Families.
Here we have created two-column families for table ‘employee’. First Column Family is ‘Personal Info’
and Second Column Family is ‘Professional Info’.
1 create 'tableName',{NAME=>"CF1",VERSIONS=>5},{NAME=."CF2",VERSIONS=>5}
2 create 'bankdetails',{NAME=>"address",VERSIONS=>5}
Put:
Put command is used to insert records into HBase.
1 put 'employee', 1, 'Personal info:empId', 10
2 put 'employee', 1, 'Personal info:Name', 'Alex'
3 put 'employee', 1, 'Professional Info:Dept, 'IT'
Here in the above example all the rows having Row Key as 1 is considered to be one row in HBase.To
add multiple rows
Read
‘get’ and ‘scan’ command is used to read data from HBase. Lets first discuss ‘get’ operation.
get: ‘get’ operation returns a single row from the HBase table. Given below is the syntax for the ‘get’
method.
1 get 'table Name', 'Row Key'
1 hbase(main):022:get 'employee', 1
COLUMN CELL
Personal info:Name timestamp=1504600767520, value=Alex
Personal info:empId timestamp=1504600767491, value=10
Professional Info:Dept timestamp=1504600767540, value=IT
3 row(s) in 0.0250 seconds
COLUMN CELL
Personal info:Name timestamp=1504600767520, value=Alex
Personal info:empId timestamp=1504600767491, value=10
Professional Info:Dept timestamp=1504600767540, value=IT
3 row(s) in 0.0250 seconds
Note: Notice that there is a timestamp attached to each cell. These timestamps will update for the cell
whenever the cell value is updated. All the old values will be there but timestamp having the latest
value will be displayed as output.
Below given command is used to find different versions. Here ‘VERSIONS => 3’ defines number of
version to be retrieved.
1 get 'Table Name', 'Row Key', {COLUMN => 'Column Family', VERSIONS => 3}
scan:
‘scan’ command is used to retrieve multiple rows.
Select all:
The below command is an example of a basic search on the entire table.
1 scan 'Table Name'
1 hbase(main):074:> scan 'employee'
ROW COLUMN+CELL
1 column=Personal info:Name, timestamp=1504600767520, value=Alex
1 column=Personal info:empId, timestamp=1504606480934, value=15
1 column=Professional Info:Dept, timestamp=1504600767540, value=IT
2 column=Personal info:Name, timestamp=1504600767588, value=Bob
2 column=Personal info:empId, timestamp=1504600767568, value=20
2 column=Professional Info:Dept, timestamp=1504600768266,
value=Sales
2 row(s) in 0.0500 seconds
Note: All the Rows are arranged by Row Keys along with columns in each row.
Column Selection:
The below command is used to Scan any particular column.
ROW COLUMN+CELL
1 column=Personal info:Name, timestamp=1504600767520, value=Alex
2 column=Personal info:Name, timestamp=1504600767588, value=Bob
2 row(s) in 0.3660 seconds
Limit Query:
The below command is used to Scan any particular column.
ROW COLUMN+CELL
1 column=Personal info:Name, timestamp=1504600767520, value=Alex
1 row(s) in 0.0270 seconds
Update
To update any record HBase uses ‘put’ command. To update any column value, users need to put new
values and HBase will automatically update the new record with the latest timestamp.
1 get 'Table Name', 'Row Key', {COLUMN => 'Column Family', VERSIONS => 3}
Delete
‘delete‘ command is used to delete individual cells of a record.
The below command is the syntax of delete command in the HBase Shell.
1 disable 'employee'
Once the table is disabled, the user can drop using below syntax.
1 drop 'employee'
You can verify the table in using ‘exist’ command and enable table which is already disabled, just use
‘enable’ command.