unit-4 DBMS

UNIT-4 DATABASE
TECHNOLOGIES
Introduction to NoSQL databases: MongoDB-Cassandra- Redis-Key-value stores-document stores-column-
family stores-graph databases
What is NoSQL?
NoSQL Database is a non-relational Data Management System, that
does not require a fixed schema. It avoids joins, and is easy to scale.
The major purpose of using a NoSQL database is for distributed data
stores with humongous data storage needs. NoSQL is used for Big data
and real-time web apps. For example, companies like Twitter,
Facebook and Google collect terabytes of user data every single day.
NoSQL database stands for “Not Only SQL” or “Not SQL.” Though a
better term would be “NoREL”, NoSQL caught on. Carl Strozz
introduced the NoSQL concept in 1998.
Traditional RDBMS uses SQL syntax to store and retrieve data for
further insights. Instead, a NoSQL database system encompasses a
wide range of database technologies that can store structured, semi-
structured, unstructured and polymorphic data. Let’s understand
about NoSQL with a diagram in this NoSQL database tutorial:
Why NoSQL?
The concept of NoSQL databases became popular with Internet giants
like Google, Facebook, Amazon, etc. who deal with huge volumes of
data. The system response time becomes slow when you use RDBMS
for massive volumes of data.
UNIT-4 DATABASE
TECHNOLOGIES
To resolve this problem, we could “scale up” our systems by

upgrading our existing hardware. This process is expensive.
The alternative for this issue is to distribute database load on

multiple hosts whenever the load increases. This method is known as
“scaling out.”
NoSQL database is non-relational, so it scales out better than

relational databases as they are designed with web applications in
mind.
Brief History of NoSQL Databases

1998- Carlo Strozzi use the term NoSQL for his lightweight, open-
source relational database
2000- Graph database Neo4j is launched
2004- Google BigTable is launched
2005- CouchDB is launched
2007- The research paper on Amazon Dynamo is released
2008- Facebooks open sources the Cassandra project
2009- The term NoSQL was reintroduced
Features of NoSQL
UNIT-4 DATABASE
TECHNOLOGIES
Non-relational
NoSQL databases never follow the relational model
Never provide tables with flat fixed-column records
Work with self-contained aggregates or BLOBs
Doesn’t require object-relational mapping and data normalization
No complex features like query languages, query planners,referential
integrity joins, ACID
Schema-free
NoSQL databases are either schema-free or have relaxed schemas
Do not require any sort of definition of the schema of the data
Offers heterogeneous structures of data in the same domain
NoSQL is Schema-Free
Simple API
Offers easy to use interfaces for storage and querying data
provided
APIs allow low-level data manipulation & selection methods
Text-based protocols mostly used with HTTP REST with JSON
Mostly used no standard based NoSQL query language
Web-enabled databases running as internet-facing services
Distributed
UNIT-4 DATABASE
TECHNOLOGIES
 Multiple NoSQL databases can be executed in a distributed

fashion
 Offers auto-scaling and fail-over capabilities
 Often ACID concept can be sacrificed for scalability and
throughput
 Mostly no synchronous replication between distributed nodes
Asynchronous Multi-Master Replication, peer-to-peer, HDFS

Replication
 Only providing eventual consistency
 Shared Nothing Architecture. This enables less coordination and
higher distribution.
NoSQL is Shared Nothing.
Types of NoSQL Databases

NoSQL Databases are mainly categorized into four types: Key-value
pair, Column-oriented, Graph-based and Document-oriented. Every
category has its unique attributes and limitations. None of the
above- specified database is better to solve all the problems. Users
should select the database based on their product needs.
Types of NoSQL Databases:

Key-value Pair Based
Column-oriented Graph
Graphs based
Document-oriented
UNIT-4 DATABASE
TECHNOLOGIES
Key Value Pair Based

Data is stored in key/value pairs. It is designed in such a way to handle
lots of data and heavy load.
Key-value pair storage databases store data as a hash table where

each key is unique, and the value can be a JSON, BLOB(Binary Large
Objects), string, etc.
For example, a key-value pair may contain a key like “Website”

associated with a value like “Guru99”.
It is one of the most basic NoSQL database example. This kind of

NoSQL database is used as a collection, dictionaries, associative
arrays, etc.
Key value stores help the developer to store schema-less data. They
work best for shopping cart contents.
UNIT-4 DATABASE
TECHNOLOGIES
Redis, Dynamo, Riak are some NoSQL examples of key-value store

DataBases. They are all based on Amazon’s Dynamo paper.
Column-based
Column-oriented databases work on columns and are based on
BigTable paper by Google. Every column is treated separately. Values
of single column databases are stored contiguously.
Column based NoSQL database
They deliver high performance on aggregation queries like SUM,

COUNT, AVG, MIN etc. as the data is readily available in a column.
Column-based NoSQL databases are widely used to manage data

warehouses, business intelligence, CRM, Library card catalogs,
HBase, Cassandra, HBase, Hypertable are NoSQL query examples of

column based database.
Document-Oriented
Document-Oriented NoSQL DB stores and retrieves data as a key
value pair but the value part is stored as a document. The document
is stored in JSON or XML formats. The value is understood by the DB
and can be queried.
UNIT-4 DATABASE
TECHNOLOGIES
Relational Vs. Document
In this diagram on your left you can see we have rows and columns,
and in the right, we have a document database which has a similar
structure to JSON. Now for the relational database, you have to know
what columns you have and so on. However, for a document database,
you have data store like JSON object. You do not require to define
which make it flexible.
The document type is mostly used for CMS systems, blogging

platforms, real-time analytics & e-commerce applications. It should
not use for complex transactions which require multiple operations or
queries against varying aggregate structures.
Amazon SimpleDB, CouchDB, MongoDB, Riak, Lotus Notes, MongoDB,

are popular Document originated DBMS systems.
Graph-Based
A graph type database stores entities as well the relations amongst
those entities. The entity is stored as a node with the relationship as
edges. An edge gives a relationship between nodes. Every node and
edge has a unique identifier.
UNIT-4 DATABASE
TECHNOLOGIES
Compared to a relational database where tables are loosely connected,

a Graph database is a multi-relational in nature. Traversing relationship
is fast as they are already captured into the DB, and there is no need to
calculate them.
Graph base database mostly used for social networks, logistics, spatial
data.
Neo4J, Infinite Graph, OrientDB, FlockDB are some popular graph-

based databases.
Query Mechanism tools for NoSQL

The most common data retrieval mechanism is the REST-based
retrieval of a value based on its key/ID with GET resource
Document store Database offers more difficult queries as they

understand the value in a key-value pair. For example, CouchDB allows
defining views with MapReduce
What is the CAP Theorem?

UNIT-4 DATABASE
TECHNOLOGIES
CAP theorem is also called brewer’s theorem. It states that is

impossible for a distributed data store to offer more than two out of
three guarantees
•Consistency
•Availability
•Partition Tolerance
Consistency:
The data should remain consistent even after the execution of an

operation. This means once data is written, any future read request
should contain that data. For example, after updating the order status,
all the clients should be able to see the same data.
Availability:
The database should always be available and responsive. It should not

have any downtime.
Partition Tolerance:
Partition Tolerance means that the system should continue to function

even if the communication among the servers is not stable. For
example, the servers can be partitioned into multiple groups which
may not communicate with each other. Here, if part of the database is
unavailable, other parts are always unaffected.
Eventual Consistency
The term “eventual consistency” means to have copies of data on
multiple machines to get high availability and scalability. Thus,
changes made to any data item on one machine has to be propagated
to other replicas.
Data replication may not be instantaneous as some copies will be

updated immediately while others in due course of time. These copies
may be mutually, but in due course of time, they become consistent.
Hence, the name eventual consistency.
UNIT-4 DATABASE
TECHNOLOGIES
BASE: Basically Available, Soft state, Eventual consistency

Basically, available means DB is available all the time as per CAP
theorem
Soft state means even without an input; the system state may
change
Eventual consistency means that the system will become consistent
over time
Advantages of NoSQL
Can be used as Primary or Analytic Data Source
Big Data Capability
No Single Point of Failure
Easy Replication
No Need for Separate Caching Layer
It provides fast performance and horizontal scalability.
Can handle structured, semi-structured, and unstructured data with
equal effect
UNIT-4 DATABASE
TECHNOLOGIES
 Object-oriented programming which is easy to use and flexible

 NoSQL databases don’t need a dedicated high-performance server
 Support Key Developer Languages and Platforms
 Simple to implement than using RDBMS
 It can serve as the primary data source for online applications.
 Handles big data which manages data velocity, variety, volume,
and complexity
 Excels at distributed database and multi-data center operations
 Eliminates the need for a specific caching layer to store data
 Offers a flexible schema design which can easily be altered
without downtime or service disruption
Disadvantages of NoSQL
 No standardization rules
 Limited query capabilities
 RDBMS databases and tools are comparatively mature
 It does not offer any traditional database capabilities, like
consistency when multiple transactions are performed

simultaneously.
 When the volume of data increases it is difficult to maintain
unique values as keys become difficult

 Doesn’t work as well with relational data
 The learning curve is stiff for new developers
 Open source options so not so popular for enterprises.
What is MongoDB?
MongoDB is a document-oriented NoSQL database system that provides high
scalability, flexibility, and performance. Unlike standard relational databases,
MongoDB stores data in a JSON document structure form. This makes it easy
to operate with dynamic and unstructured data and MongoDB is an open-
source and cross-platform database System.
MongoDB – Working and Features

MongoDB is an open-source document-oriented database that is designed to
store a large scale of data and also allows you to work with that data very
efficiently. It is categorized under the NoSQL (Not only SQL) database because
the storage and retrieval of data in the MongoDB are not in the form of
tables.
UNIT-4 DATABASE
TECHNOLOGIES
The MongoDB database is developed and managed by MongoDB.Inc under

SSPL(Server Side Public License) and initially released in February 2009. It also
provides official driver support for all the popular languages like C, C++, C#, and
.Net, Go, Java, Node.js, Perl, PHP, Python, Motor, Ruby, Scala, Swift, Mongoid. So,
that you can create an application using any of these languages. Nowadays there
are so many companies that used MongoDB like Facebook, Nokia, eBay, Adobe,
Google, etc. to store their large amount of data.
How it works ?
Now, we will see how actually thing happens behind the scene. As we know that
MongoDB is a database server and the data is stored in these databases. Or in other
words, MongoDB environment gives you a server that you can start and then create
multiple databases on it using MongoDB.
Because of its NoSQL database, the data is stored in the collections and documents.
Hence the database, collection, and documents are related to each other as shown
below:
 The MongoDB database contains collections just like the MYSQL database
contains tables. You are allowed to create multiple databases and multiple
collections.
UNIT-4 DATABASE
TECHNOLOGIES
 Now inside of the collection we have documents. These documents

contain the data we want to store in the MongoDB database and a single
collection can contain multiple documents and you are schema-less means
it is not necessary that one document is similar to another.
 The documents are created using the fields. Fields are key-value pairs in
the documents, it is just like columns in the relation database. The value
of the fields can be of any BSON data types like double, string, boolean,
etc.
 The data stored in the MongoDB is in the format of BSON documents.
Here, BSON stands for Binary representation of JSON documents. Or in
other words, in the backend, the MongoDB server converts the JSON data
into a binary form that is known as BSON and this BSON is stored and queried
more efficiently.
 In MongoDB documents, you are allowed to store nested data. This
nesting of data allows you to create complex relations between data and
store them in the same document which makes the working and fetching of
data extremely efficient as compared to SQL. In SQL, you need to write
complex joins to get the data from table 1 and table 2. The maximum size
of the BSON document is 16MB.
NOTE: In MongoDB server, you are allowed to run multiple databases.
For example, we have a database named GeeksforGeeks. Inside this

database, we have two collections and in these collections we have two
documents. And in these documents we store our data in the form of
fields. As shown in the below image:
UNIT-4 DATABASE
TECHNOLOGIES
UNIT-4 DATABASE
TECHNOLOGIES
How mongoDB is different from RDBMS ?

Some major differences in between MongoDB and the RDBMS are as follows:
MongoDB RDBMS
It is a non-relational and document- It is a relational database.

oriented database.
It is suitable for hierarchical data It is not suitable for hierarchical data

storage. storage.
It has a dynamic schema. It has a predefined schema.
It centers around the CAP theorem It centers around ACID properties

(Consistency, Availability, and (Atomicity, Consistency, Isolation,
Partition tolerance). and Durability).
In terms of performance, it is much In terms of performance, it is slower

faster than RDBMS. than MongoDB.
Features of MongoDB –
 Schema-less Database: It is the great feature provided by the MongoDB.

A Schema-less database means one collection can hold different types of
documents in it. Or in other words, in the MongoDB database, a single
collection can hold multiple documents and these documents may consist
of the different numbers of fields, content, and size. It is not necessary
that the one document is similar to another document like in the
relational databases. Due to this cool feature, MongoDB provides great
flexibility to databases.
UNIT-4 DATABASE
TECHNOLOGIES
 Document Oriented: In MongoDB, all the data stored in the documents

instead of tables like in RDBMS. In these documents, the data is stored in
fields(key-value pair) instead of rows and columns which make the data
much more flexible in comparison to RDBMS. And each document
contains its unique object id.
 Indexing: In MongoDB database, every field in the documents is indexed
with primary and secondary indices this makes easier and takes less time
to get or search data from the pool of the data. If the data is not indexed,
then database search each document with the specified query which
takes lots of time and not so efficient.
 Scalability: MongoDB provides horizontal scalability with the help of
sharding. Sharding means to distribute data on multiple servers, here a large
amount of data is partitioned into data chunks using the shard key, and
these data chunks are evenly distributed across shards that reside across
many physical servers. It will also add new machines to a running
database.
 Replication: MongoDB provides high availability and redundancy with the
help of replication, it creates multiple copies of the data and sends these
copies to a different server so that if one server fails, then the data is
retrieved from another server.
 Aggregation: It allows to perform operations on the grouped data and get
a single result or computed result. It is similar to the SQL GROUPBY clause.
It provides three different aggregations i.e, aggregation pipeline,
map-reduce function, and single-purpose aggregation methods
 High Performance: The performance of MongoDB is very high and data
persistence as compared to another database due to its features like
scalability, indexing, replication, etc.
Advantages of MongoDB :
It is a schema-less NoSQL database. You need not to design the schema of the
database when you are working with MongoDB.
It does not support join operation.
It provides great flexibility to the fields in the documents.
It contains heterogeneous data.
It provides high performance, availability, scalability.
It supports Geospatial efficiently.
UNIT-4 DATABASE
TECHNOLOGIES
 It is a document oriented database and the data is stored in BSON

documents.
 It also supports multiple document ACID transition(string from MongoDB
4.0).
 It does not require any SQL injection.
 It is easily integrated with Big Data Hadoop
Disadvantages of MongoDB :
Ituses high memory for data storage.

You are not allowed to store more than 16MB data in the documents.
The nesting of data in BSON is also limited you are not allowed to nest data more
than 100 levels.
Cassandra is a distributed database management system which is open source with
wide column store, NoSQL database to handle large amount of data across many
commodity servers which provides high availability with no single point of failure. It
is written in Java and developed by Apache Software Foundation. Avinash Lakshman
& Prashant Malik initially developed the Cassandra at Facebook to power the
Facebook inbox search feature. Facebook released Cassandra as an open source
project on Google code in July 2008. In March 2009 it became an Apache Incubator
project and in February 2010 it becomes a top-level project. Due to its outstanding
technical features Cassandra becomes so popular.
Introduction to Cassandra
UNIT-4 DATABASE
TECHNOLOGIES
Apache Cassandra is used to manage very large amounts of structure data spread
out across the world. It provides highly available service with no single point of
failure. Listed below are some points of Apache Cassandra:
It is scalable, fault-tolerant, and consistent.
It is column-oriented database.
Its distributed design is based on Amazon’s Dynamo and its data model on Google’s
Big table.
It is Created at Facebook and it differs sharply from relational database
management systems.
Cassandra implements a Dynamo-style replication model with no single point of
failure but its add a more powerful “column family” data model. Cassandra is being
used by some of the biggest companies such as Facebook, Twitter, Cisco,
Rackspace, eBay, Netflix, and more. The design goal of a Cassandra is to handle big
data workloads across multiple nodes without any single point of failure. Cassandra
has peer-to-peer distributed system across its nodes, and data is distributed among
all the nodes of the cluster. All the nodes of Cassandra in a cluster play the same
role. Each node is independent, at the same time interconnected to other nodes.
Each node in a cluster can accept read and write requests, regardless of where the
data is actually located in the cluster. When a node goes down, read/write request
can be served from other nodes in the network. Features of Cassandra: Cassandra
has become popular because of its technical features. There are some of the
features of Cassandra:
1. Easy data distribution – It provides the flexibility to distribute data where
you need by replicating data across multiple data centers. for example: If there are 5
node let say N1, N2, N3, N4, N5 and by using partitioning algorithm we will decide
the token range and distribute data accordingly. Each node have specific token
range in which data will be distribute. let’s have a look on diagram for better
understanding.
Ring structure with token range.

UNIT-4 DATABASE
TECHNOLOGIES
2. Flexible data storage – Cassandra accommodates all possible data formats

including: structured, semi-structured, and unstructured. It can
dynamically accommodate changes to your data structures accordingly to
your need.
3. Elastic scalability – Cassandra is highly scalable and allows to add more
hardware to accommodate more customers and more data as per
requirement.
4. Fast writes – Cassandra was designed to run on cheap commodity
hardware. Cassandra performs blazingly fast writes and can store
hundreds of terabytes of data, without sacrificing the read efficiency.
5. Always on Architecture – Cassandra has no single point of failure and it is
continuously available for business-critical applications that can’t afford a
failure.
2. Fast linear-scale performance – Cassandra is linearly scalable therefore it
increases your throughput as you increase the number of nodes in the
cluster. It maintains a quick response time.
2. Transaction support – Cassandra supports properties like Atomicity,
Consistency, Isolation, and Durability (ACID) properties of transactions.
Architecture of Apache Cassandra
Avinash Lakshman and Prashant Malik initially developed Cassandra at
Facebook to power the Facebook inbox search feature. Facebook released
Cassandra as an open source project on google code in July 2008. It
became an Apache incubator project in March 2009. It became one of the
top level project in 17 Feb 2010. Fueled by the internet revolution, mobile
devices, and e-commerce, modern applications have outgrown relational
databases. Out of necessity, a new generation of databases has emerged
to address large-scale, globally distributed data management challenges.
Cassandra powers online services and mobile backend for some of the
world’s most recognizable brands, including Apple, Netflix, and Facebook.
Architecture of Apache Cassandra:

In this section we will describe the following component of Apache
Cassandra.
Basic Terminology:
Node
Data center Cluster
Operations:
Read Operation
Write Operation
Storage Engine:
ommitLog
UNIT-4 DATABASE
TECHNOLOGIES
Memtables SSTables
Data Replication Strategies
let’s discuss one by one.
Basic Terminology:
1. Node:
Node is the basic component in Apache Cassandra. It is the place where actually
data is stored. For Example:As shown in diagram node which has IP address
10.0.0.7 contain data (keyspace which contain one or more tables).
Figure – Node
2.Data Center:
Data Center is a collection of nodes. For example:
DC – N1 + N2 + N3 ….
DC: Data Center N1: Node 1
N2: Node 2
N3: Node 3
•Cluster:
It is the collection of many data centers. For
example:
C = DC1 + DC2 + DC3….
C: Cluster
DC1: Data Center 1 DC2: Data Center 2
DC3: Data Center 3
UNIT-4 DATABASE
TECHNOLOGIES
Figure – Node, Data center, Cluster

Operations:
1.Read Operation:
In Read Operation there are three types of read requests
that a coordinator can send to a replica. The node that
accepts the write requests called coordinator for
that particular operation.
1. Step-1: Direct Request:
In this operation coordinator node sends the read request
to one of the replicas.
1. Step-2: Digest Request:
In this operation coordinator will contact to replicas
specified by the consistency level. For Example:
CONSISTENCY TWO; It simply means that Any two nodes in
data center will acknowledge.
 Step-3: Read Repair Request:
If there is any case in which data is not consistent across
the node then background Read Repair Request initiated
that makes sure that the most recent data is available
across the nodes.
1.Write Operation:
 Step-1:
In Write Operation as soon as we receives request then it
is first dumped into commit log to make sure that data is
saved.
1. Step-2:
Insertion of data into table that is also written in
MemTable that holds the data till it’s get full.
 Step-3:
If MemTable reaches its threshold then data is flushed to
SS Table.
UNIT-4 DATABASE
TECHNOLOGIES
Figure – Write Operation in Cassandra

Storage Engine:
1.Commit log:
Commit log is the first entry point while writing to disk or
memTable. The
purpose of commit log in apache Cassandra is to server sync
issues if a data node is down.
1.Mem-table:
After data written in Commit log then after that data is
written in Mem- table. Data is written in Mem-table
temporarily.
•SSTable:
Once Mem-table will reach a certain threshold then data will
flushed to the SSTable disk file.
Data Replication Strategy:
Basically it is used for backup to ensure no single point of
failure. In this strategy Cassandra uses replication to achieve
high availability and durability. Each data item is replicated at
N hosts, where N is the replication factor configured \per-
instance”.
There are two type of replication Strategy: Simple Strategy,

and Network Topology Strategy. These are explained as
following below.
1. Simple Strategy:
In this Strategy it allows a single integer RF
(replication_factor) to be defined. It determines the number
of nodes that should contain a copy of each row. For
example, if replication_factor is 2, then two different nodes
should store a copy of each row. It treats all nodes
identically, ignoring any configured datacenters or
racks.
CQL(Cassandra Query language) query for Simple Strategy. A

keyspace is created using a CREATE KEYSPACE statement:
create_keyspace_statement ::=
CREATE KEYSPACE [ IF NOT EXISTS ]
keyspace_name WITH options
For instance:
UNIT-4 DATABASE
TECHNOLOGIES
WITH replication = {'class':

'SimpleStrategy',
'replication_factor' : 2};
To check keyspace Schema used the following CQl query.
DESCRIBE KEYSPACE User_data
Pictorial Representation of Simple Strategy.
Figure – Simple Strategy

2. Network Topology Strategy:
In this strategy it allows a replication factor to be specified for each datacenter in
the cluster. Even if your cluster only uses a single datacenter. This Strategy should
be preferred over SimpleStrategy to make it easier to add new physical or virtual
datacenters to the cluster later.
CQL(Cassandra Query language) query for Network Topology Strategy.

CREATE KEYSPACE User_data
WITH replication = {'class': 'NetworkTopologyStrategy', 'DC1' :
2, 'DC2' : 3}
AND durable_writes = false;
To check keyspace Schema used the following CQl query.
DESCRIBE KEYSPACE User_data
Pictorial Representation of Network Topology Strategy.
UNIT-4 DATABASE
TECHNOLOGIES
Figure – Network Topology Strategy

Table Structure in Cassandra:
USE User_data;
CREATE TABLE User_table ( User_id int,

User_name text, User_add text, User_phone
text, PRIMARY KEY (User_id)
);
Insert into User_data (User_id, User_name,

User_add, User_phone )
VALUES(1000, ‘Ashish’, ‘Noida’,
‘8077178539’);
VALUES(1001, ‘Ashish Gupta’, ‘Bangalore’);
VALUES(1002, ‘Abi’);
Output:
UNIT-4 DATABASE
TECHNOLOGIES
Figure – Table Structure

Application of Apache Cassandra:
Some of the application use cases that Cassandra excels in include:
Real-time, big data workloads
Time series data management
High-velocity device data consumption and analysis
Media streaming management (e.g., music, movies)
Social media (i.e., unstructured data) input and analysis
Online web retail (e.g., shopping carts, user transactions)
Real-time data analytics
Online gaming (e.g., real-time messaging)
Software as a Service (SaaS) applications that utilize web services
Online portals (e.g., healthcare provider/patient interactions)
Most write-intensive systems
Redis
Introduction to Redis
Redis is an in-memory data structure that is used for faster access to data. It is used
to store data that needs to be accessed frequently and fast. It is not used for storing
large amounts of data. If you want to store and retrieve large amounts of data you
need to use a traditional database such as MongoDB or MYSQL. Redis provides a
variety of data structures such as sets, strings, hashes, and lists.
The Redis server is a program that runs and stores data in memory.
You can just connect to that server and can use it to store and retrieve data faster.
For that reason, Redis is not used for persistent storing of data as complete data
will be lost if the system crashes.
Redis is scalable as you can run multiple instances of the server.
UNIT-4 DATABASE
TECHNOLOGIES
 It is often used as a cache that stores data temporarily and provides faster
access to frequently used data.
When to use Redis Server?
Consider you have a MySQL database and you are constantly querying the
database which reads the data from the secondary storage, computes the
result, and returns the result.
If the data in the database is not changing much you can just store the results
of the query in redis-server and then instead of querying the database
which is going to take 100-1000 milliseconds, you can just check whether
the result of the query is already available in redis or not and return it
result which is going to be much faster as it is already available in the
memory.
Note: In a messaging app, Redis can be used to store the last five messages
that the user has sent and received using the built-list data structure
provided in Redis.
Advantages of Redis Server
• High Performance:
 Redis excels in terms of performance due to its in-memory
nature. It can deliver extremely fast read and write operations,
making it suitable for scenarios where low-latency is critical.
• Simple and Easy-to-Use API:
 Redis has a straightforward API that consists of simple and
intuitive commands, making it easy for developers to use and
integrate into their applications.
• Data Structures:
 Redis supports a variety of data structures, including strings,
lists, sets, hashes, and more. This versatility allows developers to
model their data more effectively, choosing the right data
structure for the task at hand.
• Atomic Operations:
 Redis supports atomic operations on these data structures,
making it a great fit for scenarios that require consistency and
reliability in multi-step operations.
• Persistence Options:
 While Redis is an in-memory database, it provides persistence
options such as snapshots and append-only files. This allows
users to configure the level of durability needed for their specific use case.
• Replication and High Availability:
 Redis supports master-slave replication, enabling the creation
of replicas of the master server. This provides high availability and fault
tolerance in case the master node fails.
Disadvantages of Redis Server
 Persistence Mechanism Complexity:
UNIT-4 DATABASE
TECHNOLOGIES
 Redis is an in-memory database, and while it supports

persistence, the mechanisms for achieving this (such as
snapshots and append-only files) can be complex and may
impact performance.
 Limited Query Capability:
 Redis is not a full-fledged relational database and lacks the
complex querying capabilities of traditional databases. It
primarily operates on key-value pairs and offers basic data
structures like strings, lists, sets, and hashes.
 Memory Usage:
 Since Redis stores all its data in memory, the amount of data it
can handle is limited by the available system memory. Large
datasets may require significant memory resources, which can
be a potential constraint.
 Single-Threaded Nature:
 Redis traditionally uses a single-threaded event-loop
architecture. While this design simplifies certain aspects of the
system, it may limit performance on multi-core systems.
However, recent versions of Redis have introduced multi-
threading in some parts to address this limitation.
 No Built-in Security Features:
 Redis initially lacked built-in security features, and it was
recommended to be run in trusted environments.
 While newer versions include authentication mechanisms, it’s
essential to configure and manage these security features
properly.
KEY VALUE DATABASES:
A key-value data model or database is also referred to as a key-

value store. It is a non-relational type of database. In this, an
associative array is used as a basic database in which an
individual key is linked with just one value in a collection. For
the values, keys are special identifiers. Any kind of entity
can be valued. The collection of key-value pairs stored on
separate records is called key-value databases and they do not
have an already defined structure.
UNIT-4 DATABASE
TECHNOLOGIES
How do key-value databases work?
A number of easy strings or even a complicated entity are referred to as a value that
is associated with a key by a key-value database, which is utilized to monitor the
entity. Like in many programming paradigms, a key-value database resembles a map
object or array, or dictionary, however, which is put away in a tenacious manner and
controlled by a DBMS.
An efficient and compact structure of the index is used by the key-value store to
have the option to rapidly and dependably find value using its key. For example,
Redis is a key-value store used to tracklists, maps, heaps, and primitive types (which
are simple data structures) in a constant database. Redis can uncover a very
basic point of interaction to query and manipulate value types, just by supporting a
predetermined number of value types, and when arranged, is prepared to do high
throughput.
When to use a key-value database:
Here are a few situations in which you can use a key-value database:-
User session attributes in an online app like finance or gaming, which is referred to
as real-time random data access.
Caching mechanism for repeatedly accessing data or key-based design.
The application is developed on queries that are based on keys.
UNIT-4 DATABASE
TECHNOLOGIES
Features:
One of the most un-complex kinds of NoSQL data models.

For storing, getting, and removing data, key-value databases utilize simple
functions.
Querying language is not present in key-value databases.
Built-in redundancy makes this database more reliable.
Advantages:
It is very easy to use. Due to the simplicity of the database, data can accept any
kind, or even different kinds when required.
Its response time is fast due to its simplicity, given that the remaining
environment near it is very much constructed and improved.
Key-value store databases are scalable vertically as well as horizontally.
Built-in redundancy makes this database more reliable.
Disadvantages:
As querying language is not present in key-value databases, transportation of

queries from one database to a different database cannot be done.
The key-value store database is not refined. You cannot query the database without
a key.
Some examples of key-value databases:
Here are some popular key-value databases which are widely used:
Couchbase: It permits SQL-style querying and searching for text.
Amazon DynamoDB: The key-value database which is mostly used is
Amazon DynamoDB as it is a trusted database used by a large number of users. It
can easily handle a large number of requests every day and it also provides various
security options.
Riak: It is the database used to develop applications.
Aerospike: It is an open-source and real-time database working with billions of
exchanges.
Berkeley DB: It is a high-performance and
open-source database providing scalability.
Document Data Model:
A Document Data Model is a lot different than other data models because it stores
data in JSON, BSON, or XML documents. in this data model, we can move documents
under one document and apart from this, any particular elements can be indexed to
UNIT-4 DATABASE
TECHNOLOGIES
run queries faster. Often documents are stored and retrieved in such a way that it
becomes close to the data objects which are used in many applications which means
very less translations are required to use data in applications. JSON is a
native language that is often used to store and query data too.
So in the document data model, each document has a key-value pair below is an
example for the same.
{
"Name" : "Yashodhra",
"Address" : "Near Patel Nagar",
"Email" : "yahoo123@yahoo.com", "Contact" : "12345"
}
Working of Document Data Model:
This is a data model which works as a semi-structured data model in which
the records and data associated with them are stored in a single document which
means this data model is not completely unstructured. The main thing is that data
here is stored in a document.
Features:
Document Type Model: As we all know data is stored in documents rather

than tables or graphs, so it becomes easy to map things in many programming
languages.
Flexible Schema: Overall schema is very much flexible to support this
statement one must know that not all documents in a collection need to have the
same fields.
Distributed and Resilient: Document data models are
very much
dispersed which is the reason behind horizontal scaling and distribution of data.
Manageable Query Language: These data models are the ones in which query
language allows the developers to perform CRUD (Create Read
Update Destroy) operations on the data model.
Examples of Document Data Models :
Amazon DocumentDB
MongoDB
Cosmos DB
ArangoDB
Couchbase Server
CouchDB
UNIT-4 DATABASE
TECHNOLOGIES
Advantages:
Schema-less: These are very good in retaining existing data at massive
volumes because there are absolutely no restrictions in the format and the structure
of data storage.
Faster creation of document and maintenance: It is very simple to create a
document and apart from this maintenance requires is almost nothing.
Open formats: It has a very simple build process that uses XML, JSON,
and its other forms.
Built-in versioning: It has built-in versioning which means as the
documents grow in size there might be a chance they can grow in
complexity. Versioning decreases conflicts.
Disadvantages:
Weak Atomicity: It lacks in supporting multi-document ACID transactions. A change
in the document data model involving two collections will require us to run two
separate queries i.e. one for each collection. This is where it breaks atomicity
requirements.
Consistency Check Limitations: One can search the collections and
documents that are not connected to an author collection but doing this might
create a problem in the performance of database performance.
Security: Nowadays many web applications lack security which in turn results
in the leakage of sensitive data. So it becomes a point of concern, one must pay
attention to web app vulnerabilities.
Applications of Document Data Model :
Content Management: These data models are very much used in creating various
video streaming platforms, blogs, and similar services Because each is stored as
a single document and the database here is much easier to maintain as the service
evolves over time.
Book Database: These are very much useful in making book databases
because as we know this data model lets us nest.
Catalog: When it comes to storing and reading catalog files these data
models are very much used because it has a fast reading ability if incase Catalogs
have thousands of attributes stored.
Analytics Platform: These data models are very much used in the
Analytics Platform.
Columnar Data Model of NoSQL
The Columnar Data Model of NoSQL is important. NoSQL databases are different
from SQL databases. This is because it uses a data model that has a different
structure than the previously followed row-and-column table model used with
relational database management systems (RDBMS). NoSQL databases are a flexible
schema model which is designed to scale horizontally across many servers and is
used in large volumes of data.
UNIT-4 DATABASE
TECHNOLOGIES
Columnar Data Model of NoSQL :

Basically, the relational database stores data in rows and also reads the data row by
row, column store is organized as a set of columns. So if someone wants to run
analytics on a small number of columns, one can read those columns directly
without consuming memory with the unwanted data. Columns are somehow are of
the same type and gain from more efficient compression, which makes reads
faster than before. Examples of Columnar Data Model: Cassandra and Apache
Hadoop Hbase.
Working of Columnar Data Model:
In Columnar Data Model instead of organizing information into rows, it does
in columns. This makes them function the same way that tables work in
relational databases. This type of data model is much more flexible obviously
because it is a type of NoSQL database. The below example will help in
understanding the Columnar data model:
S.No. Name
Row-Oriented Table: Course Branch ID
01. Tanmay B-Tech Computer 2
02. Abhishek B-Tech Electronics 5
03. Samriddha B-Tech IT 7
04. Aditi B-Tech E & TC 8
Column – Oriented Table:

S.No. Name ID
01. Tanmay 2
02. Abhishek 5
03. Samriddha 7
UNIT-4 DATABASE
TECHNOLOGIES
S.No. Name ID
04. Aditi 8
S.No. Course ID
01. B-Tech 2
02. B-Tech 5
03. B-Tech 7
04. B-Tech 8
S.No. Branch ID
01. Computer 2
02. Electronics 5
03. IT 7
04. E & TC 8
Columnar Data Model uses the concept of keyspace, which is like a schema
in relational models.
Advantages of Columnar Data Model :
Well structured: Since these data models are good at compression so these
are very structured or well organized in terms of storage.
Flexibility: A large amount of flexibility as it is not necessary for the columns
to look like each other, which means one can add new and different columns
without disrupting the whole database
UNIT-4 DATABASE
TECHNOLOGIES
 Aggregation queries are fast: The most important thing is

aggregation queries are quite fast because a majority of the information is
stored in a column. An example would be Adding up the total number
of students enrolled in one year.
 Scalability: It can be spread across large clusters of machines, even
numbering in thousands.
 Load Times: Since one can easily load a row table in a few seconds
so load times are nearly excellent.
Disadvantages of Columnar Data Model:
 Designing indexing Schema: To design an effective and working schema is
too difficult and very time-consuming.
 Suboptimal data loading: incremental data loading is suboptimal and
must be avoided, but this might not be an issue for some users.
 Security vulnerabilities: If security is one of the priorities then it must be
known that the Columnar data model lacks inbuilt security features in this
case, one must look into relational databases.
 Online Transaction Processing (OLTP): Online Transaction
Processing
(OLTP) applications are also not compatible with columnar data models
because of the way data is stored.
Applications of Columnar Data Model:
 Columnar Data Model is very much used in various Blogging Platforms.
 It is used in Content management systems like WordPress, Joomla, etc.
 It is used in Systems that maintain counters.
 It is used in Systems that require heavy write requests.
 It is used in Services that have expiring usage.
Graph Database
A graph database (GDB) is a database that uses graph structures for storing
data. It uses nodes, edges, and properties instead of tables or documents
to represent and store data. The edges represent relationships between
the nodes. This helps in retrieving data more easily and, in many cases,
with one operation. Graph databases are commonly referred to as a
NoSQL.
Representation:
The graph database is based on graph theory. The data is stored in the nodes
of the graph and the relationship between the data are represented by
the edges
between the nodes.
UNIT-4 DATABASE
TECHNOLOGIES
graph representation of data
When do we need Graph Database?

•It solves Many-To-Many relationship problems
If we have friends of friends and stuff like that, these are many to many
relationships.
Used when the query in the relational database is very complex.
•When relationships between data elements are more important
For example- there is a profile and the profile has some specific information in it
but the major selling point is the relationship between these different profiles that
is how you get connected within a network.
In the same way, if there is data element such as user data element inside a graph
database there could be multiple user data elements but the relationship is what is
going to be the factor for all these data elements which are stored inside the graph
database.
•Low latency with large scale data
When you add lots of relationships in the relational database, the data sets are
going to be huge and when you query it, the complexity is going to be more
complex and it is going to be more than a usual time. However, in graph database,
it is specifically designed for this particular purpose and one can query relationship
with ease.
Why do Graph Databases matter? Because graphs are good at handling

relationships, some databases store data in the form of a graph.
Example We have a social network in which five friends are all connected. These
UNIT-4 DATABASE
TECHNOLOGIES
friends are Anay, Bhagya, Chaitanya, Dilip, and Erica. A graph database that will
store their personal information may look something like this:
id first name last name email phone
1 Anay Agarwal anay@example.net 555-111-5555
2 Bhagya Kumar bhagya@example.net 555-222-5555
3 Chaitanya Nayak chaitanya@example.net 555-333-5555
4 Dilip Jain dilip@example.net 555-444-5555
5 Erica Emmanuel erica@example.net 555-555-5555
Now, we will also a need another table to capture the friendship/relationship

between users/friends. Our friendship table will look something like this:
user_id friend_id
1 2
1 3
1 4
1 5
UNIT-4 DATABASE
TECHNOLOGIES
2 1
2 3
2 4
2 5
3 1
3 2
3 4
3 5
4 1
4 2
4 3
4 5
5 1
5 2
5 3
5 4
We will avoid going deep into the Database(primary key & foreign key) theory.
Instead just assume that the friendship table uses id’s of both the friends. Assume
that our social network here has a feature that allows every user to see the
personal information of his/her friends. So, If Chaitanya were requesting
information then it would mean she needs information about Anay, Bhagya, Dilip
and Erica. We will approach this problem the traditional way(Relational database).
We must first identify Chaitanya’s id in the User’s table:
id first name last name email phone
3 Chaitanya Nayak chaitanya@example.net 555-333-5555
Now, we’d look for all tuples in friendship table where the user_id is 3. Resulting
relation would be something like this:
user_id friend_id
UNIT-4 DATABASE
TECHNOLOGIES
3 1
3 2
3 4
3 5
Now, let’s analyse the time taken in this Relational database approach. This will be
approximately log(N) times where N represents the number of tuples in friendship
table or number of relations. Here, the database maintains the rows in the order of
id’s. So, in general for ‘M’ no of queries, we have a time complexity
of M*log(N) Only if we had used a graph database approach, the total time
complexity would have been O(N). Because, once we’ve located Cindy in the
database, we have to take only a single step for finding her friends. Here is how our
query would be executed:
Advantages: Frequent schema changes, managing volume of data, real-time query

response time, and more intelligent data activation requirements are done by graph
model.
Disadvantages: Note that graph databases aren’t always the best solution for an
application. We will need to assess the needs of application before deciding the
architecture.
Limitations of Graph Databases:
UNIT-4 DATABASE
TECHNOLOGIES
 Graph Databases may not be offering better choice over the NoSQL
variations.
 If application needs to scale horizontally this may introduces poor
performance.
 Not very efficient when it needs to update all nodes with a given
parameter.

unit-4 DBMS

Uploaded by

Copyright:

Available Formats

unit-4 DBMS

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

unit-4 DBMS

Uploaded by

Copyright:

Available Formats

UNIT-4 DATABASE

To resolve this problem, we could “scale up” our systems by

The alternative for this issue is to distribute database load on

NoSQL database is non-relational, so it scales out better than

Brief History of NoSQL Databases

integrity joins, ACID

 Multiple NoSQL databases can be executed in a distributed

Asynchronous Multi-Master Replication, peer-to-peer, HDFS

NoSQL is Shared Nothing.

Types of NoSQL Databases

Types of NoSQL Databases:

Key Value Pair Based

Key-value pair storage databases store data as a hash table where

For example, a key-value pair may contain a key like “Website”

It is one of the most basic NoSQL database example. This kind of

Redis, Dynamo, Riak are some NoSQL examples of key-value store

Column based NoSQL database

They deliver high performance on aggregation queries like SUM,

Column-based NoSQL databases are widely used to manage data

HBase, Cassandra, HBase, Hypertable are NoSQL query examples of

Relational Vs. Document

The document type is mostly used for CMS systems, blogging

Amazon SimpleDB, CouchDB, MongoDB, Riak, Lotus Notes, MongoDB,

Compared to a relational database where tables are loosely connected,

Neo4J, Infinite Graph, OrientDB, FlockDB are some popular graph-

Query Mechanism tools for NoSQL

Document store Database offers more difficult queries as they

What is the CAP Theorem?

CAP theorem is also called brewer’s theorem. It states that is

The data should remain consistent even after the execution of an

The database should always be available and responsive. It should not

Partition Tolerance means that the system should continue to function

Data replication may not be instantaneous as some copies will be

BASE: Basically Available, Soft state, Eventual consistency

 Object-oriented programming which is easy to use and flexible

consistency when multiple transactions are performed

unique values as keys become difficult

MongoDB – Working and Features

The MongoDB database is developed and managed by MongoDB.Inc under

 Now inside of the collection we have documents. These documents

NOTE: In MongoDB server, you are allowed to run multiple databases.

For example, we have a database named GeeksforGeeks. Inside this

How mongoDB is different from RDBMS ?

It is a non-relational and document- It is a relational database.

It is suitable for hierarchical data It is not suitable for hierarchical data

It has a dynamic schema. It has a predefined schema.

It centers around the CAP theorem It centers around ACID properties

In terms of performance, it is much In terms of performance, it is slower

 Schema-less Database: It is the great feature provided by the MongoDB.

 Document Oriented: In MongoDB, all the data stored in the documents

 It is a document oriented database and the data is stored in BSON

Ituses high memory for data storage.

Ring structure with token range.

2. Flexible data storage – Cassandra accommodates all possible data formats

Architecture of Apache Cassandra:

Figure – Node, Data center, Cluster

Figure – Write Operation in Cassandra

There are two type of replication Strategy: Simple Strategy,

CQL(Cassandra Query language) query for Simple Strategy. A

WITH replication = {'class':