unit-4 DBMS
unit-4 DBMS
unit-4 DBMS
TECHNOLOGIES
Introduction to NoSQL databases: MongoDB-Cassandra- Redis-Key-value stores-document stores-column-
family stores-graph databases
What is NoSQL?
NoSQL Database is a non-relational Data Management System, that
does not require a fixed schema. It avoids joins, and is easy to scale.
The major purpose of using a NoSQL database is for distributed data
stores with humongous data storage needs. NoSQL is used for Big data
and real-time web apps. For example, companies like Twitter,
Facebook and Google collect terabytes of user data every single day.
NoSQL database stands for “Not Only SQL” or “Not SQL.” Though a
better term would be “NoREL”, NoSQL caught on. Carl Strozz
introduced the NoSQL concept in 1998.
Traditional RDBMS uses SQL syntax to store and retrieve data for
further insights. Instead, a NoSQL database system encompasses a
wide range of database technologies that can store structured, semi-
structured, unstructured and polymorphic data. Let’s understand
about NoSQL with a diagram in this NoSQL database tutorial:
Why NoSQL?
The concept of NoSQL databases became popular with Internet giants
like Google, Facebook, Amazon, etc. who deal with huge volumes of
data. The system response time becomes slow when you use RDBMS
for massive volumes of data.
UNIT-4 DATABASE
TECHNOLOGIES
Features of NoSQL
UNIT-4 DATABASE
TECHNOLOGIES
Non-relational
NoSQL databases never follow the relational model
Never provide tables with flat fixed-column records
Work with self-contained aggregates or BLOBs
Doesn’t require object-relational mapping and data normalization
No complex features like query languages, query planners,referential
Schema-free
NoSQL databases are either schema-free or have relaxed schemas
Do not require any sort of definition of the schema of the data
Offers heterogeneous structures of data in the same domain
NoSQL is Schema-Free
Simple API
Offers easy to use interfaces for storage and querying data
provided
APIs allow low-level data manipulation & selection methods
Text-based protocols mostly used with HTTP REST with JSON
Mostly used no standard based NoSQL query language
Web-enabled databases running as internet-facing services
Distributed
UNIT-4 DATABASE
TECHNOLOGIES
throughput
Mostly no synchronous replication between distributed nodes
higher distribution.
Column-based
Column-oriented databases work on columns and are based on
BigTable paper by Google. Every column is treated separately. Values
of single column databases are stored contiguously.
Document-Oriented
Document-Oriented NoSQL DB stores and retrieves data as a key
value pair but the value part is stored as a document. The document
is stored in JSON or XML formats. The value is understood by the DB
and can be queried.
UNIT-4 DATABASE
TECHNOLOGIES
In this diagram on your left you can see we have rows and columns,
and in the right, we have a document database which has a similar
structure to JSON. Now for the relational database, you have to know
what columns you have and so on. However, for a document database,
you have data store like JSON object. You do not require to define
which make it flexible.
Graph-Based
A graph type database stores entities as well the relations amongst
those entities. The entity is stored as a node with the relationship as
edges. An edge gives a relationship between nodes. Every node and
edge has a unique identifier.
UNIT-4 DATABASE
TECHNOLOGIES
Graph base database mostly used for social networks, logistics, spatial
data.
•Consistency
•Availability
•Partition Tolerance
Consistency:
Availability:
Partition Tolerance:
Eventual Consistency
The term “eventual consistency” means to have copies of data on
multiple machines to get high availability and scalability. Thus,
changes made to any data item on one machine has to be propagated
to other replicas.
change
Eventual consistency means that the system will become consistent
over time
Advantages of NoSQL
Can be used as Primary or Analytic Data Source
Big Data Capability
No Single Point of Failure
Easy Replication
No Need for Separate Caching Layer
It provides fast performance and horizontal scalability.
Can handle structured, semi-structured, and unstructured data with
equal effect
UNIT-4 DATABASE
TECHNOLOGIES
Disadvantages of NoSQL
No standardization rules
Limited query capabilities
RDBMS databases and tools are comparatively mature
It does not offer any traditional database capabilities, like
What is MongoDB?
MongoDB is a document-oriented NoSQL database system that provides high
scalability, flexibility, and performance. Unlike standard relational databases,
MongoDB stores data in a JSON document structure form. This makes it easy
to operate with dynamic and unstructured data and MongoDB is an open-
source and cross-platform database System.
How it works ?
Now, we will see how actually thing happens behind the scene. As we know that
MongoDB is a database server and the data is stored in these databases. Or in other
words, MongoDB environment gives you a server that you can start and then create
multiple databases on it using MongoDB.
Because of its NoSQL database, the data is stored in the collections and documents.
Hence the database, collection, and documents are related to each other as shown
below:
The MongoDB database contains collections just like the MYSQL database
contains tables. You are allowed to create multiple databases and multiple
collections.
UNIT-4 DATABASE
TECHNOLOGIES
MongoDB RDBMS
Features of MongoDB –
Advantages of MongoDB :
It is a schema-less NoSQL database. You need not to design the schema of the
database when you are working with MongoDB.
It does not support join operation.
It provides great flexibility to the fields in the documents.
It contains heterogeneous data.
It provides high performance, availability, scalability.
It supports Geospatial efficiently.
UNIT-4 DATABASE
TECHNOLOGIES
Disadvantages of MongoDB :
Introduction to Cassandra
UNIT-4 DATABASE
TECHNOLOGIES
Apache Cassandra is used to manage very large amounts of structure data spread
out across the world. It provides highly available service with no single point of
failure. Listed below are some points of Apache Cassandra:
It is scalable, fault-tolerant, and consistent.
It is column-oriented database.
Its distributed design is based on Amazon’s Dynamo and its data model on Google’s
Big table.
It is Created at Facebook and it differs sharply from relational database
management systems.
Cassandra implements a Dynamo-style replication model with no single point of
failure but its add a more powerful “column family” data model. Cassandra is being
used by some of the biggest companies such as Facebook, Twitter, Cisco,
Rackspace, eBay, Netflix, and more. The design goal of a Cassandra is to handle big
data workloads across multiple nodes without any single point of failure. Cassandra
has peer-to-peer distributed system across its nodes, and data is distributed among
all the nodes of the cluster. All the nodes of Cassandra in a cluster play the same
role. Each node is independent, at the same time interconnected to other nodes.
Each node in a cluster can accept read and write requests, regardless of where the
data is actually located in the cluster. When a node goes down, read/write request
can be served from other nodes in the network. Features of Cassandra: Cassandra
has become popular because of its technical features. There are some of the
features of Cassandra:
1. Easy data distribution – It provides the flexibility to distribute data where
you need by replicating data across multiple data centers. for example: If there are 5
node let say N1, N2, N3, N4, N5 and by using partitioning algorithm we will decide
the token range and distribute data accordingly. Each node have specific token
range in which data will be distribute. let’s have a look on diagram for better
understanding.
Cassandra powers online services and mobile backend for some of the
world’s most recognizable brands, including Apple, Netflix, and Facebook.
Memtables SSTables
Data Replication Strategies
let’s discuss one by one.
Basic Terminology:
1. Node:
Node is the basic component in Apache Cassandra. It is the place where actually
data is stored. For Example:As shown in diagram node which has IP address
10.0.0.7 contain data (keyspace which contain one or more tables).
Figure – Node
2.Data Center:
Data Center is a collection of nodes. For example:
DC – N1 + N2 + N3 ….
DC: Data Center N1: Node 1
N2: Node 2
N3: Node 3
•Cluster:
It is the collection of many data centers. For
example:
C = DC1 + DC2 + DC3….
C: Cluster
DC1: Data Center 1 DC2: Data Center 2
DC3: Data Center 3
UNIT-4 DATABASE
TECHNOLOGIES
Redis
Introduction to Redis
Redis is an in-memory data structure that is used for faster access to data. It is used
to store data that needs to be accessed frequently and fast. It is not used for storing
large amounts of data. If you want to store and retrieve large amounts of data you
need to use a traditional database such as MongoDB or MYSQL. Redis provides a
variety of data structures such as sets, strings, hashes, and lists.
The Redis server is a program that runs and stores data in memory.
You can just connect to that server and can use it to store and retrieve data faster.
For that reason, Redis is not used for persistent storing of data as complete data
will be lost if the system crashes.
Redis is scalable as you can run multiple instances of the server.
UNIT-4 DATABASE
TECHNOLOGIES
It is often used as a cache that stores data temporarily and provides faster
access to frequently used data.
When to use Redis Server?
Consider you have a MySQL database and you are constantly querying the
database which reads the data from the secondary storage, computes the
result, and returns the result.
If the data in the database is not changing much you can just store the results
of the query in redis-server and then instead of querying the database
which is going to take 100-1000 milliseconds, you can just check whether
the result of the query is already available in redis or not and return it
result which is going to be much faster as it is already available in the
memory.
Note: In a messaging app, Redis can be used to store the last five messages
that the user has sent and received using the built-list data structure
provided in Redis.
Advantages of Redis Server
• High Performance:
Redis excels in terms of performance due to its in-memory
nature. It can deliver extremely fast read and write operations,
making it suitable for scenarios where low-latency is critical.
• Simple and Easy-to-Use API:
Redis has a straightforward API that consists of simple and
intuitive commands, making it easy for developers to use and
integrate into their applications.
• Data Structures:
Redis supports a variety of data structures, including strings,
lists, sets, hashes, and more. This versatility allows developers to
model their data more effectively, choosing the right data
structure for the task at hand.
• Atomic Operations:
Redis supports atomic operations on these data structures,
making it a great fit for scenarios that require consistency and
reliability in multi-step operations.
• Persistence Options:
While Redis is an in-memory database, it provides persistence
options such as snapshots and append-only files. This allows
users to configure the level of durability needed for their specific use case.
• Replication and High Availability:
Redis supports master-slave replication, enabling the creation
of replicas of the master server. This provides high availability and fault
tolerance in case the master node fails.
Disadvantages of Redis Server
Persistence Mechanism Complexity:
UNIT-4 DATABASE
TECHNOLOGIES
A number of easy strings or even a complicated entity are referred to as a value that
is associated with a key by a key-value database, which is utilized to monitor the
entity. Like in many programming paradigms, a key-value database resembles a map
object or array, or dictionary, however, which is put away in a tenacious manner and
controlled by a DBMS.
An efficient and compact structure of the index is used by the key-value store to
have the option to rapidly and dependably find value using its key. For example,
Redis is a key-value store used to tracklists, maps, heaps, and primitive types (which
are simple data structures) in a constant database. Redis can uncover a very
basic point of interaction to query and manipulate value types, just by supporting a
predetermined number of value types, and when arranged, is prepared to do high
throughput.
Here are a few situations in which you can use a key-value database:-
User session attributes in an online app like finance or gaming, which is referred to
as real-time random data access.
Caching mechanism for repeatedly accessing data or key-based design.
The application is developed on queries that are based on keys.
UNIT-4 DATABASE
TECHNOLOGIES
Features:
Advantages:
It is very easy to use. Due to the simplicity of the database, data can accept any
kind, or even different kinds when required.
Its response time is fast due to its simplicity, given that the remaining
environment near it is very much constructed and improved.
Key-value store databases are scalable vertically as well as horizontally.
Built-in redundancy makes this database more reliable.
Disadvantages:
Here are some popular key-value databases which are widely used:
Couchbase: It permits SQL-style querying and searching for text.
Amazon DynamoDB: The key-value database which is mostly used is
Amazon DynamoDB as it is a trusted database used by a large number of users. It
can easily handle a large number of requests every day and it also provides various
security options.
Riak: It is the database used to develop applications.
Aerospike: It is an open-source and real-time database working with billions of
exchanges.
Berkeley DB: It is a high-performance and
open-source database providing scalability.
Document Data Model:
A Document Data Model is a lot different than other data models because it stores
data in JSON, BSON, or XML documents. in this data model, we can move documents
under one document and apart from this, any particular elements can be indexed to
UNIT-4 DATABASE
TECHNOLOGIES
run queries faster. Often documents are stored and retrieved in such a way that it
becomes close to the data objects which are used in many applications which means
very less translations are required to use data in applications. JSON is a
native language that is often used to store and query data too.
So in the document data model, each document has a key-value pair below is an
example for the same.
{
"Name" : "Yashodhra",
"Address" : "Near Patel Nagar",
"Email" : "yahoo123@yahoo.com", "Contact" : "12345"
}
Working of Document Data Model:
This is a data model which works as a semi-structured data model in which
the records and data associated with them are stored in a single document which
means this data model is not completely unstructured. The main thing is that data
here is stored in a document.
Features:
Advantages:
Schema-less: These are very good in retaining existing data at massive
volumes because there are absolutely no restrictions in the format and the structure
of data storage.
Faster creation of document and maintenance: It is very simple to create a
document and apart from this maintenance requires is almost nothing.
Open formats: It has a very simple build process that uses XML, JSON,
and its other forms.
Built-in versioning: It has built-in versioning which means as the
documents grow in size there might be a chance they can grow in
complexity. Versioning decreases conflicts.
Disadvantages:
Weak Atomicity: It lacks in supporting multi-document ACID transactions. A change
in the document data model involving two collections will require us to run two
separate queries i.e. one for each collection. This is where it breaks atomicity
requirements.
Consistency Check Limitations: One can search the collections and
documents that are not connected to an author collection but doing this might
create a problem in the performance of database performance.
Security: Nowadays many web applications lack security which in turn results
in the leakage of sensitive data. So it becomes a point of concern, one must pay
attention to web app vulnerabilities.
Applications of Document Data Model :
Content Management: These data models are very much used in creating various
video streaming platforms, blogs, and similar services Because each is stored as
a single document and the database here is much easier to maintain as the service
evolves over time.
Book Database: These are very much useful in making book databases
because as we know this data model lets us nest.
Catalog: When it comes to storing and reading catalog files these data
models are very much used because it has a fast reading ability if incase Catalogs
have thousands of attributes stored.
Analytics Platform: These data models are very much used in the
Analytics Platform.
Columnar Data Model of NoSQL
The Columnar Data Model of NoSQL is important. NoSQL databases are different
from SQL databases. This is because it uses a data model that has a different
structure than the previously followed row-and-column table model used with
relational database management systems (RDBMS). NoSQL databases are a flexible
schema model which is designed to scale horizontally across many servers and is
used in large volumes of data.
UNIT-4 DATABASE
TECHNOLOGIES
01. Tanmay 2
02. Abhishek 5
03. Samriddha 7
UNIT-4 DATABASE
TECHNOLOGIES
S.No. Name ID
04. Aditi 8
S.No. Course ID
01. B-Tech 2
02. B-Tech 5
03. B-Tech 7
04. B-Tech 8
S.No. Branch ID
01. Computer 2
02. Electronics 5
03. IT 7
04. E & TC 8
Columnar Data Model uses the concept of keyspace, which is like a schema
in relational models.
Advantages of Columnar Data Model :
Well structured: Since these data models are good at compression so these
are very structured or well organized in terms of storage.
Flexibility: A large amount of flexibility as it is not necessary for the columns
to look like each other, which means one can add new and different columns
without disrupting the whole database
UNIT-4 DATABASE
TECHNOLOGIES
If we have friends of friends and stuff like that, these are many to many
relationships.
Used when the query in the relational database is very complex.
For example- there is a profile and the profile has some specific information in it
but the major selling point is the relationship between these different profiles that
is how you get connected within a network.
In the same way, if there is data element such as user data element inside a graph
database there could be multiple user data elements but the relationship is what is
going to be the factor for all these data elements which are stored inside the graph
database.
When you add lots of relationships in the relational database, the data sets are
going to be huge and when you query it, the complexity is going to be more
complex and it is going to be more than a usual time. However, in graph database,
it is specifically designed for this particular purpose and one can query relationship
with ease.
Example We have a social network in which five friends are all connected. These
UNIT-4 DATABASE
TECHNOLOGIES
friends are Anay, Bhagya, Chaitanya, Dilip, and Erica. A graph database that will
store their personal information may look something like this:
1 2
1 3
1 4
1 5
UNIT-4 DATABASE
TECHNOLOGIES
2 1
2 3
2 4
2 5
3 1
3 2
3 4
3 5
4 1
4 2
4 3
4 5
5 1
5 2
5 3
5 4
We will avoid going deep into the Database(primary key & foreign key) theory.
Instead just assume that the friendship table uses id’s of both the friends. Assume
that our social network here has a feature that allows every user to see the
personal information of his/her friends. So, If Chaitanya were requesting
information then it would mean she needs information about Anay, Bhagya, Dilip
and Erica. We will approach this problem the traditional way(Relational database).
We must first identify Chaitanya’s id in the User’s table:
id first name last name email phone
Now, we’d look for all tuples in friendship table where the user_id is 3. Resulting
relation would be something like this:
user_id friend_id
UNIT-4 DATABASE
TECHNOLOGIES
3 1
3 2
3 4
3 5
Now, let’s analyse the time taken in this Relational database approach. This will be
approximately log(N) times where N represents the number of tuples in friendship
table or number of relations. Here, the database maintains the rows in the order of
id’s. So, in general for ‘M’ no of queries, we have a time complexity
of M*log(N) Only if we had used a graph database approach, the total time
complexity would have been O(N). Because, once we’ve located Cindy in the
database, we have to take only a single step for finding her friends. Here is how our
query would be executed:
Graph Databases may not be offering better choice over the NoSQL
variations.
If application needs to scale horizontally this may introduces poor
performance.
Not very efficient when it needs to update all nodes with a given
parameter.