Module 5 Dbms Notes bcs403
Module 5 Dbms Notes bcs403
Module 5
Chapter 21.1 to 21.5
21.1 Two-phase Locking Techniques for Concurrency Control
21.2 Concurrency Control Based on Timestamp Ordering
21.3 Multiversion Concurrency Control techniques
21.4 Validation Techniques and Snapshot Isolation
21.5 Granularity of Data Items and Multiple Granularity
Chapter 24.1 to 24.6
24.1 Introduction to NOSQL System
24.2 Te CAP Theorem
24.3 Document-Based NOSQL Systems and MongoDB
24.4 NOSQL Key-Value Stores
24.5 Column-Based or Wide Column NOSQL System
24.6 NOSQL Graph Databases and Neo4j
1. Briefly discuss the Two-Phase locking protocol used in concurrency control. June/July 2023,
10 Marks
2. Explain different types of lock in concurrency control. Feb/Mar 2022, 6 Marks
3. Briefly explain Two-Phase locking protocol used in concurrency control. Jan/Feb 2023, 6
Marks
4. Briefly discuss the Two-Phase locking protocol used in concurrency control. Jan/Feb 2021,
10 Marks
Two-Phase Locking (2PL) is a concurrency control method used in database systems to ensure
serializability, which means transactions are executed in such a way that the result is the same as if the
transactions were executed in a sequential, non-overlapping manner.
For this purpose, a different type of lock called a multiple-mode lock is used Dept. of CSE, Dr.TTIT, KGF 12
In this scheme called shared/exclusive or read/write locks there are three locking operations: read_lock(X),
write_lock(X), and unlock(X).
A read-locked item is also called share-locked because other transactions are allowed to read the item,
whereas a write-locked item is called exclusive-locked because a single transaction exclusively holds the
lock on the item
Every transaction is issued a timestamp based on when it enters the system. Suppose, if an old
transaction Ti has timestamp TS(Ti), a new transaction Tj is assigned timestamp TS(Tj) such
that TS(Ti) < TS(Tj).
Timestamps values are:
The concurrency algorithm must check whether conflicting operations violate timestamps odering in
the following cases:
5.What is NOSQL? explain the CAP theorem 6 marks Model Question Paper
6.What is document based NOSQL systems? Explain basic CRUD in MongoDB? 10 marks Model
Question Paper
7.What is NOSQL graph Database? Explain Neo4j? 10 marks Model Question Paper
5.NoSQL is a type of database management system (DBMS) that is designed to handle and store large
volumes of unstructured and semi-structured data.
Unlike traditional relational databases that use tables with pre-defined schemas to store data, NoSQL
databases use flexible data models that can adapt to changes in data structures and are capable of scaling
horizontally to handle growing amounts of data.
The term NoSQL originally referred to “non-SQL” or “non-relational” databases, but the term has
since evolved to mean “not only SQL,” as NoSQL databases have expanded to include a wide range of
different database architectures and data models.
The CAP theorem, originally introduced as the CAP principle, can be used to explain some of the
competing requirements in a distributed system with replication.
It is a tool used to make system designers aware of the trade-offs while designing networked shared-
data systems.
The three letters in CAP refer to three desirable properties of distributed systems with replicated
data: consistency (among replicated copies), availability (of the system for read and write operations)
and partition tolerance (in the face of the nodes in the system being partitioned by a network fault).
The CAP theorem states that it is not possible to guarantee all three of the desirable properties –
consistency, availability, and partition tolerance at the same time in a distributed system with data
replication.
The theorem states that networked shared-data systems can only strongly support two of the
following three properties:
1.Consistency –
Consistency means that the nodes will have the same copies of a replicated data item visible for various
transactions. A guarantee that every node in a distributed cluster returns the same, most recent and a
successful write. Consistency refers to every client having the same view of the data. There are various types
of consistency models. Consistency in CAP refers to sequential consistency, a very strong form of
consistency.
2.Availability –
Availability means that each read or write request for a data item will either be processed successfully or will
receive a message that the operation cannot be completed. Every non-failing node returns a response for all
the read and write requests in a reasonable amount of time. The key word here is “every”. In simple terms,
every node (on either side of a network partition) must be able to respond in a reasonable amount of time.
3.PartitionTolerance –
Partition tolerance means that the system can continue operating even if the network connecting the nodes
has a fault that results in two or more partitions, where the nodes in each partition can only communicate
among each other. That means, the system continues to function and upholds its consistency guarantees in
spite of network partitions. Network partitions are a fact of life. Distributed systems guaranteeing partition
tolerance can gracefully recover from partitions once the partition heals.
6.
A Document Data Model is a lot different than other data models because it stores data in JSON, BSON,
or XML documents.
In this data model, we can move documents under one document and apart from this, any particular
elements can be indexed to run queries faster.
Often documents are stored and retrieved in such a way that it becomes close to the data objects which
are used in many applications which means very less translations are required to use data in applications.
JSON is a native language that is often used to store and query data too.
So in the document data model, each document has a key-value pair below is an example for the same.
{
"Name" : "Yashodhra",
"Address" : "Near Patel Nagar",
"Email" : "yahoo123@yahoo.com",
"Contact" : "12345"
}
working of Document Data Model:
This is a data model which works as a semi-structured data model in which the records and data associated
with them are stored in a single document which means this data model is not completely unstructured.
The main thing is that data here is stored in a document.
Features:
1.Document Type Model: As we all know data is stored in documents rather than tables or graphs, so it
becomes easy to map things in many programming languages.
2.Flexible Schema: Overall schema is very much flexible to support this statement one must know that not
all documents in a collection need to have the same fields.
3.Distributed and Resilient: Document data models are very much dispersed which is the reason behind
horizontal scaling and distribution of data.
4.Manageable Query Language: These data models are the ones in which query language allows the
developers to perform CRUD (Create Read Update Destroy) operations on the data model.
Most of the transactional interactions that a user has with a digital platform, like a website or web application,
include requests for four basic operations:
CRUD is data-oriented, and it's standardized according to HTTP action verbs. The front end of an application
captures the information and sends it as an HTTP request to the middleware, which calls the appropriate
database functions to complete the task. These four basic functions are collectively called CRUD, the
acronym for create, read, update, and delete.
Create operation
For MongoDB CRUD, if the specified collection doesn't exist, the create operation will create the collection
when it's executed. Create operations in MongoDB target a single collection, not multiple collections. Insert
operations in MongoDB are atomic on a single document level.
MongoDB provides two different create operations that you can use to insert documents into a collection:
db.collection.insertOne()
db.collection.insertMany()
Read operations
The read operations allow you to supply special query filters and criteria that let you specify which
documents you want. The MongoDB documentation contains more information on the available
query filters. Query modifiers may also be used to change how many results are returned.
db.collection.find()
db.collection.findOne()
Like create operations, update operations operate on a single collection, and they are atomic at a single
document level. An update operation takes filters and criteria to select the documents you want to update.
You should be careful when updating documents, as updates are permanent and can’t be rolled back. This
applies to delete operations as well.
For MongoDB CRUD, there are three different methods of updating documents:
db.collection.updateOne()
db.collection.updateMany()
db.collection.replaceOne()
Delete operations
Delete operations operate on a single collection, like update and create operations. Delete operations are also
atomic for a single document. You can provide delete operations with filters and criteria to specify which
documents you would like to delete from a collection. The filter options rely on the same syntax that read
operations utilize.
db.collection.deleteOne()
db.collection.deleteMany()
7.
graph database is a type of NoSQL database that is designed to handle data with complex relationships and
interconnections. In a graph database, data is stored as nodes and edges, where nodes represent entities and
edges represent the relationships between those entities.
1.Graph databases are particularly well-suited for applications that require deep and complex queries, such
as social networks, recommendation engines, and fraud detection systems. They can also be used for other
types of applications, such as supply chain management, network and infrastructure management, and
bioinformatics.
2.One of the main advantages of graph databases is their ability to handle and represent relationships between
entities. This is because the relationships between entities are as important as the entities themselves, and
often cannot be easily represented in a traditional relational database.
3.Another advantage of graph databases is their flexibility. Graph databases can handle data with changing
structures and can be adapted to new use cases without requiring significant changes to the database schema.
This makes them particularly useful for applications with rapidly changing data structures or complex data
requirements.
4.However, graph databases may not be suitable for all applications. For example, they may not be the best
choice for applications that require simple queries or that deal primarily with data that can be easily
represented in a traditional relational database. Additionally, graph databases may require more specialized
knowledge and expertise to use effectively.
Neo 4j
Figure Neo 4j
Neo4j is a native graph database, which means that it implements a true graph model all the way down to
the storage level. The data is stored as you whiteboard it, instead of as a "graph abstraction" on top of another
technology. Beyond the core graph, Neo4j also provides: ACID transactions, cluster support, and runtime
failover.
Neo4j is offered as a managed cloud service via AuraDB. But you can also run Neo4j yourself with either
Community Edition or Enterprise Edition.
The Enterprise Edition includes all the features of Community Edition, plus extra enterprise requirements
such as backups, clustering, and failover abilities.
Neo4j is written in Java and Scala, and the source code is available on GitHub.
What makes Neo4j the easiest graph to work with?
Cypher®, a declarative query language similar to SQL, but optimized for graphs. Now used by other
databases like SAP HANA Graph and Redis graph via the openCypher project.
Constant time traversals in big graphs for both depth and breadth due to efficient representation of
nodes and relationships. Enables scale-up to billions of nodes on moderate hardware.
Flexible property graph schema that can adapt over time, making it possible to materialize and add
new relationships later to shortcut and speed up the domain data when the business needs change.
Drivers for popular programming languages, including Java, JavaScript, .NET, Python, and many
more.
Use Cases: Ideal for content management systems, user profiles, product catalogs, and applications
requiring flexible and evolving schemas.
Performance and Scalability: Designed for high performance and scalability, document stores can
efficiently manage large volumes of data and handle distributed systems.
iv) Column Family Stores
Column family stores are a type of NoSQL database that organizes data into columns and column families
rather than rows.
Data Organization: Data is stored in tables, but instead of traditional rows, it is organized into columns
and column families, allowing for efficient storage and retrieval of large datasets.
High Performance: Optimized for write-heavy operations and read operations where only a subset of
columns is needed, which improves performance for specific use cases.
Scalability: Designed to scale horizontally, column family stores can handle large volumes of data across
distributed systems.
Flexibility: Schema can be dynamic, allowing different rows to have different columns within the same
column family.
Complex Query Capabilities: Supports complex queries, indexing, and filtering, but typically less
flexible than document stores in terms of query language and capabilities.
Use Cases: Suitable for time-series data, event logging, recommendation systems, and scenarios requiring
high write throughput and fast retrieval of specific columns.