CP4152-Database Practices-Unit-1,2
CP4152-Database Practices-Unit-1,2
CP4152-Database Practices-Unit-1,2
3 0 2 4
COURSE OBJECTIVES:
● Describe the fundamental elements of relational database management systems
● Explain the basic concepts of relational data model, entity-relationship model, relational
database design, relational algebra and SQL.
● Understand query processing in a distributed database system
● Understand the basics of XML and create well-formed and valid XML documents.
● To understand the different models involved in database security and their applications in real time
world to protect the database and information associated with them
UNIT I RELATIONAL DATA MODEL 15
Entity Relationship Model – Relational Data Model – Mapping Entity Relationship Model to Relational
Model – Relational Algebra – Structured Query Language – Database Normalization.
Suggested Activities:
Data Definition Language
● Create, Alter and Drop
● Enforce Primary Key, Foreign Key, Check, Unique and Not Null Constraints
● Creating Views
Data Manipulation Language
● Insert, Delete, Update
● Cartesian Product, Equi Join, Left Outer Join, Right Outer Join and Full Outer Join
● Aggregate Functions
● Set Operations
● Nested Queries
Transaction
Control
Language
● Commit, Rollback and Save Points
UNIT II DISTRIBUTED DATABASES, ACTIVE DATABASES AND OPEN
15
DATABASE CONNECTIVITY
Distributed Database Architecture – Distributed Data Storage – Distributed Transactions – Distributed
Query Processing – Distributed Transaction Management – Event Condition Action Model – Design
and Implementation Issues for Active Databases – Open Database Connectivity
Suggested Activities:
● Distributed Database Design and Implementation
● XML Querying
UNIT IV NOSQL DATABASES AND BIG DATA STORAGE SYSTEMS 15
NoSQL – Categories of NoSQL Systems – CAP Theorem – Document-Based NoSQL Systems and
MongoDB – MongoDB Data Model – MongoDB Distributed Systems Characteristics – NoSQL Key-
Value Stores – DynamoDB Overview – Voldemort Key-Value Distributed Data Store – Wide Column
NoSQL Systems – Hbase Data Model – Hbase Crud Operations – Hbase Storage and Distributed
System Concepts – NoSQL Graph Databases and Neo4j – Cypher Query Language of Neo4j – Big Data
– MapReduce – Hadoop – YARN
Suggested Activities:
● Creating Databases using MongoDB, DynamoDB, Voldemort Key-
Value Distributed Data Store Hbase and Neo4j.
● Writing simple queries to access databases created using MongoDB, DynamoDB,
Voldemort Key-Value Distributed Data Store Hbase and Neo4j
UNIT V DATABASE SECURITY 15
Database Security Issues – Discretionary Access Control Based on Granting and Revoking Privileges –
Mandatory Access Control and Role-Based Access Control for Multilevel Security – SQL Injection –
Statistical Database Security – Flow Control – Encryption and Public Key Infrastructures – Preserving
Data Privacy – Challenges to Maintaining Database Security – Database Survivability – Oracle Label-
Based Security.
Suggested Activities:
Implementing Access Control in Relational Databases
TOTAL: 75 PERIODS
COURSE OUTCOMES:
At the end of the course, the students will be able to
● Convert the ER-model to relational tables, populate relational databases and
formulate SQL queries on data.
● Understand and write well-formed XML documents
● Use the data control, definition, and manipulation languages of the NoSQL databases
REFERENCES:
1. R. Elmasri, S.B. Navathe, “Fundamentals of Database Systems”, Seventh
Edition, Pearson Education 2016.
2. Henry F. Korth, Abraham Silberschatz, S. Sudharshan, “Database System
Concepts”, Seventh Edition, McGraw Hill, 2019.
3. C.J.Date, A.Kannan, S.Swamynathan, “An Introduction to Database Systems,
Eighth Edition, Pearson Education, 2006
4. Raghu Ramakrishnan , Johannes Gehrke “Database Management
Systems”, Fourth Edition, McGraw Hill Education, 2015.
5. Harrison, Guy, “Next Generation Databases, NoSQL and Big Data” , First
Edition, Apress publishers, 2015
6. Thomas Cannolly and Carolyn Begg, “Database Systems, A Practical Approach to
Design, Implementation and Management”, Sixth Edition, Pearson Education, 2015.
UNIT-1 RELATIONAL DATA MODEL
Entity Relationship Model – Relational Data Model – Mapping Entity Relationship Model to
Relational Model – Relational Algebra – Structured Query Language – Database Normalization.
Database
A Database is a collection of related data organized in a way that data can be easily accessed,
managed and updated. Database can be software based or hardware based, with one sole
purpose, storing data
What is Database?
● A database is a data structure that stores organized information.
● Most databases contain multiple tables, which may each include several different fields.
o For example, a company database may include tables for products, employees, and
financial records.
o Each of these tables would have different fields that are relevant to the information
stored in the table.
o
DBMS (or) Database Management System
Database Applications:
1.1 Introduction
The entity relationship model is a collection of basic objects called entities and
relationship among those objects.
Entity-Relationship (ER) Model is based on the notion of real-world entities and relationships
among them.
While formulating real-world scenario into the database model, the ER Model creates entity set,
relationship set, general attributes and constraints.
ER Model is best used for the conceptual design of a database.
ER Model is based on
a) Entities and their attributes.
b) Relationships among entities.
An entity is a thing or object in the real world that is distinguishable from other objects.
• It maps well to the relational model. The constructs used in the ER model can easily be transformed
into relational tables.
• It is simple and easy to understand with a minimum of training. Therefore, the model can be used by
the database designer to communicate the design to the end user.
• In addition, the model can be used as a design plan by the database developer to implement a data
model in specific database management software.
a) Entity
An entity in an ER Model is a real-world entity having properties called
attributes. Every attribute is defined by its set of values called
domain.
o For example, in a school database, a student is considered as an entity. Student has
various attributes like name, age, class, etc.
o Entity Representation : A Simple rectangular box represents an Entity.
An Entity is generally a real-world object which has characteristics and holds
relationships in a DBMS.
If a Student is an Entity, then the complete dataset of all the students will be the
Entity Set
Entity set: The set of all entities of the same type is termed as an entity set.
Entity type:
An entity type defines a collection of entities that have the same attributes.
Example:
For a School Management Software, we will have to
store Student information, Teacher information, Classes, Subjects taught in each class etc.
b) Relationship
Thelogicalassociationamongentitiesiscalled
relationship. Relationships are mapped with entities in
various ways.
c)
Mapping cardinalities define the number of association between two entities. Mapping cardinalities
o one to one
o one to many
o many to one
o many to many
ER Model: Attributes
Attributes
An Attribute describes a property or characterstic of an entity.
An entity is represented by a set of attributes. Attributes are descriptive properties possessed by each
member of an entity set.
Example:
A possible attributes of customer entity are customer name, customer id, Customer Street,
customer city.
For example, Name, Age, Address etc can be attributes of a Student. An attribute is
represented using eclipse.
Key Attribute
Key attribute represents the main characterstic of an Entity. It is
used to represent a Primary key.
Ellipse with the text underlined, represents Key Attribute.
Example :
If a Student is an Entity, then student's roll no., student's name, student's age,
student's gender etc will be its attributes.
An attribute can be of many types, here are different types of attributes defined in ER database model:
1. Simple attribute: The attributes with values that are atomic and cannot be broken down
further are simple attributes. For example, student's age.
2. Composite attribute: A composite attribute is made up of more than one simple attribute. For
example, student's address will contain, house no., street name, pincode etc.
Composite Attribute for any Entity
A composite attribute is the attribute, which also has attributes.
3. Derived attribute: These are the attributes which are not present in the whole database
management system, but are derived using other attributes. For example, average age of
students in a class.
Derived Attribute for any Entity
Derived attributes are those which are derived based on other attributes, for example, age can be
derived from date of birth.
To represent a derived attribute, another dotted ellipse is created inside the main ellipse.
Relationships
● A relationship is an association among several entities.
● When an Entity is related to another Entity, they are said to have a relationship.
Example: A depositor relationship associates a customer with each account that he/she has.
Example
For example, A Class Entity is related to Student entity, because students study in classes, hence
this is a relationship.
Depending upon the number of entities involved, a degree is assigned to relationships. For
example, if 2 entities are involved, it is said to be Binary relationship, if 3 entities are involved,
it is said to be Ternary relationship, and so on
Relationship set
Relationship set : The set of all relationships of the same type is termed as a relationship set.
Key attribute : An entity type usually has an attribute whose values are distinct from each
individual entity in the collection. Such an attribute is called a key attribute.
Value set: Each simple attribute of an entity type is associated with a value set that specifies the
set of values that may be assigned to that attribute for each individual entity.
Cardinality
Mapping cardinalities or cardinality ratios express the number of entities to which another entity
can be associated. Mapping cardinalities must be one of the following:
• One to one
• One to many
• Many to one
• Many to many
• While creating relationship between two entities, we may often need to face the cardinality
problem.
• This simply means that how many entities of the first set are related to how many entities of the
second set.
a) One-to-One
• Only one entity of the first set is related to only one entity of the second set. E.g. A teacher
teaches a student.
• Only one teacher is teaching only one student.
b) One-to-Many
• Only one entity of the first set is related to multiple entities of the second set.
• E.g. A teacher teaches students. Only one teacher is teaching many students.
c) Many-to-One
• Multiple entities of the first set are related to multiple entities of the second set.
E.g. Teachers teach a student.
• Multiple entities of the first set is related to multiple entities of the second set.
E.g. Teachers teach students.
• In such a case, the entity having its own primary key is called a strong entity and the entity not
having its own primary key is called a weak entity.
• Whenever we need to relate a strong and a weak entity together, the ERD would change just a
little.
Example:
• Say, for example, we have a statement “A Student lives in a Home.” STUDENT is obviously a
strong entity having a primary key Roll.
• But HOME may not have a unique primary key, as its only attribute Address may be shared by
many homes (what if it is a housing estate?).
• HOME is a weak entity in this case.
As you can see, the weak entity itself and the relationship linking a strong and weak entity must have
double border.
a) No industry standard for notation: There is no industry standard notation for developing an
E-R diagram.
b) Popular for high-level design: The E-R data model is especially popular for high level.
Excercises:
5.1 Introduction
The Relational Model is a depiction of how each piece of stored information relates to the other
stored information.
It shows how tables are linked, what type of links are between tables, what keys are used, what
information is referenced between tables.
It's an essential part of developing a normalized database structure to prevent repeat and
redundant data storage.
The basic idea behind the relational model is that a database consists of a series of unordered
tables (or relations) that can be manipulated using non-procedural operations that return tables.
The RELATIONAL database model is based on the Relational Algebra, set theory and predicate
logic.
It is commonly thought that the word relational in the relational model comes from the fact that you
relate together tables in a relational database.
Example:
● The columns enumerate the various attributes of the entity (the employee's name, address
or phone number, for example), and a row is an actual instance of the entity (a specific
employee) that is represented by the relation.
● As a result, each tuple of the employee table represents various attributes of a single
employee.
· Tuple : Row
1. The set of relations and set of domains that defines the way data can be represented (data structure).
2. Integrity rules that define the procedure to protect the data (data integrity).
Relational Database
A rational model database is defined as a database that allows you to group its data items into one or
more independent tables that can be related to one another by using fields common to each related
table.
a) The whole data is conceptually represented as an orderly arrangement of data into rows and
columns, called a relation or table.
b) .All values are scalar. That is, at any given row/column position in the relation there is one and
only one value.
● . All operations are performed on an entire relation and result is an entire relation, a concept
known as closure.
● Dr. Codd, when formulating the relational model, chose the term "relation" because it vas
comparatively free of connotations, unlike, for example, the word "table".
● It is a common misconception that the relational model is so called because relationships are
established between tables.
● In fact, the name is derived from the relations on whom it is based.
● Notice that the model requires only that data be conceptually represented as a relation, it does
not specify how the data should be physically implemented.
● A relation is a relation provided that it is arranged in row and column format and its values are
scalar.
● Its existence is completely independent of any physical representation.
The figure shows a relation with the. Formal names of the basic components marked the entire structure
is, as we have said, a relation.
a) Tuples of a Relation
Each row of data is a tuple. Actually, each row is an n-tuple, but the "n-" is usually dropped.
b) Cardinality of a relation: The number of tuples in a relation determines its cardinality. In this
case, the relation has a cardinality of 4.
c) Degree of a relation: Each column in the tuple is called an attribute. The number of attributes
in a relation determines its degree. The relation in figure has a degree of 3.
d) Domains: A domain definition specifies the kind of data represented by the attribute.
● More- particularly, a domain is the set of all possible values that an attribute may validly
contain.
● Domains are often confused with data types, but this is inaccurate.
● Data type is a physical concept while domain is a logical one. "Number" is a data type
and "Age" is a domain.
● To give another example "StreetName" and "Surname" might both be represented as text
fields, but they are obviously different kinds of text fields; they belong to different
domains.
● Domain is also a broader concept than data type, in that a domain definition includes a
more specific description of the valid data.
For example, the domain Degree A warded, which represents the degrees awarded by a university.
In the database schema, this attribute might be defined as Text [3], but it's not just any three- character
string, it's a member of the set {BA, BS, MA, MS, PhD, LLB, MD}.
Of course, not all domains can be defined by simply listing their values. Age, for example, contains a
hundred or so values if we are talking about people, but tens of thousands if we are talking about
museum exhibits.
In such instances it's useful to define the domain in terms of the rules, which can be used to determine
the membership of any specific value in the set of all valid values.
For example, Person Age could be defined as "an integer in the range 0 to 120" whereas Exhibit Age
(age of any object for exhibition) might simply by "an integer equal to or greater than 0."
Body of a Relation: The body of the relation consists of an unordered set of zero or more tuples.
Therefore, for a table to qualify as a relation each record must be uniquely identifiable and the table
must contain no duplicate records.
Keys of a Relation
It is a set of one or more columns whose combined values are unique among all occurrences in a given
table.
A key is the relational means of specifying uniqueness. Some different types of keys are:
a) Primary key is an attribute or a set of attributes of a relation which posses the properties of
uniqueness and irreducibility (No subset should be unique).
For example: Supplier number in S table is primary key, Part number in P table is primary key
and the combination of Supplier number and Part Number in SP table is a primary key
b) Foreign key is the attributes of a table, which refers to the primary key of some another table.
Foreign key permit only those values, which appears in the primary key of the table to which it
refers or may be null (Unknown value).
For example: SNO in SP table refers the SNO of S table, which is the primary key of S table, so
we can say that SNO in SP table is the foreign key.
PNO in SP table refers the PNO of P table, which is the primary key of P table, so we can say
that PNO in SP table is the foreign key.
The database of Customer-Loan, which we discussed earlier for hierarchical model and network
model, is now represented for Relational model as shown.
In can easily understood that, this model is very simple and has no redundancy.
The total database is divided in to two tables. Customer table contains
the information about the customers with CNO as the primary key.
The Cutomer_Loan table stores the information about CNO, LNO and AMOUNT.
It has the primary key combination of CNO and LNO.
Here, CNO also acts as the foreign key and refers to CNO of Customer table.
It means, only those customer number are allowed in transaction table Cutomer_Loan that have
their entry in the master Customer table.
Relational View of Sample database
Let us take an example of a sample database consisting of supplier, parts and shipments tables. The
table structure and some sample records for supplier, parts and shipments tables are given as Tables as
shown below:
We assume that each row in Supplier table is identified bya unique SNo (Supplier Number),
which uniquely identifies the entire row of the table.Likewise each part has a unique PNo (Part
Number).
Also, we assume that no more thanone shipment exists for a given supplier/part combination_in
the shipments table.
Note that the relations Parts and Shipments have PNo (Part Number) in common andSupplier
and Shipments relations have SNo (Supplier Number) in common.
The Supplier andParts relations have City in common.
For example, the fact that supplier S3 and part P2 arelocated in the same city is represented by
the appearance of the same value, Amritsar, in thecity column of the two tuples in relations.
5.4 Operations in Relational Model
The four basic operations in Relational Models and they are as follows:
a) Insert
b) Update
c) Delete
d) Retrieve
The four operations are shown below on the sample database in relational model:
a) Insert Operation:
● Suppose we wish to insert the information of supplier who does not supply any part, can be
inserted in S table without any anomaly e.g. S4 can be inserted in Stable.
● Similarly, if we wish to insert information of a new part that is not supplied by any supplier can
be inserted into a P table.
● If a supplier starts supplying any new part, then this information can be stored in shipment table
SP with the supplier number, part number and supplied quantity.
● So, we can say that insert operations can be performed in all the cases without any anomaly.
b) Update Operation:
● Suppose supplier S1 has moved from Qadian to Jalandhar.
● In that case we need to make changes in the record, so that the supplier table is up-to- date.
● Since supplier number is the primary key in the S (supplier) table, so there is only a single entry
of S 1, which needs a single update and problem of data inconsistencies would not arise.
● Similarly, part and shipment information can be updated by a single modification in the tables P
and SP respectively without the problem of inconsistency.
● Update operation in relational model is very simple and without any anomaly in case of
relational model.
c) Delete Operation:
● Suppose if supplier S3 stops the supply of part P2, then we have to delete the shipment
connecting part P2 and supplier S3 from shipment table SP.
● This information can be deleted from SP table without affecting the details of supplier of S3 in
supplier table and part P2 information in part table.
● Similarly, we can delete the information of parts in P table and their shipments in SP table and
we can delete the information suppliers in S table and their shipments in SP table.
d) Record Retrieval:
Record retrieval methods for relational model are simple and symmetric which can be clarified with
the following queries:
Query1: Find the supplier numbers for suppliers who supply part P2.
Solution: In order to get this information we have to search the information of part P2 in the SP table
(shipment table). For this a loop is constructed to find the records of P2 and on getting the records,
corresponding supplier numbers are printed.
Algorithm
print SNO;
end;
● The relational model's disadvantages are very minor as compared to the advantages and their
capabilities far outweigh the shortcomings
● Also, the drawbacks of the relational database systems could be avoided if proper corrective
measures are taken.
● The drawbacks are not because of the shortcomings in the database model, but the way it is
being implemented.
6. MAPPING FROM ER MODEL TO RELATIONAL MODEL
6.1 Introduction
• The ER Model can be represented using ER Diagrams which is a great way of designing and
representing the database design in more of a flow chart form.
• It is very convenient to design the database
– using the ER Model by creating an ER diagram and
– later on converting it into relational model to design your tables.
• Not all the ER Model constraints and components can be directly transformed into
relational model, but an approximate schema can be derived.
• The basic idea on Real world scenario into ER Model and to Relational Model is
depicted as follows:
3) The primary key specified for the entity in the ER model, will become the primary key for the
table in relational model.
Example:
In a university, a Student enrolls in Courses. A student must be assigned to at least one or more
Courses. Each course is taught by a single Professor. To maintain instruction quality, a Professor can
deliver only one course
Example:For example, in a University database, we might have entities for Students, Courses, and
Lecturers. Students entity can have attributes like Rollno, Name, and DeptID. They might have
relationships with Courses and Lecturers.
Step 1) Entity Identification
We have three entities
Student
Course
Professor
Mapping Process for newly created table(s) from relationship mapping process
i. Create table for a relationship.
ii. Add the primary keys of all participating Entities as fields of table with their respective data
types.
iii. If relationship has any attribute, add each attribute as field of table.
iv. Declare a primary key composing all the primary keys of participating entities.
v. Declare all foreign key constraints.
Note:
· Similarly we can generate relational database schema using the ER diagram.
· We cannot import all the ER constraints into relational model, but an approximate schema can
be generated.
· There are several processes and algorithms available to convert ER Diagrams into Relational
Schema.
A special scenario
Following are some key points to keep in mind while doing so:
a) Entity gets converted into Table, with all the attributes becoming fields(columns) in the table.
b) Relationship between entities is also converted into table with primary keys of the
related entities also stored in it as foreign keys.
c) Primary Keys should be properly set.
d) For any relationship of Weak Entity, if primary key of any other entity is included in a table,
foriegn key constraint must be defined.
RELATIONAL ALGEBRA:
Relational Algebra is a formal language used to describe operations on relational database tables. It
provides a theoretical foundation for querying and manipulating relational databases. These operations
help users retrieve, filter, combine, and transform data stored in relational database systems. The results of
these operations are also tables, allowing for further manipulation and analysis.
1. **Selection (σ):** This operation selects rows from a table that satisfy a given condition. It is denoted
by the Greek letter σ (sigma). For example, selecting all employees with a certain job title: σ(JobTitle =
'Manager')(Employees).
2. **Projection (π):** This operation selects specific columns from a table while eliminating duplicates. It
is denoted by the Greek letter π (pi). For example, selecting only the names and ages of employees:
π(Name, Age)(Employees).
3. **Union ():** This operation combines the rows of two tables with the same schema, eliminating duplicates. For ex
4. **Intersection ():** This operation returns only the rows that are present in both tables, again with the same schema
5. **Difference (- or \):** This operation returns the rows present in the first table but not in the second.
For example, finding customers who bought product A but not product B: CustomersWhoBoughtA -
CustomersWhoBoughtB.
6. **Cartesian Product (×):** This operation combines every row from the first table with every row from
the second table, resulting in a new table with all possible combinations of rows. It is used less frequently
due to its potential for generating large results.
7. **Join ():** This operation combines rows from two or more tables based on a common attribute. Different types of
8. **Renaming (ρ):** This operation is used to rename relations or attributes. For example, renaming the
"EmployeeName" attribute to "Name" in the Employees table: ρ(Name/EmployeeName)(Employees).
These operations can be combined to form more complex queries. Relational Algebra serves as the
theoretical basis for query languages like SQL (Structured Query Language) that are used to interact with
relational databases. It helps database developers and users understand the underlying principles of
querying and manipulating data in a relational database management system (DBMS).
7.1 Introduction
● Using the data definition properties of SQL, one can design and modify database schema,
whereas data manipulation properties allows SQL to store and retrieve data from database.
● Two classes of languages
o Procedural – user specifies what data is required and how to get those data
o Nonprocedural – user specifies what data is required without specifying how to get
those data.
● SQL is the most widely used query language.
• SQL is a standard language for storing, manipulating and retrieving data in databases.
• SQL, Structured Query Language, is a programming language designed to manage data stored
in relational databases.
• SQL operates through simple, declarative statements.
7.1.2. Capabilities of SQL
• SQL can
– execute queries against a database
– retrieve data from a database
– insert records in a database
– update records in a database
– delete records from a database
– create new databases
– create new tables in a database
– create stored procedures in a database
– create views in a database
– set permissions on tables, procedures, and views
1) CREATE TABLE
• CREATE TABLE creates a new table in the database.
• It allows you to specify the name of the table and the name of each column in the table.
• Specification notation for defining the database schema
• Also databases, and views from RDBMS.
Syntax :
• CREATE TABLE table_name ( column_1 datatype, column_2 datatype, column_3
datatype );
Example 1:
Create database tutorials;
Create table article; Create
view for_students;
Example 2:
Create database bank;
3) TRUNCATE TABLE
• TRUNCATE command will delete all the records of a selected table and the structure will
be remain.
• TRUNCATE TABLE table_name;
4) DROP TABLE
• DROP command will delete the structure and all the records of a selected table.
• Drops commands, views, tables, and databases from RDBMS.
Syntax :
• DROP TABLE table_name
• Drop object_type object_name;
Example:
5) COMMENT
Nal info and meaning to the statements..
Syntax :
COMMENT text;
Example:
COMMENT ‘ EB SYSTEM’;
6) RENAME
To rename or change the name of the given object.
Syntax:
RENAME old_object_name TO new_object_name;
Example:
RENAME emp TO employee;
• Data Manipulation Language (DML) statements are used for managing data within
schema objects.
TYPES:
1) SELECT - retrieve data from the a database
2) INSERT - insert data into a table
3) UPDATE - updates existing data within a table,
4) DELETE - deletes all records from a table, the space for the records remain
These basic constructs allow database programmers and users to enter data and information into
the database and retrieve efficiently using a number of filter options.
a) SELECT/FROM/WHERE
The SQL SELECT Statement
SELECT Syntax
– SELECT column1, column2, ...
FROM table_name
WHERE condition
ORDER BY column1, column2, ... ASC|DESC
GROUP BY column_name
HAVING condition;
– Here, column1, column2, ... are the field names of the table you want to select data
from. If you want to select all the fields available in the table,
a)
SELECT clause This is one of the fundamental query command of SQL. It is similar to the projection op
b)
FROM from clause This clause takes a relation name as an argument from which attributes are to be sele
c)
WHERE clause This clause defines predicate or conditions, which must match in order to qualify the att
Example:
Select author_name
From book_author
Where age > 50;
This command will yield the names of authors from the relation book_author whose age is greater
than 50.
b) INSERT INTO/VALUES
This command is used for inserting values into the rows of a table (relation).
• The INSERT INTO statement is used to insert new records in a table.
INSERT INTO Syntax
• It is possible to write the INSERT INTO statement in two ways.
• The first way specifies both the column names and the values to be inserted:
– INSERT INTO table_name (column1, column2, column3, ...)
VALUES (value1, value2, value3, ...);
• If you are adding values for all the columns of the table, you do not need to
specify the column names in the SQL query.
– INSERT INTO table_name
VALUES (value1, value2, value3, ...);
Example:
c) UPDATE/SET/WHERE
This command is used for updating or modifying the values of columns in a table (relation).
• The UPDATE statement is used to modify the existing records in a table.
UPDATE Syntax
• UPDATE table_name
SET column1 = value1, column2 = value2, ...
WHERE condition;
• Note:
– Be careful when updating records in a table! Notice the WHERE clause in the
UPDATE statement.
– The WHERE clause specifies which record(s) that should be updated. If you omit the
WHERE clause, all records in the table will be updated!
Example:
UPDATE tutorials SET Author="webmaster" WHERE Author="anonymous";
d) DELETE/FROM/WHERE
This command is used for removing one or more rows from a table (relation). The
DELETE statement is used to delete existing records in a table.
DELETE Syntax
• DELETE FROM table_name [WHERE condition];
• Note:
– Be careful when deleting records in a table! Notice the WHERE clause in the
DELETE statement.
– The WHERE clause specifies which record(s) should be deleted. If you omit the
WHERE clause, all records in the table will be deleted!
Example:
DELETE FROM tutorials
WHERE Author="unknown";
• Transaction Control (TCL) statements are used to manage the changes made by DML
statements.
• It allows statements to be grouped together into logical transactions.
• A transaction is a sequence of SQL statements that Oracle treats as a single unit.
• Results of Data Manipulation Language (DML) are not permanently updated to table until
explicit or implicit COMMIT occurs
• Transaction control statements can:
• Commit data through COMMIT command
• Undo data changes through ROLLBACK command
a) COMMIT
Syntax
• COMMIT [ WORK ]
• Where WORK is supported for compliance with standard SQL.
• The statements COMMIT and COMMIT WORK are equivalent.
b) ROLLBACK
• Used to “undo” changes that have not been committed
• Occurs when:
– ROLLBACK; is executed
– System restarts after crash
Syntax
Note:
Using ROLLBACK without the TO SAVEPOINT clause performs the following
operations:
a) Ends the transaction.
b) Undoes all changes in the current transaction
c) Erases all savepoints in the transaction
d) Releases the transaction's locks
Using ROLLBACK with the TO SAVEPOINT clause performs the following operations:
a) Rolls back just the portion of the transaction after the savepoint.
b) Erases all savepoints created after that savepoint. The named savepoint is retained, so you
can roll back to the same savepoint multiple times. Prior savepoints are also retained.
c) Releases all table and row locks acquired since the savepoint. Other transactions that have
requested access to rows locked after the savepoint must continue to wait until the transaction
is committed or rolled back. Other transactions that have not already requested the rows can
request and access the rows immediately.
Examples
c) SAVEPOINT
• Identifies a point in a transaction to which you can later roll back.
Syntax
• SAVEPOINT save_point;
• Where save_point is the name of the savepoint to be created.
Example:
• To update BLAKE's and CLARK's salary, check that the total company salary does not
exceed 2,7,00, then reenter CLARK's salary,
enter:
• Data Control Language (DCL) statements gives permission(s) and if not necessary
revokes or collect back those granted permission(s).
TYPES:-
a) GRANT - gives user's access privileges to database
b) REVOKE - withdraw access privileges given with the GRANT command
a) GRANT statement
This command is related to access right and /or revoking to / from various objects of
DBMS.
Syntax:
GRANT privilege_name TO user;
This command gives access right called only CREATE privilege to the user scott.
Example:
Syntax:
REVOKE privilege_name FROM usert;
This command gives access right called only CREATE privilege to the user scott.
Example:
Thus the 4 types of SQL statements are used by any DBMS can able to complete its major
tasks.
NORMALIZATION
Introduction
What is Normalization?
Normalization is a systematic approach of decomposing tables to eliminate data redundancy
(repetition) and undesirable characteristics like Insertion, Update and Deletion Anamolies. It is a multi-
step process that puts data into tabular form, removing duplicated data from the relation tables.
Normalization is the process of reducing the duplication of the data."NF" refers to "normal
form" .
Purpose:
Normalization is used for mainly two purposes,
Eliminating reduntant(useless) data.
Ensuring data dependencies make sense i.e data is logically stored.
Rule : A table is said to be in First Normal Form (1NF) if and only if each attribute of the
relation is atomic.
That is, Each row in a table should be identified by primary key (a unique column value or group
of unique column values) No rows of data should have repeating group of column values.
An attribute (column) of a table cannot hold multiple values. It should hold only atomic values.
For a table to be in the First Normal Form, it should follow the following 4 rules:
1. It should only have single(atomic) valued attributes/columns.
2. Values stored in a column should be of the same domain
3. All the columns in a table should have unique names.
4. And the order in which data is stored, does not matter.
Example: Suppose a company wants to store the names and contact details of its employees. It creates a
table that looks like this:
Two employees (Jon & Lester) are having two mobile numbers so the company stored them in the same
field as you can see in the table above.
This table is not in 1NF as the rule says “each attribute of a table must have atomic (single)
values”, the emp_mobile values for employees Jon & Lester violates that rule.
To make the table complies with 1NF we should have the data like this:
In other words,
● Table is in 1NF (First normal form)
● No non-prime attribute is dependent on the proper subset of any candidate key of table.
For a table to be in the Second Normal form, it should be in the First Normal form and it should
not have Partial Dependency.
Partial Dependency exists, when for a composite primary key, any attribute in the table
depends only on a part of the primary key and not on the complete primary key.
To remove Partial dependency, we can divide the table, remove the attribute which is
causing partial dependency, and move it to some other table where it fits in well.
Example:
Suppose a school wants to store the data of teachers and the subjects they teach.
They create a table that looks like this: Since a teacher can teach more than one subjects, the table
can have multiple rows for a same teacher.
teacher_id subject teacher_age
111 Maths 38
111 Physics 38
222 Biology 38
333 Physics 40
333 Chemistry 40
To make the table complies with 2NF we can break it in two tables like this:
teacher_details table:
teacher_id teacher_age
111 38
222 38
333 40
teacher_subject table:
teacher_id subject
111 Maths
111 Physics
222 Biology
333 Physics
333 Chemistry
Example: Suppose a school wants to store the data of teachers and the subjects they teach. They
create a table that looks like this: Since a teacher can teach more than one subjects, the table can have
multiple rows for a same teacher.
111 Maths 38
111 Physics 38
222 Biology 38
333 Physics 40
333 Chemistry 40
To make the table complies with 2NF we can break it in two tables like this:
teacher_details table:
teacher_id teacher_age
111 38
222 38
333 40
teacher_subject table:
teacher_id Subject
111 Maths
111 Physics
222 Biology
333 Physics
333 Chemistry
Now the tables comply with Second normal form (2NF).
c) Third Normal Form (3NF):
No duplicate information is permitted.
● Transitive functional dependency of non-prime attribute on any super key should be removed.
● An attribute that is not part of any candidate key is known as non-prime attribute.
In other words 3NF can be explained like this: A table is in 3NF if it is in 2NF and for each functional
dependency X-> Y at least one of the following conditions hold:
● X is a super key of table
An attribute that is a part of one of the candidate keys is known as prime attribute.
Example: Suppose a company wants to store the complete address of each employee, they create
a table named employee_details that looks like this:
emp_id emp_name emp_zip emp_state emp_city emp_district
Example: Suppose there is a company wherein employees work in more than one department.
They store the data like this:
emp_id emp_nationality emp_dept dept_type dept_no_of_emp
1001 Austrian Production and planning D001 200
1001 Austrian
1002 American
emp_dept table:
emp_dept dept_type dept_no_of_emp
emp_dept_mapping table:
emp_id emp_dept
1001 stores
Functional dependencies:
emp_id -> emp_nationality
emp_dept -> {dept_type, dept_no_of_emp}
Candidate keys:
For first table: emp_id
For second table: emp_dept
For third table: {emp_id, emp_dept}
This is now in BCNF as in both the functional dependencies left side part is a key.While normalization
makes databases more efficient to maintain, they can also make them more complex because data is
separated into so many different tables.
*******************
UNIT II DISTRIBUTED DATABASES, ACTIVE DATABASES AND OPEN
DATABASE CONNECTIVITY
Distributed Database Architecture – Distributed Data Storage – Distributed Transactions – Distributed
Query Processing – Distributed Transaction Management – Event Condition Action Model – Design and
Implementation Issues for Active Databases – Open Database Connectivity
❖ Introduction:
For appropriate working of any business/organization, there’s a requirement for a well-organized database
management system. In the past databases used to centralize in nature. But, with the growth of globalization,
organizations lean towards expanded crosswise the world.
Because of this reason they have to choose distributed data instead of centralized system.
This was the reason concept of Distributed Databases came in picture.
Distributed Database Management System is a software system that manages a distributed database which is
partitioned and placed on different location. Its objective is to hide data distribution and appears as one logical
database system to the clients.
“A distributed database management system (DDBMS) can be defined as the software system that permits the
management of the distributed database and makes the distribution transparent to the users.”
-M. Tamer Özsu
A Distributed Database Management System allows end users or application programmers to view a pool of
physically detached databases as one logical unit. In another word, we can say distributed database is, where
different data stored among multiple locations but connected via network, and for user it represents as a single
logical unit.
D
istributed Database Management System
2 Autonomy
Autonomy, in this perspective, refers to the distribution of mechanism, not of
data. It identifies the distribution of regulator of the database system and the degree
to which each component DBMS can work independently. Autonomy is a function of
a quantity of factors such as whether the module systems interchange information,
whether they can independently accomplish transactions, and whether one is certified
to modify them. Requirements of an autonomous structure have been stated as
follows:
● The local procedures of the individual DBMSs are not affected by their
involvement in the distributed system.
● The method in which the individual DBMSs develop queries and optimize them
should not be affected by the accomplishment of global queries that access
multiple databases.
1. Client-Server Architecture:
●
Multi-database Conceptual Level Shows integrated multi- database that comprises of global logical multi-
●
Multi-database Internal Level Illustrates the data distribution across different sites and multi-database to l
●
Local database View Level Give a picture of public view of local data.
●
Local database Conceptual Level Describes local data organization at each site.
●
Local database Internal Level Shows physical data organization at each site.
Distributed data storage refers to the practice of storing data across multiple physical or logical
locations, often on different servers, nodes, or data centers. This approach offers benefits such as
improved data availability, fault tolerance, scalability, and performance. Various technologies and
architectures are used to implement distributed data storage systems. Here are some key concepts
and approaches:
1. **Replication:** Data replication involves creating and maintaining multiple copies of the same
data across different nodes. This helps enhance data availability and fault tolerance. Replicated data
can be stored on geographically dispersed servers to minimize the impact of hardware failures or
network outages.
2. **Sharding or Data Partitioning:** Sharding involves dividing a dataset into smaller subsets (shards)
based on certain criteria, such as a range of values or a hash function. Each shard is stored on a
separate node. Sharding improves data distribution and can lead to better performance by allowing
parallel processing of queries.
3. **Consistency Models:** Distributed systems need mechanisms to maintain data consistency across
replicas. Consistency models, such as strong consistency, eventual consistency, and causal
consistency, define how and when data changes are propagated to replicas.
4. **Data Synchronization:** When data is replicated, mechanisms are required to keep the replicas
synchronized. Techniques like two-phase commit, three-phase commit, and distributed consensus
protocols (e.g., Paxos, Raft) ensure that all replicas agree on updates.
5. **Distributed File Systems:** Distributed file systems provide a way to store and manage files
across multiple servers. Examples include Hadoop Distributed File System (HDFS) and Ceph.
These systems offer features like data replication, fault tolerance, and high throughput.
6. **NoSQL Databases:** Many NoSQL databases are designed with distributed storage in mind.
These databases, such as Cassandra, MongoDB, and Couchbase, provide horizontal scalability, data
distribution, and support for various consistency models.
7. **Content Delivery Networks (CDNs):** CDNs distribute content, such as web pages, images, and
videos, to multiple servers located at different geographic locations. This reduces latency for users
by serving content from a nearby server.
8. **Distributed Object Storage:** Distributed object storage systems, like Amazon S3 and OpenStack
Swift, allow users to store and retrieve objects (files) through an API. These systems distribute data
across multiple nodes, providing high availability and scalability.
9. **Distributed Database Management Systems (DDBMS):** DDBMSs manage data across multiple
nodes while providing mechanisms for data distribution, replication, and querying. Examples
include Google Spanner and CockroachDB.
10. **Blockchain and Distributed Ledgers:** Blockchain technology provides a distributed and
tamper-resistant ledger for recording transactions. Each block in the chain contains a copy of the
entire ledger, distributed across nodes in a network.
11. **Erasure Coding:** Erasure coding is a technique that breaks data into smaller fragments and
adds redundant data (parity) to enable recovery in case of data loss. It's often used in distributed
storage systems to save space compared to traditional replication.
12. **Hybrid Cloud Storage:** Hybrid cloud solutions combine on-premises storage with cloud-based
storage, allowing organizations to maintain a balance between local control and cloud scalability.
Distributed data storage solutions are widely used in various industries and applications, including
cloud computing, big data analytics, IoT (Internet of Things), content delivery, and more. They
address the challenges of data growth, accessibility, and reliability in today's interconnected and
data-intensive world.
❖ Distributed Transactions
A transaction is a program including a collection of database operations, executed as a logical unit of data
processing. The operations performed in a transaction include one or more of database operations like
insert, delete, update or retrieve data. It is an atomic process that is either performed into completion
entirely or is not performed at all. A transaction involving only data retrieval without any data update is
called read-only transaction.
Each high level operation can be divided into a number of low level tasks or operations. For example, a data update op
●
begin_transaction A marker that specifies start of transaction execution.
●
read_item or write_item Database operations that may be interleaved with main memory operations as a part of
●
rollback A signal to specify that the transaction has been unsuccessful and so all temporary changes in the datab
Transaction States
A transaction may go through a subset of five states, active, partially committed, committed, failed and
aborted.
●
Active The initial state where the transaction enters is the active state. The transaction remains in this state whil
●
Partially Committed The transaction enters this state after the last statement of the transaction has been execute
●
Committed The transaction enters this state after successful completion of the transaction and system checks ha
●
Failed The transaction goes from partially committed state or active state to failed state when it is discovered th
●
Aborted This is the state after the transaction has been rolled back after failure and the database has been restor
The following state transition diagram depicts the states in the transaction and the low level transaction
operations that causes change in states.
●
Atomicity This property states that a transaction is an atomic unit of processing, that is, either it is performed in
●
Consistency A transaction should take the database from one consistent state to another consistent state. It shou
●
Isolation A transaction should be executed as if it is the only one in the system. There should not be any interfe
●
Durability If a committed transaction brings about a change, that change should be durable in the database and
●
Serial Schedules In a serial schedule, at any point of time, only one transaction is active, i.e. there is no overlap
●
Parallel Schedules In parallel schedules, more than one transactions are active simultaneously, i.e. the transactio
Conflicts in Schedules
In a schedule comprising of multiple transactions, aconflictoccurs when two active transactions perform non-compatib
1. **Query Decomposition: ** When a query is submitted to a distributed DBMS, the query is first
decomposed into subqueries that can be executed on different nodes. These subqueries are sent to
the appropriate nodes based on data distribution and availability.
2. **Data Localization:** One of the key goals of distributed query processing is to minimize the
amount of data that needs to be transferred across the network. This is achieved by executing
subqueries on nodes where the relevant data is located, reducing the need for extensive data
movement.
3. **Query Optimization:** Each node receives its respective subquery and optimizes it based on local
data. This optimization includes selecting appropriate indexes, joining tables, and applying other
query optimization techniques to improve performance.
4. **Parallel Execution:** Once subqueries are optimized, they can be executed in parallel across
different nodes. This parallelism improves query execution speed by utilizing the computational
resources of multiple nodes simultaneously.
6. **Global Optimization:** In some cases, global optimization techniques consider the overall query
plan, considering the costs and benefits of different execution strategies on various nodes. This
helps to make decisions that improve the performance of the entire distributed query.
8. **Data Consistency and Isolation:** Distributed query processing needs to ensure proper transaction
isolation levels and consistency across distributed nodes, especially when multiple transactions are
executing concurrently.
9. **Query Result Integration:** Once subqueries are executed and their results are obtained, they
need to be integrated to produce the final query result. This integration process might involve
additional processing and computations.
11. **Caching and Materialized Views:** Caching intermediate results or using materialized views
can improve performance by reducing the need to recompute certain parts of the query during
subsequent executions.
12. **Query Routing:** Query routing mechanisms determine which node should execute which part
of the query. Load balancing and smart routing strategies contribute to efficient resource utilization.
Distributed query processing is essential in scenarios where data is distributed across multiple
locations or nodes, such as in cloud computing environments or global enterprises. It addresses the
challenges of data distribution, network latency, and optimizing the use of distributed resources to
provide users with efficient and consistent query results.
1. **Transaction Coordinator**:
- A central coordinator or transaction manager is responsible for initiating, coordinating, and
monitoring distributed transactions.
- It ensures that the ACID properties (Atomicity, Consistency, Isolation, Durability) are maintained
across participating databases.
2. **Participant Databases**:
- Each participating database is a node that can execute a portion of the distributed transaction.
- Participants must support distributed transactions and adhere to the protocols and mechanisms for
transaction coordination.
8. **Compensating Transactions**:
- Sometimes, due to unforeseen issues, a distributed transaction might need to be rolled back using
compensating transactions that undo the effects of the original transaction.
Distributed Transaction Management in DBMS is a critical aspect of maintaining data integrity and
consistency in modern distributed and cloud-based environments. Proper design, robust protocols,
and careful consideration of failure scenarios are essential for achieving reliable distributed
transactions.
The Event-Condition-Action (ECA) model is a paradigm used in Database Management Systems (DBMS)
and other computing systems to define and manage complex event-driven behaviors. It provides a way to
specify how the system should react to certain events based on predefined conditions, triggering specific
actions as a result. The ECA model is commonly used in rule-based systems and event processing
frameworks. Here's a breakdown of each component:
1. Event:
o An event is a change or occurrence in the system that triggers some response.
o Events can be internal (generated within the system) or external (coming from the
environment).
o Examples of events in a DBMS context include data changes (inserts, updates, deletes),
time-based triggers, user interactions, and more.
2. Condition:
o The condition is a logical expression or criteria that determine when the associated action(s)
should be executed.
o It defines the context under which the action(s) become relevant.
o Conditions can involve comparisons, calculations, and checks on data values, states, and
more.
3. Action:
o An action is a task, operation, or set of operations that should be performed when the
associated event occurs and the condition is satisfied.
o Actions can include data modifications, notifications, invoking procedures, sending
messages, or any other system-specific behavior.
● When an event occurs (either internally or externally), the system evaluates the corresponding
condition(s) associated with that event.
● If the condition is met, the defined action(s) are executed in response to the event and condition
combination.
The ECA model is especially useful in scenarios where you want the system to autonomously react to
certain events and conditions without manual intervention. It's commonly used for tasks like automated
notifications, enforcing business rules, triggering workflows, and more.
Example scenario using the ECA model in a DBMS context: Event: New order is placed in an online store.
Condition: Total order amount exceeds a predefined threshold. Action: Send a discount coupon to the
customer's email address.
Advantages of the ECA model:
● Flexibility: Allows the system to respond to complex events and conditions in a dynamic manner.
● Automation: Enables the automation of processes and workflows based on specific triggers.
● Customization: Offers the ability to customize responses based on conditions and events.
● Real-time Processing: Suitable for real-time event processing and reaction.
Disadvantages:
● Complexity: As the system becomes more complex, managing and debugging rules can become
challenging.
● Performance: Poorly designed ECA rules could impact system performance, especially if many
events and rules are involved.
Overall, the ECA model is a powerful approach for creating event-driven behaviors in DBMS and other
systems, but it requires careful design and management to ensure effective and efficient execution of rules
and actions.
Example:
**Scenario**: Consider a university's student registration system. Whenever a student's unpaid tuition fee
exceeds a certain threshold, the system should automatically send a notification to the student.
1. **Event**: The event in this scenario could be a change in the student's tuition fee status. Specifically,
when a new fee record is added or updated in the database.
2. **Condition**: The condition is the criteria that must be met for the action to be triggered. In this case,
the condition might be that the student's unpaid tuition fee exceeds $1,000.
3. **Action**: The action is what happens when the event and condition are satisfied. In this example, the
action is to send a notification email to the student.
- **Action**: Send an email notification to the student with a message like, "Your unpaid tuition fee has
exceeded $1,000. Please make a payment."
This ECA rule ensures that the specified action (sending a notification email) is automatically triggered
when the defined event (tuition fee update) occurs, and the specified condition (exceeding $1,000) is met
in the database system.
Active databases are database systems that are capable of proactively reacting to events and conditions
by executing predefined actions. These actions can include triggers, rules, or scripts that are
automatically executed when certain events occur or conditions are met. Designing and
implementing active databases involves addressing several important issues to ensure their
functionality, performance, and maintainability. Here are some key design and implementation
considerations for active databases:
2. **Rule Specification**:
- Define the rules or conditions that should trigger specific actions when events occur. Rules can
involve complex conditions and constraints.
- Express rules using a formal syntax or a rule language that the active database system understands.
3. **Action Execution**:
- Specify the actions that should be taken when a rule's conditions are met and an event occurs.
Actions can involve data modifications, notifications, invoking procedures, etc.
- Ensure that actions are executed efficiently and reliably, considering the potential impact on system
performance.
7. **Performance Optimization**:
- Optimize the system to handle a high volume of events and rule activations efficiently. This might
involve caching, indexing, and query optimization techniques.
Designing and implementing active databases requires a careful balance between reactivity and
system performance, as well as a clear understanding of the events, rules, and actions that will drive
the system's behavior. Proper planning, testing, and ongoing maintenance are crucial to ensure the
success of an active database system.
Open Database Connectivity (ODBC) is a standard application programming interface (API) that
enables applications to interact with various database management systems (DBMS) using a
consistent and uniform interface. ODBC allows applications to access, manipulate, and manage
data across different database platforms without needing to know the specifics of each database's
underlying architecture.
1. **Standard Interface**: ODBC provides a standard set of function calls and data structures that
applications can use to interact with databases. This standardization makes it easier for developers
to write database-independent applications.
2. **Database Independence**: ODBC abstracts the differences between various DBMS systems,
allowing applications to connect to different databases without needing to modify code
significantly.
3. **Driver Architecture**: ODBC operates based on a driver architecture. Each database vendor
provides an ODBC driver specific to their database system. These drivers translate ODBC function
calls into the appropriate commands for the underlying DBMS.
4. **Data Source Name (DSN)**: ODBC connections are established using Data Source Names,
which are typically configured through an ODBC administrator tool. DSNs store information about
the database server, authentication, and other connection details.
5. **SQL Interface**: ODBC supports the Structured Query Language (SQL), allowing applications to
execute SQL statements and retrieve results from the database.
6. **Connection Pooling**: ODBC drivers often include connection pooling, which allows the reuse of
established connections to improve performance.
7. **Metadata Retrieval**: Applications can retrieve database schema information, such as tables,
columns, and indexes, using ODBC functions.
8. **Error Handling**: ODBC provides error handling mechanisms to help applications diagnose and
handle errors that may occur during database interactions.
9. **Unicode Support**: ODBC offers Unicode support, enabling applications to work with
multilingual and international character sets.
10. **Supported Platforms**: ODBC is available on various platforms, including Windows, Linux,
macOS, and others.
11. **API Layers**: ODBC can be used directly by applications, but it's also often used as a
foundation for other database APIs and tools, such as ADO (ActiveX Data Objects) and JDBC
(Java Database Connectivity).
12. **Performance Considerations**: While ODBC provides database independence, it's important to
be mindful of performance implications, as there might be some overhead involved in translating
ODBC calls into database-specific commands.
13. **ODBC Drivers**: Each DBMS vendor provides its own ODBC driver that is specific to their
database system. These drivers need to be installed on the client machine to enable communication
with the corresponding database.