Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

CP4152-Database Practices-Unit-1,2

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 71

CS4102 ADVANCED DATABASE SYSTEMS L T P C

3 0 2 4
COURSE OBJECTIVES:
● Describe the fundamental elements of relational database management systems

● Explain the basic concepts of relational data model, entity-relationship model, relational
database design, relational algebra and SQL.
● Understand query processing in a distributed database system

● Understand the basics of XML and create well-formed and valid XML documents.

● Distinguish the different types of NoSQL databases

● To understand the different models involved in database security and their applications in real time
world to protect the database and information associated with them
UNIT I RELATIONAL DATA MODEL 15
Entity Relationship Model – Relational Data Model – Mapping Entity Relationship Model to Relational
Model – Relational Algebra – Structured Query Language – Database Normalization.
Suggested Activities:
Data Definition Language
● Create, Alter and Drop

● Enforce Primary Key, Foreign Key, Check, Unique and Not Null Constraints

● Creating Views
Data Manipulation Language
● Insert, Delete, Update

● Cartesian Product, Equi Join, Left Outer Join, Right Outer Join and Full Outer Join

● Aggregate Functions

● Set Operations

● Nested Queries
Transaction
Control
Language
● Commit, Rollback and Save Points
UNIT II DISTRIBUTED DATABASES, ACTIVE DATABASES AND OPEN
15
DATABASE CONNECTIVITY
Distributed Database Architecture – Distributed Data Storage – Distributed Transactions – Distributed
Query Processing – Distributed Transaction Management – Event Condition Action Model – Design
and Implementation Issues for Active Databases – Open Database Connectivity
Suggested Activities:
● Distributed Database Design and Implementation

● Row Level and Statement Level Triggers

● Accessing a Relational Database using PHP, Python and R


UNIT III XML DATABASES 15
Structured, Semi structured, and Unstructured Data – XML Hierarchical Data Model – XML
Documents – Document Type Definition – XML Schema – XML Documents and Databases – XML
Querying – XPath – XQuery
Suggested Activities:
● Creating XML Documents, Document Type Definition and XML Schema

● Using a Relational Database to store the XML documents as text

● Using a Relational Database to store the XML documents as data elements

● Creating or publishing customized XML documents from pre-existing relational


databases
● Extracting XML Documents from Relational Databases

● XML Querying
UNIT IV NOSQL DATABASES AND BIG DATA STORAGE SYSTEMS 15
NoSQL – Categories of NoSQL Systems – CAP Theorem – Document-Based NoSQL Systems and
MongoDB – MongoDB Data Model – MongoDB Distributed Systems Characteristics – NoSQL Key-
Value Stores – DynamoDB Overview – Voldemort Key-Value Distributed Data Store – Wide Column
NoSQL Systems – Hbase Data Model – Hbase Crud Operations – Hbase Storage and Distributed
System Concepts – NoSQL Graph Databases and Neo4j – Cypher Query Language of Neo4j – Big Data
– MapReduce – Hadoop – YARN
Suggested Activities:
● Creating Databases using MongoDB, DynamoDB, Voldemort Key-
Value Distributed Data Store Hbase and Neo4j.
● Writing simple queries to access databases created using MongoDB, DynamoDB,
Voldemort Key-Value Distributed Data Store Hbase and Neo4j
UNIT V DATABASE SECURITY 15
Database Security Issues – Discretionary Access Control Based on Granting and Revoking Privileges –
Mandatory Access Control and Role-Based Access Control for Multilevel Security – SQL Injection –
Statistical Database Security – Flow Control – Encryption and Public Key Infrastructures – Preserving
Data Privacy – Challenges to Maintaining Database Security – Database Survivability – Oracle Label-
Based Security.
Suggested Activities:
Implementing Access Control in Relational Databases
TOTAL: 75 PERIODS
COURSE OUTCOMES:
At the end of the course, the students will be able to
● Convert the ER-model to relational tables, populate relational databases and
formulate SQL queries on data.
● Understand and write well-formed XML documents

● Be able to apply methods and techniques for distributed query processing.

● Design and Implement secure database systems.

● Use the data control, definition, and manipulation languages of the NoSQL databases

REFERENCES:
1. R. Elmasri, S.B. Navathe, “Fundamentals of Database Systems”, Seventh
Edition, Pearson Education 2016.
2. Henry F. Korth, Abraham Silberschatz, S. Sudharshan, “Database System
Concepts”, Seventh Edition, McGraw Hill, 2019.
3. C.J.Date, A.Kannan, S.Swamynathan, “An Introduction to Database Systems,
Eighth Edition, Pearson Education, 2006
4. Raghu Ramakrishnan , Johannes Gehrke “Database Management
Systems”, Fourth Edition, McGraw Hill Education, 2015.
5. Harrison, Guy, “Next Generation Databases, NoSQL and Big Data” , First
Edition, Apress publishers, 2015
6. Thomas Cannolly and Carolyn Begg, “Database Systems, A Practical Approach to
Design, Implementation and Management”, Sixth Edition, Pearson Education, 2015.
UNIT-1 RELATIONAL DATA MODEL
Entity Relationship Model – Relational Data Model – Mapping Entity Relationship Model to
Relational Model – Relational Algebra – Structured Query Language – Database Normalization.

Database
A Database is a collection of related data organized in a way that data can be easily accessed,
managed and updated. Database can be software based or hardware based, with one sole
purpose, storing data
What is Database?
● A database is a data structure that stores organized information.

● Most databases contain multiple tables, which may each include several different fields.
o For example, a company database may include tables for products, employees, and
financial records.
o Each of these tables would have different fields that are relevant to the information
stored in the table.
o
DBMS (or) Database Management System

· A DBMS is a software that allows creation, definition and manipulation of database,


allowing users to store, process and analyse data easily.
· DBMS provides us with an interface or a tool, to perform various operations like creating
database, storing data in it, updating data, creating tables in the database and a lot more.
· DBMS also provides protection and security to the databases. It also maintains data
consistency in case of multiple users.
What is DBMS?
· Collection of interrelated data

· Set of programs to access the data

· DBMS contains information about a particular enterprise

· DBMS provides an environment that is both convenient and efficient to use.

Here are some examples of popular DBMS used these days:


MySql, Oracle, SQL Server, IBM DB2, PostgreSQL, Amazon SimpleDB (cloud based) etc.

Database Applications:

· Banking: all transactions


ENTITY – RELATIONSHIP MODEL (E-R MODEL)

1.1 Introduction
 The entity relationship model is a collection of basic objects called entities and
relationship among those objects.
 Entity-Relationship (ER) Model is based on the notion of real-world entities and relationships
among them.
 While formulating real-world scenario into the database model, the ER Model creates entity set,
relationship set, general attributes and constraints.
 ER Model is best used for the conceptual design of a database.

ER Model is based on
a) Entities and their attributes.
b) Relationships among entities.

 An entity is a thing or object in the real world that is distinguishable from other objects.

 Entity-relationship model is a model used for design and representation of relationships


between data.
 The main data objects are termed as Entities, with their details defined as attributes, some of
these attributes are important and are used to identity the entity, and different entities are
related using relationships.

1.2 Basics of ER Model


There are two techniques used for the purpose of data base designing from the system requirements and
they are:

a) Top down Approach known as Entity-Relationship Modeling


b) Bottom Up approach known as Normalization.

• The Entity-Relationship (ER) model is a top down approach of designing database.


• It is a graphical technique, which is used to convert the requirement of the system to graphical
representation, so that it can become well understandable.
• It also provides the framework for designing of database.
• The Entity-Relationship (ER) model was originally proposed by Peter in 1976 as a way to unify
the network and relational database views.
• Simply stated, the ER model is a conceptual data model that views the real world as entities and
relationships.
• A basic component of the model is the Entity-Relationship diagram, which is used to visually
represent data objects.

For the database designer, the utility of the ER model is:

• It maps well to the relational model. The constructs used in the ER model can easily be transformed
into relational tables.

• It is simple and easy to understand with a minimum of training. Therefore, the model can be used by
the database designer to communicate the design to the end user.

• In addition, the model can be used as a design plan by the database developer to implement a data
model in specific database management software.

1.3 Elements of E-R Model


The major elements or components of a ERD are the participating elements while creating it.

These concepts are explained below.

The ER elements are:


a) Entity and Entity Set
b) Attributes And Types of Attributes.
c) Keys
d) Relationships

a) Entity
An entity in an ER Model is a real-world entity having properties called
attributes. Every attribute is defined by its set of values called
domain.
o For example, in a school database, a student is considered as an entity. Student has
various attributes like name, age, class, etc.
o Entity Representation : A Simple rectangular box represents an Entity.
An Entity is generally a real-world object which has characteristics and holds
relationships in a DBMS.
If a Student is an Entity, then the complete dataset of all the students will be the
Entity Set
Entity set: The set of all entities of the same type is termed as an entity set.

Entity type:
An entity type defines a collection of entities that have the same attributes.

Example:
For a School Management Software, we will have to
store Student information, Teacher information, Classes, Subjects taught in each class etc.

Considering the above example, Student is an entity, Teacher is an entity,


similarly, Class, Subjectetc are also entities.

b) Relationship
Thelogicalassociationamongentitiesiscalled
relationship. Relationships are mapped with entities in
various ways.
c)
Mapping cardinalities define the number of association between two entities. Mapping cardinalities

o one to one
o one to many
o many to one
o many to many

ER- Diagram Notations


ER- Diagram is a visual representation of data that describe how data is related to each other.
 Rectangles: This symbol represent entity types
 Ellipses : Symbol represent attributes
 Diamonds: This symbol represents relationship types
 Lines: It links attributes to entity types and entity types with other relationship types
 Primary key: attributes are underlined
 Double Ellipses: Represent multi-valued attributes

ER Model: Attributes

Attributes
An Attribute describes a property or characterstic of an entity.
An entity is represented by a set of attributes. Attributes are descriptive properties possessed by each
member of an entity set.

Example:
A possible attributes of customer entity are customer name, customer id, Customer Street,
customer city.

For example, Name, Age, Address etc can be attributes of a Student. An attribute is
represented using eclipse.

Attributes for any Entity


Ellipse is used to represent attributes of any entity. It is connected to the entity.

Key Attribute
Key attribute represents the main characterstic of an Entity. It is
used to represent a Primary key.
Ellipse with the text underlined, represents Key Attribute.

Single valued and multi valued attributes


Single valued attributes: attributes with a single value for a particular entity are called single
valued attributes.
Multi valued attributes : Attributes with a set of value for a particular entity are called
multivalued attributes.

stored and derived attributes


Stored attributes: The attributes stored in a data base are called stored attributes.
Derived attributes: The attributes that are derived from the stored attributes are called derived
attributes.

Example :

If a Student is an Entity, then student's roll no., student's name, student's age,
student's gender etc will be its attributes.
An attribute can be of many types, here are different types of attributes defined in ER database model:
1. Simple attribute: The attributes with values that are atomic and cannot be broken down
further are simple attributes. For example, student's age.
2. Composite attribute: A composite attribute is made up of more than one simple attribute. For
example, student's address will contain, house no., street name, pincode etc.
Composite Attribute for any Entity
A composite attribute is the attribute, which also has attributes.

3. Derived attribute: These are the attributes which are not present in the whole database
management system, but are derived using other attributes. For example, average age of
students in a class.
Derived Attribute for any Entity
Derived attributes are those which are derived based on other attributes, for example, age can be
derived from date of birth.
To represent a derived attribute, another dotted ellipse is created inside the main ellipse.

4. Single-valued attribute: As the name suggests, they have a single value.


5. Multi-valued attribute: And, they can have multiple values.

Multivalued Attribute for any Entity


Double Ellipse, one inside another, represents the attribute which can have multiple values.

Key Attribute for any Entity


To represent a Key attribute, the attribute name inside the Ellipse is underlined.

Relationships
● A relationship is an association among several entities.

● When an Entity is related to another Entity, they are said to have a relationship.

Example: A depositor relationship associates a customer with each account that he/she has.

Example

For example, A Class Entity is related to Student entity, because students study in classes, hence
this is a relationship.
Depending upon the number of entities involved, a degree is assigned to relationships. For
example, if 2 entities are involved, it is said to be Binary relationship, if 3 entities are involved,
it is said to be Ternary relationship, and so on

There are three types of relationship that exist between Entities.


1. Binary Relationship
2. Recursive Relationship
3. Ternary Relationship

Relationship set
Relationship set : The set of all relationships of the same type is termed as a relationship set.

Relationships between Entities - Weak and Strong


Rhombus is used to setup relationships between two or more entities.

Degree of relationship set


The degree of relationship type is the number of participating entity types.
i) Key attribute
ii) Value set

Key attribute : An entity type usually has an attribute whose values are distinct from each
individual entity in the collection. Such an attribute is called a key attribute.

Value set: Each simple attribute of an entity type is associated with a value set that specifies the
set of values that may be assigned to that attribute for each individual entity.

Cardinality
Mapping cardinalities or cardinality ratios express the number of entities to which another entity
can be associated. Mapping cardinalities must be one of the following:
• One to one
• One to many
• Many to one
• Many to many
• While creating relationship between two entities, we may often need to face the cardinality
problem.

• This simply means that how many entities of the first set are related to how many entities of the
second set.

Cardinality can be of the following three types.

a) One-to-One

• Only one entity of the first set is related to only one entity of the second set. E.g. A teacher
teaches a student.
• Only one teacher is teaching only one student.

This can be expressed in the following diagram as:

b) One-to-Many

• Only one entity of the first set is related to multiple entities of the second set.

• E.g. A teacher teaches students. Only one teacher is teaching many students.

This can be expressed in the following diagram as:

c) Many-to-One

• Multiple entities of the first set are related to multiple entities of the second set.
E.g. Teachers teach a student.

• Many teachers are teaching only one student.

This can be expressed in the following diagram as:


d) Many-to-Many

• Multiple entities of the first set is related to multiple entities of the second set.
E.g. Teachers teach students.

• In any school or college many teachers are teaching many students.

• This can be considered as a two way one-to-many relationship.

This can be expressed in the following diagram as:

Weak and strong entity sets


Weak entity set: entity set that do not have key attribute of their own are called weak entity sets.
Strong entity set: Entity set that has a primary key is termed a strong entity set.
• Based on the concept of foreign key, there may arise a situation when we have to relate an entity
having a primary key of its own and an entity not having a primary key of its own.

• In such a case, the entity having its own primary key is called a strong entity and the entity not
having its own primary key is called a weak entity.

• Whenever we need to relate a strong and a weak entity together, the ERD would change just a
little.

Example:

• Say, for example, we have a statement “A Student lives in a Home.” STUDENT is obviously a
strong entity having a primary key Roll.

• But HOME may not have a unique primary key, as its only attribute Address may be shared by
many homes (what if it is a housing estate?).
• HOME is a weak entity in this case.

The ERD of this statement would be like the following

As you can see, the weak entity itself and the relationship linking a strong and weak entity must have
double border.

4.3 Advantages and Disadvantages of E-R Data Model

4.4.1 Advantages of E-R Data Model

Following are advantages of an E-R Model:

a) Straightforward relation representation: Having designed an E-R diagram for a database


application, the relational representation of the database model becomes relatively
straightforward.
b) Easy conversion for E-R to other data model: Conversion from E-R diagram to a network or
hierarchical data model can· easily be accomplished.
c) Graphical representation for better understanding: An E-R model gives graphical and
diagrammatical representation of various entities, its attributes and relationships between
entities. This is turn helps in the clear understanding of the data structure and in minimizing
redundancy and other problems.
4.4.2 Disadvantages of E-R Data Model
Following are disadvantages of an E-R Model:

a) No industry standard for notation: There is no industry standard notation for developing an
E-R diagram.
b) Popular for high-level design: The E-R data model is especially popular for high level.

Excercises:

1) Draw a E-R Diagram for Library Management System.


5. RELATIONAL DATA DMODEL

5.1 Introduction
 The Relational Model is a depiction of how each piece of stored information relates to the other
stored information.
 It shows how tables are linked, what type of links are between tables, what keys are used, what
information is referenced between tables.
 It's an essential part of developing a normalized database structure to prevent repeat and
redundant data storage.
 The basic idea behind the relational model is that a database consists of a series of unordered
tables (or relations) that can be manipulated using non-procedural operations that return tables.
 The RELATIONAL database model is based on the Relational Algebra, set theory and predicate
logic.
It is commonly thought that the word relational in the relational model comes from the fact that you
relate together tables in a relational database.

 Relational model stores data in the form of tables.


 This concept purposed by Dr. E.F. Codd, a researcher of IBM in the year 1960s.

What is Relational Model?


The relational model uses a collection of tables to represent both data and the relationships among
those data.
 Relational Model represents how data is stored in Relational Databases.
 A relational database stores data in the form of relations (tables).

 The relational model represents the database as a collection of relations.


 A relation is nothing but a table of values. Every row in the table represents a collection of
related data values.
 These rows in the table denote a real-world entity or relationship.
 The table name and column names are helpful to interpret the meaning of values in each row.
 The data are represented as a set of relations. In the relational model, data are stored as tables.
However, the physical storage of the data is independent of the way the data are logically
organized.
 Attribute, Tables, Tuple, Relation Schema, Degree, Cardinality, Column, Relation
instance, are some important components of Relational Model
 Insert, Select, Modify and Delete are operations performed in Relational Model

Basic rules on Relational


 Data need to be represented as a collection of relations
 Each relation should be depicted clearly in the table
 Rows should contain data about instances of an entity
 Columns must contain data about attributes of the entity
 Cells of the table should hold a single value
 Each column should be given a unique name
 No two rows can be identical
 The values of an attribute should be from the same domain

Example:

Consider a set of Employees in a company. Each employee has 2 –levels of information

● The various tables in the database have a set of tuples.

● The columns enumerate the various attributes of the entity (the employee's name, address
or phone number, for example), and a row is an actual instance of the entity (a specific
employee) that is represented by the relation.
● As a result, each tuple of the employee table represents various attributes of a single
employee.

Tuple and attribute.


· Attributes: column headers

· Tuple : Row

5.2 Components of Relational Model

The relational model consists of three major components:

1. The set of relations and set of domains that defines the way data can be represented (data structure).

2. Integrity rules that define the procedure to protect the data (data integrity).

3. The operations that can be performed on data (data manipulation).

Relational Database
A rational model database is defined as a database that allows you to group its data items into one or
more independent tables that can be related to one another by using fields common to each related
table.

Characteristics of Relational Database

Relational database systems have the following characteristics:

a) The whole data is conceptually represented as an orderly arrangement of data into rows and
columns, called a relation or table.
b) .All values are scalar. That is, at any given row/column position in the relation there is one and
only one value.

● . All operations are performed on an entire relation and result is an entire relation, a concept
known as closure.

● Dr. Codd, when formulating the relational model, chose the term "relation" because it vas
comparatively free of connotations, unlike, for example, the word "table".
● It is a common misconception that the relational model is so called because relationships are
established between tables.
● In fact, the name is derived from the relations on whom it is based.

● Notice that the model requires only that data be conceptually represented as a relation, it does
not specify how the data should be physically implemented.
● A relation is a relation provided that it is arranged in row and column format and its values are
scalar.
● Its existence is completely independent of any physical representation.

5.3 Basic Terminology used in Relational Model

The figure shows a relation with the. Formal names of the basic components marked the entire structure
is, as we have said, a relation.

a) Tuples of a Relation

Each row of data is a tuple. Actually, each row is an n-tuple, but the "n-" is usually dropped.

b) Cardinality of a relation: The number of tuples in a relation determines its cardinality. In this
case, the relation has a cardinality of 4.
c) Degree of a relation: Each column in the tuple is called an attribute. The number of attributes
in a relation determines its degree. The relation in figure has a degree of 3.
d) Domains: A domain definition specifies the kind of data represented by the attribute.

● More- particularly, a domain is the set of all possible values that an attribute may validly
contain.
● Domains are often confused with data types, but this is inaccurate.

● Data type is a physical concept while domain is a logical one. "Number" is a data type
and "Age" is a domain.
● To give another example "StreetName" and "Surname" might both be represented as text
fields, but they are obviously different kinds of text fields; they belong to different
domains.
● Domain is also a broader concept than data type, in that a domain definition includes a
more specific description of the valid data.

For example, the domain Degree A warded, which represents the degrees awarded by a university.

In the database schema, this attribute might be defined as Text [3], but it's not just any three- character
string, it's a member of the set {BA, BS, MA, MS, PhD, LLB, MD}.

Of course, not all domains can be defined by simply listing their values. Age, for example, contains a
hundred or so values if we are talking about people, but tens of thousands if we are talking about
museum exhibits.

In such instances it's useful to define the domain in terms of the rules, which can be used to determine
the membership of any specific value in the set of all valid values.

For example, Person Age could be defined as "an integer in the range 0 to 120" whereas Exhibit Age
(age of any object for exhibition) might simply by "an integer equal to or greater than 0."

Body of a Relation: The body of the relation consists of an unordered set of zero or more tuples.

There are some important concepts here.

a) First the relation is unordered. Record numbers do not apply to relations.


b) Second a relation with no tuples still qualifies as a relation.
c) Third, a relation is a set.

The items in a set are, by definition, uniquely identifiable.

Therefore, for a table to qualify as a relation each record must be uniquely identifiable and the table
must contain no duplicate records.

Keys of a Relation

It is a set of one or more columns whose combined values are unique among all occurrences in a given
table.

A key is the relational means of specifying uniqueness. Some different types of keys are:

a) Primary key is an attribute or a set of attributes of a relation which posses the properties of
uniqueness and irreducibility (No subset should be unique).

For example: Supplier number in S table is primary key, Part number in P table is primary key
and the combination of Supplier number and Part Number in SP table is a primary key

b) Foreign key is the attributes of a table, which refers to the primary key of some another table.
 Foreign key permit only those values, which appears in the primary key of the table to which it
refers or may be null (Unknown value).
 For example: SNO in SP table refers the SNO of S table, which is the primary key of S table, so
we can say that SNO in SP table is the foreign key.
 PNO in SP table refers the PNO of P table, which is the primary key of P table, so we can say
that PNO in SP table is the foreign key.
 The database of Customer-Loan, which we discussed earlier for hierarchical model and network
model, is now represented for Relational model as shown.
 In can easily understood that, this model is very simple and has no redundancy.
 The total database is divided in to two tables. Customer table contains
the information about the customers with CNO as the primary key.
 The Cutomer_Loan table stores the information about CNO, LNO and AMOUNT.
 It has the primary key combination of CNO and LNO.
 Here, CNO also acts as the foreign key and refers to CNO of Customer table.

 It means, only those customer number are allowed in transaction table Cutomer_Loan that have
their entry in the master Customer table.
Relational View of Sample database

Let us take an example of a sample database consisting of supplier, parts and shipments tables. The
table structure and some sample records for supplier, parts and shipments tables are given as Tables as
shown below:

 We assume that each row in Supplier table is identified bya unique SNo (Supplier Number),
which uniquely identifies the entire row of the table.Likewise each part has a unique PNo (Part
Number).
 Also, we assume that no more thanone shipment exists for a given supplier/part combination_in
the shipments table.
 Note that the relations Parts and Shipments have PNo (Part Number) in common andSupplier
and Shipments relations have SNo (Supplier Number) in common.
 The Supplier andParts relations have City in common.
 For example, the fact that supplier S3 and part P2 arelocated in the same city is represented by
the appearance of the same value, Amritsar, in thecity column of the two tuples in relations.
5.4 Operations in Relational Model

The four basic operations in Relational Models and they are as follows:

a) Insert
b) Update
c) Delete
d) Retrieve

The four operations are shown below on the sample database in relational model:

a) Insert Operation:
● Suppose we wish to insert the information of supplier who does not supply any part, can be
inserted in S table without any anomaly e.g. S4 can be inserted in Stable.
● Similarly, if we wish to insert information of a new part that is not supplied by any supplier can
be inserted into a P table.
● If a supplier starts supplying any new part, then this information can be stored in shipment table
SP with the supplier number, part number and supplied quantity.
● So, we can say that insert operations can be performed in all the cases without any anomaly.

b) Update Operation:
● Suppose supplier S1 has moved from Qadian to Jalandhar.

● In that case we need to make changes in the record, so that the supplier table is up-to- date.

● Since supplier number is the primary key in the S (supplier) table, so there is only a single entry
of S 1, which needs a single update and problem of data inconsistencies would not arise.
● Similarly, part and shipment information can be updated by a single modification in the tables P
and SP respectively without the problem of inconsistency.
● Update operation in relational model is very simple and without any anomaly in case of
relational model.
c) Delete Operation:
● Suppose if supplier S3 stops the supply of part P2, then we have to delete the shipment
connecting part P2 and supplier S3 from shipment table SP.
● This information can be deleted from SP table without affecting the details of supplier of S3 in
supplier table and part P2 information in part table.
● Similarly, we can delete the information of parts in P table and their shipments in SP table and
we can delete the information suppliers in S table and their shipments in SP table.

d) Record Retrieval:

Record retrieval methods for relational model are simple and symmetric which can be clarified with
the following queries:

Query1: Find the supplier numbers for suppliers who supply part P2.

Solution: In order to get this information we have to search the information of part P2 in the SP table
(shipment table). For this a loop is constructed to find the records of P2 and on getting the records,
corresponding supplier numbers are printed.

Algorithm

do until no more shipments;

get next shipment where PNO=P2;

print SNO;

end;

Advantages and Disadvantages of Relational Model


Advantages of using Relational model
a) Simplicity: A relational data model is simpler than the hierarchical and network model.
b) Structural Independence: The relational database is only concerned with data and not with a
structure. This can improve the performance of the model.
c) Easy to use: The relational model is easy as tables consisting of rows and columns is quite
natural and simple to understand
d) Query capability: It makes possible for a high-level query language like SQL to avoid
complex database navigation.
e) Data independence: The structure of a database can be changed without having to
change any application.
f) Scalable: Regarding a number of records, or rows, and the number of fields, a database
should be enlarged to enhance its usability.

Disadvantages of Relational Model

● The relational model's disadvantages are very minor as compared to the advantages and their
capabilities far outweigh the shortcomings
● Also, the drawbacks of the relational database systems could be avoided if proper corrective
measures are taken.
● The drawbacks are not because of the shortcomings in the database model, but the way it is
being implemented.
6. MAPPING FROM ER MODEL TO RELATIONAL MODEL

6.1 Introduction

• The ER Model can be represented using ER Diagrams which is a great way of designing and
representing the database design in more of a flow chart form.
• It is very convenient to design the database
– using the ER Model by creating an ER diagram and
– later on converting it into relational model to design your tables.
• Not all the ER Model constraints and components can be directly transformed into
relational model, but an approximate schema can be derived.
• The basic idea on Real world scenario into ER Model and to Relational Model is
depicted as follows:

6.2 Basic rules of Conversion of ER diagrams into relational model schema

1) Entity becomes Table


● Entity in ER Model is changed into tables, or we can say for every Entity in ER
model, a table is created in Relational Model.
2) The attributes of the Entity should be converted to columns of the table.

3) The primary key specified for the entity in the ER model, will become the primary key for the
table in relational model.

Steps to Create an ERD


Following are the steps to create an ERD:
Step 1) Entity Identification
Step 2) Relationship Identification
Step 3) Cardinality Identification
Step 4) Identify Attributes
Step 5) Create the ERD

The following diagram depicts those 5 sequential steps pictorially:

Example:

A real time scenario : UNIVERSITY ENVIRONMENT

In a university, a Student enrolls in Courses. A student must be assigned to at least one or more
Courses. Each course is taught by a single Professor. To maintain instruction quality, a Professor can
deliver only one course
Example:For example, in a University database, we might have entities for Students, Courses, and
Lecturers. Students entity can have attributes like Rollno, Name, and DeptID. They might have
relationships with Courses and Lecturers.
Step 1) Entity Identification
We have three entities
 Student
 Course
 Professor

Step 2) Relationship Identification


We have the following two relationships
 The student is assigned a course
 Professor delivers a course
Step 3) Cardinality Identification
For them problem statement we know that,
 A student can be assigned multiple courses
 A Professor can deliver only one course

Step 4) Identify Attributes


 First study the files, forms, reports, data currently maintained by the organization to
identify attributes.
 Conduct interviews with various stakeholders to identify entities. Initially, it's important to
identify the attributes without mapping them to a particular entity.
 Once, you have a list of Attributes, you need to map them to the identified entities.
Ensure an attribute is to be paired with exactly one entity. If you think an attribute should
belong to more than one entity, use a modifier to make it unique.
 Once the mapping is done, identify the primary Keys. If a unique key is not readily
available, create one.
Entity Primary Key Attribute

Student Student_ID StudentName

Professor Employee_ID ProfessorName

Course Course_ID CourseName


For Course Entity, attributes could be Duration, Credits, Assignments, etc. For the sake of ease we
have considered just one attribute.

Step 5) Create the ERD


A more modern representation of ERD Diagram

Mapping Process Guidelines


i. Create tables for all higher-level entities.
ii. Create tables for lower-level entities.
iii. Add primary keys of higher-level entities in the table of lower-level entities.
iv. In lower-level tables, add all other attributes of lower-level entities.
v. Declare primary key of higher-level table and the primary key for lower-level table.
vi. Declare foreign key constraints.

Mapping Process for newly created table(s) from relationship mapping process
i. Create table for a relationship.
ii. Add the primary keys of all participating Entities as fields of table with their respective data
types.
iii. If relationship has any attribute, add each attribute as field of table.
iv. Declare a primary key composing all the primary keys of participating entities.
v. Declare all foreign key constraints.
Note:
· Similarly we can generate relational database schema using the ER diagram.

· We cannot import all the ER constraints into relational model, but an approximate schema can
be generated.
· There are several processes and algorithms available to convert ER Diagrams into Relational
Schema.

· Some of them are automated and some of them are manual.

A special scenario

Mapping Process for entity with Weak entities


 Create table for weak entity set.
 Add all its attributes to table as field.
 Add the primary key of identifying entity set.
 Declare all foreign key constraints.
Mapping Hierarchical Entities
ER specialization or generalization comes in the form of hierarchical entity sets.

6.3 Points to Remember

Following are some key points to keep in mind while doing so:
a) Entity gets converted into Table, with all the attributes becoming fields(columns) in the table.
b) Relationship between entities is also converted into table with primary keys of the
related entities also stored in it as foreign keys.
c) Primary Keys should be properly set.
d) For any relationship of Weak Entity, if primary key of any other entity is included in a table,
foriegn key constraint must be defined.
RELATIONAL ALGEBRA:
Relational Algebra is a formal language used to describe operations on relational database tables. It
provides a theoretical foundation for querying and manipulating relational databases. These operations
help users retrieve, filter, combine, and transform data stored in relational database systems. The results of
these operations are also tables, allowing for further manipulation and analysis.

Here are some fundamental operations in Relational Algebra:

1. **Selection (σ):** This operation selects rows from a table that satisfy a given condition. It is denoted
by the Greek letter σ (sigma). For example, selecting all employees with a certain job title: σ(JobTitle =
'Manager')(Employees).

2. **Projection (π):** This operation selects specific columns from a table while eliminating duplicates. It
is denoted by the Greek letter π (pi). For example, selecting only the names and ages of employees:
π(Name, Age)(Employees).

3. **Union ():** This operation combines the rows of two tables with the same schema, eliminating duplicates. For ex

4. **Intersection ():** This operation returns only the rows that are present in both tables, again with the same schema

5. **Difference (- or \):** This operation returns the rows present in the first table but not in the second.
For example, finding customers who bought product A but not product B: CustomersWhoBoughtA -
CustomersWhoBoughtB.

6. **Cartesian Product (×):** This operation combines every row from the first table with every row from
the second table, resulting in a new table with all possible combinations of rows. It is used less frequently
due to its potential for generating large results.

7. **Join ():** This operation combines rows from two or more tables based on a common attribute. Different types of

8. **Renaming (ρ):** This operation is used to rename relations or attributes. For example, renaming the
"EmployeeName" attribute to "Name" in the Employees table: ρ(Name/EmployeeName)(Employees).

These operations can be combined to form more complex queries. Relational Algebra serves as the
theoretical basis for query languages like SQL (Structured Query Language) that are used to interact with
relational databases. It helps database developers and users understand the underlying principles of
querying and manipulating data in a relational database management system (DBMS).

7. SQL (Structured Query Language)

7.1 Introduction

● SQL is a programming language for Relational Databases.

● It is designed over relational algebra and tuple relational calculus.

● SQL comes as a package with all major distributions of RDBMS.

● SQL comprises both data definition and data manipulation languages.

● Using the data definition properties of SQL, one can design and modify database schema,
whereas data manipulation properties allows SQL to store and retrieve data from database.
● Two classes of languages
o Procedural – user specifies what data is required and how to get those data
o Nonprocedural – user specifies what data is required without specifying how to get
those data.
● SQL is the most widely used query language.

7.1.1 SQL- What is SQL?

• SQL is a standard language for storing, manipulating and retrieving data in databases.
• SQL, Structured Query Language, is a programming language designed to manage data stored
in relational databases.
• SQL operates through simple, declarative statements.
7.1.2. Capabilities of SQL

• SQL can
– execute queries against a database
– retrieve data from a database
– insert records in a database
– update records in a database
– delete records from a database
– create new databases
– create new tables in a database
– create stored procedures in a database
– create views in a database
– set permissions on tables, procedures, and views

7.1.3. SQL at a Glance


• SQL: widely used non-procedural language
– E.g. find the name of the customer with customer-id 192-83-7465
select customer.customer-name
from customer
where customer.customer-id = ‘192-83-7465’
– E.g. find the balances of all accounts held by the customer with customer-id 192- 83-
7465
select account.balance
from depositor, account
where depositor.customer-id = ‘192-83-7465’ and
depositor.account-number = account.account-number
• Application programs generally access databases through one of
– Language extensions to allow embedded SQL
– Application program interface (e.g. ODBC/JDBC) which allow SQL queries to be sent
to a database
• This keeps data accurate and secure, and it helps maintain the integrity of databases,
regardless of size.
• SQL became a standard of the American National Standards Institute (ANSI) in 1986,
and of the International Organization for Standardization (ISO) in 1987.
• SQL used in DBMS are:
• MySQL, SQL Server, MS Access, Oracle, Sybase, Informix, Postgres, and other
database systems.

7.1.4 SQL Types

• There are 4 types of SQL statements :


1) DDL ( Data Definition Language)
2) DML ( Data Manipulation Language )
3) TCL ( Transaction Control Language)
4) DCL ( Data Control Language)

1) Data Definition Language (DDL)


• Data Definition Language (DDL) statements are used to define the database structure or
schema.

• DDL compiler generates a set of tables stored in a data dictionary


• Data dictionary contains metadata (i.e., data about data)
– database schema
– Data storage and definition language
• language in which the storage structure and access methods used by the
database system are specified
• Usually an extension of the data definition language

List of DDL commands:


1) CREATE - to create objects (Database/Table/View …) in the database.
2) ALTER - alters the structure of the database (Database/Table/View …).
3) TRUNCATE - remove all records from a table, including all spaces allocated for the
records are removed.
• deletes all the records in the table not the structure.
4) DROP - delete objects from the database(Database/Table/View …) .

5) COMMENT - add comments to the data dictionary.


• A Non-executable statement for giving comments
6) RENAME - rename an object.
• Allows to change the name of table /View.

1) CREATE TABLE
• CREATE TABLE creates a new table in the database.
• It allows you to specify the name of the table and the name of each column in the table.
• Specification notation for defining the database schema
• Also databases, and views from RDBMS.

Syntax :
• CREATE TABLE table_name ( column_1 datatype, column_2 datatype, column_3
datatype );

Example 1:
Create database tutorials;
Create table article; Create
view for_students;

Example 2:
Create database bank;

create table account (


account-number char(10),
balance integer)
2) ALTER TABLE

• ALTER TABLE table_name ADD column_name datatype;


• ALTER TABLE table_name
ADD column_name datatypeor
• ALTER TABLE table_name
DROP COLUMN column_name

3) TRUNCATE TABLE

• TRUNCATE command will delete all the records of a selected table and the structure will
be remain.
• TRUNCATE TABLE table_name;
4) DROP TABLE

• DROP command will delete the structure and all the records of a selected table.
• Drops commands, views, tables, and databases from RDBMS.
Syntax :
• DROP TABLE table_name
• Drop object_type object_name;

Example:

Drop database tutorials;


Drop table article;
Drop view for_students;

5) COMMENT
Nal info and meaning to the statements..
Syntax :
COMMENT text;

Example:

COMMENT ‘ EB SYSTEM’;
6) RENAME
To rename or change the name of the given object.
Syntax:
RENAME old_object_name TO new_object_name;
Example:
RENAME emp TO employee;

This statement change the table from emp into employee.


2) Data Manipulation Language
● SQL is equipped with data manipulation language (DML). DML modifies the
database instance by inserting, updating and deleting its data.
● DML is responsible for all forms data modification in a database.
Language for accessing and manipulating the data organized by the appropriate data model
– DML also known as query language

• Data Manipulation Language (DML) statements are used for managing data within
schema objects.
TYPES:
1) SELECT - retrieve data from the a database
2) INSERT - insert data into a table
3) UPDATE - updates existing data within a table,
4) DELETE - deletes all records from a table, the space for the records remain

SQL contains the following set of commands in its DML section


a) SELECT/FROM/WHERE
b) INSERT INTO/VALUES
c) UPDATE/SET/WHERE
d) DELETE FROM/WHERE

These basic constructs allow database programmers and users to enter data and information into
the database and retrieve efficiently using a number of filter options.

a) SELECT/FROM/WHERE
The SQL SELECT Statement

• The SELECT statement is used to select data from a database.


• The data returned is stored in a result table, called the result-set.

SELECT Syntax
– SELECT column1, column2, ...
FROM table_name
WHERE condition
ORDER BY column1, column2, ... ASC|DESC
GROUP BY column_name
HAVING condition;
– Here, column1, column2, ... are the field names of the table you want to select data
from. If you want to select all the fields available in the table,

Use the following syntax for simple query:


– SELECT * FROM table_name;

Different Clauses of SELECT Command

a)
SELECT clause This is one of the fundamental query command of SQL. It is similar to the projection op

b)
FROM from clause This clause takes a relation name as an argument from which attributes are to be sele

c)
WHERE clause This clause defines predicate or conditions, which must match in order to qualify the att

Example:
Select author_name
From book_author
Where age > 50;
This command will yield the names of authors from the relation book_author whose age is greater
than 50.

b) INSERT INTO/VALUES
This command is used for inserting values into the rows of a table (relation).
• The INSERT INTO statement is used to insert new records in a table.
INSERT INTO Syntax
• It is possible to write the INSERT INTO statement in two ways.
• The first way specifies both the column names and the values to be inserted:
– INSERT INTO table_name (column1, column2, column3, ...)
VALUES (value1, value2, value3, ...);
• If you are adding values for all the columns of the table, you do not need to
specify the column names in the SQL query.
– INSERT INTO table_name
VALUES (value1, value2, value3, ...);

Example:

INSERT INTO tutorials (Author, Subject) VALUES ("anonymous", "computers");

c) UPDATE/SET/WHERE
This command is used for updating or modifying the values of columns in a table (relation).
• The UPDATE statement is used to modify the existing records in a table.

UPDATE Syntax
• UPDATE table_name
SET column1 = value1, column2 = value2, ...
WHERE condition;
• Note:
– Be careful when updating records in a table! Notice the WHERE clause in the
UPDATE statement.
– The WHERE clause specifies which record(s) that should be updated. If you omit the
WHERE clause, all records in the table will be updated!

Example:
UPDATE tutorials SET Author="webmaster" WHERE Author="anonymous";

d) DELETE/FROM/WHERE
This command is used for removing one or more rows from a table (relation). The
DELETE statement is used to delete existing records in a table.

DELETE Syntax
• DELETE FROM table_name [WHERE condition];
• Note:
– Be careful when deleting records in a table! Notice the WHERE clause in the
DELETE statement.
– The WHERE clause specifies which record(s) should be deleted. If you omit the
WHERE clause, all records in the table will be deleted!

Example:
DELETE FROM tutorials
WHERE Author="unknown";

3) TCL (Transaction Control Language) Commands

• Transaction Control (TCL) statements are used to manage the changes made by DML
statements.
• It allows statements to be grouped together into logical transactions.
• A transaction is a sequence of SQL statements that Oracle treats as a single unit.
• Results of Data Manipulation Language (DML) are not permanently updated to table until
explicit or implicit COMMIT occurs
• Transaction control statements can:
• Commit data through COMMIT command
• Undo data changes through ROLLBACK command

a) COMMIT - save work done


b) SAVEPOINT - identify a point in a transaction to which you can later roll back
c) ROLLBACK - restore database to original since the last COMMIT
d) SET TRANSACTION - Change transaction options like isolation level and what
rollback segment to use

a) COMMIT

• Explicit COMMIT occurs by executing COMMIT;


• Implicit COMMIT occurs when DDL command is executed or user properly exits
system.
• Permanently updates table(s) and allows other users to view changes.
• This statement also erases all savepoints in the transaction and releases the transaction's locks.

Syntax

• COMMIT [ WORK ]
• Where WORK is supported for compliance with standard SQL.
• The statements COMMIT and COMMIT WORK are equivalent.

b) ROLLBACK
• Used to “undo” changes that have not been committed
• Occurs when:
– ROLLBACK; is executed
– System restarts after crash

Syntax

• ROLLBACK [ WORK | TO savepoint ]


• Where WORK is optional.
• If savepoint name is given, rolls back the current transaction to the specified savepoint. If you
omit this clause, the ROLLBACK statement rolls back the entire transaction.

Note:
Using ROLLBACK without the TO SAVEPOINT clause performs the following
operations:
a) Ends the transaction.
b) Undoes all changes in the current transaction
c) Erases all savepoints in the transaction
d) Releases the transaction's locks

Using ROLLBACK with the TO SAVEPOINT clause performs the following operations:
a) Rolls back just the portion of the transaction after the savepoint.
b) Erases all savepoints created after that savepoint. The named savepoint is retained, so you
can roll back to the same savepoint multiple times. Prior savepoints are also retained.
c) Releases all table and row locks acquired since the savepoint. Other transactions that have
requested access to rows locked after the savepoint must continue to wait until the transaction
is committed or rolled back. Other transactions that have not already requested the rows can
request and access the rows immediately.
Examples

Create table temp_table (t1 number(4));


Rollback;
Describe temp_table Insert
into temp_table (t1)
Values(10);
Select * from temp_table;
Commit;
• A normal exit from most Oracle utilities and tools causes the current transaction to be
committed. ( by giving Quit SQL * PLUS command).
• If the transaction do not explicitly committed and the program terminates abnormally, the
last uncommitted transaction is automatically rolled back.

c) SAVEPOINT
• Identifies a point in a transaction to which you can later roll back.

Syntax
• SAVEPOINT save_point;
• Where save_point is the name of the savepoint to be created.

Example:

• To update BLAKE's and CLARK's salary, check that the total company salary does not
exceed 2,7,00, then reenter CLARK's salary,
enter:

UPDATE emp SET sal = 2000 WHERE ename = 'BLAKE';


SAVEPOINT blake_sal;
UPDATE emp SET sal = 1500 WHERE ename = 'CLARK';
SAVEPOINT clark_sal; SELECT
SUM(sal) FROM emp;
ROLLBACK TO SAVEPOINT blake_sal;
UPDATE emp SET sal = 1200 WHERE ename = 'CLARK'; COMMIT;

4) Data Control Language (DCL)

• Data Control Language (DCL) statements gives permission(s) and if not necessary
revokes or collect back those granted permission(s).
TYPES:-
a) GRANT - gives user's access privileges to database
b) REVOKE - withdraw access privileges given with the GRANT command

a) GRANT statement

This command is related to access right and /or revoking to / from various objects of
DBMS.

Syntax:
GRANT privilege_name TO user;
This command gives access right called only CREATE privilege to the user scott.
Example:

GRANT CREATE TO scott;


b) REVOVE statement
This command is related to revoking privilege(s) from various objects of DBMS.

Syntax:
REVOKE privilege_name FROM usert;
This command gives access right called only CREATE privilege to the user scott.

Example:

REVOKE CREATE FROM scott;

Thus the 4 types of SQL statements are used by any DBMS can able to complete its major
tasks.

NORMALIZATION
Introduction
What is Normalization?
Normalization is a systematic approach of decomposing tables to eliminate data redundancy
(repetition) and undesirable characteristics like Insertion, Update and Deletion Anamolies. It is a multi-
step process that puts data into tabular form, removing duplicated data from the relation tables.

● Database Normalization is a technique of organizing the data in the database.


Normalization is a systematic approach of decomposing tables to eliminate data
redundancy(repetition) and undesirable characteristics like Insertion, Update and
Deletion Anamolies.
● It is a multi-step process that puts data into tabular form, removing duplicated data from the
relation tables.is a technique of organizing the data in the database.
● Normalization is a systematic approach of decomposing tables to eliminate data
redundancy(repetition) and undesirable characteristics like Insertion, Update and
Deletion Anamolies.
● It is a multi-step process that puts data into tabular form, removing duplicated data from the
relation tables.
What do you mean by normalization?
In relational database design, the process of organizing data to minimize redundancy is called
Normolization..
Normalization usually involves dividing a database into two or more tables and defining
relationships between the tables.

Normalization is the process of reducing the duplication of the data."NF" refers to "normal
form" .

Normalization is a process of organizing the data in database to avoid data redundancy,


insertion anomaly, update anomaly & deletion anomaly.

The three main types of normalization are 1NF,2NF,3NF.


Normalization is also known as data normalization.
Normalization is a data compression idea. Basically you do not want store duplicated
information in a database.

Purpose:
Normalization is used for mainly two purposes,
 Eliminating reduntant(useless) data.
 Ensuring data dependencies make sense i.e data is logically stored.

7.2 Types of Normalization


● Normalization usually involves dividing a database into two or more tables and defining
relationships between the tables.
● The objective is to isolate data so that additions, deletions, and modifications of a field can be made
in just one table and then propagated through the rest of the database via the defined relationships.
There are three main normal forms, each with increasing levels of normalization:

a) First Normal Form (1NF):


Each field in a table contains different information. For example, in an employee list, each
table would contain only one birthdate field.
What is 1NF in DBMS?

First Normal Form (1NF)

Rule : A table is said to be in First Normal Form (1NF) if and only if each attribute of the
relation is atomic.
That is, Each row in a table should be identified by primary key (a unique column value or group
of unique column values) No rows of data should have repeating group of column values.
An attribute (column) of a table cannot hold multiple values. It should hold only atomic values.

For a table to be in the First Normal Form, it should follow the following 4 rules:
1. It should only have single(atomic) valued attributes/columns.
2. Values stored in a column should be of the same domain
3. All the columns in a table should have unique names.
4. And the order in which data is stored, does not matter.

Example: Suppose a company wants to store the names and contact details of its employees. It creates a
table that looks like this:

emp_id emp_name emp_address emp_mobile

101 Herschel New Delhi 8912312390


102 Jon Kanpur 8812121212

103 Ron Chennai 7778881212

104 Lester Bangalore 9990000123

Two employees (Jon & Lester) are having two mobile numbers so the company stored them in the same
field as you can see in the table above.
This table is not in 1NF as the rule says “each attribute of a table must have atomic (single)
values”, the emp_mobile values for employees Jon & Lester violates that rule.
To make the table complies with 1NF we should have the data like this:

emp_id emp_name emp_address emp_mobile

101 Herschel New Delhi 8912312390

102 Jon Kanpur 8812121212

102 Jon Kanpur 9900012222

103 Ron Chennai 7778881212

104 Lester Bangalore 9990000123

104 Lester Bangalore 8123450987

b) Second Normal Form (2NF):


Each field in a table that is not a determiner of the contents of another field must itself be a function
of the other fields in the table.
For a table to be in the Second Normal Form,
1) It should be in the First Normal form.
2) And, it should not have Partial Dependency.

In other words,
● Table is in 1NF (First normal form)
● No non-prime attribute is dependent on the proper subset of any candidate key of table.
 For a table to be in the Second Normal form, it should be in the First Normal form and it should
not have Partial Dependency.
 Partial Dependency exists, when for a composite primary key, any attribute in the table
depends only on a part of the primary key and not on the complete primary key.
 To remove Partial dependency, we can divide the table, remove the attribute which is
causing partial dependency, and move it to some other table where it fits in well.
Example:
Suppose a school wants to store the data of teachers and the subjects they teach.
They create a table that looks like this: Since a teacher can teach more than one subjects, the table
can have multiple rows for a same teacher.
teacher_id subject teacher_age

111 Maths 38

111 Physics 38

222 Biology 38

333 Physics 40

333 Chemistry 40

Candidate Keys: {teacher_id, subject}


Non prime attribute: teacher_age

 The table is in 1 NF because each attribute has atomic values.


 However, it is not in 2NF because non prime attribute teacher_age is dependent on
teacher_id alone which is a proper subset of candidate key.
 This violates the rule for 2NF as the rule says “no non-prime attribute is dependent on the
proper subset of any candidate key of the table”.

To make the table complies with 2NF we can break it in two tables like this:
teacher_details table:
teacher_id teacher_age

111 38

222 38

333 40

teacher_subject table:
teacher_id subject

111 Maths

111 Physics

222 Biology
333 Physics

333 Chemistry

Now the tables comply with Second normal form (2NF).

Example: Suppose a school wants to store the data of teachers and the subjects they teach. They
create a table that looks like this: Since a teacher can teach more than one subjects, the table can have
multiple rows for a same teacher.

teacher_id Subject teacher_age

111 Maths 38

111 Physics 38

222 Biology 38

333 Physics 40

333 Chemistry 40

Candidate Keys: {teacher_id, subject}

Non prime attribute: teacher_age


The table is in 1 NF because each attribute has atomic values. However, it is not in 2NF because non
prime attribute teacher_age is dependent on teacher_id alone which is a proper subset of candidate
key. This violates the rule for 2NF as the rule says “no non-prime attribute is dependent on the proper
subset of any candidate key of the table”.

To make the table complies with 2NF we can break it in two tables like this:
teacher_details table:
teacher_id teacher_age

111 38

222 38

333 40
teacher_subject table:
teacher_id Subject

111 Maths

111 Physics

222 Biology

333 Physics

333 Chemistry
Now the tables comply with Second normal form (2NF).
c) Third Normal Form (3NF):
No duplicate information is permitted.

A table design is said to be in 3NF if both the following conditions hold:


Table must be in 2NF

● Transitive functional dependency of non-prime attribute on any super key should be removed.

● An attribute that is not part of any candidate key is known as non-prime attribute.

In other words 3NF can be explained like this: A table is in 3NF if it is in 2NF and for each functional
dependency X-> Y at least one of the following conditions hold:
● X is a super key of table

● Y is a prime attribute of table

An attribute that is a part of one of the candidate keys is known as prime attribute.

Example: Suppose a company wants to store the complete address of each employee, they create
a table named employee_details that looks like this:
emp_id emp_name emp_zip emp_state emp_city emp_district

1001 John 282005 UP Agra Dayal Bagh

1002 Ajeet 222008 TN Chennai M-City

1006 Lora 282007 TN Chennai Urrapakkam

1101 Lilly 292008 UK Pauri Bhagwan

1201 Steve 222999 MP Gwalior Ratan


Super keys: {emp_id}, {emp_id, emp_name}, {emp_id, emp_name, emp_zip}…so on
Candidate Keys: {emp_id}
Non-prime attributes: all attributes except emp_id are non-prime as they are not part of any
candidate keys.
Here, emp_state, emp_city & emp_district dependent on emp_zip. And, emp_zip is dependent on
emp_id that makes non-prime attributes (emp_state, emp_city & emp_district) transitively dependent
on super key (emp_id). This violates the rule of 3NF.
To make this table complies with 3NF we have to break the table into two tables to remove the transitive
dependency:
employee table:
emp_id emp_name emp_zip

1001 John 282005

1002 Ajeet 222008

1006 Lora 282007

1101 Lilly 292008

1201 Steve 222999


employee_zip table:

emp_zip emp_state emp_city emp_district

282005 UP Agra Dayal Bagh

222008 TN Chennai M-City

282007 TN Chennai Urrapakkam

292008 UK Pauri Bhagwan

222999 MP Gwalior Ratan

c) Boyce Codd normal form (BCNF)


It is an advance version of 3NF that’s why it is also referred as 3.5NF. BCNF is stricter than 3NF. A
table complies with BCNF if it is in 3NF and for every functional dependency X->Y, X should be the
super key of the table.

Example: Suppose there is a company wherein employees work in more than one department.
They store the data like this:
emp_id emp_nationality emp_dept dept_type dept_no_of_emp
1001 Austrian Production and planning D001 200

1001 Austrian stores D001 250

1002 American design and technical support D134 100

1002 American Purchasing department D134 600

Functional dependencies in the table above:


emp_id -> emp_nationality
emp_dept -> {dept_type, dept_no_of_emp}

Candidate key: {emp_id, emp_dept}


The table is not in BCNF as neither emp_id nor emp_dept alone are keys.
To make the table comply with BCNF we can break the table in three tables like this:
emp_nationality table:
emp_id emp_nationality

1001 Austrian

1002 American

emp_dept table:
emp_dept dept_type dept_no_of_emp

Production and planning D001 200

Stores D001 250

design and technical support D134 100

Purchasing department D134 600

emp_dept_mapping table:
emp_id emp_dept

1001 Production and planning

1001 stores

1002 design and technical support


1002 Purchasing department

Functional dependencies:
emp_id -> emp_nationality
emp_dept -> {dept_type, dept_no_of_emp}
Candidate keys:
For first table: emp_id
For second table: emp_dept
For third table: {emp_id, emp_dept}

This is now in BCNF as in both the functional dependencies left side part is a key.While normalization
makes databases more efficient to maintain, they can also make them more complex because data is
separated into so many different tables.
*******************
UNIT II DISTRIBUTED DATABASES, ACTIVE DATABASES AND OPEN
DATABASE CONNECTIVITY
Distributed Database Architecture – Distributed Data Storage – Distributed Transactions – Distributed
Query Processing – Distributed Transaction Management – Event Condition Action Model – Design and
Implementation Issues for Active Databases – Open Database Connectivity

❖ Introduction:
For appropriate working of any business/organization, there’s a requirement for a well-organized database
management system. In the past databases used to centralize in nature. But, with the growth of globalization,
organizations lean towards expanded crosswise the world.
Because of this reason they have to choose distributed data instead of centralized system.
This was the reason concept of Distributed Databases came in picture.

Distributed Database Management System is a software system that manages a distributed database which is
partitioned and placed on different location. Its objective is to hide data distribution and appears as one logical
database system to the clients.

❖ DISTRIBUTED DATABASE CONCEPT:


Distributed Database is database which is not restricted to one system only. It is a group of several interconnected
databases. These are spread physically across various locations that communicate through a computer network.
Distributed Database Management System (DDBMS) manages the distributed database and offers mechanisms so as
to make the databases clear to the users. In these systems, data is intentionally distributed among multiple places so
that all computing resources of the organization can be optimally used.
Definition of Distributed Databases and Distributed Database Management System
(DDBMS)
The concept that is most important to the DDBMS is location clearness, meaning the user should be unaware of
the actual location of data.

“A distributed database management system (DDBMS) can be defined as the software system that permits the
management of the distributed database and makes the distribution transparent to the users.”
-M. Tamer Özsu
A Distributed Database Management System allows end users or application programmers to view a pool of
physically detached databases as one logical unit. In another word, we can say distributed database is, where
different data stored among multiple locations but connected via network, and for user it represents as a single
logical unit.
D
istributed Database Management System

1.2.1.1 Features of Distributed Database Management System


Some features of Distributed Database Management system are as follows:
• DDBMS software maintain CRUD (create, retrieve, Update, Delete)
functions.
• It covers all application areas where huge volume of data are processed and retrieved
simultaneously by n number of users.
• It ensure that data modified at any location update universally.
• It ensures confidentiality and data integrity which is important feature in transaction
management.
• It can handle heterogeneous data platforms.

1.2.1.2 Advantages of Distributed Database Management System:


Some of the advantages of DDBMS are as follows:
• Reliable:
• Easy Expansion
• Faster Response

1.2.1.3 Disadvantages of Distributed Database Management System:


• Complex and Expensive
• Overheads
• Integrity

❖ Distributed Database Architecture:


Database systems comprise of complex data structures. Thus, to make the
system efficient for retrieval of data and reduce the complexity of the users,
developers use the method of Data Abstraction.
Factors for DDBMS Architecture:
DDBMS architectures are commonly developed dependent on three factors –

1. Distribution–Itstates the physical dispersal of data crosswise the different sites.


Autonomy refers to the distribution of control, the distribution aspect of the
classification deals with data. The user sees the data as one logical group. There
are a numeralways DBMS have been distributed. We abstract these alternatives
into two classes:
● client/server distribution

● peer-to-peer distribution (or full distribution).

2 Autonomy
Autonomy, in this perspective, refers to the distribution of mechanism, not of
data. It identifies the distribution of regulator of the database system and the degree
to which each component DBMS can work independently. Autonomy is a function of
a quantity of factors such as whether the module systems interchange information,
whether they can independently accomplish transactions, and whether one is certified
to modify them. Requirements of an autonomous structure have been stated as
follows:

● The local procedures of the individual DBMSs are not affected by their
involvement in the distributed system.

● The method in which the individual DBMSs develop queries and optimize them
should not be affected by the accomplishment of global queries that access
multiple databases.

● System regularity or operation should not be negotiated when individual DBMS


join or leave the distributed system.

3. Heterogeneity– It refers to the uniformity or variation of the data models, system


tools and databases. Heterogeneity may happen in various forms in distributed
systems, ranging from hardware heterogeneity and dissimilarities in networking
protocols to distinctions in data managers.
ARCHITECTURAL MODELS OF DISTRIBUTED DBMS:

1. Client-Server Architecture:

Client-Server architecture is a two-level architecture where the functionality


is distributed into servers and clients. The server functions mainly comprise data
management, query handling, transaction management and optimization. Client
functions contain mainly user interface. Nevertheless, they have some functions
resembling consistency checking and transaction management.

The two different types of clients – server architecture are as follows:

● Single Server Multiple Client


● Multiple Server Multiple Client:
0. Peer- to-Peer Architecture for Distributed DBMS
In this systems, each peer actions both as a client and a server for instructing
database services. The peers share their source with other peers and co-ordinate their
actions.

This architecture in general has four levels of schemas

1. Global, Local, External, and Internal Schemas:


● Global Conceptual Schema –Global Conceptual Schema represents the global
logical view of data. It represents the logical explanation of entire database as if
it is not circulated. This level encloses definitions of all units, relationships
among entities and security and integrity facts of whole databases kept at all sites
in a distributed system.
● Local Conceptual Schema –Local Conceptual Schema Show logical data
organization at individual location.
● Local Internal Schema –Local Internal Schema represents physical record at
each site.
● External Schema –External Schema Describes user’s vision of facts and figures.

0. Multi - DBMS Architectures


This is an integrated database system formed by a collection of two or more
autonomous database systems.

Multi-DBMS can be expressed through six levels of schemas



Multi-database View Level Describes multiple user views including of subsets of the integrated distribute


Multi-database Conceptual Level Shows integrated multi- database that comprises of global logical multi-


Multi-database Internal Level Illustrates the data distribution across different sites and multi-database to l


Local database View Level Give a picture of public view of local data.


Local database Conceptual Level Describes local data organization at each site.


Local database Internal Level Shows physical data organization at each site.

There are two design alternatives for multi-DBMS


❖ Distributed Data Storage

Distributed data storage refers to the practice of storing data across multiple physical or logical
locations, often on different servers, nodes, or data centers. This approach offers benefits such as
improved data availability, fault tolerance, scalability, and performance. Various technologies and
architectures are used to implement distributed data storage systems. Here are some key concepts
and approaches:

1. **Replication:** Data replication involves creating and maintaining multiple copies of the same
data across different nodes. This helps enhance data availability and fault tolerance. Replicated data
can be stored on geographically dispersed servers to minimize the impact of hardware failures or
network outages.

2. **Sharding or Data Partitioning:** Sharding involves dividing a dataset into smaller subsets (shards)
based on certain criteria, such as a range of values or a hash function. Each shard is stored on a
separate node. Sharding improves data distribution and can lead to better performance by allowing
parallel processing of queries.

3. **Consistency Models:** Distributed systems need mechanisms to maintain data consistency across
replicas. Consistency models, such as strong consistency, eventual consistency, and causal
consistency, define how and when data changes are propagated to replicas.

4. **Data Synchronization:** When data is replicated, mechanisms are required to keep the replicas
synchronized. Techniques like two-phase commit, three-phase commit, and distributed consensus
protocols (e.g., Paxos, Raft) ensure that all replicas agree on updates.

5. **Distributed File Systems:** Distributed file systems provide a way to store and manage files
across multiple servers. Examples include Hadoop Distributed File System (HDFS) and Ceph.
These systems offer features like data replication, fault tolerance, and high throughput.

6. **NoSQL Databases:** Many NoSQL databases are designed with distributed storage in mind.
These databases, such as Cassandra, MongoDB, and Couchbase, provide horizontal scalability, data
distribution, and support for various consistency models.

7. **Content Delivery Networks (CDNs):** CDNs distribute content, such as web pages, images, and
videos, to multiple servers located at different geographic locations. This reduces latency for users
by serving content from a nearby server.

8. **Distributed Object Storage:** Distributed object storage systems, like Amazon S3 and OpenStack
Swift, allow users to store and retrieve objects (files) through an API. These systems distribute data
across multiple nodes, providing high availability and scalability.

9. **Distributed Database Management Systems (DDBMS):** DDBMSs manage data across multiple
nodes while providing mechanisms for data distribution, replication, and querying. Examples
include Google Spanner and CockroachDB.

10. **Blockchain and Distributed Ledgers:** Blockchain technology provides a distributed and
tamper-resistant ledger for recording transactions. Each block in the chain contains a copy of the
entire ledger, distributed across nodes in a network.

11. **Erasure Coding:** Erasure coding is a technique that breaks data into smaller fragments and
adds redundant data (parity) to enable recovery in case of data loss. It's often used in distributed
storage systems to save space compared to traditional replication.

12. **Hybrid Cloud Storage:** Hybrid cloud solutions combine on-premises storage with cloud-based
storage, allowing organizations to maintain a balance between local control and cloud scalability.

Distributed data storage solutions are widely used in various industries and applications, including
cloud computing, big data analytics, IoT (Internet of Things), content delivery, and more. They
address the challenges of data growth, accessibility, and reliability in today's interconnected and
data-intensive world.

❖ Distributed Transactions

A transaction is a program including a collection of database operations, executed as a logical unit of data
processing. The operations performed in a transaction include one or more of database operations like
insert, delete, update or retrieve data. It is an atomic process that is either performed into completion
entirely or is not performed at all. A transaction involving only data retrieval without any data update is
called read-only transaction.

Each high level operation can be divided into a number of low level tasks or operations. For example, a data update op

● read_item() reads data item from storage to main memory.


● modify_item() change value of item in the main memory.
● write_item() write the modified value from main memory to storage.
Database access is restricted to read_item() and write_item() operations. Likewise, for all transactions, read
and write forms the basic database operations.
Transaction Operations
The low level operations performed in a transaction are


begin_transaction A marker that specifies start of transaction execution.


read_item or write_item Database operations that may be interleaved with main memory operations as a part of

● end_transaction A marker that specifies end of transaction.



commit A signal to specify that the transaction has been successfully completed in its entirety and will not be u


rollback A signal to specify that the transaction has been unsuccessful and so all temporary changes in the datab
Transaction States
A transaction may go through a subset of five states, active, partially committed, committed, failed and
aborted.


Active The initial state where the transaction enters is the active state. The transaction remains in this state whil


Partially Committed The transaction enters this state after the last statement of the transaction has been execute


Committed The transaction enters this state after successful completion of the transaction and system checks ha


Failed The transaction goes from partially committed state or active state to failed state when it is discovered th


Aborted This is the state after the transaction has been rolled back after failure and the database has been restor

The following state transition diagram depicts the states in the transaction and the low level transaction
operations that causes change in states.

Desirable Properties of Transactions


Any transaction must maintain the ACID properties, viz. Atomicity, Consistency, Isolation, and Durability.


Atomicity This property states that a transaction is an atomic unit of processing, that is, either it is performed in

Consistency A transaction should take the database from one consistent state to another consistent state. It shou


Isolation A transaction should be executed as if it is the only one in the system. There should not be any interfe


Durability If a committed transaction brings about a change, that change should be durable in the database and

Schedules and Conflicts


In a system with a number of simultaneous transactions, a schedule is the total order of execution of
operations. Given a schedule S comprising of n transactions, say T1, T2, T3………..Tn; for any
transaction Ti, the operations in Ti must execute as laid down in the schedule S.
Types of Schedules
There are two types of schedules


Serial Schedules In a serial schedule, at any point of time, only one transaction is active, i.e. there is no overlap


Parallel Schedules In parallel schedules, more than one transactions are active simultaneously, i.e. the transactio
Conflicts in Schedules

In a schedule comprising of multiple transactions, aconflictoccurs when two active transactions perform non-compatib

● The two operations are parts of different transactions.


● Both the operations access the same data item.
● At least one of the operations is a write_item() operation, i.e. it tries to modify the data item.
Serializability
A serializable schedule of ‘n’ transactions is a parallel schedule which is equivalent to a serial schedule
comprising of the same ‘n’ transactions. A serializable schedule contains the correctness of serial schedule
while ascertaining better CPU utilization of parallel schedule.

❖ Distributed Query Processing


Distributed query processing in a Database Management System (DBMS) refers to the process of
executing queries that involve data stored across multiple nodes or sites within a distributed
database system. The goal of distributed query processing is to optimize the execution of queries to
minimize data transfer, network overhead, and overall response time while ensuring correct and
consistent results. Here's how distributed query processing works:

1. **Query Decomposition: ** When a query is submitted to a distributed DBMS, the query is first
decomposed into subqueries that can be executed on different nodes. These subqueries are sent to
the appropriate nodes based on data distribution and availability.

2. **Data Localization:** One of the key goals of distributed query processing is to minimize the
amount of data that needs to be transferred across the network. This is achieved by executing
subqueries on nodes where the relevant data is located, reducing the need for extensive data
movement.

3. **Query Optimization:** Each node receives its respective subquery and optimizes it based on local
data. This optimization includes selecting appropriate indexes, joining tables, and applying other
query optimization techniques to improve performance.

4. **Parallel Execution:** Once subqueries are optimized, they can be executed in parallel across
different nodes. This parallelism improves query execution speed by utilizing the computational
resources of multiple nodes simultaneously.

5. **Intermediate Result Exchange:** In queries involving joins or aggregation, intermediate results


might need to be exchanged between nodes. Minimizing the amount of data exchanged and
optimizing data transfer strategies is crucial to avoid network bottlenecks.

6. **Global Optimization:** In some cases, global optimization techniques consider the overall query
plan, considering the costs and benefits of different execution strategies on various nodes. This
helps to make decisions that improve the performance of the entire distributed query.

7. **Distribution Transparency:** Distributed query processing aims to provide distribution


transparency, where users and applications can interact with the database as if it were a single
centralized database, without needing to be aware of data distribution and location.

8. **Data Consistency and Isolation:** Distributed query processing needs to ensure proper transaction
isolation levels and consistency across distributed nodes, especially when multiple transactions are
executing concurrently.

9. **Query Result Integration:** Once subqueries are executed and their results are obtained, they
need to be integrated to produce the final query result. This integration process might involve
additional processing and computations.

10. **Cost-Based Optimization:** Cost-based optimization techniques evaluate different query


execution plans based on estimated costs, taking into account factors like data transfer costs, local
processing costs, and network latency.

11. **Caching and Materialized Views:** Caching intermediate results or using materialized views
can improve performance by reducing the need to recompute certain parts of the query during
subsequent executions.

12. **Query Routing:** Query routing mechanisms determine which node should execute which part
of the query. Load balancing and smart routing strategies contribute to efficient resource utilization.

Distributed query processing is essential in scenarios where data is distributed across multiple
locations or nodes, such as in cloud computing environments or global enterprises. It addresses the
challenges of data distribution, network latency, and optimizing the use of distributed resources to
provide users with efficient and consistent query results.

❖ Distributed Transaction Management


Distributed Transaction Management (DTM) in Database Management Systems (DBMS) involves
handling transactions that span multiple databases or data sources in a distributed computing
environment. The goal of DTM in DBMS is to ensure the consistency, isolation, and atomicity of
transactions across different nodes or databases, despite potential network failures or system
crashes. Here's how DTM is typically implemented in a DBMS context:

1. **Transaction Coordinator**:
- A central coordinator or transaction manager is responsible for initiating, coordinating, and
monitoring distributed transactions.
- It ensures that the ACID properties (Atomicity, Consistency, Isolation, Durability) are maintained
across participating databases.

2. **Participant Databases**:
- Each participating database is a node that can execute a portion of the distributed transaction.
- Participants must support distributed transactions and adhere to the protocols and mechanisms for
transaction coordination.

3. **Two-Phase Commit (2PC)**:


- The most commonly used protocol for distributed transaction coordination is the Two-Phase
Commit protocol.
- **Phase 1 - Prepare Phase**: The coordinator asks each participant if it is ready to commit the
transaction. Participants respond with either a "vote to commit" or a "vote to abort."
- **Phase 2 - Commit Phase**: If all participants vote to commit, the coordinator instructs them to
perform the commit operation. If any participant votes to abort, the coordinator instructs them to
roll back the transaction.

4. **Three-Phase Commit (3PC)**:


- To mitigate potential blocking issues in 2PC, the Three-Phase Commit protocol introduces a "pre-
commit" phase.
- **Phase 1 - Prepare Phase**: Similar to 2PC, participants vote to commit or abort.
- **Phase 2 - Pre-Commit Phase**: Participants notify the coordinator of their readiness. The
coordinator waits for all participants to be ready before proceeding.
- **Phase 3 - Commit Phase**: If all participants are ready, the coordinator instructs them to commit.
Otherwise, if any participant is not ready, the coordinator instructs them to abort.

5. **Isolation Levels and Concurrency Control**:


- Ensuring the proper isolation levels (Read Uncommitted, Read Committed, Repeatable Read,
Serializable) is crucial to prevent data anomalies and inconsistencies in distributed transactions.
- Distributed databases must implement appropriate concurrency control mechanisms to handle
multiple transactions accessing the same data concurrently.

6. **Deadlock Detection and Handling**:


- Deadlocks can occur in distributed transactions just as in single-database transactions. Distributed
DBMS must implement deadlock detection and resolution mechanisms to prevent system lockups.

7. **Global vs. Local Transactions**:


- A global transaction spans multiple databases, while a local transaction involves a single database.
- In global transactions, all participating databases must commit or abort for data consistency.

8. **Compensating Transactions**:
- Sometimes, due to unforeseen issues, a distributed transaction might need to be rolled back using
compensating transactions that undo the effects of the original transaction.

9. **Distributed Transaction Monitors**:


- Complex distributed systems may employ dedicated transaction monitors that manage transaction
coordination, monitoring, and recovery.

10. **Data Replication and Distribution**:


- Replicated and distributed data across nodes can impact how transactions are managed, especially
in terms of maintaining consistency and ensuring that changes are propagated correctly.

Distributed Transaction Management in DBMS is a critical aspect of maintaining data integrity and
consistency in modern distributed and cloud-based environments. Proper design, robust protocols,
and careful consideration of failure scenarios are essential for achieving reliable distributed
transactions.

❖ Event Condition Action Model:

The Event-Condition-Action (ECA) model is a paradigm used in Database Management Systems (DBMS)
and other computing systems to define and manage complex event-driven behaviors. It provides a way to
specify how the system should react to certain events based on predefined conditions, triggering specific
actions as a result. The ECA model is commonly used in rule-based systems and event processing
frameworks. Here's a breakdown of each component:

1. Event:
o An event is a change or occurrence in the system that triggers some response.
o Events can be internal (generated within the system) or external (coming from the
environment).
o Examples of events in a DBMS context include data changes (inserts, updates, deletes),
time-based triggers, user interactions, and more.
2. Condition:
o The condition is a logical expression or criteria that determine when the associated action(s)
should be executed.
o It defines the context under which the action(s) become relevant.
o Conditions can involve comparisons, calculations, and checks on data values, states, and
more.
3. Action:
o An action is a task, operation, or set of operations that should be performed when the
associated event occurs and the condition is satisfied.
o Actions can include data modifications, notifications, invoking procedures, sending
messages, or any other system-specific behavior.

The basic flow of the ECA model is as follows:

● When an event occurs (either internally or externally), the system evaluates the corresponding
condition(s) associated with that event.
● If the condition is met, the defined action(s) are executed in response to the event and condition
combination.

The ECA model is especially useful in scenarios where you want the system to autonomously react to
certain events and conditions without manual intervention. It's commonly used for tasks like automated
notifications, enforcing business rules, triggering workflows, and more.

Example scenario using the ECA model in a DBMS context: Event: New order is placed in an online store.
Condition: Total order amount exceeds a predefined threshold. Action: Send a discount coupon to the
customer's email address.
Advantages of the ECA model:

● Flexibility: Allows the system to respond to complex events and conditions in a dynamic manner.
● Automation: Enables the automation of processes and workflows based on specific triggers.
● Customization: Offers the ability to customize responses based on conditions and events.
● Real-time Processing: Suitable for real-time event processing and reaction.

Disadvantages:

● Complexity: As the system becomes more complex, managing and debugging rules can become
challenging.
● Performance: Poorly designed ECA rules could impact system performance, especially if many
events and rules are involved.

Overall, the ECA model is a powerful approach for creating event-driven behaviors in DBMS and other
systems, but it requires careful design and management to ensure effective and efficient execution of rules
and actions.

Example:

**Scenario**: Consider a university's student registration system. Whenever a student's unpaid tuition fee
exceeds a certain threshold, the system should automatically send a notification to the student.

**ECA Model Implementation**:

1. **Event**: The event in this scenario could be a change in the student's tuition fee status. Specifically,
when a new fee record is added or updated in the database.

2. **Condition**: The condition is the criteria that must be met for the action to be triggered. In this case,
the condition might be that the student's unpaid tuition fee exceeds $1,000.

3. **Action**: The action is what happens when the event and condition are satisfied. In this example, the
action is to send a notification email to the student.

So, in the ECA model:

- **Event**: A new tuition fee record is added or updated for a student.

- **Condition**: If the unpaid tuition fee exceeds $1,000.

- **Action**: Send an email notification to the student with a message like, "Your unpaid tuition fee has
exceeded $1,000. Please make a payment."
This ECA rule ensures that the specified action (sending a notification email) is automatically triggered
when the defined event (tuition fee update) occurs, and the specified condition (exceeding $1,000) is met
in the database system.

❖ Design and Implementation Issues for Active Databases

Active databases are database systems that are capable of proactively reacting to events and conditions
by executing predefined actions. These actions can include triggers, rules, or scripts that are
automatically executed when certain events occur or conditions are met. Designing and
implementing active databases involves addressing several important issues to ensure their
functionality, performance, and maintainability. Here are some key design and implementation
considerations for active databases:

1. **Event Specification and Detection**:


- Clearly define the events that the database should react to. Events can include data changes (inserts,
updates, deletes), time-based triggers, external notifications, and more.
- Implement mechanisms to detect and capture these events in real-time or near real-time, such as
using event queues or change data capture (CDC) techniques.

2. **Rule Specification**:
- Define the rules or conditions that should trigger specific actions when events occur. Rules can
involve complex conditions and constraints.
- Express rules using a formal syntax or a rule language that the active database system understands.

3. **Action Execution**:
- Specify the actions that should be taken when a rule's conditions are met and an event occurs.
Actions can involve data modifications, notifications, invoking procedures, etc.
- Ensure that actions are executed efficiently and reliably, considering the potential impact on system
performance.

4. **Concurrency Control and Isolation**:


- Address concurrency control issues to ensure that multiple active rules can be executed
concurrently without leading to inconsistencies or conflicts.
- Implement proper isolation mechanisms to maintain data consistency while executing actions
triggered by different events.

5. **Rule Execution Lifecycle**:


- Design the lifecycle of rule execution, including rule activation, condition evaluation, and action
execution.
- Handle scenarios where multiple rules might be activated by the same event, possibly leading to
cascading actions.

6. **Event and Action Logging**:


- Maintain logs of events, rule activations, condition evaluations, and executed actions. This helps
with auditing, debugging, and tracking system behavior.

7. **Performance Optimization**:
- Optimize the system to handle a high volume of events and rule activations efficiently. This might
involve caching, indexing, and query optimization techniques.

8. **Error Handling and Recovery**:


- Implement error handling mechanisms to deal with exceptions that might occur during rule
activation, condition evaluation, or action execution.
- Plan for recovery strategies in case of failures to ensure the system remains consistent.

9. **Security and Access Control**:


- Ensure that the active database system enforces appropriate security measures to prevent
unauthorized access to events, rules, and actions.
- Implement role-based access control to limit who can create, modify, or execute active rules.

10. **Testing and Validation**:


- Thoroughly test the active rules, conditions, and actions to ensure they behave as expected in
different scenarios.
- Use simulation or sandbox environments to validate rules and actions before deploying them in a
production environment.

11. **Maintenance and Monitoring**:


- Implement monitoring and management tools to track the execution of rules, events, and actions.
- Plan for maintenance tasks such as rule updates, system upgrades, and performance tuning.

12. **Documentation and Training**:


- Provide documentation and training resources for developers and administrators who will be
working with the active database system.

Designing and implementing active databases requires a careful balance between reactivity and
system performance, as well as a clear understanding of the events, rules, and actions that will drive
the system's behavior. Proper planning, testing, and ongoing maintenance are crucial to ensure the
success of an active database system.

❖ Open Database Connectivity

Open Database Connectivity (ODBC) is a standard application programming interface (API) that
enables applications to interact with various database management systems (DBMS) using a
consistent and uniform interface. ODBC allows applications to access, manipulate, and manage
data across different database platforms without needing to know the specifics of each database's
underlying architecture.

Here are the key aspects of Open Database Connectivity (ODBC):

1. **Standard Interface**: ODBC provides a standard set of function calls and data structures that
applications can use to interact with databases. This standardization makes it easier for developers
to write database-independent applications.

2. **Database Independence**: ODBC abstracts the differences between various DBMS systems,
allowing applications to connect to different databases without needing to modify code
significantly.

3. **Driver Architecture**: ODBC operates based on a driver architecture. Each database vendor
provides an ODBC driver specific to their database system. These drivers translate ODBC function
calls into the appropriate commands for the underlying DBMS.

4. **Data Source Name (DSN)**: ODBC connections are established using Data Source Names,
which are typically configured through an ODBC administrator tool. DSNs store information about
the database server, authentication, and other connection details.

5. **SQL Interface**: ODBC supports the Structured Query Language (SQL), allowing applications to
execute SQL statements and retrieve results from the database.

6. **Connection Pooling**: ODBC drivers often include connection pooling, which allows the reuse of
established connections to improve performance.

7. **Metadata Retrieval**: Applications can retrieve database schema information, such as tables,
columns, and indexes, using ODBC functions.

8. **Error Handling**: ODBC provides error handling mechanisms to help applications diagnose and
handle errors that may occur during database interactions.

9. **Unicode Support**: ODBC offers Unicode support, enabling applications to work with
multilingual and international character sets.

10. **Supported Platforms**: ODBC is available on various platforms, including Windows, Linux,
macOS, and others.

11. **API Layers**: ODBC can be used directly by applications, but it's also often used as a
foundation for other database APIs and tools, such as ADO (ActiveX Data Objects) and JDBC
(Java Database Connectivity).

12. **Performance Considerations**: While ODBC provides database independence, it's important to
be mindful of performance implications, as there might be some overhead involved in translating
ODBC calls into database-specific commands.

13. **ODBC Drivers**: Each DBMS vendor provides its own ODBC driver that is specific to their
database system. These drivers need to be installed on the client machine to enable communication
with the corresponding database.

You might also like