Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

unit-1-introduction-to-dbms

Download as pdf or txt
Download as pdf or txt
You are on page 1of 52

UNIT I: Introduction to DBMS

Objectives:
 To Understand the basic concepts and the applications of database systems
 To Master the basics of SQL and construct queries using SQL
 To understand the relational database design principles
 To become familiar with the basic issues of transaction processing andconcurrency control
 To become familiar with database storage structures and access techniques

Outcomes:
 Demonstrate the basic elements of a relational database management system
 Ability to identify the data models for relevant problems
 Ability to design entity relationship and convert entity relationship diagrams into RDBMS and formulate
SQL queries on the respect data
 Apply normalization for the development of application software

INTRODUCTION TO DBMS:
 Data is nothing but facts and statistics stored or free flowing over a network, generallyit's raw and
unprocessed.
 Data becomes information when it is processed, turning it into something meaningful.
 The database is a collection of inter-related data which is used to retrieve, insert and delete the data
efficiently.
 It is also used to organize the data in the form of a table, schema, views, and reports,etc.
 Using the database, you can easily retrieve, insert, and delete the information.
 For example: The college Database organizes the data about the admin, staff, studentsand faculty etc.

DBMS Vs. File System:

DBMS File System

DBMS is a collection of data. In DBMS, theuser File system is a collection of data. In this system, the
is not required to write the procedures. user has to write the procedures for managing the
database.

Searching data is easy in Dbms Searching is difficult in File System


Dbms is structured data Files are unstructured data

No data redundancy in Dbms Data redundancy is there in file system

Memory utilisation well in dbms Memory utilisation poor in file system

No data inconsistency in dbms Inconsistency in file system

DBMS gives an abstract view of data that hides File system provides the detail of the data
the details. representation and storage of data.
DBMS provides a crash recovery mechanism, File system doesn't have a crash mechanism, i.e., if the
i.e., DBMS protects the user from the system system crashes while entering some data, then the
failure. content of the file will lost.

DBMS provides a good protection mechanism. It is very difficult to protect a file under the filesystem.

DBMS contains a wide variety of sophisticated File system can't efficiently store and retrieve thedata.
techniques to store and retrieve the data.

DBMS takes care of Concurrent access of data In the File system, concurrent access has many
using some form of locking. problems like redirecting the file while other deleting
some information or updating some information.

History of DBMS:

Data is a collection of facts and figures. The data collection was increasing day to day and they needed to be
stored in a device or software which is safer.
Charles Bachman was the first person to develop the Integrated Data Store (IDS) which was based on
network data model for which he was inaugurated with the Turing Award (The most prestigious award which
is equivalent to Nobel Prize in the field of Computer Science.). It was developed in early 1960’s.
In the late 1960’s, IBM (International Business Machines Corporation) developed the Integrated Management
Systems which is the standard database system used till date in many places. It was developed based on the
hierarchical database model. It was during the year 1970 that the relational database model was developed by
Edgar Codd. Many of the database models we use today are relational based. It was considered the
standardized database model from then.
The relational model was still in use by many people in the market. Later during the same decade (1980’s),
IBM developed the Structured Query Language (SQL) as a part of R project. It was declared as a standard
language for the queries by ISO and ANSI. The Transaction Management Systems for processing transactions
was also developed by James Gray for which he was felicitated the Turing Award.

 A DBMS is software that allows creation, definition and manipulation of database, allowing users to store,
process and analyse data easily.
 DBMS provides us with an interface or a tool, to perform various operations like creating database,
storing data in it, updating data, creating tables in the database anda lot more.
 DBMS also provides protection and security to the databases.
 It also maintains data consistency in case of multiple users. Here are some examples of popular DBMS
used these days:
 MySql
 Oracle
 SQL Server
 IBM DB2
DATABASE APPLICATIONS:
1. Telecom: There is a database to keeps track of the information regarding calls made, network usage,
customer details etc.
2. Industry: Where it is a manufacturing unit, warehouse or distribution centre, each oneneeds a database to
keep the records of ins and outs
3. Banking System: For storing customer info, tracking day to day credit and debit transactions, generating
bank statements etc.
4. Sales: To store customer information, production information and invoice details.
5. Airlines: To travel though airlines, we make early reservations; this reservation information along with
flight schedule is stored in database.
6. Education sector: Database systems are frequently used in schools and colleges to store and retrieve the
data regarding student details, staff details, course details, examdetails, payroll data, attendance details, fees
details etc.

PURPOSE OF DATABASE SYSTEMS:


The main purpose of database systems is to manage the data. Consider a university that keeps the data of
students, teachers, courses, books etc. To manage this data we need to store this data somewhere where we
can add new data, delete unused data, update outdated data, retrieve data, to perform these operations on
data we need a Database management system that allows us to store the data in such a way so that all these
operations can be performed on the data efficiently.

Characteristics of DBMS:
 Data stored into Tables: Data is never directly stored into the database. Data is stored into tables, created
inside the database.
 Reduced Redundancy: In the modern world hard drives are very cheap, but earlier when hard drives
were too expensive, unnecessary repetition of data in database was a big problem. But DBMS follows
Normalisation which divides the data in such a way that repetition is minimum.
 Data Consistency: On Live data, i.e. data that is being continuosly updated and added, maintaining the
consistency of data can become a challenge. But DBMS handles it allby itself.
 Support Multiple user and Concurrent Access: DBMS allows multiple users to work on it(update, insert,
delete data) at the same time and still manages to maintain the data consistency.
 Query Language: DBMS provides users with a simple Query language, using whichdata can be easily
fetched, inserted, deleted and updated in a database.

Advantages of DBMS:
 Controls database redundancy: It can control data redundancy because it stores all data in one single
database file and that recorded data is placed in the database.
 Data sharing: In DBMS, the authorized users of an organization can share data among multiple users.
 Easily Maintenance: It can be easily maintainable due to the centralized nature of thedatabase system.
 Reduce time: It reduces development time and maintenance need.
 Backup: It provides backup and recovery subsystems which create automatic backupof data from
hardware and software failures and restores the data if required.
 Multiple user interface: It provides different types of user interfaces like graphicaluser interfaces,
application program interfaces

Disadvantages of DBMS:
 Cost of Hardware and Software: It requires a high speed of data processor and large memory size to run
DBMS software.
 Size: It occupies a large space of disks and large memory to run them efficiently.
 Complexity: Database system creates additional complexity and requirements.
 Higher impact of failure: Failure is highly impacted the database because in most of the organization, all
the data stored in a single database and if the database is damageddue to electric failure or database
corruption then the data may be lost forever.

Disadvantages of file oriented approach:


1) Data redundancy and inconsistency:
The same information may be written in several files. This redundancy leads to higher storage and access cost.
It may lead data inconsistency that is the various copies of the same data may longer agree for example a
changed customer address may be reflected in single file but not else where in the system.
2) Difficulty in accessing data:
The conventional file processing system does not allow data to retrieve in a convenient and efficient manner
according to user choice.
3) Data isolation:
Because data are scattered in various file and files may be in different formats with new application programs
to retrieve the appropriate data is difficult.
4) Integrity Problems:
Developers enforce data validation in the system by adding appropriate code in the various application
programs. However when new constraints are added, it is difficult to change the programs to enforce them.
5) Atomicity:
It is difficult to ensure atomicity in a file processing system when transactionfailure occurs due to power failure,
networking problems etc.
(Atomicity: either all operations of the transaction are reflected properly in database or none are)
6) Concurrent access:
In the file processing system it is not possible to access a same file fortransaction at same time
7) Security problems:
There is no security provided in file processing system to secure the data from unauthorized user access.

Database:
A database is organized collection of related data of an organization stored in formatted way which is shared by
multiple users.
The main features of data in a database are:
1. It must be well organized
2. It is related
3. It is accessible in a logical order without any difficulty
4. It is stored only once for example:
Consider the roll no, name, address of a student stored in a student file. It is collection of related data with an
implicit meaning.
Data in the database may be persistent, integrated and shared.

Persistent:
If data is removed from database due to some explicit request from user to remove.
Integrated:
A database can be a collection of data from different files and when any redundancyamong those files is
removed from database is said to be integrated data.
Sharing Data:
The data stored in the database can be shared by multiple users simultaneously without affecting the correctness
of data.
Why Database over file system:
In order to overcome the limitation of a file system, a new approach was required. Hence a database approach
emerged. A database is a persistent collection of logically related data. The initial attempts were to provide a
centralized collection of data. A database has a self describing nature. It contains not only the data sharing and
integrationof data of an organization in a single database.

A small database can be handled manually but for a large database and having multiple users it is difficult to
maintain it, In that case a computerized database is useful. The advantages of database system over traditional,
paper based methods of record keeping are:
Compactness: No need for large amount of paper files
Speed: The machine can retrieve and modify the data faster way then human being.
Less drudgery: Much of the maintenance of files by hand is eliminated.
Accuracy: Accurate, up-to-date information is fetched as per requirement of theuser at any time.

Database Management System (DBMS):


A database management system consists of collection of related data and refers to a set ofprograms for defining,
creation, maintenance and manipulation of a database.

Function of DBMS:
1. Defining database schema: it must give facility for defining the databasestructure also specifies access
rights to authorized users.
2. Manipulation of the database: DBMS must have functions like insertion of record into database, updating
of data, deletion of data, and retrieval of data.
3. Sharing of database: The DBMS must share data items for multiple users bymaintaining consistency of
data.
4. Protection of database: It must protect the database against unauthorized users.
5. Database recovery: If for any reason the system fails DBMS must facilitate database recovery.

View of Data in DBMS:


Abstraction is one of the main features of database systems.
Hiding irrelevant details from user and providing abstract view of data to users, helps in easy and efficient
user-database interaction.
The three level of DBMS architecture: The top level of that architecture is “view level”. The view level
provides the “view of data” to the users and hides the irrelevant details such as data relationship, database
schema, constraints, security etc from the user.

Data Abstraction in DBMS:

Database systems are made-up of complex data structures. To ease the user interaction withdatabase, the
developers hide internal irrelevant details from users. This process of hiding irrelevant details from user is
called data abstraction.
We have three levels of abstraction:
Physical level: This is the lowest level of data abstraction. It describes how data is actuallystored in database.
You can get the complex data structure details at this level.
Logical level: This is the middle level of 3-level data abstraction architecture. It describes what data is stored
in database.
View level: Highest level of data abstraction. This level describes the user interaction withdatabase system.

Instance and schema in DBMS:

 Definition of schema: Design of a database is called the schema. Schema is of three types: Physical
schema, logical schema and view schema.
 The design of a database at physical level is called physical schema, how the data stored in blocks of
storage is described at this level.
 Design of database at logical level is called logical schema, programmers and database administrators
work at this level, at this level data can be described as certaintypes of data records gets stored in data
structures, however the internal details such as implementation of data structure is hidden at this level
(available at physical level).
 Design of database at view level is called view schema. This generally describes enduser interaction with
database systems.
Definition of instance: The data stored in database at a particular moment of time is called instance of
database. Database schema defines the variable declarations in tables that belong to a particular database; the
value of these variables at a moment of time is called the instance ofthat database.

What is Relational Model?

Relational Model (RM) represents the database as a collection of relations. A relation is nothing but a table
of values. Every row in the table represents a collection of related data values. These rows in the table denote a
real-world entity or relationship.
The table name and column names are helpful to interpret the meaning of values in each row.The data are
represented as a set of relations. In the relational model, data are stored as tables. However, the physical
storage of the data is independent of the way the data are logically organized.
Relational Model Concepts:
1. Attribute: Each column in a Table. Attributes are the properties which define arelation. e.g.,
Student_Rollno, NAME,etc.
2. Tables: In the Relational model the, relations are saved in the table format. It is stored along with its
entities. A table has two properties rows and columns. Rows represent records and columns represent
attributes.
3. Tuple: It is nothing but a single row of a table, which contains a single record.
4. Relation Schema: A relation schema represents the name of the relation with itsattributes.
5. Degree: The total number of attributes which in the relation is called the degree of therelation.
6. Cardinality: Total number of rows present in the Table.
7. Column: The column represents the set of values for a specific attribute.
8. Relation instance – Relation instance is a finite set of tuples in the RDBMS system. Relation instances
never have duplicate tuples.
9. Relation key - Every row has one, two or multiple attributes, which is called relation key.
10. Attribute domain – Every attribute has some pre-defined value and scope which isknown as attribute
domain.

Keys in DBMS:
KEYS in DBMS is an attribute or set of attributes which helps you to identify a row(tuple) in a relation(table).
They allow you to find the relation between two tables. Keys help you uniquely identify a row in a table by a
combination of one or more columns in that table. Key is also helpful for finding unique record or row from the
table. Database key is also helpful for finding unique record or row from the table.
Why we need a Key?
Here are some reasons for using SQL key in the DBMS system:
 Keys help you to identify any row of data in a table. In a real-world application, a table could contain
thousands of records. Moreover, the records could be duplicated. Keys ensure that you can uniquely identify a
table record despite these challenges.
 Allows you to establish a relationship between and identify the relation betweentables
 Help you to enforce identity and integrity in the relationship.

Types of Keys in Database Management System:

There are mainly seven different types of Keys in DBMS and each key has its different functionality:

 Super Key - A super key is a group of single or multiple keys which identifies rows in a table.
 Primary Key - is a column or group of columns in a table that uniquely identifyevery row in that table.
 Candidate Key - is a set of attributes that uniquely identify tuples in a table. Candidate Key is a super key
with no repeated attributes.
 Alternate Key - is a column or group of columns in a table that uniquely identifyevery row in that table.
 Foreign Key - is a column that creates a relationship between two tables. The purpose of Foreign keys is to
maintain data integrity and allow navigation betweentwo different instances of an entity.
 Compound Key - has two or more attributes that allow you to uniquely recognize a specific record. It is
possible that each column may not be unique by itself within thedatabase.
 Composite Key - An artificial key which aims to uniquely identify each record is called a surrogate key.
These kind of key are unique because they are created whenyou don't have any natural primary key.
 Surrogate Key - An artificial key which aims to uniquely identify each record is called a surrogate key.
These kind of key are unique because they are created whenyou don't have any natural primary key.

Primary key example:

CREATE TABLE Persons (


ID int NOT NULL,
LastName varchar(255) NOT NULL, FirstName varchar(255),
Age int,
PRIMARY KEY (ID)
);
Create Primary Key (ALTER TABLE statement):

Syntax
The syntax to create a primary key using the ALTER TABLE statement in SQL is:

ALTER TABLE table_name


ADD CONSTRAINT constraint_name
PRIMARY KEY (column1, column2, ... column_n);

FOREIGN KEY on CREATE TABLE

The following SQL creates a FOREIGN KEY on the "PersonID" column when the "Orders"table is
created:
CREATE TABLE Orders (
OrderID int NOT NULL, OrderNumber int NOT NULL, PersonID int,
PRIMARY KEY (OrderID),
FOREIGN KEY (PersonID) REFERENCES Persons(PersonID)
);

ER model:
o ER model stands for an Entity-Relationship model. It is a high-level data model. This model is used to
define the data elements and relationship for a specified system.
o It develops a conceptual design for the database. It also develops a very simple andeasy to design view
of data.
o In ER modeling, the database structure is portrayed as a diagram called an entity- relationship diagram.

For example, suppose we design a school database. In this database, the student will be an entity with
attributes like address, name, id, age, etc. The address can be another entity with attributes like city,
street name, pin code, etc and there will be a relationship between them.
Component of ER Diagram:

1. Entity:
An entity may be any object, class, person or place. In the ER diagram, an entity can berepresented as
rectangles.
Consider an organization as an example- manager, product, employee, department etc. can be taken as
an entity.

Weak Entity

An entity that depends on another entity called a weak entity. The weak entity doesn't containany key
attribute of its own. The weak entity is represented by a double rectangle.
2. Attribute

The attribute is used to describe the property of an entity. Eclipse is used to represent anattribute.
For example, id, age, contact number, name, etc. can be attributes of a student.

Key Attribute

The key attribute is used to represent the main characteristics of an entity. It represents a primary key.
The key attribute is represented by an ellipse with the text underlined.

Composite Attribute

An attribute that composed of many other attributes is known as a composite attribute. Thecomposite
attribute is represented by an ellipse, and those ellipses are connected with an ellipse.

Multivalued Attribute

An attribute can have more than one value. These attributes are known as a multivaluedattribute. The
double oval is used to represent multivalued attribute.
For example, a student can have more than one phone number.
Derived Attribute
An attribute that can be derived from other attribute is known as a derived attribute. It can berepresented by a
dashed ellipse.
For example, a person's age changes over time and can be derived from another attributelike Date of birth.

3. Relationship
A relationship is used to describe the relation between entities. Diamond or rhombus is usedto represent
the relationship.

Types of relationship are as follows:

One-to-One Relationship
When only one instance of an entity is associated with the relationship, then it is known asone to one
relationship.

For example, A female can marry to one male, and a male can marry to one female.
One-to-many relationship
When only one instance of the entity on the left, and more than one instance of an entity onthe right
associates with the relationship, this is known as a one-to-many relationship.
For example, Scientist can invent many inventions, but the invention is done by the only specific
scientist.

Many-to-one relationship
When more than one instance of the entity on the left, and only one instance of an entity onthe right
associates with the relationship, it is known as a many-to-one relationship.
For example, Student enrolls for only one course, but a course can have many students.

Many-to-many relationship
When more than one instance of the entity on the left, and more than one instance of an entity on the
right associates with the relationship, it is known as a many-to-many relationship.

For example, Employee can assign by many projects and project can have many employees.

Notation of ER diagram:
Database can be represented using the notations. In ER diagram, many notations are used to express the
cardinality. These notations are as follows:
Integrity Constraints
o Integrity constraints are a set of rules. It is used to maintain the quality of information.
o Integrity constraints ensure that the data insertion, updating, and other processes haveto be
performed in such a way that data integrity is not affected.
o Thus, integrity constraint is used to guard against accidental damage to the database.

Types of Integrity Constraints:


1. Domain constraints
o Domain constraints can be defined as the definition of a valid set of values for an attribute.
o The data type of domain includes string, character, integer, time, date, currency, etc. The value of the
attribute must be available in the corresponding domain.

Example:

2. Entity integrity constraints

o The entity integrity constraint states that primary key value can't be null.
o This is because the primary key value is used to identify individual rows in relationand if the primary key
has a null value, then we can't identify those rows.
o A table can contain a null value other than the primary key field.

3. Referential Integrity Constraints


A referential integrity constraint is specified between two tables.
In the Referential integrity constraints, if a foreign key in Table 1 refers to the Primary Key of Table 2, then
every value of the Foreign Key in Table 1 must be nullor be available in Table 2.
4. Key constraints
o Keys are the entity set that is used to identify an entity within its entity set uniquely.
o An entity set can have multiple keys, but out of which one key will be the primary key. A primary key can
contain a unique and null value in the relational table.

Database Basics:
Data item:
The data item is also called as field in data processing and is the smallest unit of datathat has meaning to its
users.
Eg: “e101”,”sumit”

Entities and attributes:


An entity is a thing or object in the real world that is distinguishable from all other objects
Eg: Bank, employee, student
Attributes are properties are properties of an entity.
Eg: Empcode, ename, rollno, name
Logical data and physical data:
Logical data are the data for the table created by user in primary memory. Physical data refers to the data stored
in the secondary memory.

Schema and sub-schema:


A schema is a logical database description and is drawn as a chart of the types of data that are used. It gives the
names of the entities and attributes and specifies the relationships between them.

A database schema includes such information as:

 Characteristics of data items such as entities and attributes.


 Logical structures and relationships among these data items.
 Format for storage representation.
 Integrity parameters such as physical authorization and back up policies.

A subschema is derived schema derived from existing schema as per the user requirement. There may be more
than one subschema creates for a single conceptual schema.

Three level architecture of DBMS :

External level View View View


user1 User2 User n

Conceptual level Mapping supplied by DBMS

Conceptual view

Physical Mapping supplied by DBMS/OS

Internal level
A database management system that provides three level of data is said to follow three-level architecture.

 External level
 Conceptual level
 Internal level

The external level is at the highest level of database abstraction. At this level, there will be many views define
for different users requirement. A view will describe only a subset of the database. Any number of user views
may exist for a given global or subschema.
For example, each student has different view of the time table. The view of a student of Btech (CSE) is different
from the view of the student of Btech(ECE).Thus this level of abstraction is concerned with different categories
of users. Each external view is described by means of a schema called schema or schema.

Conceptual level:
At this level of database abstraction all the database entities and the relationships among them are included.
One conceptual view represents the entire database. This conceptual view is defined by the conceptual schema.
The conceptual schema hides the details of physical storage structures and concentrate on describing entities,
data types, relationships, user operations and constraints.
It describes all the records and relationships included in the conceptual view. There is only one conceptual
schema per database. It includes feature that specify the checks to relation data consistency and integrity.

Internal level:
It is the lowest level of abstraction closest to the physical storage method used . It indicates how the data will
be stored and describes the data structures and access methods to be used by the database . The internal view is
expressed by internal schema.
The following aspects are considered at this level:
1. Storage allocation e.g: B-tree,hashing
2. access paths eg. specification of primary and secondary keys,indexes etc
3. Miscellaneous eg. Data compression and encryption techniques,optimization of the internal structures.

Database users :
Naive users:
Users who need not be aware of the presence of the database system or any other system supporting their usage
are considered naïve users. A user of an automatic teller machine falls on this category.
.
Online users:
These are users who may communicate with the database directly via an online terminal or indirectly via a user
interface and application program. These users are aware of the database system and also know the data
manipulation language system.

Application programmers:
Professional programmers who are responsible for developing application programs or user interfaces utilized
by the naïve and online user fall into this category.

Database Administration:
A person who has central control over the system is called database administrator.
The function of DBA are:
1. Creation and modification of conceptual Schema definition
2. Implementation of storage structure and access method.
3. Schema and physical organization modifications.
4. Granting of authorization for data access.
5. Integrity constraints specification.
6. Execute immediate recovery procedure in case of failures.
7. Ensure physical security to database.

Database language :

1) Data definition language(DDL) :


DDL is used to define database objects .The conceptual schema is specified by a set of definitions expressed by
this language. It also gives some details about how to implement this schema in the physical devices used to
store the data. This definition includes all the entity sets and their associated attributes and their relationships.
The result of DDL statements will be a set of tables that are stored in special file called data dictionary.

2) Data manipulation language(DML) :


A DML is a language that enables users to access or manipulate data stored in the database. Data manipulation
involves retrieval of data from the database, insertion of new data into the database and deletion of data or
modification of existing data.

There are basically two types of DML:


 Procedural: It requires a user to specify what data is needed and how to get it.
 Non-procedural: which requires a user to specify what data is needed without specifying how to get it.

3) Data control language(DCL):


This language enables user to grant authorization and cancelingauthorization of database objects.

Elements of DBMS:

DML pre-compiler:
It converts DML statement embedded in an application program to normal procedure calls in the host language.
The pre-complier must interact with the query processor in order to generate the appropriate code.
DDL compiler:
The DDL compiler converts the data definition statements into a set of tables. These tables contain information
concerning the database and are in a form that can be used by other components of the dbms.
File manager:
File manager manages the allocation of space on disk storage and the data structure used to represent
information stored on disk.
Database manager:
A database manager is a program module which provides the interface between the low level data stored in the
database and the application programs and queries submitted to the system.
The responsibilities of database manager are:
1. Interaction with file manager: The data is stored on the disk using the file system which is provided by
operating system. The database manager translates the different DML statements into low-level file system
commands, so database manager is responsible for the actual storing, retrieving and updating of data in the
database.
2. Integrity enforcement: The data values stored in the database must satisfy certain constraints (eg: the age
of a person can't be less than zero).These constraints are specified by DBA. Data manager checks the
constraints and ifit satisfies then it stores the data in the database.
3. Security enforcement: Data manager checks the security measures for database from unauthorized users.
4. Backup and recovery: Database manager detects the failures occurs due to different causes (like disk
failure, power failure, deadlock, s/w error) and restores the database to original state of the database.
5. Concurrency control: When several users access the same database file simultaneously, there may be
possibilities of data inconsistency. It is responsible of database manager to control the problems occurs for
concurrenttransactions.

Query processor:
The query processor used to interpret to online user’s query and convert it into an efficient series of operations
in a form capable of being sent to the data manager for execution. The query processor uses the data dictionary
to find the details of data file and using this information it create query plan/access plan to execute the query.

Data Dictionary:
Data dictionary is the table which contains the information about database objects. It contains information like
1. External, conceptual and internal database description
2. Description of entities , attributes as well as meaning of data elements
3. Synonyms, authorization and security codes
4. Database authorization

The data stored in the data dictionary is called metadata.

DBMS STRUCTURE:

Naïve user Application On line user DBA


programers

Application System calls Ddl compiler


programs

Application prog Dml precomplier Query processor Ddl compiler


obj code

Database manager

File manager

DBMS

Data file

Data dictionary

Q. List four significant differences between a file-processing system and a DBMS.

Some main differences between a database management system and a file-processing system are:
• Both systems contain a collection of data and a set of programs which access that data. A database
management system coordinates both the physical and the logical
access to the data, whereas a file-processing system coordinates only the physical access.
• A database management system reduces the amount of data duplication by ensuring that a physical piece of
data is available to all programs authorized to have access to it, where as data written by one program in a file-
processing system may not be readable by another program.
• A database management system is designed to allow flexible access to data (i.e., queries), whereas a file-
processing system is designed to allow predetermined access to data (i.e., compiled programs).
• A database management system is designed to coordinate multiple users accessing the same data at the
same time. A file-processing system is usually designed to allow one or more programs to access different data
files at the same time. In a file-processing system, a file can be accessed by two programs concurrently only if
both programs have read-only access to the file.

Q. Explain the difference between physical and logical data independence.


• Physical data independence is the ability to modify the physical scheme without making it necessary to
rewrite application programs. Such modifications include changing from unblocked to blocked record storage,
or from sequential to randomaccess files.
• Logical data independence is the ability to modify the conceptual scheme without making it necessary to
rewrite application programs. Such a modification might be adding a field to a record; an application program’s
view hides this change from the program.

Q. List five responsibilities of a database management system. For each responsibility, explain the
problems that would arise if the responsibility were not discharged.
A general purpose database manager (DBM) has five responsibilities:
a. Interaction with the file manager.
b. Integrity enforcement.
c. Security enforcement.
d. Backup and recovery.
e. Concurrency control.
If these responsibilities were not met by a given DBM (and the text points out that sometimes a responsibility is
omitted by design, such as concurrency control on a single-user DBM for a micro computer) the following
problems can occur, respectively:

a. No DBMS can do without this, if there is no file manager interaction then nothing stored in the files can be
retrieved.
b. Consistency constraints may not be satisfied, account balances could go below the minimum allowed,
employees could earn too much overtime (e.g.,hours > 80) or, airline pilots may fly more hours than allowed by
law.
c. Unauthorized users may access the database, or users authorized to access part of the database may be able
to access parts of the database for which they lack authority. For example, a high school student could get
access to national defense secret codes, or employees could find out what their supervisors earn.
d. Data could be lost permanently, rather than at least being available in a consistent state that existed prior to
a failure.
e. Consistency constraints may be violated despite proper integrity enforcement in each transaction. For
example, incorrect bank balances might be reflected due to simultaneous withdrawals and deposits, and so on.

Q. What are five main functions of a database administrator?

Five main functions of a database administrator are:


 To create the scheme definition
 To define the storage structure and access methods
 To modify the scheme and/or physical organization when necessary
 To grant authorization for data access
 To specify integrity constraints
Q. List six major steps that you would take in setting up a database for a particularenterprise.
Six major steps in setting up a database for a particular enterprise are:
 Define the high level requirements of the enterprise (this step generates a document known as the system
requirements specification.)
 Define a model containing all appropriate types of data and datarelationships.
 Define the integrity constraints on the data.
 Define the physical level.
 For each known problem to be solved on a regular basis (e.g., tasks to be carried out by clerks or Web
users) define a user interface to carry out the task, and write the necessary application programs to
implement the user interface.
 Create/initialize the database.

EXERCISES:

1. What is database management system


2. What are the disadvantage of file processing system
3. State advantage and disadvantage of database management system
4. What ate different types of database users
5. What is data dictionary and what are its contents
6. What are the function of DBA
7. What are the different database languages explain with example.
8. Explain the three layer architecture of DBMS.
9. Differentiate between physical data independence and logical data independence
10. Explain the function of data base manager
11. Explain meta data

ER-MODEL
Data model:
The data model describes the structure of a database. It is a collection of conceptual tools for describing data,
data relationships and consistency constraints and various types of data model such as
• Object based logical model
• ER-model
• Functional model
• Object oriented model
• Semantic model
• Record based logical model
• Hierarchical database model
• Network model
• Relational model
• Physical model

The entity-relationship data model perceives the real world as consisting of basic objects, called entities and
relationships among these objects. It was developed to facilitate data base design by allowing specification of an
enterprise schema which represents the overall logical structure of a data base.

Main features of ER-MODEL:


 Entity relationship model is a high level conceptual model
 It allows us to describe the data involved in a real world enterprise in terms ofobjects and their relationships.
 It is widely used to develop an initial design of a database.
 It provides a set of useful concepts that make it convenient for a developer to move from a based set of
information to a detailed and description of information that can be easily implemented in a database system.
 It describes data as a collection of entities, relationships and attributes.

The E-R data model employs three basic notions : entity sets, relationship sets andattributes.

Entity sets:
An entity is a “thing” or “object” in the real world that is distinguishable from all other objects. For example,
each person in an enterprise is an entity. An entity has a set properties and the values for some set of properties
may uniquely identify an entity.
BOOK is entity and its properties (called as attributes) bookcode, booktitle, price etc.
An entity set is a set of entities of the same type that share the same properties, or attributes. The set of all
persons who are customers at a given bank, for example, can be defined as the entity set customer.

Attributes:
An entity is represented by a set of attributes. Attributes are descriptive propertiespossessed by each member of
an entity set.

Customer is an entity and its attributes are customerid, custmername, custaddress etc.
An attribute as used in the E-R model , can be characterized by the following attributetypes.

Simple and composite attribute:


simple attributes are the attributes which can’t be divided into sub parts eg: customerid,empno
composite attributes are the attributes which can be divided into subparts. eg: name consisting of first name,
middle name, last name address consisting of city,pincode,state

Single-valued and multi-valued attribute:


The attribute having unique value is single –valued attributeeg: empno,customerid,regdno etc.
The attribute having more than one value is multi-valued attributeeg: phone-no, dependent name, vehicle

Derived Attribute:
The values for this type of attribute can be derived from the values of existingattributes
eg:age which can be derived from (currentdate-birthdate) experience_in_year can be calculated as (currentdate-
joindate)

NULL valued attribute:


The attribute value which is unknown to user is called NULL valued attribute.

Relationship sets:
A relationship is an association among several entities.
A relationship set is a set of relationships of the same type. Formally, it is a mathematical relation on n>=2
entity sets. If E1,E2…En are entity sets, then a relationship set R is a subset of
{(e1,e2,…en)|e1Є E1,e2 Є E2..,en Є En}where (e1,e2,…en) is a relationship.

borrow loan
customer
Consider the two entity sets customer and loan. We define the relationship set borrow to denote the association
between customers and the bank loans that the customers have.

More about entities and Relationship:


Recursive relationships:
When the same entity type participates more than once in a relationship type in different roles, the relationship
type is called recursive relationship.

Participation constraints:
The participation constraints specify whether the existence of any entity depends on its being related to another
entity via the relationship. There are two types of participation constraints

Total :
When all the entities from an entity set participate in a relationship type , is called total participation. For
example, the participation of the entity set student on the relationship set must ‘opts’ is said to be total because
every student enrolled must opt for a course.

Partial:
When it is not necessary for all the entities from an entity set to particapte ion a relationship type, it is called
participation. For example, the participation of the entity set student in ‘represents’ is partial, since not every
student in a class is a class representative.

Weak Entity:
Entity types that do not contain any key attribute, and hence cannot be identified independently are called weak
entity types. A weak entity can be identified by uniquely only by considering some of its attributes in
conjunction with the primary key attribute of another entity, which is called the identifying owner entity.
Generally a partial key is attached to a weak entity type that is used for unique identification of weak entities
related to a particular owner type. The following restrictions must hold:
The owner entity set and the weak entity set must participate in one to many relationship set. This relationship
set is called the identifying relationship set of the weak entity set.
The weak entity set must have total participation in the identifying relationship.

Example:
Consider the entity type dependent related to employee entity, which is used to keep track of the dependents of
each employee. The attributes of dependents are: name, birthrate, sex and relationship. Each employee entity set
is said to its own the dependent entities that are related to it. However, not that the ‘dependent’ entity does not
exist of its own, it is dependent on the employee entity. In other words we can say that in case an employee
leaves the organization all dependents related to without the entity ‘employee’. Thus it is a weak entity.

Keys:
Super key:
A super key is a set of one or more attributes that taken collectively, allow us to identify uniquely an entity in the
entity set.
For example , customer-id,(cname,customer-id),(cname,telno)

Candidate key:
In a relation R, a candidate key for R is a subset of the set of attributes of R, whichhave the following properties:
Uniqueness:No two distinct tuples in R have the same values for the candidate key.
Irreducible:No proper subset of the candidate key has the uniqueness property that is the candidate key.
Eg: (cname,telno)
Primary key:
The primary key is the candidate key that is chosen by the database designer as the principal means of
identifying entities with in an entity set. The remaining candidate keys (if any), are called alternate key.
.

Advanced ER-diagram:

Abstraction is the simplification mechanism used to hide superfluous details of a set of objects. It allows one to
concentrate on the properties that are of interest to the application.
There are two main abstraction mechanism used to model information:

Generalization and specialization:


Generalization is the abstracting process of viewing set of objects as a single general class by concentrating on
the general characteristics of the constituent sets while suppressing or ignoring their differences. It is the union
of a number of lower-level entity types for the purpose of producing a higher-level entity type. For instance,
student is a generalization of graduate or undergraduate, full-time or part-time students. Similarly, employee is
generalization of the classes of objects cook, waiter, and cashier. Generalization is an IS_A relationship;
therefore, manager IS_AN employee, cook IS_ANemployee, waiter IS_AN employee, and so forth.
Specialization is the abstracting process of introducing new characteristics to an existing class of objects to
create one or more new classes of objects. This involves taking a higher-level, and using additional
characteristics, generating lower-level entities. The lower-level entities also inherit the, characteristics of the
higher-level entity. In applying the characteristics size to car we can create a full-size, mid-size, compact or
subcompact car. Specialization may be seen as the reverse process of generalization addition specific properties
are introduced at a lower level in a hierarchy of objects.
.

Aggregation:
Aggregation is the process of compiling information on an object, there by abstracting a higher level object. In
this manner, the entity person is derived by aggregating the characteristics of name, address, ssn. Another form
of the aggregation is abstracting a relationship objects and viewing the relationship as an object.

Job

Branch
Employe
Work
son

Manages

Manager
ER- Diagram For College Database

rollno
name addres
coursei cname duratio

Student
opts Course
1
1

has enroll Taug


ed
1 N
1 Work fid

gaurdian Department dno Faculty addre

Head
name dnam 1 name sal
1
addres relationship

Date

Conversion of ER-diagram to relational database


Conversion of entity sets:
1. For each strong entity type E in the ER diagram, we create a relation R containing all the single attributes of
E. The primary key of the relation R will be one of the key attribute of R.

STUDENT(rollno (primary key),name, address) FACULTY(id(primary key),name ,address, salary)


COURSE(course-id,(primary key),course_name,duration) DEPARTMENT(dno(primary key),dname.

2. For each weak entity type W in the ER diagram, we create another relation R that contains all simple
attributes of W. If E is an owner entity of W then key attribute of E is also include In R. This key attribute of R
is set as a foreign key attribute of R. Now the combination of primary key attribute of owner entity type and
partial key of the weak entity type will form the key of the weak entity type.
GUARDIAN((rollno,name) (primary key),address,relationship)

Conversion of relationship sets:

Binary Relationships:

One-to-one relationship:
For each 1:1 relationship type R in the ER-diagram involving two entities E1 and E2 we choose one of
entities(say E1) preferably with total participation and add primary key attribute of another E as a foreign key
attribute in the table of entity(E1). We will also include all the simple attributes of relationship type R in E1 if
any, For example, the department relationship has been extended tp include head-id and attribute of the
relationship.

DEPARTMENT(D_NO,D_NAME,HEAD_ID,DATE_FROM)

One-to-many relationship:
For each 1:n relationship type R involving two entities E1 and E2, we identify the entity type (say E1) at the n-
side of the relationship type R and include primary key of the entity on the other side of the relation (say E2) as
a foreign key attribute in the table of E1. We include all simple attribute(or simple components of a composite
attribute of R(if any) in the table E1)
For example:
This works in relationship between the DEPARTMENT and FACULTY. For this relationship choose the entity
at N side, i.e, FACULTY and add primary key attribute of another entity DEPARTMENT, ie, DNO as a foreign
key attribute in FACULTY.

FACULTY(CONSTAINS WORKS_IN RELATIOSHIP)(ID,NAME,ADDRESS,BASIC_SAL,DNO)

Many-to-many relationship:
For each m:n relationship type R, we create a new table (say S) to represent R, Wealso include the primary key
attributes of both the participating entity types as a foreign key attribute in s. Any simple attributes of the m:n
relationship type(or simple components as a composite attribute) is also included as attributes of S. For
example:
The M:n relationship taught-by between entities COURSE; and FACULTY shod be represented as a new table.
The structure of the table will include primary key of COURSE and primary key of FACULTY entities.
TAUGHT-BY(ID (primary key of FACULTY table),course-id (primary key of COURSE table)

N-ary relationship:
For each n-anry relationship type R where n>2, we create a new table S to represent R, We include as foreign
key attributes in s the primary keys of the relations that represent the participating entity types. We also include
any simple attributes of the n-ary relationship type(or simple components of complete attribute) as attributes of
S. The primary key of S is usually a combination of all the foreign keys that reference the relations representing
the participating entity types.
Multi-valued attributes:
For each multivalued attribute ‘A’, we create a new relation R that includes an attribute corresponding to plus
the primary key attributes k of the relation that represents the entity type or relationship that has as an attribute.
The primary key of R is then combination of A and k.
For example, if a STUDENT entity has rollno,name and phone number where phone numer is a multivalued
attribute the we will create table PHONE(rollno,phoneno) where primary key is the combination,In the
STUDENTtable we need not have phone number, instead if can be simply (rollno,name) only.
PHONE(rollno,phoneno)
.

Account_n name
Account
branch
generalization

specialization
Is-a specialization

intrest
charges

Saving
Current

Converting Generalisation /specification hierarchy to tables:


A simple rule for conversion may be to decompose all the specialized entities into table in case they are disjoint,
for example, for the figure we can create the two table as:
Account(account_no,name,branch,balance) Saving account(account-no,intrest) Current_account(account-
no,charges)

Record Based Logical Model

Hierarchical Model:

 A hierarchical database consists of a collection of records which are connected toone another through links.
 a record is a collection of fields, each of which contains only one data value.
 A link is an association between precisely two records.
 The hierarchical model differs from the network model in that the records areorganized as collections of
trees rather than as arbitrary graphs.

Tree-Structure Diagrams:
 The schema for a hierarchical database consists of
o boxes, which correspond to record types
o lines, which correspond to links
 Record types are organized in the form of a rooted tree.
o No cycles in the underlying graph.
o Relationships formed in the graph must be such that only
one-to-many or one-to-one relationships exist between a parent and achild.

Database schema is represented as a collection of tree-structure diagrams.


 single instance of a database tree
 The root of this tree is a dummy node
 The children of that node are actual instances of theappropriate record type
When transforming E-R diagrams to corresponding tree-structure diagrams, we must ensure that the resulting
diagrams are in the form of rooted trees.

Single Relationships:

 Example E-R diagram with two entity sets, customer and account, related througha binary, one-to-many
relationship depositor.
 Corresponding tree-structure diagram has
o the record type customer with three fields: customer-name, customer-street, and customer-city.
o the record type account with two fields: account-number and balance
o the link depositor, with an arrow pointing to customer

 If the relationship depositor is one to one, then the link depositor has two arrows.

 Only one-to-many and one-to-one relationships can be directly represented in thehierarchical mode.

Transforming Many-To-Many Relationships:

 Must consider the type of queries expected and the degree to which the databaseschema fits the given E-R
diagram.
 In all versions of this transformation, the underlying database tree (or trees) willhave replicated records.
 Create two tree-structure diagrams, T1, with the root customer, and T2, withthe root account.
 In T1, create depositor, a many-to-one link from account to customer.
 In T2, create account-customer, a many-to-one link from customer to account.

Virtual Records:
 For many-to-many relationships, record replication is necessary to preserve the tree-structure organization
of the database.
o Data inconsistency may result when updating takes place
o Waste of space is unavoidable
 Virtual record — contains no data value, only a logical pointer to a particular physical record.
 When a record is to be replicated in several database trees, a single copy of that record is kept in one of the
trees and all other records are replaced with a virtual record.
 Let R be a record type that is replicated in T1, T2, . . ., Tn. Create a new virtual record type virtual-R and
replace R in each of the n – 1 trees with a record of type virtual-R.
 Eliminate data replication in the diagram shown on page B.11; create virtual- customer and virtual-account.
 Replace account with virtual-account in the first tree, and replace customer with
virtual-customer in the second tree.
 Add a dashed line from virtual-customer to customer, and from virtual-account to account, to specify the
association between a virtual record and its corresponding physical record.
Network Model:
 Data are represented by collections of records.
o similar to an entity in the E-R model
o Records and their fields are represented as record type
 Type customer = record type account = record type
customer-name: string; account-number: integer;
customer-street: string; balance: integer;
customer-city: string;
 Relationships among data are represented by links
o similar to a restricted (binary) form of an E-R relationship
o restrictions on links depend on whether the relationship is many-many, many-to-one, or one-to-one.
Data-Structure Diagrams:
 Schema representing the design of a network database.
 A data-structure diagram consists of two basic components:
o Boxes, which correspond to record types.
o Lines, which correspond to links.
 Specifies the overall logical structure of the database.

For every E-R diagram, there is a corresponding data-structure diagram.

Since a link cannot contain any data value, represent an E-R relationship withattributes with a new record type
and links.
To represent an E-R relationship of degree 3 or higher, connect the participating record types through a new
record type that is linked directly to each of the originalrecord types.

1. Replace entity sets account, customer, and branch with record types account, customer, and branch,
respectively.
2. Create a new record type Rlink (referred to as a dummy record type).
3. Create the following many-to-one links:
o CustRlink from Rlink record type to customer record type
o AcctRlnk from Rlink record type to account recordtype
o BrncRlnk from Rlink record type to branch recordtype
The DBTG CODASYL Model:
o All links are treated as many-to-one relationships.
o To model many-to-many relationships, a record type is defined to represent therelationship and two links are
used.

DBTG Sets:

o The structure consisting of two record types that are linked together is referredto in the DBTG model as a
DBTG set
o In each DBTG set, one record type is designated as the owner, and the other isdesignated as the member, of
the set.
o Each DBTG set can have any number of set occurrences (actual instances oflinked records).
o Since many-to-many links are disallowed, each set occurrence has preciselyone owner, and has zero or more
member records.
o No member record of a set can participate in more than one occurrence of theset at any point.
o A member record can participate simultaneously in several set occurrences of
different DBTG set.
Class hierarchy
Class hierarchy can be viewed one of two ways:-
 Specialization (Top Down Approach)
 Generalization (Bottom Up Approach)

Specialization
Specialization is a process of identifying subsets of an entity that shares different characteristics. It breaks
an entity into multiple entities from higher level (super class) to lower level (subclass).
Specialization is the process of defining one or more entities from present entity.

The class vehicle can be specialized into Car, Truck and Motorcycle ( Top Down Approach)

Hence, vehicle is the superclass and Car, Truck, Motorcycle are subclasses. All three of these inherit
attributes from vehicle. Moreover, these three share those attributes among themselves while containing
some other attributes which make them different.

Generalization
Generalization is a process of generalizing an entity which contains generalized attributes or properties of
generalized entities. The entity that is created will contain the common features. Generalization is a Bottom
up process. Generalization is the higher level of understanding of data from lower level of data.

The classes Car, Truck and motorcycle can be generalised into Vehicle. (Bottom Up Approach). Car, Truck
and Motorcycle are subclasses while vehicle is the superclass.

Basically, Vehicle contains the common attributes that were shared between Car, Truck and Motorcycle.

Aggregation

Aggregation is the process of combining two or more entities.


Sub classes
A subclass is a class derived from the superclass. It inherits the properties of the superclass and also contains
attributes of its own.

Example: Car, Truck and Motorcycle are all subclasses of the superclass Vehicle. They all inherit common
attributes from vehicle such as speed, colour etc. while they have different attributes also i.e Number of wheels
in Car is 4 while in Motorcycle is 2.

Super classes
A superclass is the class from which many subclasses can be created. The subclasses inherit the characteristics
of a superclass. The superclass is also known as the parent class or base class.

In the above example, Vehicle is the Superclass and its subclasses are Car, Truck and Motorcycle.

Inheritance
Inheritance is basically the process of basing a class on another class i.e to build a class on a existing class.
The new class contains all the features and functionalities of the old class in addition to its own.

The class which is newly created is known as the subclass or child class and the original class is the parent
class or the superclass.

 Attribute inheritance: allows lower level entities to inherit the attributes of higher
level entities and vice versa.

in diagram: Car entity is an inheritance of Vehicle entity, So Car can acquire


attributes of Vehicle example: car can acquire Model attribute of Vehicle.

 Participation inheritance: In participation inheritance, relationships involving


higher level entity set also inherited by lower level entity and vice versa.

in diagram: Vehicle entity has an relationship with Cycle entity ,So Cycle entity can
acquire attributes of lower level entities i.e Car and Bus since it is inheritance
of Vehicle.
Examples of Entity vs Attribute:

Example 1: Use of entity sets vs attributes

Use of phone as an entity allows extra information about phone numbers (plus multiple phone
numbers)

Example 2:
Should address can be an attribute of Employees or an Entity (connected to Employees by a
relationship) ?
 It is depends upon the use we want to make of address information, and the
semantics of the data.
 If we have several addresses per employee, address must be an Entity (since
attributes cannot be set-valued).
 And if the structure (city, street, etc.) is important, e.g., we want to retrieve
employees in a given city, then address must be modelled as an entity (since
attribute values are atomic).

Example3 :
Works_In2 does not allow an employee to work in a department for two or more periods. This
is similar to the problem of wanting to record of several addresses for an employee.
we want to record several values of the descriptive attributes for each instance of this
relationship.
Examples of Entity vs. Relationship
First ER diagram OK if a manager gets a separate discretionary budget for each dept.
What if a manager gets a discretionary budget that covers all managed depts? –
Redundancy: dbudget stored for each dept managed by manager.
Misleading: Suggests dbudget associated with department-mgr combination.
Degree of Relationship
In DBMS, a degree of relationship represents the number of entity types that are associated with a
relationship.

 In a unary relationship, only one entity is involved. Here, the degree of relationship is 1. The unary
relationship is also known as a recursive relationship.

 In a binary relationship, there are two entities involved. The degree of relationship is 2.

 In a ternary relationship, there are three entities involved. The degree of relationship is 3.

 In an N-ary relationship, there is an n number of involved entities. The degree of relationship is


'n'.
Binary Vs Ternary Relationships:
Generally, the relationships described in the databases are binary relationships. However,
ternary relationships can be represented by several binary relationships.
It is possible to replace any ternary (for n > 2) relationship set by a number of distinct binary
relationship sets But a ternary relationship set shows more clearly that several entities
participate in a single relationship.
Some relationships that appear to be ternary may be better represented using binary
relationships.
But there are some relationships that are naturally non-binary; Example: project - guide.

For example
we can create and represent a ternary relationship 'parent' that may relate to a child, his father,
as well as his mother. Such relationship can also be represented by two binary relationships i.e,
mother and father, that may relate to their child. Thus, it is possible to represent a ternary
relationship by a set of distinct binary relationships.
Example
If each policy is owned by just 1 employee: – Key constraint on Policies would mean policy
can only cover 1 dependent.
There are the additional constraints in the 2nd diagram.
Aggregation Vs Ternary Relationships
explain when to use aggregation versus ternary relationship. In short, each Project entity is sponsored by
one or more Department entities and each Department can sponsor zero, one or more Projects.
Each Sponsorship relationship has a Monitors relationship, which connects Employees with Sponsorship.
This can be expressed in 2 ER diagrams:
Now, we want to express an additional constraint that each Sponsorship relationship is monitored by at
most one Employee that this cannot be done with ternary relationship.

In a aggregation relationship, a key constraint ("each Sponsorship relationship is monitored by at most


one Employee) with a participation constraint ("each project must be sponsored by at least one
department," indicated by the bold line).
In a ternary relationship, a total participation of the Project entity set, while also trying to impose a one-
to-many relationship between Employees and (Project, Department) tuples. The ER diagram is not
equipped to handle this in a ternary relationship -- how would it know whether the one-to-many
relationship was between Employees and Projects, Employees and Departments, or Employees and
(Project, Department) tuples?

Using Aggregation relationships is sometimes more better than Ternary Relationships.

You might also like