Introduction To Database Management System-Notes
Introduction To Database Management System-Notes
Unit 1
Definition of Data: Data, we mean known facts that can be recorded and that have implicit
meaning. For example, consider the names, telephone numbers, and addresses of the people you
know.
Database Management System (DBMS) is a combination of two words that is database &
management system. Combining the meaning of both gives the definition of DBMS.
A database management system (DBMS) is a collection of programs that enables users to create
and maintain a database. The DBMS is hence a general-purpose software system that facilitates
the processes of defining, constructing, manipulating, and sharing databases among various
users and applications. Defining a database involves specifying the data types, structures, and
constraints for the data to be stored in the database.Constructing the database is the process of
storing the data itself on some storage medium that is controlled by the DBMS. Manipulating a
database includes such functions as querying the database to retrieve specific data, updating the
database to reflect changes in the miniworld, and generating reports from the data. Sharing a
database allows multiple users and programs to access the database concurrently.
Unit 1 2
Need of Database
• Database systems are basically developed for large amount of data. When dealing with huge amount of
data, there are two things that require optimization: Storage of data and retrieval of data.
• Storage: According to the principles of database systems, the data is stored in such a way that it acquires
lot less space as the redundant data (duplicate data) has been removed before storage. Let’s take a
layman example to understand this:
In a banking system, suppose a customer is having two accounts, one is saving account and another is
salary account. Let’s say bank stores saving account data at one place and salary account data at another
place, in that case if the customer information such as customer name, address etc. are stored at both
places then this is just a wastage of storage (redundancy/ duplication of data), to organize the data in a
better way the information should be stored at one place and both the accounts should be linked to that
information somehow. The same thing we achieve in DBMS.
• Fast Retrieval of data: Along with storing the data in an optimized and systematic manner,
it is also important that we retrieve the data quickly when needed. Database systems
ensure that the data is retrieved as quickly as possible.
•Manufacturing: For management of supply chain and for tracking production of items in
factories, inventories of items in warehouses/stores, and orders for items.
•Human resources: For information about employees, salaries, payroll taxes and
benefits, and for generation of paychecks.
1)Data redundancy and inconsistency: Since different programmers create the files and
application programs over a long period, the various files are likely to have different formats and
the programs may be written in several programming languages. Moreover, the same information
may be duplicated in several places (files). For example, the address and telephone number of a
particular customer may appear in a file that consists of savings-account records and in a file that
consists of checking-account records. This redundancy leads
to higher storage and access cost. In addition, it may lead to data inconsistency; that is, the
various copies of the same data may no longer agree. For example, a changed customer address
may be reflected in savings-account records but not elsewhere in the system.
2) Difficulty in accessing data: conventional file-processing environments do not allow
needed data to be retrieved in a convenient and efficient manner .Suppose that one of the bank
officers needs to find out the names of all customers who live within a particular postal-code
area. The officer asks the data-processing department to generate such a list. Because the
designers of the original system did not anticipate this request, there is no application program on
hand to meet it. There is, however, an application program to generate the list of all customers.
The bank officer has now two choices: either obtain the list of all customers manually and extract
the needed information manually or ask a system programmer to write the necessary application
program. Both alternatives are obviously unsatisfactory.
3)Data isolation. Because data are scattered in various files, and files may be in different
formats, writing new application programs to retrieve the appropriate data is difficult.
4)Integrity problems. The data values stored in the database must satisfy certain
types of consistency constraints. For example, the balance of a bank account
may never fall below a prescribed amount (say, $25).
5) Atomicity problems. A computer system, like any other mechanical or electrical device, is
subject to failure. In many applications, it is crucial that, if a failure occurs, the data be restored
to the consistent state that existed prior to the failure. Consider a program to transfer $50 from
account A to account B. If a system failure occurs during the execution of the program, it is
possible that the $50 was removed from account A but was not credited to account B, resulting in
an inconsistent database state. That is, the funds transfer must be atomic—it must happen in its
entirety or not at all. It is difficult to ensure atomicity in a conventional file-processing system.
6)Concurrent-access anomalies. For the sake of overall performance of the system and faster
response, many systems allow multiple users to update the data simultaneously. In such an
Unit 1 4
environment, interaction of concurrent updates may result in inconsistent data. To guard against
this possibility, the system must maintain some form of supervision. But supervision is difficult
to provide because data may be accessed by many different application programs.
7) Security problems. Not every user of the database system should be able to access all the
data. For example, in a banking system, payroll personnel need to see only that part of the
database that has information about the various bank employees. They do not need access to
information about customer accounts.But, since application programs are added to the system in
an ad hoc manner, enforcing such security constraints is difficult.
Advantages of DBMS:
The following advantages perform by the DMBS.
(1)Reduce Randomly (Duplication): Centralizes control of data by the DBA avoid a necessary
duplication of data & effectively reduce the total amount of data storage.
(2)Shared data: The data base allows the sharing of data under it control by any number of
application programs on users.
(3)Integrity: Centralizes control can also insure that they are incorporate in DBMS to provide
data integrity that is data available on single system & access by many people.
(4)Security: Data is very important to and organization & may be confidential. Such confidential
data must not be accessible by unauthorized user. The DBA who has the altimeters possibilities
for the data in the DBMS can ensure that proper access procedure including authentication.
DBMS check the permission before provide any access to other users.
(5)Conflict Regulations: Since the data base is under the control of DBA, Hence, various user
can not access any data without the permission. < Logging name, Password >.
(6)Data independent: Data can be a physical or logical both data are independent so that change
occur in hardware or software can not affect the access of data.
Disadvantages of DBMS:
1)The cost of purchasing & developing is more because it is more expansive than other
applications
2)backup and recovery operation are complex..
3)more workspace is required for its execution and storage.
4)excessive data entries may currupt the total data.
Functions of DBMS:
1)addition of new data.
2)sorting of data.
3) searching particular data.
4)printing particular data.
5)editing or changing sorted data.
6)deleting data.
Unit 1 5
DIFFERENCE BETWEEN FILE SYSTEM & DBMS
FILE SYSTEM DBMS
Data Abstraction:
For the system to be usable, it must retrieve data efficiently. The need for efficiency has led
designers to use complex data structures to represent data in the database. Since many database-
systems users are not computer trained, developers hide the complexity from users through
several levels of abstraction, to simplify users’ interactions with the system:
•Physical level- The lowest level of abstraction describes how the data are actually stored. The
physical level describes complex low-level data structures in detail.
•Logical level- The next-higher level of abstraction describes what data are stored in the
database, and what relationships exist among those data. Database administrators, who must
decide what information to keep in the database, use the logical level of abstraction.
•View level- The highest level of abstraction describes only part of the entire database. Even
though the logical level uses simpler structures, complexity remains because of the variety of
information stored in a large database. Many users of the database system do not need all this
information; instead, they need to access only a part of the database. The view level of
Unit 1 6
abstraction exists to simplify their interaction with the system. The system may provide many
views for the same database.
Figure 1.1 shows the relationship among the three levels of abstraction.
Eg.An analogy to the concept of data types in programming languages may clarify the
distinction among levels of abstraction. Most high-level programming languages support the
notion of a record type. For example, in a Pascal-like language, we may declare a record as
follows:
type customer = record
customer-id : string;
customer-name : string;
customer-street : string;
customer-city : string;
end;
This code defines a new record type called customer with four fields. Each field has a name and
a type associated with it. A banking enterprise may have several such record types, including
• account, with fields account-number and balance
• employee, with fields employee-name and salary
*At the physical level, a customer, account, or employee record can be described as a block of
consecutive storage locations (for example, words or bytes). The language compiler hides this
level of detail from programmers.
*At the logical level, each such record is described by a type definition, as in the previous code
segment, and the interrelationship of these record types is defined as well. Programmers using a
programming language work at this level of abstraction. Similarly, database administrators
usually work at this level of abstraction.
*Finally, at the view level, computer users see a set of application programs that hide details of
the data types. Similarly, at the view level, several views of the database are defined, and
database users see these views. In addition to hiding details of the logical level of the database,
the views also provide a security mechanism to prevent users from accessing certain parts of the
database. For example, tellers in a bank see only that part of the database that has information on
customer accounts; they cannot access information about salaries of employees.
DBMS Architecture
Database management systems architecture will help us understand the
components of database system and the relation among them.
The architecture of DBMS depends on the computer system on which it runs. For
example, in a client-server DBMS architecture, the database systems at server
machine can run several requests made by client machine.
Unit 1 7
DBMS Architecture
It would be time consuming, when there is huge number of users. All the
requests will be queued and handed one after another. Hence it will not
respond to multiple users at the same time.
This architecture would little cost effective.
Three tier architecture
The 3-Tier architecture contains another layer between the client and
server. In this architecture, client can't directly communicate with the
server.
The application on the client-end interacts with an application server
which further communicates with the database system.
End user has no idea about the existence of the database beyond the
application server. The database also has no idea about any other user
beyond the application.
The 3-Tier architecture is used in case of large web application.
Advantages of 3-tier architecture:
PC databases
Centralized database
Client/server databases
Distributed databases
Database models
PC Databases
E.g.:
Access
FoxPro
Dbase
Etc.
Centralized Databases
Central
Computer
Client Server Databases
Client
Client
Network
Database
Server
Client
Distributed Databases
Location B
Location C
computer
computer
Homogeneous
computer Databases
Location A
Components of Database
Database Models
Three Types of Relationships
One-to-many relationships (1:M)
A painter paints many different paintings, but each one of them is painted by
only that painter.
PAINTER (1) paints PAINTING (M)
Many-to-many relationships (M:N)
An employee might learn many job skills, and each job skill might be learned by
many employees.
EMPLOYEE (M) learns SKILL (N)
One-to-one relationships (1:1)
Each store is managed by a single employee and each store manager
(employee) only manages a single store.
EMPLOYEE (1) manages STORE (1)
Database Models
A database model is a collection of logical constructs used
to represent the data structure and the data relationships
found within the database.
Disadvantages
System complexity
Lack of structural independence
Relational Database Model
Basic Structure
RDBMS allows operations in a human logical
environment.
The relational database is perceived as a collection of
tables.
Each table consists of a series of row/column
intersections.
Tables (or relations) are related to each other by sharing
a common entity characteristic.
The relationship type is often shown in a relational
schema.
A table yields complete data and structural
independence.
Linking Relational Tables
Relational Database Model
Advantages
Structural independence
Improved conceptual simplicity
Easier database design, implementation,
management, and use
Ad hoc query capability (SQL)
Powerful database management system
Disadvantages
Substantial hardware and system software
overhead
Possibility of poor design and implementation
Potential “islands of information” problems
Some Important Definations:
Database Schema :The description of a database is called the database schema, which is
specified during database design and is not expected to change frcquentlv.
Schemas Diagrams : Most data models have certain conventions for displaying schemas as
diagrams. A displayed schema is called a schema diagram.
Instances / Database State :The data in the database at a particular moment in time is called a
database state or snapshot. It is also called the current set of occurrences or instances in the
database.
Unit 1 36
FIGURE 2.2 The three-schema architecture
The goal of the three-schema architecture, illustrated in Figure 2.2, is to separate the user
applications and the physical database. In this architecture, schemas can be defined at the
following three levels:
1.The internal level has an internal schema, which describes the physical storage structure of the
database. The internal schema uses a physical data model and describes the complete details of
data storage and access paths for the database.
2.The conceptual level has a conceptual schema, which describes the structure of the whole
database for a community of users. The conceptual schema hides the details of physical storage
structures and concentrates on describing entities, data types, relationships, user operations, and
constraints. Usually, a representational data model is used to describe the conceptual schema
when a database system is implemented.
3.The external or view level includes a number of external schemas or user views. Each external
schema describes the part of the database that a particular user group is interested in and hides
the rest of the database from that user group.
*MAPPING : The processes of transforming requests and results between levels are called
mappings.
Unit 1 37
Data Independence:
The three-schema architecture can be used to further explain the concept of data independence.
Data Independence:Is the capacity to change the schema at one level of a database system
without having to change the schema at the next higher level. We can define two types of data
independence
1.Logical data independence is the capacity to change the conceptual schema without having
to change external schernas or application programs. We may change the conceptual schema to
expand the database (by adding a record type or data item), to change constraints, or to reduce
the database (by removing a record type or data item).
2.Physical data independence is the capacity to change the internal schema without having to
change the conceptual schema. Hence, the external schemas need not be changed as well.
Changes to the internal schema may be needed because some physical files had to be
reorganized-for example, by creating additional access structures-to improve the performance of
retrieval or update. If the same data as before remains in the database, we should not have to
change the conceptual schema.
Database Languages:
A database system provides a data definition language to specify the database schema and a data
manipulation language to express database queries and updates.
Data-Definition Language
-We specify a database schema by a set of definitions expressed by a special language called a
data-definition language (DDL).
For instance, the following statement in the SQL language defines the account table:
create table account
(account-number char(10),
balance integer)
Execution of the above DDL statement creates the account table. In addition, it updates a special
set of tables called the data dictionary or data directory.
Data-Manipulation Language
Data manipulation is
• The retrieval of information stored in the database
• The insertion of new information into the database
• The deletion of information from the database
• The modification of information stored in the database
-A data-manipulation language (DML) is a language that enables users to access or manipulate
data as organized by the appropriate data model.
There are basically two types:
• Procedural DMLs require a user to specify what data are needed and how to get those data.
•Declarative DMLs (also referred to as nonprocedural DMLs) require a user to specify what data
are needed without specifying how to get those data.
Unit 1 38
Database Users:
There are four different types of database-system users, differentiated by the way they expect to
interact with the system.
1)Naive users: are unsophisticated users who interact with the system by invoking one of the
application programs that have been written previously. Forexample, a bank teller who needs to
transfer $50 from account A to account B invokes a program called transfer. This program asks
the teller for the amount of money to be transferred, the account from which the money is to be
transferred, and the account to which the money is to be transferred.
2)Application programmers :are computer professionals who write application programs.
Application programmers can choose from many tools to develop user interfaces. Rapid
application development (RAD) tools are tools that enable an application programmer to
construct forms and reports without writing a program.
3)Sophisticated users: interact with the system without writing programs. Instead, they form
their requests in a database query language. They submiteach such query to a query processor,
whose function is to break down DML statements into instructions that the storage manager
understands. Analysts who submit queries to explore data in the database fall in this category.
4)Specialized users :are sophisticated users who write specialized databaseapplications that do
not fit into the traditional data-processing framework. Among these applications are computer-
aided design systems, knowledge base and expert systems, systems that store data with complex
data types (for example, graphics data and audio data), and environment-modeling systems.
Unit 1 39
Overall STRUCTURE of DBMS:
i) Storage Manager :
A storage manager is a program module that provides the interface between the low level
data stored in the database and the application programs and queries submitted to the system.
The storage manager is responsible for the interaction with the file manager. The raw data are
stored on the disk using the file system, which is usually provided by a conventional operating
system. The storage manager translates the various DML statements into low-level file-system
commands. Thus, the storage manager is responsible for storing, retrieving, and updating data in
the database.
The storage manager components include • Authorization and integrity manager-which tests for
the satisfaction of integrity constraints and checks the authority of users to access data.
•Transaction manager- which ensures that the database remains in a consistent (correct) state
despite system failures, and that concurrent transaction executions proceed without conflicting.
•File manager- which manages the allocation of space on disk storage and the data structures
used to represent information stored on disk.
•Buffer manager- which is responsible for fetching data from disk storage into main memory,
and deciding what data to cache in main memory. The buffer manager is a critical part of the
database system, since it enables the database
Unit 1 40
to handle data sizes that are much larger than the size of main memory.
The storage manager implements several data structures as part of the physical
system implementation:
• Data files- which store the database itself.
•Data dictionary- which stores metadata about the structure of the database, in
Particular the schema of the database.
• Indices- which provide fast access to data items that hold particular values.
Unit 1 41
Types of Database Language
DML stands for Data Manipulation Language. It is used for accessing and
manipulating data in a database. It handles user requests.
3. Data Control Language
DCL stands for Data Control Language. It is used to retrieve the stored or
saved data.
The DCL execution is transactional. It also has rollback parameters.
Here are some tasks that come under DCL:
Grant: It is used to give user access privileges to a database.
Revoke: It is used to take back permissions from the user.
4. Transaction Control Language
TCL is used to run the changes made by the DML statement. TCL can be
grouped into a logical transaction.
Here are some tasks that come under TCL:
Commit: It is used to save the transaction on the database.
Rollback: It is used to restore the database to original since the last
Commit.