DBMS Unit 1
DBMS Unit 1
Prepared By:
Milan Vachhani
Assistant Professor, MCA Department,
B. H. Gardi College of Engineering and Technology, Rajkot
M 9898626213
milan.vachhani@gmail.com
http://milanvachhani.blogspot.com
&
Research Scholar, Department of Computer Science,
Saurashtra University, Rajkot
Unit -1
What is Data?
Data is a collection of facts from which conclusion may be drawn.
In computer science, data is anything in a form suitable for use with a computer. Data is
often distinguished from programs. A program is a set of instructions that detail a task for the
computer to perform. In this sense, data is thus everything that is not program code.
What is Database?
A database is a collection of data that is organized so that its contents can easily be accessed,
managed, and updated.
A database is a collection of data, typically describing the activities of one or more related
organizations.
Database is a structured collection of records or data that is stored in a computer system.
Tuples / Records
Entity/ Table
Name
Raj
Deepak
Vijay
Dhaval
Attributes / Fields
Address
City
PIN
Lakhsminagar
Rajkot
360001
Rudapark
Baroda
524413
Kalawad Road
Ah.bad
985542
Punit society
Surat
254412
.
.
.
[1]
Mobile
9898756214
9847562399
9984325678
9945678835
Table
Entity
Attributes
Fields
Tuples
Records
A table is a collection of data arrange in row and column format. A database may
contain one or more tables.
An Entity the distinguishable objects of real world.
E.g. Student, Customer, Employee..etc.
An attributes are the set of properties processed by an entity.
E.g. Name, Address, City, Mobile.etc.
The title of the column that holds a specific type of data is known as field. A table
can have maximum 255 fields.
Each record row in a table is tuple.
The collection of data horizontally for each field is known as record. A record is
complete information about an entity.
Database,
Database Management System (DBMS),
Database Systems
[2]
Database-System Applications
Some representative applications are:
Enterprise Information
Sales: For customer, product, and purchase information.
Accounting: For payments, receipts, account balances, assets and other accounting
information.
Human resources: For information about employees, salaries, payroll taxes, and
benefits, and for generation of paychecks.
Manufacturing: For management of the supply chain and for tracking production of
items in factories, inventories of items in warehouses and stores, and orders for items.
Online retailers: For sales data noted above plus online order tracking, generation of
recommendation lists, and maintenance of online product evaluations.
Banking and Finance
Banking: For customer information, accounts, loans, and banking transactions.
Credit card transactions: For purchases on credit cards and generation of monthly
statements.
Finance: For storing information about holdings, sales, and purchases of financial
instruments such as stocks and bonds; also for storing real-time market data to enable
online trading by customers and automated trading by the firm.
Universities: For student information, course registrations, and grades (in addition to standard
enterprise information such as human resources and accounting).
Airlines: For reservations and schedule information. Airlines were among the first to use
databases in a geographically distributed manner.
Telecommunication: For keeping records of calls made, generating monthly bills, maintaining
balances on prepaid calling cards, and storing information about the communication networks.
[3]
Purpose of DBMS
In the early days, database applications were built directly on top of file systems
Typical file-processing system is supported by a conventional operating system.
The system stores permanent records in various files, and it needs different application
programs to extract records from, and add records to, the appropriate files. Before database
management systems (DBMSs) were introduced, organizations usually stored information in
such systems.
Keeping organizational information in a file-processing system has a number of major
disadvantages:
1) Data redundancy and inconsistency
Same information is stored in multiple files, so duplication of information occurs.
Changes should be done in all copies of file otherwise data become inconsistent.
2) Difficulty in accessing data
Need to write a new program to carry out each new task.
The point here is that conventional file-processing environments do not allow needed
data to be retrieved in a convenient and efficient manner. More responsive dataretrieval systems are required for general use.
3) Data isolation multiple files and formats
Because data are scattered in various files, and files may be in different formats,
writing new application programs to retrieve the appropriate data is difficult.
4) Integrity problems
Developers enforce these constraints in the system by adding appropriate code in the
various application programs.
Integrity constraints (e.g. account balance > 0) become buried in program code
rather than being stated explicitly
However, when new constraints are added, it is difficult to change the programs to
enforce them. The problem is compounded when constraints involve several data
items from different files.
5) Atomicity of updates
Failures may leave database in an inconsistent state with partial updates carried out
Example: Transfer of funds from one account to another should either complete or not
happen at all
It must happen in its entirety or not at all. It is difficult to ensure atomicity in a
conventional file-processing system.
6) Concurrent access by multiple users
Concurrent accessed needed for performance
Uncontrolled concurrent accesses can lead to inconsistencies
Example: Two people reading a balance and updating it at the same time
[4]
7) Security problems
Not every user of the database system should be able to access all the data.
For example, in a university, payroll personnel need to see only that part of the
database that has financial information. They do not need access to information about
academic records.
In file-processing system enforcing such security constraints is difficult.
These difficulties, among others, prompted the development of database systems.
In what follows, we shall see the concepts and algorithms that enable database systems to
solve the problems with file-processing systems
[5]
2) Defining Attributes :The unique data field in a table is assigned a primary key. The primary key helps in the
identification of data. It also checks for duplicates within the same table, thereby reducing
data redundancy.
There are tables, which have a secondary key in addition to the primary key. The secondary
key is also called 'foreign key'. The secondary key refers to the primary key of another table,
thus establishing a relationship between the two tables.
3) Systematic Storage :The data is stored in the form of tables. The table consists of rows and columns. The primary
and secondary key helps to eliminate data redundancy, enabling systematic storage of data.
4) Changes to schema :The table schema can be changed and it is not platform dependent. Therefore, the tables in
the system can be edited to add new columns and rows without hampering the applications,
which depend on that particular database.
5) No Language Dependence :The database management systems are not language dependent. Therefore, they can be
used with various languages and on various platforms.
6) Table Joins :The data in two or more tables can be integrated into a single table. This enables to reduce
the size of the database and also helps in easy retrieval of data.
7) Multiple Simultaneous Usage :The database can be used simultaneously by a number of users. Various users can retrieve
the same data simultaneously. The data in the database can also be modified, based on the
privileges assigned to users.
8) Data Security :Data is the most important asset. Therefore, there is a need for data security. Database
management systems help to keep the data secured.
9) Privileges :Different privileges can be given to different users. For example, some users can edit the
database, but are not allowed to delete the contents of the database.
10) Abstract View of Data and Easy Retrieval :DBMS enables easy and convenient retrieval of data. A database user can view only the
abstract form of data; the complexities of the internal structure of the database are hidden
from him. The data fetched is in user friendly format.
[6]
11) Data Consistency :Data consistency ensures a consistent view of data to every user. It includes the accuracy,
validity and integrity of related data. The data in the database must satisfy certain
consistency constraints, for example, the age of a candidate appearing for an exam should
be of number datatype and in the range of 20-25. When the database is updated, these
constraints are checked by the database systems.
View of Data
A database system is a collection of interrelated data and a set of programs that allow users to
access and modify these data.
A major purpose of a database system is to provide users with an abstract view of the data. That is,
the system hides certain details of how the data are stored and maintained.
Data Abstraction
For the system to be usable, it must retrieve data efficiently. The need for efficiency has led
designers to use complex data structures to represent data in the database. Since many databasesystem users are not computer trained, developers hide the complexity from users through several
levels of abstraction, to simplify users interactions with the system:
Physical level
The lowest level of abstraction describes how the data are actually stored. The physical
level describes complex low-level data structures in detail.
Logical level
The next-higher level of abstraction describes what data are stored in the database, and
what relationships exist among those data. The logical level thus describes the entire
database in terms of a small number of relatively simple structures.
Although implementation of the simple structures at the logical level may involve
complex physical-level structures, the user of the logical level does not need to be aware
of this complexity. This is referred to as physical data independence.
Database administrators, who must decide what information to keep in the database, use
the logical level of abstraction.
View level
The highest level of abstraction describes only part of the entire database. Even though
the logical level uses simpler structures, complexity remains because of the variety of
information stored in a large database.
[7]
Many users of the database system do not need all this information; instead, they need
to access only a part of the database. The view level of abstraction exists to simplify their
interaction with the system. The system may provide many views for the same database.
Following figure shows the relationship among the three levels of abstraction.
[8]
physical schema is hidden beneath the logical schema, and can usually be changed easily
without affecting application programs. Application programs are said to exhibit physical
data independence if they do not depend on the physical schema, and thus need not be
rewritten if the physical schema changes.
Data Models
A Collection of tools for describing :
Data
Data Relationship
Add Semantics
Data Constraints
Relational Model
Entity Relationship Model (for Database Design)
Object base Data Model (for Object-Oriented)
Semistructured Data Model (XML)
Other Older Models :
Network Model
Hierarchical Model
1) Relational Data Model
This Model uses a collection of tables to represent both data and the relationship
among those data.
Each table has multiple columns and each column has a unique name.
It is an example of Record-Base a model.
Database is structured in fix-format records of several types.
This is the most widely used Data Model
Tuples / Records
Entity/ Table
Name
Raj
Deepak
Vijay
Dhaval
Attributes / Fields
Address
City
PIN
Mobile
Lakhsminagar
Rajkot
360001
9898756214
Rudapark
Baroda
524413
9847562399
Kalawad Road
Ah.bad
985542
9984325678
Punit society
Surat
254412
9945678835
.
.
.
Example of tabular Data in Relational Model
[9]
Table
Entity
Attributes
Fields
Tuples
Records
A table is a collection of data arrange in row and column format. A database may
contain one or more tables.
An Entity the distinguishable objects of real world.
E.g. Student, Customer, Employee..etc.
An attributes are the set of properties processed by an entity.
E.g.:- Name, Address, City, Mobile.etc.
The title of the column that holds a specific type of data is known as field. A table
can have maximum 255 fields.
Each record row in a table is tuple.
The collection of data horizontally for each field is known as record. A record is
complete information about an entity.
Example :-
[10]
Entity
Attribute
Relation
Flow of Relationship
[11]
4) Network Model
This model organizes data using two fundamental constructs, called records and sets.
Records contain fields, and sets define one-to-many relationships between records: one
owner, many members.
Access to the database was not via SQL query strings, but by a specific set of APIs, typically
for FIND, CREATES, READ, UPDATE and DELETE.
Each API would only access a single table (dataset), so it was not possible to implement a
JOIN which would return data from several tables.
It was not possible to provide a variable WHERE clause. The only selection mechanism
available was
[12]
[13]
DBMS Architecture
(A)
External Level
Users view of the database.
Consists of a number of different external views of the Database.
Describes part of the DB for particular group of users.
Provides a powerful and flexible security mechanism by hiding parts of the DB from
certain users. The user is not aware of the existence of any attributes that are
missing from the view.
It permits users to access data in a way that is customize to their needs, so that the
same data can be seen by different users in different ways, at the same time.
[14]
(B)
Conceptual Level
The logical structure of the entire database as seen by DBA.
What data is stored in the database.
The relationships among the data.
Complete view of the data requirements of the organization, independent of any
storage consideration.
Represents:
entities, attributes, relations
constraints on data
semantic information on data
security, integrity information
Supports each external view: any data available to a user must be contained in, or
derivable from the conceptual level.
(C)
Internal Level
Physical representation of the DB on the computer.
How the data is stored in the database.
Physical implementation of the DB to achieve optimal runtime performance and
storage space utilization.
Storage space allocation for data and indexes
Record description for storage
Record placement
Data Compression, encryption
We are now in a position to provide a single picture (below Figure) of the various components
of a database system and the connections among them.
The architecture of a database system is greatly influenced by the underlying computer system
on which the database system runs. Database systems can be centralized, or client-server,
where one server machine executes work on behalf of multiple client machines. Database
systems can also be designed to exploit parallel computer architectures. Distributed databases
span multiple geographically separated machines.
[15]
[16]
2) Two-Tier Architecture
Client manages main business and data processing logic and user interface.
Server manages and controls access to database.
[17]
3) Three-Tier Architecture
Client side presented two problems preventing true scalability:
Fat client, requiring considerable resources on clients computer to run effectively.
Significant client side administration overhead.
By 1995, three layers proposed, each potentially running on a different platform.
User interface layer runs on client.
Business logic and data processing layer middle tier runs on a server (application server).
DBMS stores data required by the middle tier. This tier may be on a separate server
(database server).
Advantages:
Thin client, requiring less expensive hardware.
Application maintenance centralized.
Easier to modify or replace one tier without affecting others.
Separating business logic from database functions makes it easier to implement load
balancing.
Maps quite naturally to Web environment.
[18]
The storage manager implements several data structures as part of the physical system
implementation:
[19]
Components of DBMS
In Previous question we have seen the DBMS Architecture. The components of this architectures are
the components of DBMS.
Following are the some of the components of DBMS.
DBMS external interfaces
Database language engines (or processors)
Query optimizer
Database engine
Storage engine
Transaction engine
DBMS management and operation component
[20]
Data Languages
A database system provides a data-definition language to specify the database schema and
a data-manipulation language to express database queries and updates.
In practice, the data-definition and data-manipulation languages are not two separate
languages; instead they simply form parts of a single database language, such as the widely
used SQL language.
(A) DDL
The Data Definition Language (DDL) is used to create and destroy databases and
database objects. These commands will primarily be used by database administrators
during the setup and removal phases of a database project.
Specific notation for defining the Data schema
Example : Create Table Account ( Acc_No
[21]
Dr. E.F.Codds Rules :(1) The Information rule: All data should be in presented in table form.
(2) The Guaranteed Access rule: all data should be accessible without ambiguity.
(3) The Systematic Treatment of Null Values rule: a field should be allowed to remain empty. This
involves the support of null values. Which is distinct form an empty string or a number with a
value of zero.
(4) The Dynamic Online Catalog Based on the Relational Model rule: a relational database must
provide to access to its structure through the same tools that are used to access the data.
(5) The Comprehensive Data Sublanguage rule: the database must support one clearly defined
language that include data definition language, data manipulation, data integrity and database
transaction control.
(6) The View Updating rule: All views of the data which are theoretically updatable must be
updatable in practice by the DBMS.
(7) The High-level Insert, Update, and Delete rule: The capability of handling a base relation or a
derived relation as a single operand applies not only to the retrieval of data but also to the
insertion, update, and deletion of data.
(8) The Physical Data Independence rule: Application programs and terminal activities remain
logically unimpaired whenever any changes are made in either storage representations or
access methods.
(9) The Logical Data Independence rule: how data is viewed should not be changed when the
logical structure of the database changed. This rule is particularly difficult to satisfy.
(10) The Integrity Independence rule: Integrity constraints must be definable in the RDBMS.
(11) The Distribution Independence rule: An RDBMS has distribution independence. Distribution
independence implies that users should not have to be aware of whether a database is
distributed.
(12) The Nonsubversion rule: If the database has any means of handling a single record at a time,
that low-level language must not be able to subvert or avoid the integrity rules which are
expressed in a higher-level language that handles multiple records at a time.
[22]
RDBMS
[23]