Dbms Units Notes
Dbms Units Notes
Dbms Units Notes
What is Data?
The raw facts are called as data. The word “raw” indicates that they have not been processed.
What is information?
Ex: If I want to find how much rain falls in a week. Data is the one, that I record amount of
rainfall/day.
Information is the knowledge gathered from data on an average how much rain has occurred
in a week.
What is Knowledge?
DATA/INFORMATION PROCESSING:
The process of converting the data (raw facts) into meaningful information is called as
data/information processing.
When When
Data Information Knowledge
Processed Processed
Note: In business processing knowledge is more useful to make decisions for any organization.
DIFFERENCE BETWEEN DATA AND INFORMATION:
DATA INFORMATION
1.Processed data
1.Raw facts
Database -
Database is a collection of interrelated and organized data.
A database is a collection of information that is organized so that it can
be easily accessed, managed and updated.
In general, it is a collection of files (tables).
A database can be of any size and varying complexity.
A database may be generated and manipulated manually or it may be
computerized.
Example:
Customer database consists the fields as cname, cno, and ccity
The DBMS is a general purpose software system that facilitates the process of
defining constructing and manipulating databases for various applications.
Goals of DBMS:
The primary goal of a DBMS is to provide a way to store and retrieve database
information that is both convenient and efficient
Need of DBMS:
1. Before the advent of DBMS, organizations typically stored information using a “File
Processing Systems”.
Example of such systems is File Handling in High Level Languages like C, Basic and
COBOL etc., these systems have Major disadvantages to perform the Data Manipulation. So
to overcome those drawbacks now we are using the DBMS.
3. In addition to that the database system must ensure the safety of the information stored,
despite system crashes or attempts at unauthorized access. If data are to be shared among
several users, the system must avoid possible anomalous results.
The following are the various kinds of applications/organizations uses databases for their
business processing activities in their day-to-day life. They are:
1. Banking: For customer information, accounts, and loans, and banking transactions.
2. Airlines: For reservations and schedule information. Airlines were among the first to
. use
4. Credit Card Transactions: For purchases on credit cards and generation of monthly
statements.
6. Finance: For storing information about holdings, sales, and purchases of financial
instruments such as stocks and bonds.
9. Human resources: For information about employees, salaries, payroll taxes and
benefits, and for generation of paychecks.
11. Web: For access the Back accounts and to get the balance amount.
12. E –Commerce: For Buying a book or music CD and browse for things like watches,
mobiles from the Internet.
The earliest business computer systems were used to process business records and
produce information. They were generally faster and more accurate than equivalent manual
systems. These systems stored groups of records in separate files, and so they were called file
processing systems.
1. File system is a collection of data. Any management with the file system, user has
to write the procedures
2. File system gives the details of the data representation and Storage of data.
4. Concurrent access to the data in the file system has many problems like a Reading
the file while other deleting some information, updating some information
Since files and application programs are created by different programmers over a long
period of time, the files are likely to be having different formats and the programs may be written
in several programming languages. Moreover, the same piece of information may be duplicated
in several places. This redundancy leads to higher storage and access cost. In addition, it may
lead to data inconsistency.
Suppose that one of the bank officers needs to find out the names of all customers who
live within a particular postal-code area. The officer asks the data-processing department to
generate such a list. Because there is no application program to generate that. The bank officer
has now two choices: either obtain the list of all customers and extract the needed information
manually or ask a system programmer to write the necessary application program. Both
alternatives are obviously unsatisfactory.
The conventional file processing environments do not allow needed data to be retrieved
in a convenient and efficient manner. Better data retrieval system must be developed for general
use.
Data Isolation:
Since data is scattered in various files, and files may be in different formats, it is difficult
to write new application programs to retrieve the appropriate data.
In order to improve the overall performance of the system and obtain a faster response
time, many systems allow multiple users to update the data simultaneously. In such an
environment, interaction of concurrent updates may result in inconsistent data.
Consider bank account A, containing $500. If two customers withdraw funds (say $50
and $100 respectively) from account A at about the same time, the result of the concurrent
executions may leave the account in an incorrect (or inconsistent) state. Suppose that the
programs executing on behalf of each withdrawal read the old balance, reduce that value by the
amount being withdrawn, and write the result back. If the two programs run concurrently, they
may both read the value $500, and write back $450 and $400, respectively. Depending on which
one writes the value last, the account may contain $450 or $400, rather than the correct value of
$350. To guard against this possibility, the system must maintain some form of supervision. But
supervision is difficult to provide because data may be accessed by many different application
programs that have not been coordinated previously.
Security Problems:
Not every user of the database system should be able to access all the data. For example,
in banking system, payroll personnel need only that part of the database that has information
about various bank employees. They do not need access to information about customer accounts.
It is difficult to enforce such security constraints.
Integrity Problems:
The data values stored in the database must satisfy certain types of consistency constraints.
For example, the balance of a bank account may never fall below a prescribed amount. These
constraints are enforced in the system by adding appropriate code in the various application
programs. When new constraints are added, it is difficult to change the programs to enforce
them. The problem is compounded when constraints involve several data items for different files.
Atomicity Problem:
A computer system like any other mechanical or electrical device is subject to failure. In
many applications, it is crucial to ensure that once a failure has occurred and has been detected,
the data are restored to the consistent state existed prior to the failure
Example:
Consider part of a savings-bank enterprise that keeps information about all customers and
savings accounts. One way to keep the information on a computer is to store it in operating
system files. To allow users to manipulate the information, the system has a number of
application programs that manipulate the files, including:
Data Independence:
A DBMS utilizes a variety of sophisticated techniques to store and retrieve data efficiently.
This feature is especially important if the data is stored on external storage devices.
Data Integrity and Security:
If data is always accessed through the DBMS, the DBMS can enforce integrity
constraints on the data. For example, before inserting salary information for an employee, the
DBMS can check that the department budget is not exceeded. Also, the DBMS can enforce
access controls that govern what data is visible to different classes of users.
A database system allows several users to access the database concurrently. Answering
different questions from different users with the same (base) data is a central aspect of an
information system. Such concurrent use of data increases the economy of a system.
An example for concurrent use is the travel database of a bigger travel agency. The
employees of different branches can access the database concurrently and book journeys for their
clients. Each travel agent sees on his interface if there are still seats available for a specific
journey or if it is already fully booked.
A DBMS also protects data from failures such as power failures and crashes etc. by the
recovery schemes such as backup mechanisms and log files etc.
Data Administration:
When several users share the data, centralizing the administration of data can offer
significant improvements. Experienced professionals, who understand the nature of the data
being managed, and how different groups of users use it, can be responsible for organizing the
data representation to minimize redundancy and fine-tuning the storage of the data to make
retrieval efficient.
DBMS supports many important functions that are common to many applications
accessing data stored in the DBMS. This, in conjunction with the high-level interface to the data,
facilitates quick development of applications. Such applications are also likely to be more robust
than applications developed from scratch because many important tasks are handled by the
DBMS instead of being implemented by the application.
Abstraction is one of the main features of database systems. Hiding irrelevant details from
user and providing abstract view of data to users, helps in easy and efficient user-database
interaction. That is the system hides certain details of how data are stored and maintained.
To understand the view of data, you must have a basic knowledge of data abstraction and
Instance & schema.
1. Data abstraction
2. Instance and schema
Hiding certain details of how the data are stored and maintained. A major purpose of
database system is to provide users with an “Abstract View” of the data. In DBMS there are 3
levels of data abstraction. The goal of the abstraction in the DBMS is to separate the users
request and the physical storage of data in the database.
..
THREE LEVELS OF DATA ABSTRACTION
Levels of Abstraction:
Physical Level:
The lowest Level of Abstraction describes “How” the data are actually stored.
The physical level describes complex low level data structures in detail.
Logical Level:
This level of data Abstraction describes “What” data are to be stored in the database and
what relationships exist among those data.
View Level:
It is the highest level of data Abstracts that describes only part of entire database.
Different users require different types of data elements from each database.
The system may provide many views for the some database.
Describes part of the database for a particular group of users.
Can be many different views of a database.
E.g. tellers in a bank get a view of customer accounts, but not of payroll data.
Example: Let’s say we are storing customer information in a customer table. At physical level
these records can be described as blocks of storage (bytes, gigabytes, terabytes etc.) in memory.
These details are often hidden from the programmers.
At the logical level these records can be described as fields and attributes along with their data
types, their relationship among each other can be logically implemented. The programmers
generally work at this level because they are aware of such things about database systems.
At view level, user just interact with system with the help of GUI and enter the details at the
screen, they are not aware of how the data is stored and what data is stored; such details are
hidden from them.
Schema:
The overall design of the database is called the “Schema” or “Meta Data”. A database
schema corresponds to the programming language type definition.
Schema.
EX:SCHEMA
STUDENT
EX:INSTANCE
STUDENT
The goal of this architecture is to separate the user applications and the physical
database. In this architecture, schemas can be defined at the following three levels:
1. The internal level has an internal schema, which describes the physical storage structure of
the database. The internal schema uses a physical data model and describes the complete
details of data storage and access paths for the database.
2. The conceptual level has a conceptual schema, which describes the structure of the whole
database for a community of users. The conceptual schema hides the details of physical
storage structures and concentrates on describing entities, data types, relationships, user
operations, and constraints. A high-level data model or an implementation data model can be
used at this level.
3. The external or view level includes a number of external schemas or user views. Each
external schema describes the part of the database that a particular user group is interested in
and hides the rest of the database from that user group. A high-level data model or an
implementation data model can be used at this level.
DATA INDEPENDENCE:
The ability to modify a scheme definition in one level without affecting a scheme
definition in a higher level is called data independence.
The ability to modify the physical schema without causing application programs to be
rewritten
Modifications at this level are usually to improve performance.
The ability to modify the conceptual schema without causing application programs to be
rewritten
Usually done when logical structure of database is altered
Logical data independence is harder to achieve as the application programs are usually
heavily dependent on the logical structure of the data.
Data models are a collection of conceptual tools for describing data, data relationships, data
semantics and data constraints. There are three different groups:
- Record based logical Models – Like Object based model, they also describe
data at the conceptual and view levels. These models specify logical structure of database
with records, fields and attributes.
ER MODEL:
ER Model is a diagrammatic representation of entire Database tables. It represents a High Level
Database Design.
Attributes
Entities are represented by means of their properties, called attributes. All attributes have
values. For example, a student entity may have name, class, and age as attributes. Attributes are
represented by means of ellipses. Every ellipse represents one attribute and is directly connected
to its entity (rectangle).
The set of all entities or relationships of the same type is called the entity set or
relationship set.
Example :
Here are the geometric shapes and their meaning in an E-R Diagram –
Rectangle: Represents Entity sets.
Ellipses: Attributes
Diamonds: Relationship Set
Lines: They link attributes to Entity Sets and Entity sets to Relationship Set
Double Ellipses: Multivalued Attributes
Dashed Ellipses: Derived Attributes
Double Rectangles: Weak Entity Sets
Double Lines: Total participation of an entity in a relationship set
A sample E-R Diagram:
Relational Model
The relational model uses a collection of tables to represent both data and the relationships
among those data. Each table has multiple columns, and each column has a unique name as
follows: Each table is a group of column and rows, where column represents attribute of an
entity and rows represents records.
Sample relationship Model: Student table with 3 columns and four records.
In Hierarchical model data elements are linked as an inverted tree structure (root at the top with
branches formed below). Below the single root data element are subordinate elements each of
which in turn has its own subordinate elements and so on, the tree can grow to multiple levels.
Data elements have parent child relationship as in a family tree.
For Example in an organization employees are categorized by their department and within a
department they are categorized by their job function such as managers, engineers, technicians
and support staff.
Network model
This model is the extension of hierarchical data model. In this model also there exist a parent
child relationship but a child data element can have more than one parent element or no parent at
all. The main difference of the network model from the hierarchical model is its ability to handle
many –to – many (n: n) relationships or in other words it allows a record to have more than one
parent.
Example of Network model is given below where there are relationships among courses offered
and students enrolled for each course in a college. Each student can be enrolled for several
courses and each course may have a number of students enrolled for it. The students enrolled for
English are Miya and Priyanka and Miya has taken three courses English, Math and Science.The
example also shows a child element that has no parent element i.e he has not taken any course in
this semester, he might be a research student.
Object oriented models were introduced to overcome the shortcomings of conventional models
like Relational, Hierarchical and network model. An object oriented database is collection of
objects whose behavior, state, and relationships are defined in accordance with object oriented
concepts (such as objects, class, class hierarchy etc. )
The following diagram represents an example of object oriented database structure. Here Class
vehicle is root of a class composition hierarchy including classes VehicleSpecs, Company and
Employee. Class Vehicle is also root of a class Hierarchy involving classes. Two Wheeler and
FourWheeler. Class Company is in turn, root of a class hierarchy with subclasses Domestic
Company and ForeignCompany. It is also root of a class composition hierarchy involving class
Employee.