Unit 4 Part 1
Unit 4 Part 1
Unit 4 Part 1
Lesson Structure
7.0 Objectives
7.1 Introduction
7.2 Introduction to Database Design
7.2.1 File System vs. Database
7.2.2 Steps in Database Design
7.2.3 Inputs to Physical Database Design
7.2.4 Guidelines for Database Design
7.3 Design of Database fields
7.4 Design of Physical Records
7.5 Design of Physical Files
7.5.1 Types of Files
7.5.2 File Organization
7.6 Design of Database
7.7 Case Study
7.8 Summary
7.9 Questions for Exercise
7.10 Further Readings
7.0 Objectives
After going through this unit, you will be able to understand the:
• concept of database design;
• advantage of databases over files;
• difference between logical and physical design;
• rules for good database design practices;
• concepts of fields, records and database;
• process to design the fields and records in a database table;
(138)
Physical File Design and Data Base Design
7.1 Introduction
A database is a collection of information that is organized so that it can be
easily accessed, managed and updated.
Data is organized into rows, columns and tables, and it is indexed to make
it easier to find relevant information. Data gets updated, expanded and
deleted as new information is added. Databases process workloads to create
and update themselves, querying the data they contain and running
applications against it.
A database is a model of a thing in the real world. Like their physical model
counterparts, data models enable you to get answers about the facts that
make up the objects being modeled. Database design is the craft of relating
things in the real world to data on a computer under the constraints and
affordances of computer technology (read/write; supported data types, storage
and access). Tellingly, popular database data types include Booleans, strings,
numbers, time – but not one explicitly for money. In addition to its use (in
many forms) in what was once known as new media, databases pervade many
parts of the postmodern condition (electronic voting, banking records, known
and unknown government “collection lists”, etc.). The strengths and danger
of databases lies in the ability to selectively access and combine entries from
large amounts of structured data. Furthermore, relationships between data
entries can be found (via queries) in a way that reveals additional information.
This is the domain of data mining.
Once the user finalized the logical system design, the process of physical
design of the system can be started. Physical database design involves actual
implementation of the logical database in the DBMS. The requirements of
physical design of the system require the logical design of the system.
The database design involves three levels of design concepts namely:
• Conceptual,
• Logical and
• Physical schemas.
Conceptual model produces a data model which accounts for the relevant
entities and relationships within the target application domain.
Logical model ensures via normalization procedures and the definition of
integrity rules that the stored database will be non-redundant and properly
connected.
(139)
Physical File Design and Data Base Design
Physical model specifies how database records are stored, accessed and
related to ensure adequate performance. A good database design helps
efficient storage and retrieval of data.
Mapping and
independence
Conceptual betweenexternala
schema
nd internal levels.
Database
(140)
Physical File Design and Data Base Design
File System
As mentioned above, in a typical File System electronic data are directly
stored in a set of files. If only one table is stored in a file, they are called flat
files. They contain values at each row separated with a special delimiter like
commas. In order to query some random data, first it is required to parse each
row and load it to an array at run time. But for this file should be read
sequentially (because, there is no control mechanism in files), therefore it is
quite inefficient and time consuming. The burden of locating the necessary
file, going through the records (line by line), checking for the existence of a
certain data, remembering what files/records to edit is on the user. The user
either has to perform each task manually or has to write a script that does
them automatically with the help of the file management capabilities of the
operating system. Because of these reasons, File Systems are easily
vulnerable to serious issues like inconsistency, inability for concurrency, data
isolation, threats on integrity and lack of security.
(141)
Physical File Design and Data Base Design
• Redundancy
• Sharing of information is cumbersome task
• Slow for huge database
• Searching process is time consuming
• Be Sociable, Share!
DBMS
DBMS, sometimes just called a database manager, is a collection of computer
programs that is dedicated for the management (i.e. organization, storage and
retrieval) of all databases that are installed in a system (i.e. hard drive or
network). There are different types of Database Management Systems existing
in the world, and some of them are designed for the proper management of
databases configured for specific purposes. Most popular commercial
Database Management Systems are Oracle, DB2 and Microsoft Access. All
these products provide means of allocation of different levels of privileges for
different users, making it possible for a DBMS to be controlled centrally by
a single administrator or to be allocated to several different people. There are
four important elements in any Database Management System. They are the
modeling language, data structures, query language and mechanism for
transactions. The modeling language defines the language of each database
hosted in the DBMS. Currently several popular approaches like hierarchal,
network, relational and object are in practice. Data structures help organize
the data such as individual records, files, fields and their definitions and
objects such as visual media. Data query language allow for maintaining and
the security of the database. It monitors login data, access rights to different
users, and protocols to add data to the system. SQL is a popular query
language which is used in Relational Database Management Systems. Finally,
the mechanism that allows for transactions help concurrency and multiplicity.
That mechanism will make sure same record will not be modified by multiple
users at the same time, thus keeping the data integrity in tact. Additionally,
DBMSs provide backup and other facilities as well. With all these
advancements in place, DBMS solves almost all problems of the File System,
mentioned above.
Advantages
1. Improved data sharing
The DBMS helps create an environment in which end users have better
access to more and better-managed data. Such access makes it possible for
end users to respond quickly to changes in their environment.
(142)
Physical File Design and Data Base Design
(143)
Physical File Design and Data Base Design
• Although File System and DBMS are two ways of managing data,
DBMS clearly has many advantages over File Systems. Typically when
using a File System, most tasks such as storage, retrieval and search
are done manually and it is quite tedious whereas a DBMS will
provide automated methods to complete these tasks.
• Using a File System will lead to problems like data integrity, data
inconsistency and data security, but these problems could be avoided
by using a DBMS.
• Unlike File System, DBMS are efficient because reading line by line
is not required and certain control mechanisms are in place.
Requirement Analysis
In this phase a detailed analysis of the requirement is done.The objective of
this phase is to get a clear understanding of the requirements.It make use
of various information gathering methods for this purpose. some of them are
o Interview
o Analyzing documents
o Survey
o Site visit
o Joint Applications Design (JAD) and Joint Requirements Analysis
(JRA)
o Prototyping
Requirements
Analysis
Specification of requirements
and results
Conceptual
Design
Conceptual Schema
Logical
Design
Logical Schema
Physical
Design
Physical Schema
(145)
Physical File Design and Data Base Design
(147)
Physical File Design and Data Base Design
(148)
Physical File Design and Data Base Design
(149)
Physical File Design and Data Base Design
While designing database fields, it is required to set the properties of the fields
which are as following:
1. Name: A name is used to refer the attribute in the DBMS that uniquely
labels the field. The name of the attribute in the logical data model and the
name of the field in the physical data model must be same. For example,
student name in a student table.
2. Data type: It defines the type of data the field is expected to store. This
could be numeric, alphanumeric etc. The data type, supported by various
RDBMS varies to a great extent. For example, employee_name CHAR(25),
indicates that the name of the employee is of character data type, 25 indicates
the maximum size of the data that can be stored in the field. The data type
selected should ensure the following:
• it involves minimum usage of memory and represents all possible values
• supports all types of data manipulation that is expected from the business
transaction.
3. Size: It indicates the size of the database fields. Many RDBMS support
sizes that are variable. For example, VARCHAR data type in Oracle.
4. Null or not Null: specifies whether the field will accept null value. Not null
constrains applied in DBMS ensure that null values are not entered to the
respective fields. A null value is a special value distinct from 0 or blank. A
null value indicates that the value is either missing or unassigned yet. We
may specify that customer_name in a customer table to be not null. When a
field is declared a primary key DBMS automatically ensures that the field
in not null.
5. Domain: It indicates the range of values that are accepted by the fields.
For example: Basic_Pay in a employee table can assume any value between
the lowest basic_pay and highest basic_pay existing in the company. In such
cases, the value of the field can be restricted to the one between the highest
and lowest value to avoid entry of non-existing basic_pay.
6. Default value: It refers to the value that is stored by default in the field.
For example, ship_date in a invoice is most of the time same as invoice_date
(current date). When a default value is assigned to a field, it reduces a lot of
data entry time and reduces the chances of error.
7. Referential integrity: It refers to a set of rules that avoid data inconsistency
and quality problems. Referential integrity ensures that a foreign key value
cannot be entered unless it matches a primary key value in another table.
RDBMS automatically enforces the referential integrity once the database
designer identifies and implements primary and foreign key relationship.
(150)
Physical File Design and Data Base Design
the starting position of each record, etc. Blocks of data (pages) are normally
read or written by the operating system. Page is referred to as the amount
of data written in one I/O operation of operating system.
Blocking factor refers to the number of physical records per page. If a record
size is 1340 bytes and the page size is 2048 bytes, then 708 bytes are wasted
if DBMS does not allow physical records to span different pages. Selecting a
block size involves a trade-off. In principle, the larger the block size, the fewer
read-write operations need be performed to access a file by the operating
system and therefore the more efficient is the processing.
However, it requires a correspondingly large allocation of buffer space in
memory. Since this is limited (and perhaps shared by many users), there is
in practice, an upper bound. Moreover, large block sizes are primarily
advantageous for sequential access.
Denormalization is the process of transforming normalized relations into
unnormalized physical record specifications. The motivation behind de-
normalization is poor performance of normalized table. The following may be
of use for denormalization.
• Combine two entities with one-to-one relationship to one entity. This avoids
the cost of joining two tables when the data are required from both the tables.
• Another form of de-normalization is to repeat the non key attribute (field)
of one table in another table to facilitate the execution of query faster.
However, it depends on the application at hand.
Customer_ID Customer_Name Address City State Zip Order_ID Order_Date
1001 Raj Flat 102 Patna Bihar 8004 0342 04/01/2012
3002 Abhi Flat 202 Gaya Bihar 80011 0441 09/02/2012
Figure 7: Depicts denormalization for optimized query processing
In a particular application it is seen that queries about order also require the
cutomer_name. In case of normalized table, this would always require joining
Customer table and order table each time the query is processed. We have
modified the order table by adding back the customer_name from the
customer table in order table. Now all queries will require only the order table
as all relevant information are available in this table.
Activities to enhance performance
1. Combining tables to avoid joins
2. Horizontal partitioning refers to placing different rows of a table into
separate files. For example, in an order table order pertaining to different
regions can be kept in a separate table for efficient retrieval of records.
3. Vertical partitioning refers to placing different columns of a table into
separate files by repeating the primary key in each of the files.
(152)
Physical File Design and Data Base Design
(153)
Physical File Design and Data Base Design
for legal reasons or to perform trend analysis). Archive files will contain all
information about the past dealings of the business and would normally be
stored on a different site to facilitate recovery in case of a disaster such as fire.
4. Audit file
An Audit file is a file that does not store business data but data related to
transaction log. For example, data and time of access, modification etc. of data,
values of fields before and after modification etc.
5. Work file
A Work file is file temporarily created to hold intermediate result of the data
processing. For example, a sorted file of list of customers.
7.5.2 File Organization
The physical organization of records on the disk is known as file Organization.
There are different types of file organizations depending on the organization
of records in the disk and other secondary storage are as following:
1. Serial file organization .
2. Sequential file organization.
3. Indexed sequential file organization.
4. Hashed file organization.
Before deciding on a specific file organization, we should ensure that its
application leads to the following:
• Fast retrieval of records
• Reduce disk access time
• Efficient use of disk spaces.
Serial file organization
A serial file is created by placing the record as it is created. It leaves no gap
between the records that are stored on the disk. The utilization of space called
packing density approaches 100 percent in this case. Examples of serial files
are print file, dump file, log files, and transaction files. These files are created
once and are not used for addition or deletion or any kind of record searching
operation.
Sequential file organization
In this organization, the records are physically ordered by primary key. To
locate a particular record, the program starts searching from the beginning
of the file till the matching primary key is found. Alphabetic list of customers
is a common example of sequential file organization. Deletion of record may
cause wastage of space and adding a new record requires rewriting of the
file. This type of file organization is suitable for master files and is not used
where fast response time is required.
(155)
Physical File Design and Data Base Design
(157)
Physical File Design and Data Base Design
(158)
Physical File Design and Data Base Design
A Schema / Relation
Conversion from ER Model to Relational Model
Basic Ideas:
• Build a table for each entity set
• Build a table for each relationship set if necessary (more on this later)
• Make a column in the table for each attribute in the entity set
• Indivisibility Rule and Ordering Rule
• Primary Key
SID NAME
SSN NAME
STUDENT
TEACHES
DEPT
MAJOR GPA
Table:Professor Table:Student
The aim of database design should be to create a database plan to fit present
and future objectives. A logical data model should be used as the blueprint
for designing and creating a physical database. But, the physical database
cannot be created properly with a simple logical to physical mapping.
7.8 Summary
File is being used for keeping data. A database management system (DBMS)
is a computer application program designed for the efficient and effective
storage, access and update of large volumes of information. Use of DBMS has
been the standard to store data for today’s information systems due to their
various advantages. Relational database is mostly being used unless the
application has specific requirements. The physical database design process
can be considered as a mapping from logical model to physical working
database, which involves design of fields, design of records and finally design
(159)
Physical File Design and Data Base Design
of the database. While transforming the logical model to physical model, many
implementation issues related to the information system and target DBMS
are to be addressed. Database volume estimation is an important part of
database design. The present size and future growth of database is to be
estimated before implementing the database.The transformation of E-R model
to relational database model is also discussed in this unit.
(160)