Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
2 views

Client-server Computing Using Oracle

Chapter 1 introduces the concepts of data, information, and knowledge, emphasizing their interrelationship and importance in decision-making. It discusses the limitations of file processing systems, such as data duplication and inflexibility, and contrasts them with the advantages of database management systems (DBMS) that provide a unified data repository. The chapter outlines the components of DBMS, including hardware, software, users, and procedures, highlighting their roles in managing and accessing data effectively.

Uploaded by

amrit deep kaur
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Client-server Computing Using Oracle

Chapter 1 introduces the concepts of data, information, and knowledge, emphasizing their interrelationship and importance in decision-making. It discusses the limitations of file processing systems, such as data duplication and inflexibility, and contrasts them with the advantages of database management systems (DBMS) that provide a unified data repository. The chapter outlines the components of DBMS, including hardware, software, users, and procedures, highlighting their roles in managing and accessing data effectively.

Uploaded by

amrit deep kaur
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 32

Chapter 1

Database – An Introduction

Data, Information and Knowledge

1.1 Data

Data represents unorganized and unprocessed facts. Usually data is static in nature. It can
represent a set of discrete facts about events. Data is a prerequisite to information. An
organization sometimes has to decide on the nature and volume of data that is required for
creating the necessary information.

1.2 Information

Information can be considered as an aggregation of data (processed data) which makes decision
making easier. Information has usually got some meaning and purpose.

Fig 1.2 Information

1.3 Knowledge

By knowledge we mean human understanding of a subject matter that has been acquired
through proper study and experience. Knowledge is usually based on learning, thinking, and
proper understanding of the problem area. Knowledge is not information and information is not
data. Knowledge is derived from information in the same way information is derived from data.
We can view it as an understanding of information based on its perceived importance or
relevance to a problem area. It can be considered as the integration of human perceptive
processes that helps them to draw meaningful conclusions.
Figure 1.3: Data, Information, Knowledge and Wisdom

Let’s summarize some key points:


• Data constitute the building blocks of information.
• Information is produced by processing data.
• Information is used to reveal the meaning of data.
• Accurate, relevant, and timely information is the key to good decision making.
• Good decision making is the key to organizational survival in a global environment.

1.4 Difference between data and information

1.5 Historical Roots: File and File Systems


File processing systems was an early attempt to computerize the manual filing system that we are
all familiar with. A file system is a method for storing and organizing computer files and the data
they contain to make it easy to find and access them. File systems may use a storage device such
as a hard disk or CD-ROM and involve maintaining the physical location of the files.

In our own home, we probably have some sort of filing system, which contains receipts,
guarantees, invoices, bank statements, and such like. When we need to look something up, we go
to the filing system and search through the system starting from the first entry until we find what
we want. Alternatively, we may have an indexing system that helps to locate what we want more
quickly. For example we may have divisions in the filing system or separate folders for different
types of item that are in some way logically related.

The manual filing system works well when the number of items to be stored is small. It even
works quite adequately when there are large numbers of items and we have only to store and
retrieve them. However, the manual filing system breaks down when we have to cross-reference
or process the information in the files. For example, a typical real estate agent's office might have
a separate file for each property for sale or rent, each potential buyer and renter, and each
member of staff.

Clearly the manual system is inadequate for this' type of work. The file based system was
developed in response to the needs of industry for more efficient data access. In early processing
systems, an organization's information was stored as groups of records in separate files.

In the traditional approach, we used to store information in flat files which are maintained by the
file system under the operating system's control. Here, flat files are files containing records
having no structured relationship among them. The file handling which we learn under C/C ++ is
the example of file processing system. The Application programs written in C/C ++ like
programming languages go through the file system to access these flat. files as shown.

Fig 1.5 File storage system for University management system

1.5.1 Characteristics of File Processing System

Here is the list of some important characteristics of file processing system:

• It is a group of files storing data of an organization.

• Each file is independent from one another.

• Each file is called a flat file.

• Each file contained and processed information for one specific function, such as accounting or
inventory.

• Files are designed by using programs written in programming languages such as COBOL, C,
C++.

• The physical implementation and access procedures are written into database application;
therefore, physical changes resulted in intensive rework on the part of the programmer.

• As systems became more complex, file processing systems offered little flexibility, presented
many limitations, and were difficult to maintain.

1.5.2 Limitations of the File Processing System I File-Based Approach

There are following problems associated with the File Based Approach:
1. Separated and Isolated Data: To make a decision, a user might need data from two separate
files. First, the files were evaluated by analysts and programmers to determine the specific data
required from each file and the relationships between the data and then applications could be
written in a programming language to process and extract the needed data. Imagine the work
involved if data from several files was needed.

2. Duplication of data: Often the same information is stored in more than one file. Uncontrolled
duplication of data is not required for several reasons, such as:

• Duplication is wasteful. It costs time and money to enter the data more than once

• It takes up additional storage space, again with associated costs.

• Duplication can lead to loss of data integrity; in other words the data is no longer consistent.
For example, consider the duplication of data between the Payroll and Personnel departments. If
a member of staff moves to new house and the change of address is communicated only to
Personnel and not to Payroll, the person's pay slip will be sent to the wrong address. A more
serious problem occurs if an employee is promoted with an associated increase in salary. Again,
the change is notified to Personnel but the change does not filter through to Payroll. Now, the
employee is receiving the wrong salary. When this error is detected, it will take time and effort to
resolve. Both these examples, illustrate inconsistencies that may result from the duplication of
data. As there is no automatic way for Personnel to update the data in the Payroll files, it is
difficult to foresee such inconsistencies arising. Even if Payroll is notified of the changes, it is
possible that the data will be entered incorrectly.

3. Data Dependence: In file processing systems, files and records were described by specific
physical formats that were coded into the application program by programmers. If the format of a
certain record was changed, the code in each file containing that format must be updated.
Furthermore, instructions for data storage and access were written into the application's code.
Therefore, .changes in storage structure or access methods could greatly affect the processing or
results of an application.

In other words, in file based approach application programs are data dependent. It means that,
with the change in the physical representation (how the data is physically represented in disk) or
access technique (how it is physically accessed) of data, application programs are also affected
and needs modification. In other words application programs are dependent on the how the data
is physically stored and accessed.
If for example, if the physical format of the master/transaction file is changed, by making he
modification in the delimiter of the field or record, it necessitates that the application programs
which depend on it must be modified.

Let us consider a student file, where information of students is stored in text file and each field is
separated by blank space as shown below:

I Rahat 35 Thapar

Now, if the delimiter of the field changes from blank space to semicolon as shown below:

1; Rahat; 35; Thapar

Then, the application programs using this file must be modified, because now it will token the
field on semicolon; but earlier it was blank space.

4. Difficulty in representing data from the user's view: To create useful applications for the
user, often data from various files must be combined. In file processing it was difficult to
determine relationships between isolated data in order to meet user requirements.

5. Data Inflexibility: Program-data interdependency and data isolation, limited the flexibility of
file processing systems in providing users with ad-hoc information requests

6. Incompatible file formats: As the structure of files is embedded in the application programs,
the structures are dependent on the application programming language. For example, the
structure of a file generated by a COBOL program may be different from the structure of a file
generated by a 'C' program. The direct incompatibility of such files makes them difficult to
process jointly.

7. Data Security. The security of data is low in file based system because, the data is maintained
in the flat file(s) is easily accessible. For Example: Consider the Banking System. The Customer
Transaction file has details about the total available balance of all customers. A Customer wants
information about his account balance. In a file system it is difficult to give the Customer access
to only his data in the· file. Thus enforcing security constraints for the entire file or for certain
data items are difficult.

8. Transactional Problems. The File based system approach does not satisfy transaction
properties like Atomicity, Consistency, Isolation and Durability properties commonly known as
ACID properties.

For example: Suppose, in a banking system, a transaction that transfers Rs. 1000 from account A
to account B with initial values' of A and B being Rs. 5000 and Rs. 10000 respectively. If a
system crash occurred after the withdrawal of Rs. 1000 from account A, but before depositing of
amount in account B, it will result an inconsistent state of the system. It means that the
transactions should not execute partially but wholly. This concept is known as Atomicity of a
transaction (either 0% or 100% of transaction). It is difficult to achieve this property in a file
based system.

9. Concurrency problems. When multiple users access the same piece of data at same interval
of time then it is called as concurrency of the system. When two or more users read the data
simultaneously there is ll( problem, but when they like to update a file simultaneously, it may
result in a problem.

For example:

Let us consider a scenario where in transaction T 1 a user transfers an amout1t 1000 from
Account A to B (initial value of A is 5000 and B is 8000). In mean while, another transaction T2,
tries to display the sum of account A and B is also executed. If both the transaction runs in
parallel it may results inconsistency as shown below:

The above schedule results inconsistency of database and it shows Rs.12,000 as sum of accounts
A and B instead of Rs .13,000. The problem occurs because second concurrently running
transaction T2, reads A and B at intermediate point and computes its sum, which results
inconsistent value.

10. Poor data modeling of real world. The file based system is not able to represent the
complex data and interfile relationships, which results poor data modeling properties.

1.6 Database System

The problems inherent in file systems make using a database system very desirable. Unlike the
file system, with its many separate and unrelated files, the database system consists of logically
related data stored in a single logical data repository. (The “logical” label reflects the fact that
the data repository appears to be a single unit to the end user, even though data might be
physically distributed among multiple storage facilities and locations.) Because the database’s
data repository is a single logical unit, the database represents a major change in the way end-
user data are stored, accessed, and managed. The database’s DBMS, shown in Figure 1.6,
provides numerous advantages over file system management, by making it possible to eliminate
most of the file system’s data inconsistency, data anomaly, data dependence, and structural
dependence problems. Better yet, the current generation of DBMS software stores not only the
data structures, but also the relationships between those structures and the access paths to those
structures—all in a central location. The current generation of DBMS software also takes care of
defining, storing, and managing all required access paths to those components.

Fig 1.6 Database System vs File system

1.6.1 Database Management System

A database management system is the software system that allows users to define, create and
maintain a database and provides controlled access to the data.

A Database Management System (DBMS) is basically a collection of programs that enables


users to store, modify, and extract information from a database as per the requirements. DBMS is
an intermediate layer between programs and the data. Programs access the DBMS, which then
accesses the data. There are different types of DBMS ranging from small systems that run on
personal computers to huge systems that run on mainframes. The following are main examples
of database applications:
• Computerized library systems

• Automated teller machines

• Flight reservation systems

• Computerized parts inventory systems

Some DBMS examples include MySQL, PostgreSQL, Microsoft Access, SQL Server,
FileMaker, Oracle, RDBMS, dBASE, Clipper, and FoxPro.

1.6.2 Components of DBMS

Let’s take a closer look at components of DBMS

Hardware. Hardware refers to all of the system’s physical devices, including computers (PCs,
workstations, servers, and supercomputers), storage devices, printers, network devices (hubs,
switches, routers, fiber optics), and other devices (automated teller machines, ID readers, and so
on).
• Software. Although the most readily identified software is the DBMS itself, three types of
software are needed to make the database system function fully: operating system software,
DBMS software, and application programs and utilities.
-- Operating system software manages all hardware components and makes it possible for all
other software to run on the computers. Examples of operating system software include
Microsoft Windows, Linux, Mac OS, UNIX, and MVS.
-- DBMS software manages the database within the database system. Some examples of DBMS
software include Microsoft’s SQL Server, Oracle Corporation’s Oracle, Sun’s MySQL, and
IBM’s DB2.
-- Application programs and utility software are used to access and manipulate data in the DBMS
and to manage the computer environment in which data access and manipulation take place.
Application programs are most commonly used to access data within the database to generate
reports, tabulations, and other information to facilitate decision making. Utilities are the software
tools used to help manage the database system’s computer components. For example, all of the
major DBMS vendors now provide graphical user interfaces (GUIs) to help create database
structures, control database access, and monitor database operations.
• People. This component includes all users of the database system. On the basis of primary job
functions, five types of users can be identified in a database system: system administrators,
database administrators, database designers, system analysts and programmers, and end users.
Each user type, described below, performs both unique and complementary functions.

-- System administrators oversee the database system’s general operations.


-- Database administrators, also known as DBAs, manage the DBMS and ensure that the
database is functioning properly.
-- Database designers design the database structure. They are, in effect, the database architects. If
the database design is poor, even the best application programmers and the most dedicated DBAs
cannot produce a useful database environment. Because organizations strive to optimize their
data resources, the database designer’s job description has expanded to cover new dimensions
and growing responsibilities.
-- System analysts and programmers design and implement the application programs. They
design and create the data-entry screens, reports, and procedures through which end users access
and manipulate the database’s data.
-- End users are the people who use the application programs to run the organization’s daily
operations. For example, sales clerks, supervisors, managers, and directors are all classified as
end users. High-level end users employ the information obtained from the database to make
tactical and strategic business decisions.

1. Sophisticated Users - They are database developers, who write SQL queries to
select/insert/delete/update data. They do not use any application or programs to request
the database. They directly interact with the database by means of query language like
SQL. These users will be scientists, engineers, analysts who thoroughly study SQL and
DBMS to apply the concepts in their requirement. In short, we can say this category
includes designers and developers of DBMS and SQL.
2. Specialized Users - These are also sophisticated users, but they write special database
application programs. They are the developers who develop the complex programs to
the requirement.
3. Stand-alone Users - These users will have stand –alone database for their personal
use. These kinds of database will have readymade database packages which will have
menus and graphical interfaces.
4. Native Users - these are the users who use the existing application to interact with the
database. For example, online library system, ticket booking systems, ATMs etc which
has existing application and users use them to interact with the database to fulfill their
requests.

• Procedures. Procedures are the instructions and rules that govern the design and use of the
database system.
Procedures are a critical, although occasionally forgotten, component of the system. Procedures
play an important role in a company because they enforce the standards by which business is
conducted within the organization and with customers. Procedures also help to ensure that
companies have an organized way to monitor and audit the data that enter the database and the
information generated from those data.
• Data. The word data covers the collection of facts stored in the database. Because data are the
raw material from which information is generated, determining what data to enter into the
database and how to organize those data is a vital part of the database designer’s job.

1.6.3 Advantages of DBMS


The database management system has a number of advantages as compared to traditional
computer file-based processing approach. The DBA must keep in mind these benefits or
capabilities during databases and monitoring the DBMS.
The Main advantages of DBMS are described below.

Controlling Data Redundancy


In non-database systems each application program has its own private files. In this case, the
duplicated copies of the same data is created in many places. In DBMS, all data of an
organization is integrated into a single database file. The data is recorded in only one place in
the database and it is not duplicated.

Sharing of Data
In DBMS, data can be shared by authorized users of the organization. The database administrator
manages the data and gives rights to users to access the data. Many users can be authorized to
access the same piece of information simultaneously. The remote users can also share same data.
Similarly, the data of same database can be shared between different application programs.

Data Consistency
By controlling the data redundancy, the data consistency is obtained. If a data item appears only
once, any update to its value has to be performed only once and the updated value is immediately
available to all users. If the DBMS has controlled redundancy, the database system enforces
consistency.

Integration of Data
In Database management system, data in database is stored in tables. A single database contains
multiple tables and relationships can be created between tables (or associated data entities). This
makes easy to retrieve and update data.

Integration Constraints
Integrity constraints or consistency rules can be applied to database so that the correct data can
be entered into database. The constraints may be applied to data item within a single record or
the may be applied to relationships between records.

Data Security
Form is very important object of DBMS. You can create forms very easily and quickly in
DBMS. Once a form is created, it can be used many times and it can be modified very easily.
The created forms are also saved along with database and behave like a software component. A
form provides very easy way (user-friendly) to enter data into database, edit data and display
data from database. The non-technical users can also perform various operations on database
through forms without going into technical details of a fatabase.

Report Writers
Most of the DBMSs provide the report writer tools used to create reports. The users can create
very easily and quickly. Once a report is created, it can be used may times and it can be modified
very easily. The created reports are also saved along with database and behave like a software
component.

Control Over Concurrency


In a computer file-based system, if two users are allowed to access data simultaneously, it is
possible that they will interfere with each other. For example, if both users attempt to perform
update operation on the same record, then one may overwrite the values recorded by the other.
Most database management systems have sub-systems to control the concurrency so that
transactions are always recorded with accuracy.

Backup and Recovery Procedures


In a computer file-based system, the user creates the backup of data regularly to protect the
valuable data from damage due to failures to the computer system or application program. It is
very time consuming method, if amount of data is large. Most of the DBMSs provide the 'backup
and recovery' sub-systems that automatically create the backup of data and restore data if
required.
Data Independence
The separation of data structure of database from the application program that uses the data is
called data independence. In DBMS, you can easily change the structure of database without
modifying the application program.

1.6.4Disadvantages of DBMS
Although there are many advantages but the DBMS may also have some minor disadvantages.
These are:
1. Cost of Hardware & Software:
A processor with high speed of data processing and memory of large size is required to run the
DBMS software. It means that you have to upgrade the hardware used for file-based system.
Similarly, DBMS software is also Very costly.
2. Cost of Data Conversion:
When a computer file-based system is replaced with a database system, the data stored into data
file must be converted to database files. It is difficult and time consuming method to convert data
of data files into database. You have to hire DBA (or database designer) and system designer
along with application programmers; Alternatively, you have to take the services of some
software houses. So a lot of money has to be paid for developing database and related software.
3. Cost of Staff Training:
Most DBMSs are often complex systems so the training for users to use the DBMS is required.
Training is required at all levels, including programming, application development, and database
administration. The organization has to pay a lot of amount on the training of staff to run the
DBMS.
4. Appointing Technical Staff:
The trained technical persons such as database administrator and application programmers etc
are required to handle the DBMS. You have to pay handsome salaries to these persons.
Therefore, the system cost increases.
5. Database Failures:
In most of the organizations, all data is integrated into a single database. If database is corrupted
due to power failure or it is corrupted on the storage media, then our valuable data may be lost or
whole system stops.

1.7 Three Level Architecture of DBMS

The logical architecture describes how data in the database is perceived by users. It is not
concerned with how the data is handled and processed by the DBMS, but only with how it looks.
The method of data storage on the underlying file system is not revealed, and the users can
manipulate the data without worrying about where it is located or how it is actually stored. This
results in the database having different levels of abstraction.

The External or View Level:

The external or view level is the highest level of abstraction of database. It provides a window on
the conceptual view, which allows the user to see only the data of interest to them. The user can
be either an application program or an end user. There can be many external views as any
number of external schema can be defined and they can overlap each other. It consist of the
definition of logical records and relationships in the external view. It also contains the method of
deriving the objects such as entities, attributes and relationships in the external view from the
conceptual view.

The Conceptual Level or Global Level:

The conceptual level presents a logical view of the entire database as a unified whole. It allows
the user to bring all the data in the database together and see it in a consistent manner. Hence ,
there is only one conceptual schema per database. The first stage in the design of a database is to
define the conceptual view, and a DBMS provides a data definition language for this purpose. it
describes all the records and relationships included in the database.The data definition language
used to create the conceptual level must not specify any physical storage considerations that
should be handled by the physical level. It does not provide any storage or acess details, but
defines the information content only.

The Internal or Physical Level:

The collection of files permanently stored on secondary storage devices is known as the physical
database. The physical or internal level is the one closest to the physical storage, ans it provide a
low level description of the physical database, and an interface between the operating system file
system and the record structures used in higher level of abstraction. It is at this level that record
types and methods of storage are defined, as well as how stored fields are represented, what
physical sequence the stored records are in, and what other physical structures exist.
Fig 1.7 Architecture of DBMS

1.7.1 Mapping Between Views

We know that three view-levels are described by means of three schemas. These schemas are
stored in the data dictionary. In DBMS, each user refers only to its own external schema. Hence,
the DBMS must transform a request on. a specified external schema into a request against
conceptual schema, and then into a request against internal schema to store and retrieve data to
and from the database.

The process to convert a request (from external level) and the result between view levels is called
mapping. The mapping defines the correspondence between three view levels. The mapping
description is also stored in data dictionary. The DBMS is responsible for mapping between
these three types of schemas. There are two types of mapping.

(i) External-Conceptual mapping (ii) Conceptual-Internal mapping


External-Conceptual Mapping
An external-conceptual mapping defines the correspondence between a particular external view
and the conceptual view. The external-conceptual mapping tells the DBMS which objects on the
conceptual level correspond to the objects requested on a particular user's external view. If
changes are made to either an external view or conceptual view, then mapping must be changed
accordingly.

Conceptual-Internal Mapping
The conceptual-internal mapping defines the correspondence between the conceptual view and
the internal view, i.e. database stored on the physical storage device. It describes how conceptual
records are stored and retrieved to and from the storage device. This means that conceptual-
internal mapping tells the DBMS that how the conceptual! records are physically represented. If
the structure of the stored database is changed, then the mapping must be changed accordingly. It
is the responsibility of DBA to manage such changes.
Fig 1.7 Mapping Between views

1.7.2 Database Schema


A database schema is the skeleton structure that represents the logical view of the entire
database. It defines how the data is organized and how the relations among them are associated.
It formulates all the constraints that are to be applied on the data.

A database schema defines its entities and the relationship among them. It contains a descriptive
detail of the database, which can be depicted by means of schema diagrams. It’s the database
designers who design the schema to help programmers understand the database and make it
useful.

Subschema

A subschema is a subset of the schema and inherits the same property that a schema has. The
plan (or scheme) for a view is often called subschema. Subschema refers to an application
programmer's (user's) view of the data item types and record types, which he or she uses. It gives
the users a window through which he or she can view only that part of the database, which is of
interest to him. Therefore, different application programs can have different view of data.

Data Independence
A major objective for three-level architecture is to provide data independence, which means that
upper levels are unaffected by changes in lower levels.

There are two kinds of data independence:


Logical Data Idependence

Logical data independence indicates that the conceptual schema can be changed without
affecting the existing external schemas. The change would be absorbed by the mapping between
the external and conceptual levels. Logical data independence also insulates application
programs from operations such as combining two records into one or splitting an existing record
into two or more records. This would require a. change in the external/conceptual mapping so as
to leave the external view unchanged.

Physical Data Independence

Physical data independence indicates that the physical storage structures or devices could be
changed without affecting conceptual schema. The change would be absorbed by the mapping
between the conceptual and internal levels. Physic 1data independence is achieved by the
presence of the internal level of the database and the n, lPping or transformation from the
conceptual level of the database to the internal level. Conceptual level to internal level mapping,
therefore provides a means to go from the conceptual view (conceptual records) to the internal
view and hence to the stored data in the database (physical records).
Difference between physical and logical data independence

1.8 Functions of Data Base Administrator

Data Base Administrator (DBA) is a person or group in charge for implementing DBMS in an
organization. Database Administrator's job requires a high degree of technical expertise and the
ability to understand and interpret management requirements ata senior level. In practice the
DBA may consist of team of people rather than just one person
Makes decisions concerning the content of the database: It is the DBA's job to decide exactly
what information is to be held in the database-in other words, to identify the' entities of interest
to the enterprise and to identify information to be recorded about those entitie .

• Plans storage structures and access strategies: The DBA must also decide how the data is to
be represented in the database, and must specify the representation by writing the storage
structure definition (using the internal data defination language).

In addition, the associated mapping between the storage structure definition and the conceptual
schema must also be specified.

• Provides support to users: It is the responsibility of the DBA to provide support to the users,
to ensure that the data they require is available, and to write the\ necessary external schemas
(using the appropriate external data definition language).

In addition, the mapping between any given eA1ernal schema and the conceptual' schema must
also be specified.

• Defines security and integrity checks: DBA is responsible for providing the authorization and
authentication checks such that no malicious users can accessdatabase and it must remain
protected. DBA must also ensure the integrity of the database.

• Interprets backup and recovery strategies: In the event of damage to any portion\ of the
database-caused by human error, say, or a failure in the hardware or supporting operating
system-it is essential to be able to repair the data concerned witl1 a minimum of delay and with
as little effect as possible on the rest of the system.

The DBA must define and implement an appropriate recovery strategy to recover he database
from all types of failures.

• Monitoring performance and responding to changes in requirements: The

DBA is responsible for so organizing the system as to get the performance that is "best for the
enterprise," and for making the appropriate adjustments as requirements change.
Chapter 2

A database model is a type of data model that determines the logical structure of database and
fundamentally determines in which manner data can be stored, organized, and manipulated.

Data Model Basic Building Blocks


The basic building blocks of all data models are entities, attributes, relationships, and constraints.
An entity is anything (a person, a place, a thing, or an event) about which data are to be collected
and stored. An entity represents a particular type of object in the real world. Because an entity
represents a particular type of object, entities are “distinguishable” that is, each entity occurrence
is unique and distinct. For example, a CUSTOMER entity would have many distinguishable
customer occurrences, such as John Smith, Pedro Dinamita, Tom Strickland, etc.

An attribute is a characteristic of an entity. For example, a CUSTOMER entity would be


described by attributes such as customer last name, customer first name, customer phone,
customer address, and customer credit limit. Attributes are the equivalent of fields in file
systems.

A relationship describes an association among entities. For example, a relationship exists


between customers and agents that can be described as follows: an agent can serve many
customers, and each customer may be served by one agent. Data models use three types of
relationships: one-to-many, many-to-many, and one-to-one. Database designers usually use the
shorthand notations 1:M or 1..*, M:N or *..*, and 1:1 or 1..1, respectively. (Although the M:N
notation is a standard label for the many-to-many relationship, the label M:M may also be used.)
The following examples illustrate the distinctions among the three.

One-to-many (1:M or 1..*) relationship. A painter paints many different paintings, but each
one of them is painted by only one painter. Thus, the painter (the “one”) is related to the
paintings (the “many”). Therefore, database designers label the relationship “PAINTER paints
PAINTING” as 1:M. (Note that entity names are often capitalized as a convention so they are
easily identified.) Similarly, a customer (the “one”) may generate many invoices, but each
invoice (the “many”) is generated by only a single customer. The “CUSTOMER generates
INVOICE” relationship would also be labeled 1:M.

Many-to-many (M:N or *..*) relationship. An employee may learn many job skills, and each
job skill may be learned by many employees. Database designers label the relationship
“EMPLOYEE learns SKILL” as M:N. Similarly, a student can take many classes and each class
can be taken by many students, thus yielding the M:N relationship label for the relationship
expressed by “STUDENT takes CLASS.”

One-to-one (1:1 or 1..1) relationship. A retail company’s management structure may require
that each of its stores be managed by a single employee. In turn, each store manager, who is an
employee, manages only a single store. Therefore, the relationship “EMPLOYEE manages
STORE” is labeled 1:1.

Types of Data Model

2.1 Hierarchical Model

The hierarchical model was developed in the 1960s to manage large amounts of data for
complex manufacturing projects such as the Apollo rocket that landed on the moon in 1969. Its
basic logical structure is represented by an upside-down tree. The hierarchical structure contains
levels, or segments. A segment is the equivalent of a file
system’s record type. Within the hierarchy, the top layer (the root) is perceived as the parent of
the segment directly beneath it. For example, in Figure 2.1, the root segment is the parent of the
Level 1 segments, which, in turn, are the parents of the Level 2 segments, etc. The segments
below other segments are the children of the segment above. In short, the hierarchical model
depicts a set of one-to-many (1:M) relationships between a parent and its childrensegments.
(Each parent can have many children, but each child has only one parent.)

Advantages of Hierarchical Model

1. It promotes data sharing.


2. Parent/Child relationship promotes conceptual simplicity.
3. Database security is provided and enforced by DBMS.
4. Parent/Child relationship promotes data integrity.
5. It is efficient with 1:M relationships.

Disadvantages

1. Complex implementation requires knowledge of physical data storage characteristics.


2. Navigational system yields complex application development, management, and use; requires
knowledge of hierarchical path.
3. Changes in structure require changes in all application programs.
4. There are implementation limitations (no multiparent or M:N relationships).
5. There is no data definition or data manipulation language in the DBMS.
6. There is a lack of standards.

2.2 Network Data Model

In the network model, entities are organized in a graph, in which some entities can be accessed
through several path. In the network model, the user perceives the network database as a
collection of records in 1:M relationships. However, unlike the hierarchical model, the network
model allows a record to have more than one parent. In network database terminology, a
relationship is called a set. Each set is composed of at least two record types: an owner record
and a member record.

Fig 2.2 Network Data Model

Advantages

1. Conceptual simplicity is at least equal to that of the hierarchical model.


2. It handles more relationship types, such as M:N and multiparent.
3. Data access is more flexible than in hierarchical and file system models.
4. Data Owner/Member relationship promotes data integrity.
5. There is conformance to standards.
6. It includes data definition language (DDL) and data manipulation language (DML) in DBMS.

Disadvantages

1. System complexity limits efficiency—still a navigational system.


2. Navigational system yields complex implementation, application development, and
management.
3. Structural changes require changes in all application programs.
2.3 The Relational Model

The relational model was introduced in 1970 by E. F. Codd (of IBM) in his landmark paper “A
Relational Model of
Data for Large Shared Databanks” (Communications of the ACM, June 1970, pp. 377−387). The
relational model
represented a major breakthrough for both users and designers. Its conceptual simplicity set the
stage for a genuine database revolution.
The relational model foundation is a mathematical concept known as a relation. To avoid the
complexity of abstract
mathematical theory, you can think of a relation (sometimes called a table) as a matrix
composed of intersecting rows and columns. Each row in a relation is called a tuple. Each
column represents an attribute. The relational model also describes a precise set of data
manipulation constructs based on advanced mathematical concepts.
Example of RDBMS include Oracle, DB2, Microsoft SQL Server & MySQL.

Arguably the most important advantage of the RDBMS is its ability to hide the complexities of
the relational model from the user. The RDBMS manages all of the physical details, while the
user sees the relational database as a collection of tables in which data are stored. The user can
manipulate and query the data in a way that seems intuitive and logical.Tables are related to each
other through the sharing of a common attribute (value in a column). For example, the
CUSTOMER table in Figure 2.3 might contain a sales agent’s number that is also contained in
the AGENT table.

Relationship between Teacher table & Classes table can be created through teacherID. Following
figure shows this concept.
Fig Linkage between tables in RDBMS

The Advantages of a Relational Database Management System

Data Structure
 The table format is simple and easy for database users to understand and use. RDBMSs provide
data access using a natural structure and organization of the data. Database queries can search
any column for matching entries.

Multi-User Access
 RDBMSs allow multiple database users to access a database simultaneously. Built-in locking
and transactions management functionality allow users to access data as it is being changed,
prevents collisions between two users updating the data, and keeps users from accessing partially
updated records.

Privileges
 Authorization and privilege control features in an RDBMS allow the database administrator to
restrict access to authorized users, and grant privileges to individual users based on the types of
database tasks they need to perform. Authorization can be defined based on the remote client IP
address in combination with user authorization, restricting access to specific external computer
systems.

Network Access
 RDBMSs provide access to the database through a server daemon, a specialized software
program that listens for requests on a network, and allows database clients to connect to and use
the database. Users do not need to be able to log in to the physical computer system to use the
database, providing convenience for the users and a layer of security for the database. Network
access allows developers to build desktop tools and Web applications to interact with databases.

Speed
 The relational database model is not the fastest data structure. RDBMS advantages, such as
simplicity, make the slower speed a fair trade-off. Optimizations built into an RDBMS, and the
design of the databases, enhance performance, allowing RDBMSs to perform more than fast
enough for most applications and data sets. Improvements in technology, increasing processor
speeds and decreasing memory and storage costs allow systems administrators to build
incredibly fast systems that can overcome any database performance shortcomings.

Maintenance
 RDBMSs feature maintenance utilities that provide database administrators with tools to easily
maintain, test, repair and back up the databases housed in the system. Many of the functions can
be automated using built-in automation in the RDBMS, or automation tools available on the
operating system.

Language
 RDBMSs support a generic language called "Structured Query Language" (SQL). The SQL
syntax is simple, and the language uses standard English language keywords and phrasing,
making it fairly intuitive and easy to learn. Many RDBMSs add non-SQL, database-specific
keywords, functions and features to the SQL language.

Disadvantages of RDBMS

Cost of software/hardware and migration: A significant disadvantage of the DBMS system is cost. In
addition to the cost of purchasing or developing the software, the hardware has to be upgraded to allow
for the extensive programs and work spaces required for their execution and storage. The processing
overhead introduced by DBMS to implement security, integrity, and sharing of the data causes a
degradation of the response and throughput times. An additional cost is that of migration from a
traditionally separate application environment to an integrated one.

Problem associated with centralization: While centralization reduces duplication, the lack of duplication
requires that the database be adequately backed up so that in the case of failure the data can be
recovered. Centralization also means that the data is accessible from a single source. This increases the
potential severity of security breaches and disruption of the operation of the organization because of
downtimes and failures. The replacement of a monolithic centralized database by a federation of
independent and cooperating distributed databases resolves some of the problems resulting from
failures and downtimes

2.4 Difference between DBMS & RDBMS

Chapter 3 Normalization
Normalization is a process for evaluating and correcting table structures to minimize data
redundancies, thereby reducing the likelihood of data anomalies. The normalization process involves
assigning attributes to table.
Normalization is used for mainly two purpose,

 Eliminating reduntant(useless) data.

 Ensuring data dependencies make sense i.e data is logically stored.

Problem without Normalization


Without Normalization, it becomes difficult to handle and update the database, without facing data
loss. Insertion, Updation and Deletion Anamolies are very frequent if Database is not Normalized. To
understand these anomalies let us take an example of Student table.

S_id S_Name S_Address Subject_opted

401 Adam Noida Bio


402 Alex Panipat Maths

403 Stuart Jammu Maths

404 Adam Noida Physics

 Updation Anamoly : To update address of a student who occurs twice or more than twice in a

table, we will have to update S_Address column in all the rows, else data will become

inconsistent.

 Insertion Anamoly : Suppose for a new admission, we have a Student id(S_id), name and

address of a student but if student has not opted for any subjects yet then we have to

insert NULL there, leading to Insertion Anamoly.

 Deletion Anamoly : If (S_id) 401 has only one subject and temporarily he drops it, when we

delete that row, entire student record will be deleted along with it.

Normalization Steps
We shall discuss all these NFs one by one.

3.1 First NF (1NF)

As per First Normal Form, no two Rows of data must contain repeating group of information i.e each
set of column must have a unique value, such that multiple columns cannot be used to fetch the
same row. Each table should be organized into rows, and each row should have a primary key that
distinguishes it as unique.

The Primary key is usually a single column, but sometimes more than one column can be combined
to create a single primary key. For example consider a table which is not in First normal form

Student Table :

Student Age Subject


Adam 15 Biology, Maths

Alex 14 Maths

Stuart 17 Maths

In First Normal Form, any row must not have a column in which more than one value is saved, like
separated with commas. Rather than that, we must separate such data into multiple rows.

Student Table following 1NF will be :

Student Age Subject

Adam 15 Biology

Adam 15 Maths

Alex 14 Maths

Stuart 17 Maths

Using the First Normal Form, data redundancy increases, as there will be many columns with same
data in multiple rows but each row as a whole will be unique.

Second Normal form (2NF)

As per the Second Normal Form there must not be any partial dependency of any column on primary
key. It means that for a table that has concatenated primary key, each column in the table that is not
part of the primary key must depend upon the entire concatenated key for its existence. If any
column depends only on one part of the concatenated key, then the table fails Second normal
form.

In example of First Normal Form there are two rows for Adam, to include multiple subjects that he
has opted for. While this is searchable, and follows First normal form, it is an inefficient use of space.
Also in the above Table in First Normal Form, while the candidate key is {Student, Subject}, Age of
Student only depends on Student column, which is incorrect as per Second Normal Form. To
achieve second normal form, it would be helpful to split out the subjects into an independent table,
and match them up using the student names as foreign keys.

New Student Table following 2NF will be :

Student Age

Adam 15

Alex 14

Stuart 17

In Student Table the candidate key will be Student column, because all other column i.e Age is
dependent on it.

New Subject Table introduced for 2NF will be :

Student Subject

Adam Biology

Adam Maths

Alex Maths

Stuart Maths

In Subject Table the candidate key will be {Student, Subject} column. Now, both the above tables
qualifies for Second Normal Form and will never suffer from Update Anomalies. Although there are a
few complex cases in which table in Second Normal Form suffers Update Anomalies, and to handle
those scenarios Third Normal Form is there.

Third Normal form (3NF)


Third Normal form applies that every non-prime attribute of table must be dependent on primary
key, or we can say that, there should not be the case that a non-prime attribute is determined by
another non-prime attribute. So this transitive functional dependency should be removed from the
table and also the table must be in Second Normal form. For example, consider a table with
following fields.

Student_Detail Table :

Student_id Student_name DOB Street city State Zip

In this table Student_id is Primary key, but street, city and state depends upon Zip. The dependency
between zip and other fields is called transitive dependency. Hence to apply 3NF, we need to
move the street, city and state to new table, with Zip as primary key.

New Student_Detail Table :

Student_id Student_name DOB Zip

Address Table :

Zip Street city state

The advantage of removing transtive dependency is,

 Amount of data duplication is reduced.

 Data integrity achieved.

Boyce and Codd Normal Form (BCNF)

Boyce and Codd Normal Form is a higher version of the Third Normal form. This form deals with
certain type of anamoly that is not handled by 3NF. A 3NF table which does not have multiple
overlapping candidate keys is said to be in BCNF. For a table to be in BCNF, following conditions
must be satisfied:

 R must be in 3rd Normal Form

 and, for each functional dependency ( X -> Y ), X should be a super Key.

You might also like