Client-server Computing Using Oracle
Client-server Computing Using Oracle
Database – An Introduction
1.1 Data
Data represents unorganized and unprocessed facts. Usually data is static in nature. It can
represent a set of discrete facts about events. Data is a prerequisite to information. An
organization sometimes has to decide on the nature and volume of data that is required for
creating the necessary information.
1.2 Information
Information can be considered as an aggregation of data (processed data) which makes decision
making easier. Information has usually got some meaning and purpose.
1.3 Knowledge
By knowledge we mean human understanding of a subject matter that has been acquired
through proper study and experience. Knowledge is usually based on learning, thinking, and
proper understanding of the problem area. Knowledge is not information and information is not
data. Knowledge is derived from information in the same way information is derived from data.
We can view it as an understanding of information based on its perceived importance or
relevance to a problem area. It can be considered as the integration of human perceptive
processes that helps them to draw meaningful conclusions.
Figure 1.3: Data, Information, Knowledge and Wisdom
In our own home, we probably have some sort of filing system, which contains receipts,
guarantees, invoices, bank statements, and such like. When we need to look something up, we go
to the filing system and search through the system starting from the first entry until we find what
we want. Alternatively, we may have an indexing system that helps to locate what we want more
quickly. For example we may have divisions in the filing system or separate folders for different
types of item that are in some way logically related.
The manual filing system works well when the number of items to be stored is small. It even
works quite adequately when there are large numbers of items and we have only to store and
retrieve them. However, the manual filing system breaks down when we have to cross-reference
or process the information in the files. For example, a typical real estate agent's office might have
a separate file for each property for sale or rent, each potential buyer and renter, and each
member of staff.
Clearly the manual system is inadequate for this' type of work. The file based system was
developed in response to the needs of industry for more efficient data access. In early processing
systems, an organization's information was stored as groups of records in separate files.
In the traditional approach, we used to store information in flat files which are maintained by the
file system under the operating system's control. Here, flat files are files containing records
having no structured relationship among them. The file handling which we learn under C/C ++ is
the example of file processing system. The Application programs written in C/C ++ like
programming languages go through the file system to access these flat. files as shown.
• Each file contained and processed information for one specific function, such as accounting or
inventory.
• Files are designed by using programs written in programming languages such as COBOL, C,
C++.
• The physical implementation and access procedures are written into database application;
therefore, physical changes resulted in intensive rework on the part of the programmer.
• As systems became more complex, file processing systems offered little flexibility, presented
many limitations, and were difficult to maintain.
There are following problems associated with the File Based Approach:
1. Separated and Isolated Data: To make a decision, a user might need data from two separate
files. First, the files were evaluated by analysts and programmers to determine the specific data
required from each file and the relationships between the data and then applications could be
written in a programming language to process and extract the needed data. Imagine the work
involved if data from several files was needed.
2. Duplication of data: Often the same information is stored in more than one file. Uncontrolled
duplication of data is not required for several reasons, such as:
• Duplication is wasteful. It costs time and money to enter the data more than once
• Duplication can lead to loss of data integrity; in other words the data is no longer consistent.
For example, consider the duplication of data between the Payroll and Personnel departments. If
a member of staff moves to new house and the change of address is communicated only to
Personnel and not to Payroll, the person's pay slip will be sent to the wrong address. A more
serious problem occurs if an employee is promoted with an associated increase in salary. Again,
the change is notified to Personnel but the change does not filter through to Payroll. Now, the
employee is receiving the wrong salary. When this error is detected, it will take time and effort to
resolve. Both these examples, illustrate inconsistencies that may result from the duplication of
data. As there is no automatic way for Personnel to update the data in the Payroll files, it is
difficult to foresee such inconsistencies arising. Even if Payroll is notified of the changes, it is
possible that the data will be entered incorrectly.
3. Data Dependence: In file processing systems, files and records were described by specific
physical formats that were coded into the application program by programmers. If the format of a
certain record was changed, the code in each file containing that format must be updated.
Furthermore, instructions for data storage and access were written into the application's code.
Therefore, .changes in storage structure or access methods could greatly affect the processing or
results of an application.
In other words, in file based approach application programs are data dependent. It means that,
with the change in the physical representation (how the data is physically represented in disk) or
access technique (how it is physically accessed) of data, application programs are also affected
and needs modification. In other words application programs are dependent on the how the data
is physically stored and accessed.
If for example, if the physical format of the master/transaction file is changed, by making he
modification in the delimiter of the field or record, it necessitates that the application programs
which depend on it must be modified.
Let us consider a student file, where information of students is stored in text file and each field is
separated by blank space as shown below:
I Rahat 35 Thapar
Now, if the delimiter of the field changes from blank space to semicolon as shown below:
Then, the application programs using this file must be modified, because now it will token the
field on semicolon; but earlier it was blank space.
4. Difficulty in representing data from the user's view: To create useful applications for the
user, often data from various files must be combined. In file processing it was difficult to
determine relationships between isolated data in order to meet user requirements.
5. Data Inflexibility: Program-data interdependency and data isolation, limited the flexibility of
file processing systems in providing users with ad-hoc information requests
6. Incompatible file formats: As the structure of files is embedded in the application programs,
the structures are dependent on the application programming language. For example, the
structure of a file generated by a COBOL program may be different from the structure of a file
generated by a 'C' program. The direct incompatibility of such files makes them difficult to
process jointly.
7. Data Security. The security of data is low in file based system because, the data is maintained
in the flat file(s) is easily accessible. For Example: Consider the Banking System. The Customer
Transaction file has details about the total available balance of all customers. A Customer wants
information about his account balance. In a file system it is difficult to give the Customer access
to only his data in the· file. Thus enforcing security constraints for the entire file or for certain
data items are difficult.
8. Transactional Problems. The File based system approach does not satisfy transaction
properties like Atomicity, Consistency, Isolation and Durability properties commonly known as
ACID properties.
For example: Suppose, in a banking system, a transaction that transfers Rs. 1000 from account A
to account B with initial values' of A and B being Rs. 5000 and Rs. 10000 respectively. If a
system crash occurred after the withdrawal of Rs. 1000 from account A, but before depositing of
amount in account B, it will result an inconsistent state of the system. It means that the
transactions should not execute partially but wholly. This concept is known as Atomicity of a
transaction (either 0% or 100% of transaction). It is difficult to achieve this property in a file
based system.
9. Concurrency problems. When multiple users access the same piece of data at same interval
of time then it is called as concurrency of the system. When two or more users read the data
simultaneously there is ll( problem, but when they like to update a file simultaneously, it may
result in a problem.
For example:
Let us consider a scenario where in transaction T 1 a user transfers an amout1t 1000 from
Account A to B (initial value of A is 5000 and B is 8000). In mean while, another transaction T2,
tries to display the sum of account A and B is also executed. If both the transaction runs in
parallel it may results inconsistency as shown below:
The above schedule results inconsistency of database and it shows Rs.12,000 as sum of accounts
A and B instead of Rs .13,000. The problem occurs because second concurrently running
transaction T2, reads A and B at intermediate point and computes its sum, which results
inconsistent value.
10. Poor data modeling of real world. The file based system is not able to represent the
complex data and interfile relationships, which results poor data modeling properties.
The problems inherent in file systems make using a database system very desirable. Unlike the
file system, with its many separate and unrelated files, the database system consists of logically
related data stored in a single logical data repository. (The “logical” label reflects the fact that
the data repository appears to be a single unit to the end user, even though data might be
physically distributed among multiple storage facilities and locations.) Because the database’s
data repository is a single logical unit, the database represents a major change in the way end-
user data are stored, accessed, and managed. The database’s DBMS, shown in Figure 1.6,
provides numerous advantages over file system management, by making it possible to eliminate
most of the file system’s data inconsistency, data anomaly, data dependence, and structural
dependence problems. Better yet, the current generation of DBMS software stores not only the
data structures, but also the relationships between those structures and the access paths to those
structures—all in a central location. The current generation of DBMS software also takes care of
defining, storing, and managing all required access paths to those components.
A database management system is the software system that allows users to define, create and
maintain a database and provides controlled access to the data.
Some DBMS examples include MySQL, PostgreSQL, Microsoft Access, SQL Server,
FileMaker, Oracle, RDBMS, dBASE, Clipper, and FoxPro.
Hardware. Hardware refers to all of the system’s physical devices, including computers (PCs,
workstations, servers, and supercomputers), storage devices, printers, network devices (hubs,
switches, routers, fiber optics), and other devices (automated teller machines, ID readers, and so
on).
• Software. Although the most readily identified software is the DBMS itself, three types of
software are needed to make the database system function fully: operating system software,
DBMS software, and application programs and utilities.
-- Operating system software manages all hardware components and makes it possible for all
other software to run on the computers. Examples of operating system software include
Microsoft Windows, Linux, Mac OS, UNIX, and MVS.
-- DBMS software manages the database within the database system. Some examples of DBMS
software include Microsoft’s SQL Server, Oracle Corporation’s Oracle, Sun’s MySQL, and
IBM’s DB2.
-- Application programs and utility software are used to access and manipulate data in the DBMS
and to manage the computer environment in which data access and manipulation take place.
Application programs are most commonly used to access data within the database to generate
reports, tabulations, and other information to facilitate decision making. Utilities are the software
tools used to help manage the database system’s computer components. For example, all of the
major DBMS vendors now provide graphical user interfaces (GUIs) to help create database
structures, control database access, and monitor database operations.
• People. This component includes all users of the database system. On the basis of primary job
functions, five types of users can be identified in a database system: system administrators,
database administrators, database designers, system analysts and programmers, and end users.
Each user type, described below, performs both unique and complementary functions.
1. Sophisticated Users - They are database developers, who write SQL queries to
select/insert/delete/update data. They do not use any application or programs to request
the database. They directly interact with the database by means of query language like
SQL. These users will be scientists, engineers, analysts who thoroughly study SQL and
DBMS to apply the concepts in their requirement. In short, we can say this category
includes designers and developers of DBMS and SQL.
2. Specialized Users - These are also sophisticated users, but they write special database
application programs. They are the developers who develop the complex programs to
the requirement.
3. Stand-alone Users - These users will have stand –alone database for their personal
use. These kinds of database will have readymade database packages which will have
menus and graphical interfaces.
4. Native Users - these are the users who use the existing application to interact with the
database. For example, online library system, ticket booking systems, ATMs etc which
has existing application and users use them to interact with the database to fulfill their
requests.
• Procedures. Procedures are the instructions and rules that govern the design and use of the
database system.
Procedures are a critical, although occasionally forgotten, component of the system. Procedures
play an important role in a company because they enforce the standards by which business is
conducted within the organization and with customers. Procedures also help to ensure that
companies have an organized way to monitor and audit the data that enter the database and the
information generated from those data.
• Data. The word data covers the collection of facts stored in the database. Because data are the
raw material from which information is generated, determining what data to enter into the
database and how to organize those data is a vital part of the database designer’s job.
Sharing of Data
In DBMS, data can be shared by authorized users of the organization. The database administrator
manages the data and gives rights to users to access the data. Many users can be authorized to
access the same piece of information simultaneously. The remote users can also share same data.
Similarly, the data of same database can be shared between different application programs.
Data Consistency
By controlling the data redundancy, the data consistency is obtained. If a data item appears only
once, any update to its value has to be performed only once and the updated value is immediately
available to all users. If the DBMS has controlled redundancy, the database system enforces
consistency.
Integration of Data
In Database management system, data in database is stored in tables. A single database contains
multiple tables and relationships can be created between tables (or associated data entities). This
makes easy to retrieve and update data.
Integration Constraints
Integrity constraints or consistency rules can be applied to database so that the correct data can
be entered into database. The constraints may be applied to data item within a single record or
the may be applied to relationships between records.
Data Security
Form is very important object of DBMS. You can create forms very easily and quickly in
DBMS. Once a form is created, it can be used many times and it can be modified very easily.
The created forms are also saved along with database and behave like a software component. A
form provides very easy way (user-friendly) to enter data into database, edit data and display
data from database. The non-technical users can also perform various operations on database
through forms without going into technical details of a fatabase.
Report Writers
Most of the DBMSs provide the report writer tools used to create reports. The users can create
very easily and quickly. Once a report is created, it can be used may times and it can be modified
very easily. The created reports are also saved along with database and behave like a software
component.
1.6.4Disadvantages of DBMS
Although there are many advantages but the DBMS may also have some minor disadvantages.
These are:
1. Cost of Hardware & Software:
A processor with high speed of data processing and memory of large size is required to run the
DBMS software. It means that you have to upgrade the hardware used for file-based system.
Similarly, DBMS software is also Very costly.
2. Cost of Data Conversion:
When a computer file-based system is replaced with a database system, the data stored into data
file must be converted to database files. It is difficult and time consuming method to convert data
of data files into database. You have to hire DBA (or database designer) and system designer
along with application programmers; Alternatively, you have to take the services of some
software houses. So a lot of money has to be paid for developing database and related software.
3. Cost of Staff Training:
Most DBMSs are often complex systems so the training for users to use the DBMS is required.
Training is required at all levels, including programming, application development, and database
administration. The organization has to pay a lot of amount on the training of staff to run the
DBMS.
4. Appointing Technical Staff:
The trained technical persons such as database administrator and application programmers etc
are required to handle the DBMS. You have to pay handsome salaries to these persons.
Therefore, the system cost increases.
5. Database Failures:
In most of the organizations, all data is integrated into a single database. If database is corrupted
due to power failure or it is corrupted on the storage media, then our valuable data may be lost or
whole system stops.
The logical architecture describes how data in the database is perceived by users. It is not
concerned with how the data is handled and processed by the DBMS, but only with how it looks.
The method of data storage on the underlying file system is not revealed, and the users can
manipulate the data without worrying about where it is located or how it is actually stored. This
results in the database having different levels of abstraction.
The external or view level is the highest level of abstraction of database. It provides a window on
the conceptual view, which allows the user to see only the data of interest to them. The user can
be either an application program or an end user. There can be many external views as any
number of external schema can be defined and they can overlap each other. It consist of the
definition of logical records and relationships in the external view. It also contains the method of
deriving the objects such as entities, attributes and relationships in the external view from the
conceptual view.
The conceptual level presents a logical view of the entire database as a unified whole. It allows
the user to bring all the data in the database together and see it in a consistent manner. Hence ,
there is only one conceptual schema per database. The first stage in the design of a database is to
define the conceptual view, and a DBMS provides a data definition language for this purpose. it
describes all the records and relationships included in the database.The data definition language
used to create the conceptual level must not specify any physical storage considerations that
should be handled by the physical level. It does not provide any storage or acess details, but
defines the information content only.
The collection of files permanently stored on secondary storage devices is known as the physical
database. The physical or internal level is the one closest to the physical storage, ans it provide a
low level description of the physical database, and an interface between the operating system file
system and the record structures used in higher level of abstraction. It is at this level that record
types and methods of storage are defined, as well as how stored fields are represented, what
physical sequence the stored records are in, and what other physical structures exist.
Fig 1.7 Architecture of DBMS
We know that three view-levels are described by means of three schemas. These schemas are
stored in the data dictionary. In DBMS, each user refers only to its own external schema. Hence,
the DBMS must transform a request on. a specified external schema into a request against
conceptual schema, and then into a request against internal schema to store and retrieve data to
and from the database.
The process to convert a request (from external level) and the result between view levels is called
mapping. The mapping defines the correspondence between three view levels. The mapping
description is also stored in data dictionary. The DBMS is responsible for mapping between
these three types of schemas. There are two types of mapping.
Conceptual-Internal Mapping
The conceptual-internal mapping defines the correspondence between the conceptual view and
the internal view, i.e. database stored on the physical storage device. It describes how conceptual
records are stored and retrieved to and from the storage device. This means that conceptual-
internal mapping tells the DBMS that how the conceptual! records are physically represented. If
the structure of the stored database is changed, then the mapping must be changed accordingly. It
is the responsibility of DBA to manage such changes.
Fig 1.7 Mapping Between views
A database schema defines its entities and the relationship among them. It contains a descriptive
detail of the database, which can be depicted by means of schema diagrams. It’s the database
designers who design the schema to help programmers understand the database and make it
useful.
Subschema
A subschema is a subset of the schema and inherits the same property that a schema has. The
plan (or scheme) for a view is often called subschema. Subschema refers to an application
programmer's (user's) view of the data item types and record types, which he or she uses. It gives
the users a window through which he or she can view only that part of the database, which is of
interest to him. Therefore, different application programs can have different view of data.
Data Independence
A major objective for three-level architecture is to provide data independence, which means that
upper levels are unaffected by changes in lower levels.
Logical data independence indicates that the conceptual schema can be changed without
affecting the existing external schemas. The change would be absorbed by the mapping between
the external and conceptual levels. Logical data independence also insulates application
programs from operations such as combining two records into one or splitting an existing record
into two or more records. This would require a. change in the external/conceptual mapping so as
to leave the external view unchanged.
Physical data independence indicates that the physical storage structures or devices could be
changed without affecting conceptual schema. The change would be absorbed by the mapping
between the conceptual and internal levels. Physic 1data independence is achieved by the
presence of the internal level of the database and the n, lPping or transformation from the
conceptual level of the database to the internal level. Conceptual level to internal level mapping,
therefore provides a means to go from the conceptual view (conceptual records) to the internal
view and hence to the stored data in the database (physical records).
Difference between physical and logical data independence
Data Base Administrator (DBA) is a person or group in charge for implementing DBMS in an
organization. Database Administrator's job requires a high degree of technical expertise and the
ability to understand and interpret management requirements ata senior level. In practice the
DBA may consist of team of people rather than just one person
Makes decisions concerning the content of the database: It is the DBA's job to decide exactly
what information is to be held in the database-in other words, to identify the' entities of interest
to the enterprise and to identify information to be recorded about those entitie .
• Plans storage structures and access strategies: The DBA must also decide how the data is to
be represented in the database, and must specify the representation by writing the storage
structure definition (using the internal data defination language).
In addition, the associated mapping between the storage structure definition and the conceptual
schema must also be specified.
• Provides support to users: It is the responsibility of the DBA to provide support to the users,
to ensure that the data they require is available, and to write the\ necessary external schemas
(using the appropriate external data definition language).
In addition, the mapping between any given eA1ernal schema and the conceptual' schema must
also be specified.
• Defines security and integrity checks: DBA is responsible for providing the authorization and
authentication checks such that no malicious users can accessdatabase and it must remain
protected. DBA must also ensure the integrity of the database.
• Interprets backup and recovery strategies: In the event of damage to any portion\ of the
database-caused by human error, say, or a failure in the hardware or supporting operating
system-it is essential to be able to repair the data concerned witl1 a minimum of delay and with
as little effect as possible on the rest of the system.
The DBA must define and implement an appropriate recovery strategy to recover he database
from all types of failures.
DBA is responsible for so organizing the system as to get the performance that is "best for the
enterprise," and for making the appropriate adjustments as requirements change.
Chapter 2
A database model is a type of data model that determines the logical structure of database and
fundamentally determines in which manner data can be stored, organized, and manipulated.
One-to-many (1:M or 1..*) relationship. A painter paints many different paintings, but each
one of them is painted by only one painter. Thus, the painter (the “one”) is related to the
paintings (the “many”). Therefore, database designers label the relationship “PAINTER paints
PAINTING” as 1:M. (Note that entity names are often capitalized as a convention so they are
easily identified.) Similarly, a customer (the “one”) may generate many invoices, but each
invoice (the “many”) is generated by only a single customer. The “CUSTOMER generates
INVOICE” relationship would also be labeled 1:M.
Many-to-many (M:N or *..*) relationship. An employee may learn many job skills, and each
job skill may be learned by many employees. Database designers label the relationship
“EMPLOYEE learns SKILL” as M:N. Similarly, a student can take many classes and each class
can be taken by many students, thus yielding the M:N relationship label for the relationship
expressed by “STUDENT takes CLASS.”
One-to-one (1:1 or 1..1) relationship. A retail company’s management structure may require
that each of its stores be managed by a single employee. In turn, each store manager, who is an
employee, manages only a single store. Therefore, the relationship “EMPLOYEE manages
STORE” is labeled 1:1.
The hierarchical model was developed in the 1960s to manage large amounts of data for
complex manufacturing projects such as the Apollo rocket that landed on the moon in 1969. Its
basic logical structure is represented by an upside-down tree. The hierarchical structure contains
levels, or segments. A segment is the equivalent of a file
system’s record type. Within the hierarchy, the top layer (the root) is perceived as the parent of
the segment directly beneath it. For example, in Figure 2.1, the root segment is the parent of the
Level 1 segments, which, in turn, are the parents of the Level 2 segments, etc. The segments
below other segments are the children of the segment above. In short, the hierarchical model
depicts a set of one-to-many (1:M) relationships between a parent and its childrensegments.
(Each parent can have many children, but each child has only one parent.)
Disadvantages
In the network model, entities are organized in a graph, in which some entities can be accessed
through several path. In the network model, the user perceives the network database as a
collection of records in 1:M relationships. However, unlike the hierarchical model, the network
model allows a record to have more than one parent. In network database terminology, a
relationship is called a set. Each set is composed of at least two record types: an owner record
and a member record.
Advantages
Disadvantages
The relational model was introduced in 1970 by E. F. Codd (of IBM) in his landmark paper “A
Relational Model of
Data for Large Shared Databanks” (Communications of the ACM, June 1970, pp. 377−387). The
relational model
represented a major breakthrough for both users and designers. Its conceptual simplicity set the
stage for a genuine database revolution.
The relational model foundation is a mathematical concept known as a relation. To avoid the
complexity of abstract
mathematical theory, you can think of a relation (sometimes called a table) as a matrix
composed of intersecting rows and columns. Each row in a relation is called a tuple. Each
column represents an attribute. The relational model also describes a precise set of data
manipulation constructs based on advanced mathematical concepts.
Example of RDBMS include Oracle, DB2, Microsoft SQL Server & MySQL.
Arguably the most important advantage of the RDBMS is its ability to hide the complexities of
the relational model from the user. The RDBMS manages all of the physical details, while the
user sees the relational database as a collection of tables in which data are stored. The user can
manipulate and query the data in a way that seems intuitive and logical.Tables are related to each
other through the sharing of a common attribute (value in a column). For example, the
CUSTOMER table in Figure 2.3 might contain a sales agent’s number that is also contained in
the AGENT table.
Relationship between Teacher table & Classes table can be created through teacherID. Following
figure shows this concept.
Fig Linkage between tables in RDBMS
Data Structure
The table format is simple and easy for database users to understand and use. RDBMSs provide
data access using a natural structure and organization of the data. Database queries can search
any column for matching entries.
Multi-User Access
RDBMSs allow multiple database users to access a database simultaneously. Built-in locking
and transactions management functionality allow users to access data as it is being changed,
prevents collisions between two users updating the data, and keeps users from accessing partially
updated records.
Privileges
Authorization and privilege control features in an RDBMS allow the database administrator to
restrict access to authorized users, and grant privileges to individual users based on the types of
database tasks they need to perform. Authorization can be defined based on the remote client IP
address in combination with user authorization, restricting access to specific external computer
systems.
Network Access
RDBMSs provide access to the database through a server daemon, a specialized software
program that listens for requests on a network, and allows database clients to connect to and use
the database. Users do not need to be able to log in to the physical computer system to use the
database, providing convenience for the users and a layer of security for the database. Network
access allows developers to build desktop tools and Web applications to interact with databases.
Speed
The relational database model is not the fastest data structure. RDBMS advantages, such as
simplicity, make the slower speed a fair trade-off. Optimizations built into an RDBMS, and the
design of the databases, enhance performance, allowing RDBMSs to perform more than fast
enough for most applications and data sets. Improvements in technology, increasing processor
speeds and decreasing memory and storage costs allow systems administrators to build
incredibly fast systems that can overcome any database performance shortcomings.
Maintenance
RDBMSs feature maintenance utilities that provide database administrators with tools to easily
maintain, test, repair and back up the databases housed in the system. Many of the functions can
be automated using built-in automation in the RDBMS, or automation tools available on the
operating system.
Language
RDBMSs support a generic language called "Structured Query Language" (SQL). The SQL
syntax is simple, and the language uses standard English language keywords and phrasing,
making it fairly intuitive and easy to learn. Many RDBMSs add non-SQL, database-specific
keywords, functions and features to the SQL language.
Disadvantages of RDBMS
Cost of software/hardware and migration: A significant disadvantage of the DBMS system is cost. In
addition to the cost of purchasing or developing the software, the hardware has to be upgraded to allow
for the extensive programs and work spaces required for their execution and storage. The processing
overhead introduced by DBMS to implement security, integrity, and sharing of the data causes a
degradation of the response and throughput times. An additional cost is that of migration from a
traditionally separate application environment to an integrated one.
Problem associated with centralization: While centralization reduces duplication, the lack of duplication
requires that the database be adequately backed up so that in the case of failure the data can be
recovered. Centralization also means that the data is accessible from a single source. This increases the
potential severity of security breaches and disruption of the operation of the organization because of
downtimes and failures. The replacement of a monolithic centralized database by a federation of
independent and cooperating distributed databases resolves some of the problems resulting from
failures and downtimes
Chapter 3 Normalization
Normalization is a process for evaluating and correcting table structures to minimize data
redundancies, thereby reducing the likelihood of data anomalies. The normalization process involves
assigning attributes to table.
Normalization is used for mainly two purpose,
Updation Anamoly : To update address of a student who occurs twice or more than twice in a
table, we will have to update S_Address column in all the rows, else data will become
inconsistent.
Insertion Anamoly : Suppose for a new admission, we have a Student id(S_id), name and
address of a student but if student has not opted for any subjects yet then we have to
Deletion Anamoly : If (S_id) 401 has only one subject and temporarily he drops it, when we
delete that row, entire student record will be deleted along with it.
Normalization Steps
We shall discuss all these NFs one by one.
As per First Normal Form, no two Rows of data must contain repeating group of information i.e each
set of column must have a unique value, such that multiple columns cannot be used to fetch the
same row. Each table should be organized into rows, and each row should have a primary key that
distinguishes it as unique.
The Primary key is usually a single column, but sometimes more than one column can be combined
to create a single primary key. For example consider a table which is not in First normal form
Student Table :
Alex 14 Maths
Stuart 17 Maths
In First Normal Form, any row must not have a column in which more than one value is saved, like
separated with commas. Rather than that, we must separate such data into multiple rows.
Adam 15 Biology
Adam 15 Maths
Alex 14 Maths
Stuart 17 Maths
Using the First Normal Form, data redundancy increases, as there will be many columns with same
data in multiple rows but each row as a whole will be unique.
As per the Second Normal Form there must not be any partial dependency of any column on primary
key. It means that for a table that has concatenated primary key, each column in the table that is not
part of the primary key must depend upon the entire concatenated key for its existence. If any
column depends only on one part of the concatenated key, then the table fails Second normal
form.
In example of First Normal Form there are two rows for Adam, to include multiple subjects that he
has opted for. While this is searchable, and follows First normal form, it is an inefficient use of space.
Also in the above Table in First Normal Form, while the candidate key is {Student, Subject}, Age of
Student only depends on Student column, which is incorrect as per Second Normal Form. To
achieve second normal form, it would be helpful to split out the subjects into an independent table,
and match them up using the student names as foreign keys.
Student Age
Adam 15
Alex 14
Stuart 17
In Student Table the candidate key will be Student column, because all other column i.e Age is
dependent on it.
Student Subject
Adam Biology
Adam Maths
Alex Maths
Stuart Maths
In Subject Table the candidate key will be {Student, Subject} column. Now, both the above tables
qualifies for Second Normal Form and will never suffer from Update Anomalies. Although there are a
few complex cases in which table in Second Normal Form suffers Update Anomalies, and to handle
those scenarios Third Normal Form is there.
Student_Detail Table :
In this table Student_id is Primary key, but street, city and state depends upon Zip. The dependency
between zip and other fields is called transitive dependency. Hence to apply 3NF, we need to
move the street, city and state to new table, with Zip as primary key.
Address Table :
Boyce and Codd Normal Form is a higher version of the Third Normal form. This form deals with
certain type of anamoly that is not handled by 3NF. A 3NF table which does not have multiple
overlapping candidate keys is said to be in BCNF. For a table to be in BCNF, following conditions
must be satisfied: