Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
100% found this document useful (1 vote)
67 views

National Certificate in Software Engineering Software Engineering: I

This document provides an introduction to database concepts. It discusses key definitions such as data, information, and databases. It also outlines different types of databases like relational, hierarchical, object-oriented, distributed, and network databases. The document introduces database management systems and their role in allowing applications to create, retrieve, update and delete data. It discusses database administration topics such as security, backup plans, maintenance and monitoring. Finally, it defines transaction processes and their properties.

Uploaded by

Melusi Demadema
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
67 views

National Certificate in Software Engineering Software Engineering: I

This document provides an introduction to database concepts. It discusses key definitions such as data, information, and databases. It also outlines different types of databases like relational, hierarchical, object-oriented, distributed, and network databases. The document introduces database management systems and their role in allowing applications to create, retrieve, update and delete data. It discusses database administration topics such as security, backup plans, maintenance and monitoring. Finally, it defines transaction processes and their properties.

Uploaded by

Melusi Demadema
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 102

NATIONAL CERTIFICATE IN SOFTWARE ENGINEERING

SOFTWARE ENGINEERING: I

MODULE: DATABASE CONCEPTS


CONTENT

1. INTRODUCTION TO DATABASES
1. Definitions
2. Structure of a database
3. Advantages and Disadvantages of databases
4. File system comparison
5. Introduction to Database Management Systems
2. TYPES OF DATABASES
1. Introduction to Relational Databases
2. Hierarchical Databases
3. Objected Oriented (Programming) Databases
4. Distributed Databases
5. Network Databases
3. RELATIONAL DATABASE
1. Identification of Entities
2. Mapping Relationship
3. Drawing Entity Relationship (E-R) Diagrams
4. Normalisation (1st Normal Form to BCNF)
5. Database Schema (Table Design)
6. Querying (SQL Query Strings, DDL & DML)
4. DATABASE ADMINISTRATION
1. Database Security
2. Backup and Recovery Plans
3. Database Maintenance
4. Database Monitoring
5. TRANSACTION PROCESSES
1. Define Transaction Processes
2. Outline Properties

NC Software Engineering: Database Concepts Module. Compiled by C. Uta utanoel@gmail.com 2 | 100


CHATPER 1

INTRODUCTION TO DATABASES

NC Software Engineering: Database Concepts Module. Compiled by C. Uta utanoel@gmail.com 3 | 100


1. INTRODUCTION TO DATABASES

1.1. Objectives
Main Objective: Outline the application of database management system in
computing.
1. Identify various areas of database systems applications
2. Define and explain key concepts of database systems
3. Explain advantages of using database system
4. Explain disadvantages of using a database system

1.2. Introduction
The traditional / conventional approach to data management exploits the presence
of files to store data permanently. A file allows for the storage and searching of data,
but provides only simple mechanisms for access, sharing, and management.
With this approach, the procedures written in a programming language are
completely autonomous; each one defines and uses one or more ‘private’ files. Data
of possible interest to more than one program is replicated as many times as there
are programs that use it, with obvious redundancy and the possibility of
inconsistency.
 Databases were created, for the most part, to overcome this type of
inconvenience. The motivation for databases over files is that there is
integration for easy access and update, non-redundancy, and multi-access.
 The data within a database is structured so as to model a real world structures
and hierarchies so as to enable conceptually convenient data storage,
processing and retrieval mechanisms
 Clients (Services or applications) interact with databases through queries
(remote or otherwise) to Create, Retrieve, Update, and Delete (CRUD) data
within a database. This process is facilitated through a Database Management
System (DBMS)

1.3. Definitions
There are many concepts related to databases that one studying the subject need to
be conversant with. This is necessary for understanding the concepts that are
covered in the course of study. For someone studying I.T., it is important to grow the
habit of enriching oneself with as much of I.T. skills, vocabulary and concepts as is
possible. That empowers you and makes you relevant in an ever changing I.T.
environment and the global space at large.
The following are some of the terms commonly associated with database systems. It
is also necessary to be aware that the same terms may mean different things in
different I.T., and other contexts.
1. Data: Data are the building blocks of information. The word data can have
varied meanings depending on the context it used in. Data may mean any and
beyond the following:
i. Raw facts, figures, symbols and or other representations standing singly
or in combination, but delivering incomplete and incomprehensible
message to the receiver, thus unable to equip him/her with a full scale
account of contextual issues for effective decision making.

NC Software Engineering: Database Concepts Module. Compiled by C. Uta utanoel@gmail.com 4 | 100


ii. Data are basic facts or values. Every task a computer carries out works
with data in some way. It is therefore important to understand what
data are and how to represent and organize it.
iii. In the context of a database environment, data are the facts about
entities (both physical and abstract) stored for reference, manipulation
and / or extraction and use by varied stakeholders. Stakeholders are
people or other systems that interact with data in one or many ways for
fulfilling their obligations. Data stored is used to make business
decisions, allowing an organization to function more effectively and
effectively.
There are basically two types of data that reside in any database:
 Static or historic data is data which is seldom or never modified
once stored in the database. For example, historic data for a
company can be stored offline and accessed only when needed.
Historic data never changes. Certain historic data can be used to
track trends or business statistics, and can be used later to make
business decisions.
 Dynamic or transactional data, is data that is frequently modified
once stored in the database. At a minimum, most companies have
dynamic data.
Most companies have a combination of both dynamic and static data. For
example, an online bookstore has mostly transactional data because
customer orders are constantly being processed. An online bookstore,
however, might also need to track statistics, such as book categories that
have had the highest sales in a particular location over the past five years.
2. Information: Information refers to processed and organised data that conveys a
complete meaning to the user or receiver. It enables one to make decisions. In
the context of databases, information is the result of referencing, extracting,
and / or manipulating some or all data in a database repository for purposes of
fulfilling business and / or other societal needs. The facts and figures are
represented as text, tables, graphics, or a combination of these to convey a
meaningful message to the receiver.
3. Database: A database is an organised or logical collection of related data
gathered for a specific purpose. For example, a database on all students enrolled
at a college or of all items of equipment that we have as a company. A database
may be paper-based, but oftentimes it is computerised for purposes of meeting
modern day data processing needs. The following are some other definitions of
a database.
i. A database can also be defined as a collection of persistent data that is
interrelated and serves the needs of multiple users within one or more
organizations.
ii. A database system is a collection of data managed by a Database
Management System (DBMS). In a broader
sense, a database system is more than just the
organised data and the DBMS but it includes,
hardware, procedures, and people that are
part and parcel of the database environment.
iii. “A shared collection of logically related data,
and a description of this data, designed to meet
the information needs of an organisation.”
Connolly/Begg. Logically related data comprises entities, attributes, and
relationships of an organization's information.

NC Software Engineering: Database Concepts Module. Compiled by C. Uta utanoel@gmail.com 5 | 100


1.3.1. Example Uses of Database Systems

Examples of databases include student-lists, membership / customer lists, library


catalogues, and web page content, account maintenance & access in banking,
lending library systems, airline reservation systems, internet purchasing systems,
media archives for radio/TV stations. The list is, in fact, infinite. You can model and
design a database to store anything which can be represented as structured
information.
1.3.2. Illustration of a database system

Users / Programs

Application Programs / Queries

DBMS Software

Software to Process Queries

Software to access stored data

Stored Database Stored Database


Definition (Metadata) (Data items)

 Users / programs are the people or other systems that interact with
the database at the outmost periphery of the database layers. They
input, output, transfer, convert, or perform some operations on data.
 Application Programs / Queries: these perform operations as
required by the users in the layer above or do some preliminary work
for lower layers. An example is rate conversion program in foreign
currency trading transaction system.
 Software to Process Queries: Queries from the above layer are
processed by this layer. This includes data manipulation languages, a
component of query languages.
 Software to access stored data: This is special software, generally
termed Database Management Systems (DBMS) of which a
component such as Data Control Language can be used here.
 Stored Database Definition (Metadata): Acts as a reference and
provides meanings of data stored in the database. Also called
System ue (data dictionary or metadata) provides the description of
the data to enable program–data independence
 Stored Database (Data items): This is the actual facts sitting on the
database / on storage media, such as hard disk, and can include
accountholder details in a banking system.

NC Software Engineering: Database Concepts Module. Compiled by C. Uta utanoel@gmail.com 6 | 100


1.4. Structure of a Database
A database is an organized collection of data. Instead of having all the data in a
disordered list, a database provides a structure to organize the data.
The following diagram is a representation of a database structure. Other
representations are possible.
A structure shows components that make up a system and how they fit together,
that is, their
placement within
scheme or their
relationship. Database

Entity Entity

Record Record Record Record

Attribute Attribute Attribute

1. Entity: An entity is a “thing” or object for which we can gather data about. An
entity can be physical or abstract. In a database context, an entity is a thing
that we can hold data about, in the database. An employee, a course, a piece
of equipment, a building, or a computer can be an entity. An entity represents
a category of data. Entities are used to logically separate data.
2. Record: A record is a collection of fields or attributes pertaining to a single
occurrence or instance in a database. For example, a student record in a
college system can be made up of the following attributes: student ID, student
first name, student surname, student Course ID and student DOB.
3. Attributes: An attribute is a characteristic, fact, feature, property, sub-group,
or phenomenon of information that describes an entity. In a college system, a
student is an entity. As an entity, a student has many facts. These can student
ID, student first name, student surname, student Course ID and student DOB.
Only relevant characteristics / facts are selected for a particular database
application. Facts stored are an abstraction of the actual entity. Note that
some attributes can be split further, for example, date can be split into day,
month, and year; and name into first and last names.
4. Value / Occurrence: this is the actual data item stored within an attribute /
domain / column when the table is filled up / populated. For example, if the
attribute was student ID, then a value could be something like 20200779SE.
5. Data Type: The various attributes of an entity belong to particular data types.
The types dictate whether the values stored are going to be numbers only,
strings (alphanumeric), dates, money / currency, password, yes (true) / no
(false), and many others. The types may be decided upon arbitrarily or can be
natural types.
6. Data Modelling: Iterative and progressive process of creating a specific data
model for a determined problem domain
7. Data Models: Simple representations of complex real-world data structures.
They are useful for supporting a specific problem domain. They are also called
abstractions of real-world objects or events.

NC Software Engineering: Database Concepts Module. Compiled by C. Uta utanoel@gmail.com 7 | 100


8. A model is a collection of concepts that describe the structure of a database.
It provides the means to achieve data abstraction, and understanding of data.

1.5. Advantages and Disadvantages of Databases


1.5.1. Advantages of Databases

A good database system should provide the following advantages over a


conventional file system:
1. Reduced data redundancy: Data duplication is reduced within a database
system. Data may occur only in one place, instead of many places. This saves
space, and ultimately reduces costs of storing, processing, and managing
data.
2. Reduced updating errors and increased consistency: Data stored in a
database frequently need to be updated. If the same data is stored in
different files, the chances are that some of the same data in two files may
not receive updates at the same time, this results in different data values for
same things. A DB system minimizes the chances of these inconsistencies.
3. Greater data integrity and independence from applications programs: Data
integrity refers to the accuracy and completeness of stored data. A database
allows integrity to be achieved through storing a single copy of data for all
users and when this copy changes, the change propagates to all affected files
instantly. Data and program independence means that data can be changed
without changing the programs that use the data and vice versa.
4. Improved data access to users through use of host and query languages:
Master record storage. Using a database solution is like having an electronic
version of a master record room which is accessible 24x7x365. All
information about various aspects of business such as sales, purchases,
customers, stock etc. are stored in this highly organised, electronic room for
instant access retrieval. Having all business data stored in bespoke database
applications also provides the flexibility to feed other software applications
and business processes which can help to reduce costs and by elimination
manual data validation and retrieval.
5. Reduced data entry, storage, and retrieval costs: Having a database
application also allows the business to link data stored with other business
entities. For example, linking sales data with marketing will allow the
management to analyse if the investment in marketing activities are resulting
in an increase in revenue. Also, after a significance amount of time, it is also
possible to view trends and analyses over a period of time to see whether a
particular business function, department, or an employee is performing as
expected of if any interventions are required. This is only possible when data
is stored in a database application as correlating and analysing data stored in
disparate/manual systems will make this a near impossible task.
6. Improved data Security: Just like controlling access to a master record room
via lock and key, having a database application allows the business to
exercise access control at a granular level. A database solution allows the
business to divide the information at the lowest granular level and then
exercise control over who can add, edit, view, and delete. Data stored in an
electronic format allows for encryption of confidential information.
7. Standards enforcement is made easy, Concurrency – the simultaneous
processing of multiple transactions for multiple users / applications is made
possible, and Data Sharing is enabled. Distance barriers broken if online.

NC Software Engineering: Database Concepts Module. Compiled by C. Uta utanoel@gmail.com 8 | 100


8. Faster Development of New Applications: Since most, data already in
system.

1.5.2. Disadvantages of Databases

The following can be viewed as some of the limitations of a database:


1. Database systems are complex, difficult, and time-consuming to design
and develop.
2. Substantial hardware and software start-up costs are incurred in building
the entire database infrastructure.
3. Damage to database affects virtually all applications / programs. This is
because data may be stored centrally or as single copies for the entire
database. Once this data is corrupted or destroyed, all those dependent
on it are affected.
4. Extensive conversion costs in moving from a file-based system to a
database system as well as conversion between different models, and or
data / file systems if exchanges are to be effective. Processing overheads
is also common.
5. There is initial training required for all programmers and users, and this
costs time, effort and resources that may be difficult to harness as well as
obtain value from once the system is implemented.
6. Scalability - Web applications can face unpredictable and potentially
enormous peak loads. This requires the development of a high
performance server architecture that is highly scalable. To improve
scalability, web farms have been introduced with two or more servers
hosting the same site. HTTP requests are usually routed to each server in
the farm in a round-robin fashion, to distribute load, and allow the site to
handle more requests. However, this can make maintaining state
information more complex.
7. Limited functionally of HTML – if DB is driven through a web application.
HTML is not a full-fledged programming language, therefore may impose
limitations on functional versatility of the database.
8. Statelessness - The statelessness of the web environment makes the
management of database connections and user transactions a difficult
requirement for applications to maintain additional information. The
connect-break-reconnect process is an overhead.
9. Bandwidth - The internet is currently an unreliable and slow com
munication medium-when a request is carried across the internet there is
no real guarantee of delivery because of things like heavy traffic, faults, or
sabotage.
10. Performance - Many parts of complex web database clients are centred
around interpreted languages, making them slower than the traditional
database clients, which are natively complied.
11. Immaturity of development tools: Various tools that are used in
modelling, design, development, maintenance, and management are not
standardized, and some are still new for the users making them not very
useful in database related work.
12. Opens up system to Security Breaches: A database is shared by many
applications and users, with different intentions, some devious, on the
database. If the database runs over wide area networks or even any
network, the surface area for attack is widened. Already, there are many
cases of data security being compromised across global corporations

NC Software Engineering: Database Concepts Module. Compiled by C. Uta utanoel@gmail.com 9 | 100


1.6. File System Comparison
1.6.1. File Processing Evolution (Traditional to Database Approach)
Comparison Flat File Hierarchical Network Relational Object-Oriented Hybrid
Timeframe Late 1950’s early 1960’s Mid 1960’s early 1970’s Late 1960’s mid 1970’s Early 1970’s present Late 1980’s present Late 1960’s early 1970’s
(Nested)
Late 1990’s present (Object-
Relational)
Database 1st Generation 2nd Generation 2nd Generation 3rd Generation 4th Generation 4th Generation
Evolution
Acronym DBF DBMS DBMS RDBMS OODBS, ODBMS or OODBMS O-R, ORDB or ORDBMS
Description File orientation & navigation; Hierarchical orientation & navigation; Network orientation & navigation; Relational orientation; Object orientation; Nested: database/operating
file structures & proprietary hierarchies of related records & uses hierarchically arranged data with the data retrieved by unique keys; active, distributed processing & more system with tool for data
program interfaces standard interfaces exception that child tables can have more relationships expressed through powerful operators; retrieval built in Object-
than one parent; matching keys; uses object oriented programming to Relational: relational databases
standard interfaces physical organization of data combine data structures with functions that have evolved to add object-
managed by RDBMS to create re-usable objects oriented features

Physical Flat file; one dimensional; Tree; parent-child relationships; single Network of interrelated lists; Data is stored in relations (tables); Modelling & creation of data as objects; Organize information common
Structure frequently in tabular format; table acts as the “root” of the database looks like several trees that share relationships maintained by placing a store objects and operations to be to relational tabular structures;
oftentimes multiple copies of from which other tables “branch” out; branches; key field value of one record as an performed on data; subsume the relational database
the same data were child can only have one parent, but a children have multiple parents and attribute in the related record applications interact with object model
maintained, each copy sorted parent can have multiple children parents have multiple children managers which work through object
in a different way servers to gain access to object stores
Programming Assembler, Fortran, COBOL; Commands embedded in programming Commands embedded in programming SQL, ODBC Java, C++, Smalltalk, Ada, Object Pascal, Relational: SQL3, ODBC, JDBC
Languages spreadsheets use non- languages; COBOL, PL1, Fortran, ADS & languages; COBOL, PL1, Fortran, ADS & Objective DRAGOON, BETA, Emerald, Object-Oriented: Java, C++,
Used algorithmic programming Assembler Assembler POOL, Eiffel, Seif, Oblog, ESP. Loops, Smalltalk, etc.
language Visual Basic, POLKA & Python
Structural If new fields were added to Inflexible (once data is organized in a Inflexible (once data is organized in a Flexible; because tables are subject Flexible; programs are built using chunks Cartridges, Data Blades, &
Changes the file, every program that particular way, difficult to change); particular way, difficult to change); data specific & key fields relate one entity or modules consisting of preassembled Extenders are modules that
accessed that file had to be data reorganization complicated; reorganization complicated; to another, both the data & the code & data which makes programming build on the object / relational
changed & data files would requires careful design requires careful design database structure can be easily easier & faster; changes are made in the infrastructure; they consist of
have to be converted modified & manipulated; programs underlying code rather than in the types, data structures,
independent of data format which design or structure of the database functions, & data & often
yields flexibility when modifications include special developer
are needed interfaces or prebuilt
applications
Relationships No structured Linked lists using pointers stored in the Uses series of linked lists to implement Uses key fields to link data in many Defines software pieces, object types, Primarily a relational structure
interrelationship parent/child records to navigate relationships between records; different ways; actions / methods & the interrelation with object-oriented features
between its data records through the records; pointers could be each list has an owner record & possibly supports one-to-one, one-to-many & ships between these objects; included
a disk address, the key field, or other many member records; many-to-many relationships allows objects to be re-used for different
random access technique; a single record can either be the owner or purposes
start at root and work down the tree to a member of several lists of various types;
reach target data; supports one-to-one supports one-to one, one-to-many, &
& one-to-many relationships many-to-many relationships

NC Software Engineering: Database Concepts Module. Compiled by C. Uta utanoel@gmail.com 10 | 100


1.7. Introduction to Database Management System
A database management system (DBMS) is a collection of components that supports
the creation, use, and maintenance of databases. Initially, DBMSs provided efficient
storage and retrieval of data. Due to marketplace demands and product innovation,
DBMSs have evolved to provide a broad range of features for data acquisition,
storage, dissemination, maintenance, retrieval, and formatting. The evolution of
these features has made DBMSs rather complex.
Database Management System (DBMS) – is a generalized software system for
manipulating databases. Includes logical view (schema, sub-schema), physical view
(access methods, clustering), data manipulation language, data definition language,
utilities - security, recovery, integrity, etc.
Example DBMS would include Oracle, Sybase, MySQL, DB/2, SQLServer, Informix,
MS-Access, FileMaker, Thunderbird, and Firebird. These may fall under the categories
of Relational, Hierarchical, Network, Object-Oriented, or some other categorisation.
Most DBMSs provide several tools to define databases. The Structured Query
Language (SOL) is an industry standard language supported by most DBMSs. SQL can
be used to define tables, relationships among tables, integrity constraints (rules that
define allowable data), and authorization rights (rules that restrict access to data).
Database languages which often come as components of a DBMS are special-purpose
languages, which allow one or more of the following tasks, sometimes distinguished
as sublanguages:
1. Data control language (DCL) – controls access to data;
2. Data definition language (DDL) – defines data types such as creating, altering,
or dropping and the relationships among them;
3. Data manipulation language (DML) – performs tasks such as inserting,
updating, or deleting data occurrences;
4. Data query language (DQL) – allows searching for information and computing
derived information.
5. Data Storage Language (DSL) – manages the storage of data on storage media.
This responsibility is now mainly carried out by the OS.
The most important feature of a DBMS is the ability to answer queries. A query is a
request for data to answer a question. For example, the user may want to know
students having large fees balances or courses with strong sales in a particular region.
Database Management System (DBMS): as a collection of programs enables users to
perform actions on databases, such as the following:
1. define the structure of database information (descriptive attributes, data types,
constraints, etc), storing this as metadata
2. populate the database with appropriate information
3. manipulate the database (for retrieval/update/removal/insertion of dara)
4. protect the database contents against accidental or deliberate corruption of
contents (involves secure access by users and automatic recovery in the case of
user/hardware faults)
5. share the database among multiple users, possibly concurrently
Basically, a DBMS does the following:
1 add, remove, update records
2 retrieve data that match certain criteria
3 cross-reference data in different tables

NC Software Engineering: Database Concepts Module. Compiled by C. Uta utanoel@gmail.com 11 | 100


4 perform complex aggregate calculation
1.7.1. DBMS Component Modules

The database and the DBMS catalogue are usually stored on disk. Access to the
disk is controlled primarily by the operating system (OS), which schedules disk
input/output. A higher-level stored data manager module of the DBMS controls
access to DBMS information that is stored on disk, whether it is part of the
database or the catalogue. The stored data manager may use basic OS services for
carrying out low-level data transfer between the disk and computer main storage,
but it controls other aspects of data transfer, such as handling buffers in main
memory. Once the data is in main memory buffers, it can be processed by other
DBMS modules, as well as by application programs.
1. The DDL compiler processes schema definitions, specified in the DDL, and
stores descriptions of the schemas (meta-data) in the DBMS catalogue.
The catalogue includes information such as the names of files, data items,
storage details of each file, mapping information among schemas, and
constraints, in addition to many other types of information that are
needed by the DBMS modules. DBMS software modules then look up the
catalogue information as needed.
2. The run-time database processor handles database accesses at run time;
it receives retrieval or update operations and carries them out on the
database. Access to disk goes through the stored data manager.
3. The query compiler handles high-level queries that are entered
interactively. It parses, analyzes, and compiles or interprets a query by
creating database access code, and then generates calls to the run-time
processor for executing the code.
4. The pre-compiler extracts DML commands from an application program
written in a host programming language. These commands are sent to the
DML compiler for compilation into object code for database access. The
rest of the program is sent to the host language compiler. The object
codes for the DML commands and the rest of the program are linked,
forming a canned transaction whose executable code includes calls to the
runtime database processor.
The DBMS interacts with the operating system when disk accesses either to the
database or to the catalogue are needed. If the computer system is shared by
many users, the OS will schedule DBMS disk access requests and DBMS processing
along with other processes. The DBMS also interfaces with compilers for general-
purpose host programming languages. User-friendly interfaces to the DBMS can
be provided to help any of the user types to specify their requests.

1.7.2. Database System Utilities

In addition to possessing the software modules just described, most DBMSs have
database utilities that help the DBA in managing the database system. Common
utilities have the following types of functions:
1. Loading: A loading utility is used to load existing data files—such as text
files or sequential files—into the database. Usually, the current (source)
format of the data file and the desired (target) database file structure are
specified to the utility, which then automatically reformats the data and
stores it in the database. With the proliferation of DBMSs, transferring
data from one DBMS to another is becoming common in many
organizations. Some vendors are offering products that generate the

NC Software Engineering: Database Concepts Module. Compiled by C. Uta utanoel@gmail.com 12 | 100


appropriate loading programs, given the existing source and target
database storage descriptions (internal schemas). Such tools are also
called conversion tools.
2. Backup: A backup utility creates a backup copy of the database, usually by
dumping the entire database onto tape. The backup copy can be used to
restore the database in case of catastrophic failure. Incremental backups
are also often used, where only changes since the previous backup are
recorded. Incremental backup is more complex but it saves space.
3. File reorganization: This utility can be used to reorganize a database file
into a different file organization to improve performance.
4. Performance monitoring: Such a utility monitors database usage and
provides statistics to the DBA. The DBA uses the statistics in making
decisions such as whether or not to reorganize files to improve
performance.
5. Other utilities may be available for sorting files, handling data
compression, monitoring access by users, and performing other functions.
1.7.3. Tools, Application Environments, and Communications Facilities

Other tools are often available to database designers, users, and DBAs. CASE tools
are used in the design phase of database systems. Another tool that can be quite
useful in large organizations is an expanded data dictionary (or data repository)
system. In addition to storing catalogue information about schemas and
constraints, the data dictionary stores other information, such as design decisions,
usage standards, application program descriptions, and user information. Such a
system is also called an information repository. This information can be accessed
directly by users or the DBA when needed. A data dictionary utility is similar to the
DBMS catalogue, but it includes a wider variety of information and is accessed
mainly by users rather than by the DBMS software.
Application Development Environments, such as the PowerBuilder system, are
becoming quite popular. These systems provide an environment for developing
database applications and include facilities that help in many facets of database
systems, including database design, GUI development, querying and updating, and
application program development. These environments fall under IDEs or SDKs or
Frameworks that provide templating mechanisms for design, development, and
deployment of database systems
The DBMS also needs to interface with communications software, whose function
is to allow users at locations remote from the database system site to access the
database through computer terminals, workstations, or their local personal
computers. These are connected to the database site through data
communications hardware such as phone lines, long-haul networks, local-area
networks, or satellite communication devices. Many commercial database systems
have communication packages that work with the DBMS. The integrated DBMS
and Data Communications system is called a DB/DC system. In addition, some
distributed DBMSs are physically distributed over multiple machines. In this case,
communications networks are needed to connect the machines. These are often
local area networks (LANs) but they can also be other types of networks.

NC Software Engineering: Database Concepts Module. Compiled by C. Uta utanoel@gmail.com 13 | 100


1.8. Chapter 1 Questions
1. Define the term “File System” [2]
2. Explain two limitations of the File System [2]
3. Define the term database [2]
4. Differentiate a database from a database system [4]
5. Explain the following terms in the context of database systems:
a. Data [2]
b. Static data [2]
c. Dynamic data [2]
d. Transactional data [2]
e. Information [2]
f. DBMS [2]
6. Identify and explain five (5) applications where databases are used. [10]
7. Draw a diagram to show an illustration of a database system
highlighting the following elements and their inter-relationships: [10]
a. Users / Programs
b. Application Programs / Queries
c. DBMS
d. Data Dictionary
e. Database
8. Explain the role played by each component list in a. - e. in
question 7 above. [20]
9. The structure of a database can be shown as a hierarchy of elements.
Using the following elements, show a structural representation of a
database: attribute, entity, value, database, record. [10]
10. Why is it important to model the database before building the actual
database? [5]
11. Outline:
a. Five (5) advantages brought about by adopting a database
approach to organising, using, and managing enterprise data. [10]
b. Five (5) disadvantages brought about by adopting a database
approach to organising, using, and managing enterprise data. [10]
12. Using diagrammatic representations, describe the following database
structures:
a. File System [5]
b. Hierarchical System [5]
c. Network System [5]
d. Relational System [5]

NC Software Engineering: Database Concepts Module. Compiled by C. Uta utanoel@gmail.com 14 | 100


13. List five (5) examples of DBMSes [5]
14. Explain each of following components of a DBMS:
a. Data Control Language [5]
b. Data definition Language [5]
c. Data Manipulation Language [5]
d. Data Query Language [5]
e. Data Storage Language [5]
15. Identify and elaborate on each of the five (5) main functions of a DBMS. [25]
16. Explain the following utilities offered by a DBMS:
a. Loading [3]
b. Compilation [3]
c. Backup [3]
d. File reorganisation [3]
e. Handling data compression [3]
f. Encryption and decryption [3]
g. Performance Monitoring [3]
17. Identify and explain any three (3) tools that are useful in a database
environment. [9]
18. What purpose do CASE Tools serve in database environments? [10]
19. What educational qualifications, skills, knowledge, and aptitudes should
an aspiring DBA possess? [15]
20. In your opinion, will databases be still useful in the next two decades
from the year 2020? Support your position. [20]
21. What are the database trending technologies? [20]
22. A company needs to implement a database system. As a database
consultant, what advice can you give them before, during, and after
the implementation of a database system in their organisation? [30]
23. Discuss ethical issues that may arise in database environments and
how a balance can be struck between organisational and
stakeholder needs. [25]

NC Software Engineering: Database Concepts Module. Compiled by C. Uta utanoel@gmail.com 15 | 100


CHATPER 2

TYPES OF DATABASES

NC Software Engineering: Database Concepts Module. Compiled by C. Uta utanoel@gmail.com 16 | 100


2. TYPES OF DATABASES

2.1. Objectives
Main Objective: Be able to compare and contrast different types of database
management system:
Have an understanding of the following database management systems:
1. Relational Database
2. Hierarchical Database
3. Object oriented Database
4. Distributed Database
5. Network Database

2.2. Introduction
There are many types of databases. These include hierarchical, network, relational,
object-oriented, graph, NoSql, and NewSql. We will focus on the first five.
A hierarchical model is a structure of data organized in a tree-like model using
parent-child relationships while network model is a database model that allows
multiple records to be linked to the same owner file. A relational model, on the other
hand, is a database model to manage data as tuples grouped into relations (tables).
An object-oriented database encapsulates data and procedures.

2.3. Introduction to Relational Databases


A relational database organizes data in tables (or relations). A table is made up of
rows and columns. A database table is similar to a spreadsheet. However, the
relationships that can be created among the tables enable a relational database to
efficiently store huge amount of data, and effectively retrieve selected data.
A language called SQL (Structured Query Language) was developed to work with
relational databases.
2.3.1. Relational Model Concepts

 Attribute: Each column in a Table. Attributes are the properties which define
a relation. e.g., Student_Rollno, NAME, etc.
 Tables – In the Relational model the, relations are saved in the table format. It
is stored along with its entities. A table has two properties rows and columns.
Rows represent records and columns represent attributes.
 Tuple – It is a single row of a table, which contains a single record.
 Relation Schema: A relation schema represents the name of the relation with
its attributes.
 Degree: The total number of columns / attributes which is in the relation is
called the degree of the relation.
 Cardinality: Total number of rows present in the Table.
 Column: The column represents the set of values for a specific attribute.
 Relation instance – Relation instance is a finite set of tuples in the RDBMS
system. Relation instances never have duplicate tuples.
 Relation key - Every row has one, two or multiple attributes, which is called
relation key.

NC Software Engineering: Database Concepts Module. Compiled by C. Uta utanoel@gmail.com 17 | 100


 Attribute domain – Every attribute has some pre-defined value and scope
which is known as attribute domain
2.3.2. Common characteristics of relational database include:

 Relational model is organisation of data into collections of two-dimensional


tables called ‘relations’
 Table has columns and records, standard column types such as numeric,
string, datetime, or Boolean, etc can be defined
 Storing information in tables using the simplest and most versatile method.
Each intersection of a table forms a cell where ‘attributes’ are contained.
 Uses Keys for a table (set of attributes) whose values uniquely determine
the values of a row of the table. Each table has a primary key, a unique
identifier constructed from one or more columns. A proper relational table
contains no duplicate rows.
 A table is linked to another by including the other table's primary key. Such
an included column is called a foreign key.
 Indexes are data structures that help to retrieve or change information in
tables quickly by grouping similar information / data together
 Powerful (allows retrievals from, as well as writes to many tables). Supports
and manages concurrency, and rollback / rollforward.
 Expressing queries through relational algebra, notation for expressing
queries without giving details about how the operations are to be carried
out. Allows the definition of “What” (4GL feature), without specifying the
“How”.
 Relational model consists of three components: structure (table/relation),
manipulation (high level operations which act upon and produce whole
tables) and a set of rules to maintain the integrity of the database
 Relational model often used / applied in large-scale applications such as
airline reservations systems, library systems, student / college systems,
ecommerce systems, etc.
There are many commercial Relational Database Management System (RDBMS),
such as Oracle, IBM DB2 and Microsoft SQL Server. There are also many free and
open-source RDBMS, such as MySQL, and mSQL (mini-SQL).
The Relational Model

NC Software Engineering: Database Concepts Module. Compiled by C. Uta utanoel@gmail.com 18 | 100


2.3.3. Advantages and Disadvantages of using Relational model

Advantages of using Relational model


1. Simplicity: A relational data model is simpler than the hierarchical and
network model.
2. Structural Independence: The relational database is only concerned with
data and not with a structure. This can improve the performance of the
model.
3. Easy to use: The relational model is easy as tables consisting of rows and
columns is quite natural and simple to understand
4. Query capability: It makes possible for a high-level query language like
SQL to avoid complex database navigation.
5. Data independence: The structure of a database can be changed without
having to change any application.
6. Scalable: Regarding a number of records, or rows, and the number of
fields, a database should be enlarged to enhance its usability.
Disadvantages of using Relational model
1. Few relational databases have limits on field lengths which can't be
exceeded.
2. Relational databases can sometimes become complex as the amount of
data grows, and the relations between pieces of data become more
complicated.
3. Complex relational database systems may lead to isolated databases
where the information cannot be shared from one system to another.
2.3.4. Summary

 The Relational database model represents the database as a collection of


relations (tables)
 Attribute, Tables, Tuple, Relation Schema, Degree, Cardinality, Column,
Relation instance, are some important components of Relational Model
 Relational Integrity constraints are referred to conditions which must be
present for a valid relation
 Domain constraints can be violated if an attribute value is not appearing in
the corresponding domain or it is not of the appropriate data type
 Create / Insert, Retrieve / Select, Update / Modify and Delete (CRUD) are
operations performed in Relational Model
 The relational database is only concerned with data and not with a structure
which can improve the performance of the model
 Advantages of relational model are simplicity, structural independence, ease
of use, query capability, data independence, and scalability.
 Few relational databases have limits on field lengths which can't be
exceeded.

NC Software Engineering: Database Concepts Module. Compiled by C. Uta utanoel@gmail.com 19 | 100


2.4. Hierarchical Databases
In hierarchical DBMS, the relationships among data in the database are established
so that one data element exists as a subordinate of another. The data elements have
parent-child relationships and are modelled using the “tree” data structure. A Parent
element can have many children elements but a Child element cannot have many
parent elements. These are very fast and simple.
In the hierarchical data model, records are linked with other superior records on
which they are dependent and also on the records, which are dependent on them.
Hierarchical model cannot represent many to many relationships among records.
2.4.1. Examples include:

 file systems in windows


 family tree
 management hierarchy in an organisation
2.4.2. Hierarchical Structure—Characteristics

 Data element or record at the highest level of the hierarchy is called the root
element. Any data element can be accessed by moving progressively
downward from the root and along the branches of the tree until the desired
record is located.
 Each parent can have many children. Records are dependent and arranged in
multilevel structures, consisting of one root record & any number of
subordinate levels.
 Each child has only one parent
 Tree is defined by path that traces parent segments to child segments,
beginning from the left. Relationships among the records are one-to-many,
since each data element is related only to one element above it.
 Hierarchical path
 Ordered sequencing of segments tracing hierarchical structure
 Accessing data: Although it is difficult to access data in the hierarchical model,
it is easier to access data in the network model and the relational model.
 Flexibility: The hierarchical model is less flexible, but the network model, and
relational model are flexible.
2.4.3. Hierarchical Model Diagram

NC Software Engineering: Database Concepts Module. Compiled by C. Uta utanoel@gmail.com 20 | 100


2.4.4. Advantages and Disadvantages of Hierarchical Model

Advantages Disadvantages
1. Data can be retrieved easily due to 1. If the parent table and child table are
the explicit links present between unrelated then adding a new entry in the
the table structures. child table is difficult because additional
2. Referential integrity is always entry must be added in the parent table.
maintained i.e. any changes made 2. Complex relationships are not supported.
in the parent table are 3. Redundancy which results in inaccurate
automatically updated in a child information.
table. 4. Change in structure leads to change in all
3. Promotes data sharing. application programs.
4. It is conceptually simple due to the 5. M: N relationship is not supported.
parent-child relationship. 6. No data manipulation or data definition
5. Database security is enforced. language
6. Efficient with 1: N relationships. 7. Lack of standards impacts compatibility
7. A clear chain of command or and portability
authority. 8. Communication barriers
8. Increases specialization. 9. Organizational Disunity
9. High performance. 10. Rigid structure / Poor flexibility
10. Clear results.

2.5. Objected Oriented (Programming) Databases


Object-oriented DBMS is derived from the model of the object-oriented
programming paradigm. They are helpful in representing both consistent data as
stored in databases, as well as transient / transitory data (data in transition), as found
in executing programs. They use small, reusable elements called objects. Each object
contains a data part and a set of operations which works upon the data. The object
and its attributes are accessed through message passing instead of being stored in
relational table models.
Objects in an object-oriented database refer to the ability to develop a product, then
define and name it. The object can then be referenced, or called later, as a unit
without having to go into its complexities.
Data in OODB is stored as objects. Thus if a person's data were in a database, that
person's attributes, such as their address, phone number, and age were now
considered to belong to that person instead of being extraneous data. This allows for
relations between data to be relations to objects and their attributes and not to
individual fields.
2.5.1. OODB organisation

 Models both data and their relationships in a single structure called an


object
 OODM becomes the basis for the object oriented database management
system (OODBMS)
 Object is described by its factual content: Like relational model’s entity
 Includes information about relationships between facts within object and
relationships with other objects: Unlike relational model’s entity
 Object becomes basic building block for autonomous structures

NC Software Engineering: Database Concepts Module. Compiled by C. Uta utanoel@gmail.com 21 | 100


2.5.2. Object Oriented Data Model— Basic Structure

 Object: abstraction of a real-world entity


 Attributes describe the properties of an
object
 Classes are organized in a class hierarchy.
Objects that share similar characteristics
are grouped in classes
 Inheritance is the ability of an object within the class hierarchy to inherit
the attributes and methods of classes above it
Object Model aims to reduce the overhead of converting information
representation in the database to an application specific representation. An object
model allows for data persistence and storage by storing objects in the databases.
The relationships between various objects are inherent in the structure of the
objects.
2.5.3. Summary of Object Oriented Database Model

The object-oriented model is based on a collection of objects, like the E-R model.
 An object contains values stored in instance variables within the object.
o Unlike the record-oriented models, these values are themselves
objects.
o Thus objects contain objects to an arbitrarily deep level of nesting.
 An object also contains bodies of code that operate on the object.
o These bodies of code are called methods.
 Objects that contain the same types of values and the same methods are
grouped into classes.
o A class may be viewed as a type definition for objects.
o Analogy: the programming language concept of an abstract data
type.
 The only way in which one object can access the data of another object is
by invoking the method of that other object.
o This is called sending a message to the object.
o Internal parts of the object, the instance variables and method
code, are not visible externally.
o Result is two levels of data abstraction.
2.5.4. For example, consider an object representing a bank account.

 The object contains instance variables number and balance.


 The object contains a method pay-interest which adds interest to the
balance.
 Under most data models, changing the interest rate entails changing code in
application programs.
 In the object-oriented
model, this only entails
a change within
the pay-
interest method.

NC Software Engineering: Database Concepts Module. Compiled by C. Uta utanoel@gmail.com 22 | 100


Unlike entities in the E-R model, each object has its own unique identity
independent of the values it contains:
 Two objects containing the same values are distinct.
 Distinction is created and maintained in physical level by assigning distinct
object identifiers.
2.5.5. Object v/s Relational Models

Object-Oriented Model Relational Model


Class ------------------- Relation
Object Instance ------------------- Tuple
Attribute ------------------- Column
Method Different Stored Procedure

2.5.6. Advantages of OODB

Object-oriented databases make the promise of reduced maintenance, code


reusability, real world modelling, and improved reliability and flexibility.
1. An integrated repository of information that is shared by multiple users,
multiple products, multiple applications on multiple platforms.
2. It also solves the following problems:
a. The semantic gap: The real world and the Conceptual model are
very similar.
b. Impedance mismatch: Programming languages and database
systems must be interfaced to solve application problems. But the
language style, data structures, of a programming language (such
as C) and the DBMS (such as Oracle) are different. The OODB
supports general purpose programming in the OODB framework.
3. Navigate between tables with pointers that link related objects
4. Reduced Maintenance: The primary goal of object-oriented development
is the assurance that the system will enjoy a longer life while having far
smaller maintenance costs. Because most of the processes within the
system are encapsulated, the behaviours may be reused and incorporated
into new behaviours.
5. Real-World Modelling: Object-oriented systems tend to model the real
world in a more complete fashion than do traditional methods. Objects
are organized into classes of objects, and objects are associated with
behaviours. The model is based on objects, rather than on data and
processing.
6. Improved Reliability and Flexibility: Object-oriented system promise to
be far more reliable than traditional systems, primarily because new

NC Software Engineering: Database Concepts Module. Compiled by C. Uta utanoel@gmail.com 23 | 100


behaviours can be "built" from existing objects. Because objects can be
dynamically called and accessed, new objects may be created at any time.
The new objects may inherit data attributes from one, or many other
objects. Behaviours may be inherited from super-classes, and novel
behaviours may be added without effecting existing systems functions.
7. High Code Reusability: When a new object is created, it will automatically
inherit the data attributes and characteristics of the class from which it
was spawned. The new object will also inherit the data and behaviours
from all super classes in which it participates. When a user creates a new
type of a widget, the new object behaves "wigitty", while having new
behaviours which are defined to the system.
2.5.7. Disadvantages of the Object Technology

There are several major misconceptions which must be addressed when


considering the use of an object-oriented method:
1. Object-oriented Development is not a panacea - Object-oriented
Development is best suited for dynamic, interactive environments, as
evidenced by its widespread acceptance in CAD/CAM and engineering design
systems. Wide-scale object-oriented corporate systems are still unproved, and
many bread-and-butter information systems applications (i.e. payroll,
accounting), may not benefit from the object-oriented approach.
2. Object-oriented Development is not a technology - Although many advocates
are religious in their favour for object-oriented systems, remember that all the
"HOOPLA" is directed at the object-oriented approach to problem solving, and
not to any specific technology.
3. Object-oriented Development is not yet completely accepted by major
vendors - Object-oriented Development has gained some market
respectability, and vendors have gone from catering to a "lunatic fringe" to a
respected market. Still, there are major reservations as to whether Object-
oriented development will become a major force, or fade into history.
4. Lack of standards: There is a general lack of standards of OODBMSs. We have
already mentioned that there is not universally agreed data model. Similarly,
there is no standard object-oriented query language.
5. Lack of universal data model: There is no universally agreed data model for an
OODBMS, and most models lack a theoretical foundation. This .disadvantage
is seen as a significant drawback, and is comparable to pre-relational systems.
6. Lack of experience: In comparison to RDBMSs the use of OODBMS is still
relatively limited. This means that we do not yet have the level of experience
that we have with traditional systems. OODBMSs are still very much geared
towards the programmer, rather than the naïve end-user. Also there is a
resistance to the acceptance of the technology.
7. Lack of support for views: Currently, most OODBMSs do not provide a view
mechanism, which, as we have seen previously, provides many advantages
such as data independence, security, reduced complexity, and customization.

2.5.8. Conclusion

Object-oriented databases are what we call navigational. This means that access
to related objects must follow the predefined linkages created by the containers
for related objects. For example, to find all the purchases made by a customer, a
program in an object-oriented database environment would do the following:

NC Software Engineering: Database Concepts Module. Compiled by C. Uta utanoel@gmail.com 24 | 100


1. Find the customer object, perhaps using an aggregate object that collects
all the customer objects.
2. Retrieve the first related-object identifier from the customer object.
3. Use the purchase object's identifier to locate the purchase object and
process it as needed.
4. Retrieve the next related-object identifier from the customer object.
5. Repeat steps 3 and 4 until all purchase objects have been processed.
Relationships in an object-oriented database are all two-way (inverse). It is also
possible to take a single entity at the “many” end of a relationship and find its
parent entity. However, not all OODBMSs require inverse relationships. When the
database developer creates the schema, he or she must indicate which
relationships will be inverse, and which will be one-way.

2.6. Distributed Databases


A distributed database is a set of interconnected databases that is distributed over
the computer network or internet.
A distributed database system is
located on various sites that don’t
share physical components. A
distributed database is basically a
database that is not limited to one
system, it is spread over different
sites, i.e., on multiple computers
or over a network of computers.
A distributed database is a
collection of multiple
interconnected databases, which
are spread physically across
various locations that
communicate via a computer
network.
A Distributed Database Management System (DDBMS) consists of a single logical
database that is split into a number of fragments. Each fragment is stored on one or
more computers under the control of a separate DBMS, with the computers
connected by a communications network. Each site is capable of independently
processing user requests that require access to local data (that is, each site has some
degree of local autonomy) and is also capable of processing data stored on other
computers in the network.
Users access the distributed database via applications, which are classified as those
that do not require data from other sites (local applications) and those that do
require data from other sites (global applications). We require a DDBMS to have at
least one global application.
A DDBMS therefore has the following characteristics:
 a collection of logically related shared data;
 the data is split into a number of fragments;
 fragments may be replicated;
 fragments/replicas are allocated to sites;
 the sites are linked by a communications network;
 the data at each site is under the control of a DBMS;

NC Software Engineering: Database Concepts Module. Compiled by C. Uta utanoel@gmail.com 25 | 100


 the DBMS at each site can handle local applications, aut
 onomously;
 each DBMS participates in at least one global application.
2.6.1. Features of Distributed Database System

 Databases in the collection are logically interrelated with each other. Often
they represent a single logical database.
 Data is physically stored across multiple sites. Data in each site can be
managed by a DBMS independent of the other sites.
 The processors in the sites are connected via a network. They do not have any
multiprocessor configuration.
 A distributed database is not a loosely connected file system.
 A distributed database incorporates transaction processing, but it is not
synonymous with a transaction processing system.

A Distributed Database Management System (DDBMS) manages the distributed


database and provides mechanisms so as to make the databases transparent to the
users.
A distributed database management system (DDBMS) is a centralized software
system that manages a distributed database in a manner as if it were all stored in a
single location. It performs the following functions:
 It is used to create, retrieve, update and delete distributed databases.
 It synchronizes the database periodically and provides access mechanisms
by the virtue of which the distribution becomes transparent to the users.
 It ensures that the data modified at any site is universally updated.
 It is used in application areas where large volumes of data are processed
and accessed by numerous users simultaneously.
 It is designed for heterogeneous database platforms.
 It maintains confidentiality and data integrity of the databases.
In these systems, data is intentionally distributed among multiple nodes so that all
computing resources of the organization can be optimally used. This may be
required when a particular database needs to be accessed by various users
globally. It needs to be managed such that for the users it looks like one single
database.

2.6.2. Types of Distributed Databases include

1. Homogeneous Database: In a homogeneous database, all different sites store


database identically. The operating system, database management system
and the data structures used – all are same at all sites.
2. Heterogeneous Database: In a heterogeneous distributed database, different
sites can use different schema, DBMSs, data models, and software that can
lead to problems in query processing and transactions. A particular site might
be completely unaware of the other sites.
3. Distributed Data Storage: There are 2 ways in which data can be stored on
different sites. These are:
(i) Replication: The entire relation is stored redundantly at 2 or more sites.
If the entire database is available at all sites, it is a fully redundant
database. Thus systems maintain copies of data. It increases the

NC Software Engineering: Database Concepts Module. Compiled by C. Uta utanoel@gmail.com 26 | 100


availability of data at different sites. Query requests can be processed in
parallel.
Challenges include the need to constantly update changes across all
sites. This is a lot of overhead. Concurrency control becomes way more
complex as concurrent access now needs to be checked over a number
of sites.
(ii) Fragmentation: The relations are fragmented (i.e., they are divided into smaller
parts) and each of the fragments is stored in different sites where they are
required. It must be made sure that the fragments are such that they can be
reconstructed to the original relation (i.e., there isn’t any loss of data).
Fragmentation is advantageous as it doesn’t create copies of data, and
consistency is not a problem. Fragmentation of relations can be done in two
ways:
1) Horizontal fragmentation – Splitting by rows – The relation is
fragmented into groups of tuples so that each tuple is assigned to at
least one fragment.
2) Vertical fragmentation – Splitting by columns – The schema of the
relation is divided into smaller schemas. Each fragment must contain a
common candidate key so as to ensure lossless join. In certain cases, an
approach that is hybrid of fragmentation and replication is used.

2.6.3. Advantages & Disadvantages of Distributed System

Below are the advantages and disadvantages of a Distributed System:


Advantages of Distributed Database System
1. Increased reliability and availability – A distributed database system is
robust to failure to some extent. Hence, it is reliable when compared to a
Centralized database system.
2. Local control – The data is distributed in such a way that every portion of
it is local to some sites (servers). The site in which the portion of data is
stored is the owner of the data.
3. Modular growth (resilient) – Growth is easier. We do not need to
interrupt any of the functioning sites to introduce (add) a new site. Hence,
the expansion of the whole system is easier. Removal of site also does not
necessarily have ripple effects.
4. Lower communication costs (More Economical) – Data are distributed in
such a way that they are available near to the location where they are
needed more. This reduces the communication cost.
5. They are cost-effective and can drastically reduce database management
costs since setup and administrative operations are small scale and local.
6. Faster response – Most of the data are local and in close proximity to
where they are needed, thereby requests can be answered quickly.
7. Reflects the organizational structure – Normally, database is fragmented
into various locations wherever we have controls, allowing some degree
of independence.
8. Secured management of distributed data – Various transparencies like
networki, fragmentationii, and replicationiii transparency are implemented
to hide the actual implementation details of the whole distributed system.
In such way, Distributed database provides security for data.
9. Robust – The system continues to work in case of failures. For example,
replicated distributed database performs in spite of failure of other sites.

NC Software Engineering: Database Concepts Module. Compiled by C. Uta utanoel@gmail.com 27 | 100


10. Complies with ACID properties – Distributed transactions demand
Atomicity, Consistency, Isolation, and Reliability, thus buttressing integrity
contingencies.
11. It supports both OLTP (Online Transaction Processing) and OLAP (Online
Analytical processing) upon diversified systems that may have common
data, and in so doing, addressing various stakeholder needs.
12. Improved performance and Parallelism in executing transactions. This is
achieved through task distribution and simultaneous processing across the
networked database nodes.
Disadvantages of Distributed Database Systems
1. Complex Software – Complex implementation. The software to manage
and run the system is sophisticated, thus requires specialised.
2. Costs more in terms of software cost compared to a centralized system.
Additional specialised software might be needed in most cases over a
centralized system. Includes middleware and security software.
3. Increased Processing overhead – It costs many messages to be shared
between sites to complete a distributed transaction.
4. Data integrity – Data integrity is difficult to maintain as it moves across
different network resources.
5. Different data formats might be used – This may cost time. There is an
overhead related to data conversions for synchronisation among nodes.
6. Deadlock is difficult to handle compared to a centralized system.
Possibility for contention on resources is real – i.e., for data, applications,
processor, network and storage.
7. Network traffic jams in case of write and read operations in a replicated
form of distributed database.
8. Distributed System supported Operating System is required to implement
distributed database system. This might be a complex and expensive
software, thus has inherent challenges.
9. The data shared between sites over networks are vulnerable to attack.
There is an increase in the level of exposure to security threats because of
increased attack-surface-area. Robust security provisioning is therefore a
necessity and it comes at a cost.
10. More complex in terms database design – According to various
applications, we may need to fragment a database, or replicate a database
or both.
11. Handling failures is a difficult task. In some cases, it may not be easy to
distinguish whether it is site failure, network partition failure, or link
failure. Such failures impact negatively on the ability of the system to
serve user expectations.

2.7. Conclusion
A distributed database is a collection of multiple, logically interrelated databases
distributed over a computer network. It may also be a single database divided into
chunks and distributed over several locations. The database is scattered over various
locations which provide local access to data and thus reduces communication costs
and increases availability.
Most of today’s business applications have shifted from traditional processing to
online processing. This has also changed the database needs of the applications.
Today, the role of databases to organize voluminous data has increased compared to

NC Software Engineering: Database Concepts Module. Compiled by C. Uta utanoel@gmail.com 28 | 100


previous era. Large companies need to distribute their data for many reasons e.g. for
including economic and competitiveness.
However, the main motivation behind the concept of data distribution is the efficient
management of huge amounts of data with increased availability and reduced
communication cost. As a result, it has become a very attractive solution for areas
like: online banking, e-commerce merchant, HR departments, telecommunication
industry, and air line ticketing etc.
Complexity, security-exposure and data integrity are the associated common
challenges.

2.8. Network Databases


A network database model is a database model that allows multiple records
to be linked to the same owner file. The model can be seen as an upside down
tree where the branches (bottom of the tree) are the member information
linked to the owner.
A Network Database Model is similar to the hierarchical model in the representation
of data but allows for greater flexibility in data access. In addition, the relationship
that the information has in the network database model is defined as many-
to-many relationship because one owner file can be linked to many member
files and vice versa. The multiple linkages allow the network database model
to be very flexible.

A network database consists of a collection of records connected to one another


through links. A record is in many respects similar to an entity in the E-R model. Each
record is a collection of fields (attributes), each of which contains only one data value.
A link is an association between precisely two records. Thus, a link can be viewed as a
restricted (binary) form of relationship in the sense of the E-R model.
As an illustration, consider a database representing a customer-account relationship in
a banking system. There are two record types, customer, and account. So a customer-
account consists of two entity sets, customer, and account. We can define the
customer record type, using Pascal-like notation:
type customer = record
customer name: string;
customer street: string;
customer city: string;
end
The account record type can be defined as:
type account = record

NC Software Engineering: Database Concepts Module. Compiled by C. Uta utanoel@gmail.com 29 | 100


account number: string;
balance: integer;
end
Account is the record type corresponding to the entity set account. It includes the two
fields account number and balance. A database corresponding to the described
schema may thus contain a number of customer records linked to a number of
account records.
The data access in this database model is either in sequential form or can be in a
circular linked list pattern. And there can be multiple paths to
access to any particular record.
An owner record type can also be a member or owner in
another set. The data model is a simple network, and link and
intersection record types may exist, as well as sets between
them. Thus, the complete network of relationships is
represented by several pair-wise sets; in each set some (one)
record type is owner (at the tail of the network arrow) and
one or more record types are members (at the head of the
relationship arrow). Usually, a set defines a 1:M relationship,
although 1:1 is permitted.
Thus the structure of a network database is complicated because of these many-to-
many relationships in which one record can be used as a key of the entire database.
 Network database is a collection of records connected to each other through
links
 A record similar to entity in an Entity-Relationship (E-R) model
 A record is a collection of fields, each of which contains only one data value
 Link is association between two records
 Viewed as a restricted (binary) form of relationship in the sense of an E-R model.
 Data-structure diagram is a schema representing the design of a network
database, having two components: boxes and lines
 Serves the same purpose as an E-R diagram, it specifies the logical structure of
the DB.
 Can create relationships that span more entity sets through many-to-one.
2.8.1. Advantages and Disadvantages of Network Database
Advantages Disadvantages
 Conceptual simplicity  Structural changes require changes in
 Handles more relationship types all application programs
 Data owner/member relationship  Navigational system yields complex
promotes data integrity implementation, application
 Conformance to standards development, and management
 Data access is flexible  System complexity limits efficiency
 Includes data definition language  it contains redundancy among the
(DDL) and data manipulation records which means one record can
language (DML) appear more than one time

2.8.2. Qualities of a Good Database Design


 Reflects real-world structure of the  Provides efficient access to data
problem  Supports the maintenance of data
 Can represent all expected data over time integrity over time
 Avoids redundant storage of data items  Clean, consistent, and easy to

NC Software Engineering: Database Concepts Module. Compiled by C. Uta utanoel@gmail.com 30 | 100


understand
Note: These objectives are sometimes contradictory!

2.9. Chapter 2 Questions


1. Explain the following concepts related to relational databases:
a. Attribute [2]
b. Table [2]
c. Relation [2]
d. Tuple [2]
e. Relation Schema [2]
f. Relation Instance [2]
g. Relation Key [2]
h. Relationship [2]
i. Data type [2]
j. Foreign key [2]
2. How does a database help an application such as airline
reservation systems? [4]
3. Give four (4) examples of commercial relational database
management systems (RDBMS) [4]
4. Using a student management system, draw a table to represent
the entity student. The table must have a name, three attributes
(including one unique identifier), and three different occurrences.
Put labels against the main elements / aspects of the table. [20]
5. Spell out:
a. Four (4) advantages of implementing a relational model. [8]
b. Four (4) disadvantages of implementing a relational model. [8]
6. What do the following terms mean in the context of relational
database models?
a. Cardinality [2]
b. Constraint [2]
c. CRUD [2]
d. Data Independence [2]
e. Query [2]
7. Describe the Hierarchical Database Model. [10]
8. Explain the following terms related to the hierarchical model:
a. Node [2]
b. Segment [2]
c. Root element [2]
d. Child element [2]
e. One-to-many relationship [2]
9. Cite with reasons, any two applications where a hierarchical model is ideal
for implementation. [6]
10. Why is it difficult to access data in a hierarchical model, especially if there is
need to consolidate data from three child elements belonging to three
different grandparents? [5]

NC Software Engineering: Database Concepts Module. Compiled by C. Uta utanoel@gmail.com 31 | 100


11. Explain 4 advantages and 4 disadvantages of a Hierarchical database model. [16]
12. Highlight four main features of an Object-Oriented Database model. [8]

13. Represent the following objects’ relationships using OODBMS notation:


a. Student
b. Lecturer
c. Course
d. Class [10]
14. Define the term Interface in relation to Object Oriented Database Model. [3]
15. Describe one aspect of an OODBMS that differentiates it from an RDBMS. [3]
16. Explain the following concepts associated with OODBMSes:
a. Inheritance [2]
b. Polymorphism [2]
c. Association [2]
d. Class Hierarchy [2]
e. Methods [2]
f. Message Passing [2]
g. Impedance Mismatch [2]
17. Explain any three (3) advantages and any three (3) disadvantages of
OODBMS driven database systems. [12]
18. Explain any two (2) challenges that may arise in porting OODBMSes
to different platforms. [6]
19. Explain why object-oriented databases are called “Navigational databases”? [2]
20. Draw a diagram of a distributed database system. [10]
21. What benefits can be gained by adopting a distributed database for
organisational operations? [8]
22. How does “scattering of data” in some distributed systems impact
performance of applications dependent on the scattered data? [6]
23. Explain the following terms related to distributed database systems:
a. Replication [2]
b. Heterogeneous Database [2]
c. Horizontal Fragmentation [2]
d. Growth Resilience [2]
e. Parallelism [2]
f. Processing Overhead [2]
g. Security Vulnerability [2]
24. Explain four (4) functions of a DDBMS.
25. Is a DDBMS a core type of a DBMS, like RDBMS? Support your answer. [5]
26. Why is it difficult to handle failures in a Distributed Database system? [6]
27. How does a bank with branches spread across the country benefits from
using a Distributed Database System? [6]
28. What makes a Network Database Model different and similar to a
Hierarchical Database Model? [4]
29. Draw a diagram of a Network Database Model containing three (3) related
entities, with one of the entities being the independent entity and the
other two being dependent. [8]

NC Software Engineering: Database Concepts Module. Compiled by C. Uta utanoel@gmail.com 32 | 100


30. Outline two (2) advantages and two (2) disadvantages of a network database. [8]
31. Explain five (5) characteristics of good database design. [10]

NC Software Engineering: Database Concepts Module. Compiled by C. Uta utanoel@gmail.com 33 | 100


CHATPER 3

RELATIONAL DATABASES

NC Software Engineering: Database Concepts Module. Compiled by C. Uta utanoel@gmail.com 34 | 100


3. RELATIONAL DATABASE

3.1. Objectives
1. Design a relational database
1.1. Identify Entities
1.2. Map Relationships
1.3. Draw Entity Relationship (E-R) Diagrams
1.4. Normalise tables (1st Normal Form to BCNF)
1.5. Create Database Schema (Table Design)
1.6. Write and interpret Queries (SQL Query Strings, DDL & DML)

3.2. Introduction
A relational database organises data into two dimensional tables, with rows and
columns. A table may consist of one entity type. Tables can be linked together to
generate consolidated results. Relational databases are a common phenomenon in
most business organisations. This is because of their general simplicity and versatility.
A relational database organizes data in tables (or relations). A table is made up of
rows and columns. A row is also called a record (or tuple). A column is also called
a field (or attribute). A database table is similar to a spreadsheet. However, the
relationships that can be created among the tables enable a relational database to
efficiently store huge amount of data, and effectively retrieve selected data.
A language called SQL (Structured Query Language) was developed to work with
relational databases.

3.3. Identification of Entities


An entity is a person, place, thing, event, or concept of interest to the business or
organization about which data is likely to be kept. For example, in a school possible
entities might be Student, Instructor / Lecturer, Course and Class.
3.3.1. Entities

An entity is a business object and can be either tangible (such as a person or an


item) or intangible (such as an event or a reservation). Every entity in a database
must have a different name. It is common practice (but not required) to name
entities in the singular.
 Entity type refers to a generic class of things such as Company, Student,
Teacher, and Entity is the short form of entity-type.
 Entity occurrence refers to specific instances or examples of a type. For
example, one occurrence of the entity Car is Honda CRV.
An entity usually has attributes (i.e., data elements) that further describe it. Each
attribute is a characteristic of the entity. An entity must possess a set of one or
more attributes that uniquely identify it (called a primary key).
The entities on an Entity-Relationship Diagram are represented by boxes (i.e.,
rectangles). The name of the entity is placed inside the box.
An entity–relationship model (or ER model) describes interrelated things of
interest in a specific domain of knowledge. A basic ER model is composed of entity

NC Software Engineering: Database Concepts Module. Compiled by C. Uta utanoel@gmail.com 35 | 100


types (which classify the things of interest) and specifies relationships that can
exist between entities (instances of those entity types).
Identifying entities is the first step in Data Modelling. Start by gathering existing
information about the organization.
Use documentation that describes the information and functions of the subject
area being analyzed, and interview subject matter specialists (i.e., end-users).
Derive the preliminary entity-relationship diagram from the information gathered
by identifying objects (i.e., entities) for which information is kept.
Entities are easy to find. Look for the people, places, things, organizations,
concepts, and events that an organization needs to capture, store, or retrieve
information about.
Name each entity using a noun in the singular form (e.g., Employee not
Employees). Use a word that is precise and clearly identifies the object. When
appropriate and needed to distinguish similar entities, use an adjective to further
describe the noun (e.g., Permanent Employee, Temporary Employee).
Use a term that is familiar to the business or is commonly used in everyday
language. The entity name is representative of the characteristics or attributes of
the entity. Take this into account when naming entities. For example, use a term
such as Inventory Item rather than Item.
Step 1: The first step in the logical design stage of the (DBLC) database life cycle is
to create a conceptual model. This involves converting business objects (and their
characteristics) identified during requirements analysis into the language
of entities and attributes for use in an ER diagram.
Step 2: Entities become tables in a database. Special types of entities, discussed in
a later module, are sometimes created to represent the relationship between
other entities.

A database consists of a PERSON entity and


the table consists of the attributes of the PERSON

NC Software Engineering: Database Concepts Module. Compiled by C. Uta utanoel@gmail.com 36 | 100


3.3.2. Attributes

Just as business objects have characteristics that describe them, entities are
described by their attributes. When we represent an entity in a database, what we
actually store are that entity’s attributes. In a nutshell, attributes store data
values that either 1) describe or 2) identify entities.
Attributes become fields in a table.
Attributes that describe a person (for instance, customer, employee, student, etc.)
would include such things as name, address, and telephone number. Attributes
that identify a person would include such things as social security number or any
combination of letters and numbers that uniquely identify a person.
Attributes that describe entities are called non-key attributes.
Attributes that identify entities (entity identifiers) are called key attributes.
Each entity is completely characterized by the values of all its attributes.
 Similar entities can be combined into entity types.
 Similarity requires at least identical attribute structure. (Attribute names
and corresponding value domains are identical.)
 Entity types are graphically represented by rectangles. Attributes label the
line connecting an entity type and a value domain (often symbolized by an
oval)

3.3.3. Categories of Entities

There are two general categories of entities:


 Physical entities are tangible and easily understood. They generally fall into
one of the following categories:
o People, for example, doctor, patient, employee, customer,
o Property, for example, equipment, land and buildings, furniture and
fixtures, supplies,
o Products, such as goods and services.
 Conceptual entities are not tangible and are less easily understood. They are
often defined in terms of other entity-types. They generally fall into one of the
following categories:
o Organizations, for example, corporation, church, government,
o Agreements, for example, lease, warranty, mortgage,
o Abstractions, such as strategy and blueprint.
o Event/State entities are typically incidents that happen. They are very
abstract and are often modelled in terms of other entity-types as
an associative entity. Examples of events are purchase, negotiation,
service call, and deposit. Examples of states are ownership,
enrolment, and employment.

NC Software Engineering: Database Concepts Module. Compiled by C. Uta utanoel@gmail.com 37 | 100


3.3.4. Types of Entities

Different types of entities are required to provide a complete and accurate


representation of an organization's data and to enable the analyst to use the
Entity-Relationship Diagram as a starting point for physical database design. Types
of entities include:
1. Fundamental where the entity is a base entity that depends on no other
for its existence. A fundamental entity has a primary key that is
independent of any other entity and is typically composed of a single
attribute. Fundamental entities are real-world, tangible objects, such as,
Employee, Customer, or Product. They are also called kernels.
2. Attributive where the entity depends on another for its existence, for
example, Employee Hobby depends on Employee. An attributive entity
depends on another entity for parts of its primary key. It can result from
breaking out a repeating group, the first rule of normalization, or from an
optional attribute.
3. Associative where the entity describes a connection between two
entities with an otherwise many-to-many relationship, for example,
assignment of Employee to Project (an Employee can be assigned to more
than one Project and a Project can be assigned to more than one
Employee). If information exists about the relationship, this information is
kept in an associative entity. For example, the number of hours the
Employee worked on a particular Project is an attribute of the relationship
between Employee and Project, not of either Employee or Project. An
associative entity is uniquely identified by concatenating the primary keys
of the two entities it connects.
4. Subtype/ Supertype where one entity (the subtype) inherits the
attributes of another entity (the supertype).

A supertype entity is used to represent two or more entities when they


are viewed as the same entity by some other entities.
A subtype entity is an entity that is a special case or refined version of
another entity. Subtype entities are created when attributes or
relationships apply to only some occurrences of an entity, the subsets of
occurrences to which the attributes or relationships apply are separated
into entity subtypes.
When an attribute applies only to some occurrences of an entity, the
subset of occurrences to which it applies should be separated into entity
subtypes.

NC Software Engineering: Database Concepts Module. Compiled by C. Uta utanoel@gmail.com 38 | 100


3.4. Mapping Relationships
The steps in mapping entities are as follows:
1. Identify Entities: The entities in this system are Department, Employee,
Supervisor, and Project. One is tempted to make Company an entity, but it is a
false entity because it has only one instance in this problem. True entities must
have more than one instance.
2. Find Relationships: We construct the following Entity Relationship Matrix:
Department Employee Supervisor Project
Department is assigned run by
Employee belongs to works on
Supervisor runs
Project uses

3. Draw Rough ERD: We connect the entities whenever a relationship is shown in


the entity Relationship Matrix.

4. Fill in Cardinality: From the description of the problem we see that:


 Each department has exactly one supervisor.
 A supervisor is in charge of one and only one department.
 Each department is assigned at least one employee.
 Each employee works for at least one department.
 Each project has at least one employee working on it.
 An employee is assigned to 0 or more projects.

5. Define Primary Keys: The primary keys are Department Name, Supervisor
Number, Employee Number, Project Number.

NC Software Engineering: Database Concepts Module. Compiled by C. Uta utanoel@gmail.com 39 | 100


6. Draw Key-Based ERD: There are two many-to-many relationships in the rough
ERD above, between Department and Employee and between Employee and
Project. Thus we need the associative entities Department-Employee and
Employee-Project. The primary key for Department-Employee is the
concatenated key Department Name and Employee Number. The primary key
for Employee-Project is the concatenated key Employee Number and Project
Number.
7. Identify Attributes: The only attributes indicated are the names of the
departments, projects, supervisors and employees, as well as the supervisor and
employee NUMBER and a unique project number.
8. Map Attributes

Attribute Entity Attribute Entity


Department Name Department Supervisor Number Supervisor
Employee Number Employee Supervisor Name Supervisor
Employee Name Employee Project Name Project
Project Number Project

9. Draw Fully Attributed ERD

10. Check Results


The final ERD appears to model the data in this system well.

NC Software Engineering: Database Concepts Module. Compiled by C. Uta utanoel@gmail.com 40 | 100


3.4.1. Mapping between Views

A conceptual schema or conceptual data model is a map of concepts and their


relationships used for databases.
The DBMS is responsible for mapping between these three types of schema. Two
mappings are required in a database system with three different views.
3.4.2. External/Conceptual Mapping

Each external schema is related to the conceptual schema by the external /


conceptual mapping. A mapping between the external and conceptual views gives
the correspondence among the records and the relationships of the external and
conceptual views the external view is an abstraction of the conceptual view, which
in its turn is an abstraction of the internal view. It describes the contents of the
database as perceived by the user or application program of that view. The user of
the external view sees and manipulates a record corresponding to the external
view. There is a mapping from0 a particular logical record in the external view to
one (or more) conceptual record(s) in the conceptual view.
3.4.3. Differences between External/Conceptual views

Following could be differences that exist between the two:


Names of the fields and records, for instance, may be different. A number of
conceptual fields can be combined into a single external field, for example,
Last_Name and First_Name at the conceptual level but Name at the external level.
A given external record could be derived from a number of conceptual .records.
3.4.4. Conceptual / Internal Mapping

Conceptual schema is related to the internal schema by the conceptual/internal


mapping. This enables the DBMS to find the actual record or combination of
records in physical storage that constitute a logical record in conceptual schema.
Mapping between the conceptual and the internal levels specifies the method of
deriving the conceptual record from the physical database.

NC Software Engineering: Database Concepts Module. Compiled by C. Uta utanoel@gmail.com 41 | 100


3.5. Drawing Entity Relationship (E-R) Diagrams
3.5.1. ER Notation

There is no standard for representing data objects in ER diagrams. Each modelling


methodology uses its own notation. All notational styles represent entities as
rectangular boxes and relationships as lines connecting boxes. Each style uses a
special set of symbols to represent the cardinality of a connection. The notation
used in this document is from Martin. The symbols used for the basic ER
constructs are:
 Entities (objects) are represented by labelled rectangles. The label is the
name of the entity. Entity names should be singular nouns.
 Relationships are represented by a solid line connecting two entities. The
name of the relationship is written above the line. Relationship names
should be verbs. Four types of table relationships that can be derived are
as follows:
o One-to-one - One record in a table is related to only one record in
another table.
o One-to-many - One record in a table can be related to many
records in another table.
o Many-to-many - One record in a table can be related to one or
more records in another table, and one or more records in the
second table can be related to one or more records in the first
table.
o Many-to-one - Many records in a table can be related to one
record in another table.
o Often not mentioned explicitly, but important and basic:
 Values: printable symbols as values of attributes; play a
subordinate role (characterizing objects)
 Roles: Names for the special meaning an entity has within a
relationship
 Attributes, when included, are listed inside the entity rectangle. Attributes
which are identifiers are underlined. Attribute names should be singular
nouns.
 Cardinality of many is represented by a line ending in a crow's foot. If the
crow's foot is omitted, the cardinality is one.
 Existence is represented by placing a circle / perpendicular bar on the line.
o Mandatory existence is shown by the bar (| looks like a 1) next to the
entity for which an instance is required. Optional existence is shown by
placing a circle next to the entity that is optional.
Examples of these symbols are shown in Figure 3.1 below:

Figure 3.1 ER Notation

NC Software Engineering: Database Concepts Module. Compiled by C. Uta utanoel@gmail.com 42 | 100


Entity Relationship Diagram Symbols — Chen notation

Symbol Shape Name Symbol Description

Entities

An entity is represented by a rectangle which contains the


Entity
entity’s name.

An entity that cannot be uniquely identified by its attributes


alone. The existence of a weak entity is dependent upon another
Weak Entity entity called the owner entity. The weak entity’s identifier is a
combination of the identifier of the owner entity and the partial
key of the weak entity.

An entity used in a many-to-many relationship (represents an


Associative
extra table). All relationships for the associative entity should be
Entity
many

Attributes

In the Chen notation, each attribute is represented by an oval


Attribute
containing atributte’s name

An attribute that uniquely identifies a particular entity. The name


Key attribute
of a key attribute is underscored.

An attribute that can have many values (there are many distinct
Multivalued
values entered for it in the same column of the table).
attribute
Multivalued attribute is depicted by a dual oval.

An attribute whose value is calculated (derived) from other


Derived attributes. The derived attribute may or may not be physically
attribute stored in the database. In the Chen notation, this attribute is
represented by dashed oval.

Relationships

A relationship where entity is existence-independent of other


Strong
entities, and PK of Child doesn’t contain PK component of Parent
relationship
Entity. A strong relationship is represented by a single rhombus

Weak A relationship where Child entity is existence-dependent on


(identifying) parent, and PK of Child Entity contains PK component of Parent
relationship Entity. This relationship is represented by a double rhombus.

NC Software Engineering: Database Concepts Module. Compiled by C. Uta utanoel@gmail.com 43 | 100


3.5.2. E-R Modelling Process

1. Identify the entities that your database must represent


2. Determine the cardinality relationships among the entities and classify
them as one of:
o One-to-one (e.g., a parcel has one address)
o One-to-many (e.g., a parcel may be involved in many files)
o Many-to-many (e.g., parcel sales: a parcel may be sold many times by
different owners, and an individual owner may sell many parcels)
o Many-to-one (e.g., many parcels may be sent to one person)
3. Draw the entity-relationship diagram
4. Determine the attributes of each entity
5. Define the (unique) primary key of each entity
6. Define the relationships between primary keys in one table and foreign
keys in another
3.5.3. E-R Model Symbols and their Meanings

Entity Relationship Diagram Symbols — Crow’s Foot notation

Symbol Meaning
Relationships (Cardinality and Modality)

Zero or One

One or More

One and only One

Zero or More

Many - to - One

a one through many notation on one side of a relationship and


a one and only one on the other

a zero through many notation on one side of a relationship and


a one and only one on the other

a one through many notation on one side of a relationship and


a zero or one notation on the other

a zero through many notation on one side of a relationship and


a zero or one notation on the other

Many - to - Many

a zero through many on both sides of a relationship

a zero through many on one side and a one through many on


the other

a one through many on both sides of a relationship

a one and only one notation on one side of a relationship and a


zero or one on the other

a one and only one notation on both sides

NC Software Engineering: Database Concepts Module. Compiled by C. Uta utanoel@gmail.com 44 | 100


3.5.4. From E-R Model to Database Design

 Entities with one-to-one relationships can (and often should) be merged


into a single entity
 Each remaining entity is modelled by a table with a primary key and
attributes, some of which may be foreign keys
 One-to-many relationships are modelled by a foreign key attribute in the
table representing the entity on the "many" side of the relationship (e.g.,
the FIRES table has a foreign key that refers to the PARCELS table)
 Many-to-many relationships among two entities are modelled by defining
a third table that has foreign keys that refer to the entities in each original
table. These foreign keys should be included in the third table's primary
key, if appropriate
 Commercially available tools can automate the process of converting a E-R
model to a database schema.

NC Software Engineering: Database Concepts Module. Compiled by C. Uta utanoel@gmail.com 45 | 100


3.6. The Entity Relationship Model — Basic Structure
 Entity relationship diagram (ERD)
o Uses graphic representations to model database components
o Entity is mapped to a relational table
 Entity instance (or occurrence) is row in table
 Entity set is collection of like entities
 Connectivity labels types of relationships
o Diamond connected to related entities through a relationship line
Relationships: The Basic Chen ERD Relationships: The Basic Crow’s Foot ERD

NC Software Engineering: Database Concepts Module. Compiled by C. Uta utanoel@gmail.com 46 | 100


3.6.1. Comparisons of ER Notations

NC Software Engineering: Database Concepts Module. Compiled by C. Uta utanoel@gmail.com 47 | 100


3.6.2. Further Comparisons of ER Notations

3.6.3. Advantages and Disadvantages of the Entity Relationship Model

Advantages Disadvantages
 Exceptional conceptual simplicity  Limited constraint representation
 Visual representation  Limited relationship representation
 Effective communication tool  No data manipulation language
 Integrated with the relational data model  Loss of information content

3.6.4. Best Practices for creating a Relational Model

1. Data need to be represented as a collection of relations 5. The values of an attribute should be from the same domain
2. Each relation should be depicted clearly in the table 6. Columns must contain data about attributes of the entity
3. Rows should contain data about instances of an entity 7. Cells of the table should hold a single value
4. No two rows can be identical 8. Each column should be given a unique name

NC Software Engineering: Database Concepts Module. Compiled by C. Uta utanoel@gmail.com 48 | 100


3.7. Normalisation (1st Normal Form to BCNF)
3.7.1. Normalization

Most databases are divided into many tables, most of which are related to one
another. In most modern databases, such as the relational database, relationships
are established through the use of primary and foreign keys. The purpose of
separating data into tables and establishing table relationships is to reduce data
redundancy. The process of reducing data redundancy in a relational database is
called normalization
Normalization provides a set of rules and patterns that can be applied to any
database to avoid common logical inconsistencies. Normalizing a database design
will typically improve:
 Consistency, since errors that could be made with the database would be
structurally impossible
 Extensibility, since changes to the database structure will only affect parts
of the database they are logically dependent on
 Efficiency, since redundant information will not need to be stored
In other words, the non-key columns are dependent on primary key, only on the
primary key and nothing else. For example, suppose that we have
a Products table with columns productID (primary key), name,and
unitPrice. The column discountRate shall not belong to Products table if it
is also dependent on the unitPrice, which is not part of the primary key.
3.7.2. Purpose and Utilization

Normalization helps in achieving resource optimization with improvements in


database design.
1. Database normalization is a process by which an existing schema is
modified to bring its component tables into compliance through a series
of progressive normal forms. It is a technique of organizing the data in the
database.
2. Formal process of decomposing relations with anomalies to produce
smaller, well structured, and stable relations. Primarily a tool to validate
and improve a logical design so that it satisfies certain constraints that
avoid unnecessary duplication of data
3. Normalization is a systematic approach of decomposing tables to
eliminate data redundancy (repetition) and undesirable characteristics like
Insertion, Update, and Deletion Anomalies. It is a multi-step process that
puts data into tabular form, removing duplicated data from the relation /
tables.
Normalization is used for mainly three purposes,
1. Eliminating redundant (useless) data.
2. Ensuring data dependencies make sense i.e., data is logically stored.
3. Ensures relations / tables are independent from each other.

NC Software Engineering: Database Concepts Module. Compiled by C. Uta utanoel@gmail.com 49 | 100


3.7.3. Forms (Degrees) of Normalization

3.7.3.1. First Normal Form (1NF)

For a database to be in first normal form then every value of every column of
every table should be atomic
What does atomic mean? Speaking lightly atomic means that the value
represents a “single thing”
For example, if you have a table like this:
first_name last_name age areas
John Doe 27 {“Website design”, “Customer research”}
Mary Jane 33 {“Long term strategy planning”,”Hiring”}
Tom Smith 35 {“Marketing”}

Then the “areas” column has values that aren’t atomic. Just look at the row of
John to see that the areas’ field is storing two things “Website design” and
“Customer research”.
So this table is not in first normal form.
To be in first normal form you should store a single value per field.
In summary, for a table to be in the First Normal Form (1NF), it should follow the
following 4 rules:
1. It should only have single (atomic) valued attributes / columns.
2. Values stored in a column should be of the same domain.
3. All the columns in a table should have unique names.
4. And the order in which data is stored, does not matter.

3.7.3.2. Second Normal Form (2NF)

For a table to be in second normal form then every column that is not part
of the primary key (or could act as part of another primary key) shouldn’t
be able to be inferred from a smaller part of the primary key.
What does this mean?
Say you have the following design (I have underlined the fields that
conform the primary key in this table)

employee_id project_id Hours employee_name project_name


1 1 10 john “website design”
2 1 20 Mary “website design”

In this design employee name can be directly inferred from employee_id


because the idea is that the name of an employee is uniquely defined by
it’s id.
Similarly, the project_name is uniquely defined by the project_id
So we have two examples of columns where the columns can be inferred
from only part of the primary key.
Each one of these examples would be enough to throw this table out of
second normal form.

NC Software Engineering: Database Concepts Module. Compiled by C. Uta utanoel@gmail.com 50 | 100


Another takeaway is that if a table is in first normal form and all the
primary keys are single columns, then the table is already in second
normal form.
In summary, for a table to be in the Second Normal Form,
1. It should be in the First Normal form.
2. And, it should not have Partial Dependency.

3.7.3.3. Third Normal Form (3NF)

For a table to be in third normal form then it must be in second normal


form and there should not be a non prime attribute* that depends
transitively from the primary key.
*a prime attribute is an attribute / column that is part of a of a group of attributes /
columns that can act as a primary key

What does that mean?


Say you have the following design (which is far from ideal):

employee_name employee_id age department_number department_name


John 1 27 123 “Marketing”
Mary 2 33 456 “Operations”
Tom 3 35 123 “Marketing”

In this table the department_number can be inferred from the employee_id,


but the department_name can be inferred from the department_number!
Then department_name depends transitively from employee_id!
Then there exists a transitive dependency from employee_id ->
department_number -> department_name, which means this table is not in
third normal form.
What problems does this bring?
Well, in some way if the department name can be inferred from the
department number, storing it each time for every employee introduces
redundancy.
Imagine now that the marketing department is changing its name to
“Marketing & Sales”. To keep consistency, you should update that in every
row of the table for each employee that belongs to that department!
Also, think what happens if Mary decides to leave the company: we should
delete her row from the table, but if she was our only employee belonging
to the “operations” department the department will also get deleted!

In summary, a table is said to be in the Third Normal Form when,


1. It is in the Second Normal form.
2. And, it doesn't have Transitive Functional Dependency of non-prime
attribute. An attribute that is not part of any candidate key is known as
non-prime attribute. In other words 3NF can be explained like this: A table
is in 3NF if it is in 2NF and for each functional dependency X-> Y at least
one of the following conditions hold:
a. X is a super key of table
b. Y is a prime attribute of table
An attribute that is a part of one of the candidate keys is known as prime
attribute.

NC Software Engineering: Database Concepts Module. Compiled by C. Uta utanoel@gmail.com 51 | 100


3.7.3.4. Boyce and Codd Normal Form (BCNF)

Boyce and Codd Normal Form is a higher version of the Third Normal
form. This form deals with certain type of anomaly that is not handled by
3NF. A 3NF table which does not have multiple overlapping candidate keys
is said to be in BCNF. For a table to be in BCNF, following conditions must
be satisfied:
1. R must be in 3rd Normal Form
2. And, for each functional dependency (X → Y), X should be a super
Key.

3.7.4. Advantages and Disadvantages of Normalisation

Advantages of Normalisation
 Avoids data modification (INSERT/DELETE/UPDATE) anomalies as each
data item lives in One place
 Greater flexibility in getting the expected data in atomic granular
 Normalization is conceptually cleaner and easier to maintain and change
as your needs change
 Fewer null values and less opportunity for inconsistency
 A better handle on database security
 Increased storage efficiency
 The normalization process helps maximize the use of clustered indexes,
which is the most powerful and useful type of index available. As more
data is separated into multiple tables because of normalization, the more
clustered indexes become available to help speed up data access
Disadvantages of Normalisation
 More tables to join as by spreading out data into more tables, the need to
join table’s increases and the task becomes more tedious. The database
becomes harder to realize as well.
 Tables will contain codes rather than real data as the repeated data will
be stored as lines of codes rather than the true data. Therefore, there is
always a need to go to the lookup table.
 Data model becomes extremely difficult to query against as the data
model is optimized for applications, not for ad hoc querying. (Ad hoc
query is a query that cannot be determined before the issuance of the
query. It consists of an SQL that is constructed dynamically and is usually
constructed by desktop friendly query tools.). Hence it is hard to model
the database without knowing what the customer desires.
 As the normal form type progresses, the performance becomes slower
and slower. Requires much more CPU, memory, and I/O to process thus
normalized data gives reduced database performance
 Requires more joins to get the desired result. A poorly-written query can
bring the database down
 Maintenance overhead. The higher the level of normalization, the greater
the number of tables in the database.
 Proper knowledge is required on the various normal forms to execute the
normalization process efficiently. Careless use may lead to terrible design
filled with major anomalies and data inconsistency.

NC Software Engineering: Database Concepts Module. Compiled by C. Uta utanoel@gmail.com 52 | 100


3.7.5. Normalization Example

The following example will illustrate how database normalization helps achieve a good design.
The table below presents data that needs to be captured in the database:
Title Author Bio ISBN Subject Pages Publisher
Beginning MySQL Database Chad Russell, Chad Russell is a programmer and system 90593324 MySQL Database 520 Apress
Design and Optimization Jon Stephens administrator who owns his own internet hosting Design
company. Jon Stephens is a member of the MySQL
AB documentation team.

In the example shown above, a lot of storage space will be wasted if any one criterion (author or publisher) is considered as the identification
key, therefore database normalization is essential.
Normalization is a step-by-step process that cannot be carried out haphazardly. The following steps will help in attaining database
normalization.
3.7.5.1. Step 1: Create first normal form (1NF)
The database normalization process involves getting data to conform to progressive normal forms, and a higher level of database
normalization cannot be achieved unless the previous levels have been satisfied. First normal form is the basic level of database
normalization.
For 1NF, ensure that the values in each column of a table are atomic; which means they are unique, containing no sets of values. In our
case, Author and Subject do not comply.
One method for bringing a table into 1NF is to separate the entities contained in the table into separate tables. In our case, this would
result in Book, Author, Subject, and Publisher tables.
Book’s table: Author’s table:
ISBN Title Pages
Author_ID First Name Last Name
Beginning MySQL Database 1 Chad Russell
1590593324 520
Design and Optimization 2 Jon Stephens
3 Mike Hilyer

Subject’s table: Publisher’s table:


Subject _ID Subject_name Publisher_ID Name Address City State Zip
1 MySQL Database Design 2 580, Ninth street,
1 Apress Berkeley California 94710
2 MySQL Database Design station 219

NC Software Engineering: Database Concepts Module. Compiled by C. Uta utanoel@gmail.com 53 | 100


Step 2: Define relationships
Three types of relations can be established:
1. One-to-(Zero or)-one (Example: marriage)
2. One-to-(Zero or)-many (Example: kids)
3. Many-to-many (Example: book)
The Book’s table may have many to many relations with the Author’s table. Author’s table may have many books and a book may have
more than one author. The Book’s table may have many to many relations with the Subject table. The books may fit in many subjects
and the subjects may have many books. Many-to-many relations have to be presented by “link” tables

Book Author table: Book_Subject table:


ISBN Author_ID ISBN Subject_ID
1590593324 1 1590593324 1
1590593324 2 1590593324 2

One-to-many in our example will be Books to Publisher. Each book has only one Publisher but one Publisher may have many books.
We can achieve one-to-many relationships with a foreign key. A foreign key is a mechanism in database management systems (DBMS)
that defines relations and creates constraints between data segments. It is not possible to review what is not related to the specific
book. It is not possible to have a book without an author or publisher.
When deleting a publisher, all the related books may need to be deleted along with the reviews of those books. The authors would not
need to be deleted.
The foreign key is introduced in the table that represents the “many”, pointing to the primary key on the “one” table. Since the Book
table represents the many portion of the one-to-many relationship, the primary key value of the Publisher as in a Publisher_ID column
as a foreign key is added.

ISBN Title Pages Publisher_ID


1590593324 Beginning MySQL Database Design and Optimization 520 1

NC Software Engineering: Database Concepts Module. Compiled by C. Uta utanoel@gmail.com 54 | 100


3.7.5.2. Step 3: Make second normal form (2NF)
Second normal form (2NF) cuts down the tautological/superfluous data in a table by selecting it, putting it in new tables and by
establishing relations among them.
In database normalization, 2NF is about the relations between the composite key columns and non-key columns. That means the non-
key columns have to depend on the whole composite key.
Here, the primary key is composite to eliminate the possibility of the same person writing more than one review of the book. Reviewer
URL is dependent only on the Reviewer ID which is only part of the whole composite key.
This table does not comply with the 2NF:
ISBN Reviewer ID Summary Reviewer_URL
1590593324 3 A great book! http://www.openwin.org

3.7.5.3. Step 4: Third Normal Form (3NF)


This requires that all columns depend directly on the primary key. Tables violate the 3NF when one column depends on another
column which, in turn, depends on the primary key – a transitive dependency.
In the publisher table, the City and State are dependent on the zip code, not the Publisher_ID.

Publisher_ID Name Address City State Zip


1 Apress 2580, Ninth street, station 219 Berkeley California 94710

To comply with 3NF we have to move these outside the publisher’s table:

Zip City State


94710 Berkeley California

Through the process of database normalization, we bring our schema’s tables into conformance with progressive normal forms. As a
result, the tables each represent a single entity – a book, an author or a subject, for example – and we benefit from decreased
redundancy, fewer anomalies, and improved efficiency.

NC Software Engineering: Database Concepts Module. Compiled by C. Uta utanoel@gmail.com 55 | 100


3.7.5.4. Boyce-Codd Normal Form (BCNF) Example

Even when a database is in 3rd Normal Form, still there would be anomalies resultant if it has more than one Candidate Key. For a
relation schema which is in 3NF, the presence of modification anomalies which could not be treated well by 3NF is due to one of the
following reasons;
1. Reason 1: A relation schema might contain more than one candidate keys.
2. Reason 2: In case of more than one candidate keys presents, all of them might be composite.
3. Reason 3: If the above two reasons exist, then there is a possibility of overlap between the candidate keys.
A relation schema R is in BCNF if and only if,
1. For all the Functional Dependencies (FDs) hold in the relation R, if the FD is non-trivial then the determinant (LHS of FD) of that
FD should be a Super key.
Through this definition, BCNF insists that all the determinants of any Functional Dependency must be a Candidate Key. Due to this
reason, BCNF sometimes referred as strict 3NF.
Note: A relation schema R, which is in BCNF, is also in 3NF automatically. But, a relation schema R which is in 3NF need not be in BCNF.
Explanation: If we have set of FDs in R such that X → Y, then X must be a super key. In other words, if X is not a key, then the relation R
is not in BCNF.
Sometimes is BCNF is also referred as 3.5 Normal Form.
Practical Example of BCNF:
Suppose there is a company wherein employees work in more than one department. They store the data like this:

emp_id emp_nationality emp_dept dept_type dept_no_of_emp


1001 Zimbabwean Production and planning D001 200
1001 Zimbabwean stores D001 250
1002 Zambian design and technical support D134 100
1002 Zambian Purchasing department D134 600

NC Software Engineering: Database Concepts Module. Compiled by C. Uta utanoel@gmail.com 56 | 100


Functional dependencies in the table above:
emp_id -> emp_nationality
emp_dept -> {dept_type, dept_no_of_emp}
Candidate key: {emp_id, emp_dept}
The table is not in BCNF as neither emp_id nor emp_dept alone are keys.
To make the table comply with BCNF we can break the table in three tables like this:
emp_nationality table: emp_dept table:
emp_id emp_nationality emp_dept dept_type dept_no_of_emp
1001 Zimbabwean
Production and planning D001 200
1002 Zambian
stores D001 250
design and technical support D134 100
Purchasing department D134 600

emp_dept_mapping table:
emp_id emp_dept
1001 Production and planning
1001 stores
1002 design and technical support
1002 Purchasing department

Functional dependencies:
emp_id -> emp_nationality
emp_dept -> {dept_type, dept_no_of_emp}
Candidate keys:
 For first table: emp_id
 For second table: emp_dept
 For third table: {emp_id, emp_dept}

NC Software Engineering: Database Concepts Module. Compiled by C. Uta utanoel@gmail.com 57 | 100


This is now in BCNF as in both the functional dependencies left side part is a key.

NC Software Engineering: Database Concepts Module. Compiled by C. Uta utanoel@gmail.com 58 | 100


3.7.6. Comparison of Normal Forms

Aspect of Comparison 1 NF 2 NF 3 NF BCNF


Properties to hold All the attributes of the At the first place the table is The table is in 2NF, For all the Functional
relation are atomic in 1NF, Dependencies (FDs) hold in
There is no Functional Dependency
(indivisible into meaningful the relation R, if the FD is
All the non-key attributes of such that both Left Hand Side and
sub parts), non-trivial then the
the table are fully Right Hand Side attributes of the FD
determinant (LHS of FD) of
Every attribute contains functionally dependent on are non-key attributes. In other
that FD should be a Super
single value (per record). the Primary key of the table. words, no transitive dependency is
key
allowed
Achievability Always achievable Always achievable Always achievable Not always
Anomalies May allow some anomalies May allow some anomalies May allow some anomalies Always eliminates
anomalies
What is eliminated? Eliminate repeating groups Eliminate redundant data Eliminate columns not dependent on Eliminate multiple
key candidate keys
Identification of Functional Dependencies Not necessary Must Must Must
Attribute Domain Should be atomic Should be atomic Should be atomic Should be atomic
Handling of Update Anomalies Does not handle. Handles Handles Handles
Composite Primary Key Allowed Allowed (if no partial Allowed Not allowed
dependency exists)
Partial key dependencies Permitted Not permitted Not permitted Not permitted
(if AB → C, and if C can be fully
determined by either A or B, then this
dependency is partial key dependencies)
Transitive dependencies (if A → B, and Can be permitted Can be permitted Cannot be permitted Cannot be permitted
B → C then A → C)
Overview It is about shape of a record It is about the relationship It is about the relationship between It is about determinant
type between key and non-key key and non-key fields should be a superkey.
fields

NC Software Engineering: Database Concepts Module. Compiled by C. Uta utanoel@gmail.com 59 | 100


3.8. Database Schema (Table Design)
3.8.1. Database Schema

A schema is a group of related objects in a database. A schema is a skeleton


structure that represents the logical view of the complete database. Within a
schema, objects that are related have relationships to one another. There is one
owner of a schema, who has access to manipulate the structure of any object in
the schema. A schema does not represent a person, although the schema is
associated with a user account that resides in the database. A schema may include
constraints on the entities or attributes of the entities. It contains a descriptive
detail of the database, which may be pictured by means of schema diagrams.
Database designers design the schema to assist programmers to perceive the
database and use it to create application programs.
A database schema defines the organization of the data, the relationships among
them, and the constraints associated with them. In a relational database, the
schema defines the tables, the fields in each table, and the relationships between
fields and tables. Schemas are generally stored in a data dictionary.
A schema is defined as an outline or a plan that describes the records and
relationships existing at a particular level.
3.8.2. Types of Schema

The three models associated with a schema are as follows:


1. The conceptual model, also called the logical model, is the basic database
model, which deals with organizational structures that are used to define
database structures such as tables and constraints. A Conceptual View is
defined by Conceptual Schema, which describes all the entities, attributes,
and relationship together with integrity constraints. This is where the
database is designed from the ERD. Often this is achieved by a process of
data normalisation (to 3rd normal form is usually sufficient) and entity
modelling to develop the tables and relationships.
The result is an entity relationship diagram (ERD) which represents the real-
world domain that the database is going to model, and this model will be
checked for redundancy and validated to determine if it meets
user/business requirements
2. The internal model, also called the physical model, deals with the physical
storage of the database, as well as access to the data, such as through data
storage in tables and the use of indexes to expedite data access. The internal
model separates the physical requirements of the hardware and the
operating system from the data model. It defines how the information will
be stored in an exceedingly auxiliary storage. Internal View is defined by

NC Software Engineering: Database Concepts Module. Compiled by C. Uta utanoel@gmail.com 60 | 100


Internal Schema, which is a complete description of the internal model,
containing definition of stored records, the methods of representation, the
data fields, and the indexes used. This stage specifies the physical
implementation of the database in the selected RDBMS, e.g., MySQL, SQL
Server, DB2 and so on, or other database paradigm id the database is not
relational.
Base relations, file organisations, indexes of primary or foreign keys in tables
are defined. In addition, entity and referential integrity constraints and
security measures are defined. This is where we get to use the DDL
commands such as CREATE TABLE statements.
3. The external model, or application interface, deals with methods through
which users may access the schema, such as through the use of a data input
form. The external model allows relationships to be created between the
user application and the data model. The External View is described by
means of a schema called External Schema that corresponds to different
views of the data.
These three stages can also be iterative in their implementation, and frequently
overlap.
3.8.3. Objects in a database schema

It is important to note that the data in the database changes frequently, while the
plans or schemas remain the same over long periods of time. The users' view of
the data (also called logical organization of data) should be in a form that is most
convenient for the users and they should not be concerned about the way data is
physically organized. Therefore, a DBMS should do the translation between the
logical (users' view) organization and the physical organization of the data in the
database.
The data in the database at any particular point in time is called a database
instance. Therefore, many database instances can correspond to the same
database schema. The schema is sometimes called the intension of the database,
while an instance is called an extension (or state) of the database.
Database Instance:
A database instance could be a state of an operational database with information
at any given time. It contains a snap of the database. Database instances tend to
alter over time. A software package ensures that its every instance (state) is in an
exceedingly valid state, by diligently following all the validations, constraints, and
conditions that the database designers have imposed.

NC Software Engineering: Database Concepts Module. Compiled by C. Uta utanoel@gmail.com 61 | 100


Four types of tables are described below:
1. Data tables store most of the data found in a database.
2. Join tables are tables used to create a relationship between two tables that
would otherwise be unrelated.
3. Subset tables contain a subset of data from a data table.
4. Validation tables, often referred to as code tables, are used to validate data
entered into other database tables.
Logical Design - Physical Design Comparison
Physical design is where you translate the expected
schemas into actual database structures. At this
time, you have to map:
 Entities to Tables
 Relationships to Foreign Keys
 Attributes to Columns
 Primary Unique Identifiers to the Primary
Key
 Unique Identifiers to Unique Keys.
3.8.4. Conceptual, Logical, and Physical Data Models Differences

Conceptual, logical, and physical data models are different in their objectives,
goals, and content. Key differences noted below:
Conceptual Data Model Logical Data Model (LDM) Physical Data Model (PDM)
(CDM)
Includes high-level data Includes entities (tables), Includes tables, columns, keys, data
constructs attributes (columns/ fields) types, validation rules, database triggers,
and relationships (keys) stored procedures, domains, and access
constraints

Non-technical names, so that Uses business names for Uses more defined and less generic
executives and managers at all entities & attributes specific names for tables and columns,
levels can understand the data such as abbreviated column names,
basis of Architectural limited by the database management
Description system (DBMS) and any company
defined standards

Uses general high-level data Is independent of technology Includes primary keys and indices for fast
constructs from which (platform, DBMS) data access.
Architectural Descriptions are
created in non-technical terms

May not be normalized Is normalized to fourth May be de-normalized to meet


normal form (4NF) performance requirements based on the
nature of the database. If the nature of
the database is Online Transaction
Processing (OLTP) or Operational Data
Store (ODS) it is usually not de-
normalized. De-normalization is common
in Data warehouses.

NC Software Engineering: Database Concepts Module. Compiled by C. Uta utanoel@gmail.com 62 | 100


3.9. Querying (SQL Query Strings, DDL & DML)
SQL statements are divided into two major categories: data definition language (DDL)
and data manipulation language (DML).
1. Data Definition Language (DDL)
Data Definition Language can be defined as a standard for commands through
which data structures are defined. It is a computer language used for creating
and modifying the structure of the database objects, such as schemas, tables,
views, indexes, etc. Additionally, it assists in storing the metadata details in the
database.
2. Data Definition Language (DDL) statements are used to define
the database structure or schema. Some examples include CREATE, ALTER,
DROP, TRUNCATE, COMMENT, and RENAME. These are explained below:
(i) CREATE - to create objects in the database, such as tables
The main use of the create command is to build a new table and it comes
with a predefined syntax.
The general syntax for the create command in DDL is mentioned below:
CREATE TABLE tablename (Column1 DATATYPE, Column2 DATATYPE,
Column3 DATATYPE, …ColumnN DATATYPE)
For Example
CREATE TABLE student_tbl ((studentID INT(10), student_Name CHAR (40));
Student Table with student ID and student name is created by the DDL
statement
Generally, the data types often used consist of strings and dates while
creating a table. Every system varies in how to specify the data type.

(ii) ALTER - alters the structure of the database, such as changing column
arrangement
The syntax to add a column in a table in MySQL (using the ALTER TABLE
statement) is:
ALTER TABLE table_name
ADD new_column_name column_definition
[ FIRST | AFTER column_name ];

table_name: The name of the table to modify.


new_column_name: The name of the new column to add to the table.
column_definition: The datatype and definition of the column (NULL or NOT
NULL, etc).
FIRST | AFTER column_name: Optional. It tells MySQL where in the table to
create the column. If this parameter is not specified, the new column will be
added to the end of the table.
For example:
ALTER TABLE contacts
ADD last_name varchar(40) NOT NULL
AFTER contact_id;
Many other ALTER statements can be created

NC Software Engineering: Database Concepts Module. Compiled by C. Uta utanoel@gmail.com 63 | 100


(iii) DROP - delete objects from the database, such as deleting a particular table
Syntax
The syntax for the DROP TABLE statement in SQL is:
DROP TABLE table_name;
Example:
DROP TABLE suppliers;
This DROP TABLE statement example would drop the table called suppliers.
This would both remove the records associated with the suppliers table as
well as its table definition.
Once you have dropped the table, you can recreate the suppliers table
without getting an error that the table already exists.

(iv) TRUNCATE - removes all records from a table, including all spaces allocated
for the records.
Warning: If you truncate a table, the TRUNCATE TABLE statement can not be
rolled back.
Syntax
The syntax for the TRUNCATE TABLE statement in MySQL is:
TRUNCATE TABLE [database_name.]table_name;
Example:
TRUNCATE TABLE customers;
This example would truncate the table called customers and remove all
records from that table.
It would be equivalent to the following DELETE statement in MySQL:
DELETE FROM customers;
Both of these statements would result in all data from the customers table
being deleted. The main difference between the two is that you can roll
back the DELETE statement if you choose, but you can't roll back the
TRUNCATE TABLE statement.
(v) COMMENT - add comments to the data dictionary
Syntax
COMMENT [IF EXISTS] ON <object_type> <object_name> IS
'<string_literal>';
COMMENT [IF EXISTS] ON COLUMN <table_name>.<column_name> IS
'<string_literal>';
Example
ALTER TABLE user MODIFY id INT(11) COMMENT 'id of user';

(vi) RENAME - renames an object


Syntax:
RENAME <OldTableName> TO <NewTableName>
Example:
RENAME <Student> TO <Stu>

NC Software Engineering: Database Concepts Module. Compiled by C. Uta utanoel@gmail.com 64 | 100


3.9.1. Data Manipulation Language (DML)

Data Manipulation Language (DML) statements are used for managing data
within schema objects.
A data-manipulation language (DML) is a language that enables users to access or
manipulate data as per the appropriate data model. There are basically two types:
 Procedural DMLs require a user to specify what data are needed and how
to get those data.
 Declarative DMLs (also referred to as nonprocedural DMLs) require a user
to specify what data are needed without specifying how to get those data.
Declarative DMLs are usually easier to learn and use than are procedural DMLs.
However, since a user does not have to specify how to get the data, the database
system has to figure out an efficient means of accessing data.
A query is a statement requesting the retrieval of information. The portion of a
DML that involves information retrieval is called a query language. The following
query in the SQL language finds the name of the student whose student-id is 5565:
SELECT students.student-name, students.dob
FROM students
WHERE students.student-id = 5565
The query specifies that those rows from the table students where the student-id
is 5565 must be retrieved, and the students.name and students.dob attributes of
these rows must be displayed.
Queries may involve information from more than one table.
Some examples:
 SELECT - retrieves data from the a database
Syntax:
SELECT * FROM Table_name;
Example:
Select * from Student; It will show all the table records.
SELECT First_name, DOB FROM STUDENT WHERE Reg_no = 'S101';
Cover it by single inverted comma if its datatype is varchar or char.
Eliminating Duplicates: A table could hold duplicate rows. In such a case, you can
eliminate duplicates.
Syntax:
SELECT DISTINCT col, col, .., FROM table_name;
Example:
SELECT DISTINCT * FROM Student;
SELECT DISTINCT first_name, city, pincode FROM Student;
It scans through entire rows, and eliminates rows that have exactly the same
contents in each column.
Sorting DATA: The Rows retrieved from the table will be sorted in either
Ascending or Descending order depending on the condition specified in select
statement, the Keyword has used ORDER BY.
SELECT * FROM Student
ORDER BY First_Name;
The above statement will show records in ascending order from A to Z.

NC Software Engineering: Database Concepts Module. Compiled by C. Uta utanoel@gmail.com 65 | 100


 INSERT - inserts data into a table
Syntax:
INSERT INTO table_name (column1, column2, column3, ...)
VALUES (value1, value2, value3, ...);
Example:
INSERT INTO students (studentName, ContactName, Address, City, Country)
VALUES ('Fadzai', Chido Moyo', '143 5th Ave ', 'Harare', 'Zimbabwe');

In this syntax,
 First, specify the table name and a list of comma-separated columns
inside parentheses after the INSERT INTO clause.
 Then, put a comma-separated list of values of the corresponding
columns inside the parentheses following the VALUES keyword.
The number of columns and values must be the same. In addition, the
positions of columns must be corresponding with the positions of their
values.
INSERT statements that use VALUES syntax can insert multiple rows. To do
this, include multiple lists of comma-separated column values, with lists
enclosed within parentheses and separated by commas. Example:
INSERT INTO tbl_name (a,b,c)
VALUES(1,2,3), (4,5,6), (7,8,9);

There are many other variations of the INSERT STATEMENT

 UPDATE - updates data existing in a table


The following illustrates the basic syntax of the UPDATE statement:
UPDATE table_name
SET
column_name1 = expr1,
column_name2 = expr2,
...
[WHERE
condition];
In this syntax:
 First, specify the name of the table that you want to update data
after the UPDATE keyword.
 Second, specify which column you want to update and the new
value in the SET clause. To update values in multiple columns, you
use a list of comma-separated assignments by supplying a value in
each column’s assignment in the form of a literal value, an
expression, or a subquery.
 Third, specify which rows to be updated using a condition in
the WHERE clause. The WHERE clause is optional. If you omit it,
the UPDATE statement will modify all rows in the table.
Notice that the WHERE clause is so important that you should not forget.
Sometimes, you may want to update just one row; However, you may
forget the WHERE clause and accidentally update all rows of the table.

NC Software Engineering: Database Concepts Module. Compiled by C. Uta utanoel@gmail.com 66 | 100


Example:
UPDATE employees
SET
email = 'uta.noel@tcflonline.ac.zw'
WHERE
employeeNumber = 1056;
 DELETE - deletes all records from a table, the space for the records remain
The following illustrates the syntax of the DELETE statement:
DELETE FROM table_name
WHERE condition;
In this statement:
 First, specify the table from which you delete data.
 Second, use a condition to specify which rows to delete in
the WHERE clause. The DELETE statement will delete rows that
match the condition,
Notice that the WHERE clause is optional. If you omit the WHERE clause,
the DELETE statement will delete all rows in the table.
Besides deleting data from a table, the DELETE statement returns the
number of deleted rows.
Example:
Suppose you want to delete employees whose the officeNumber is 4, you
use the DELETE statement with the WHERE clause as shown in the
following query:
DELETE FROM employees
WHERE officeCode = 4;
To delete all rows from the employees table, you use
the DELETE statement without the WHERE clause as follows:
DELETE FROM employees;
All rows in the employees table deleted.
Many other variations of the DELETE statement exist.

 LOCK TABLE – concurrency control


LOCK TABLES trans READ, customer WRITE;
SELECT SUM(value) FROM trans WHERE customer_id=some_id;
UPDATE customer SET total_value=sum_from_previous_statement
WHERE customer_id=some_id;
UNLOCK TABLES;
Example (READ LOCK):
LOCK TABLES table_name READ;
Example (WRITE LOCK):
LOCK TABLES table_name WRITE;
To see lock is applied or not, use following Command
SHOW OPEN TABLES;

NC Software Engineering: Database Concepts Module. Compiled by C. Uta utanoel@gmail.com 67 | 100


To flush/remove all locks, use following command:
UNLOCK TABLES;
EXAMPLE:
LOCK TABLES products WRITE:
INSERT INTO products(id,product_name) SELECT id,old_product_name
FROM old_products;
UNLOCK TABLES;
Above example any external connection cannot write any data to products
table until unlocking table product
EXAMPLE:
LOCK TABLES products READ:
INSERT INTO products(id,product_name) SELECT id,old_product_name
FROM old_products;
UNLOCK TABLES;
Above example any external connection cannot read any data from
products table until unlocking table product

3.9.2. Use of SQL in Data Manipulation and Definition

We use SQL (structured Query Language) for manipulating and defining data in the
database. Common operations use the acronym CRUD – standing for Create, Read,
Update and Delete. SQL allows these and many more other operations on
database to be carried out.
 Creating is the processes of generating or originating new tables, and
inserting (populating) data into the created tables. The CREATE Command
in used for actuating or realizing objects such as databases and tables and
indexes.
 Read means retrieving data from the database for viewing or performing
other operations.
 Update makes changes to sitting data in the database for example when a
transaction is processed a previous value is replaced by a new value.
 Delete erases the contents of the database, such as data held in the
tables.
3.9.3. Data Control Language

Data Control Language (DCL) statements: Some examples:


 GRANT - gives user's access privileges to database
Syntax:
GRANT <object privileges>
ON <object_name>
TO <User_Name>
[WITH GRANT OPTION]
Example:
GRANT ALL
ON Student TO Noel
WITH GRANT OPTION
The WITH GRANT OPTION allows the grantee to in turn grant object privileges to
other users.

NC Software Engineering: Database Concepts Module. Compiled by C. Uta utanoel@gmail.com 68 | 100


Example:
GRANT SELECT, UPDATE
ON Student TO Noel
WITH GRANT OPTION
The user Noel has been given permission to view and modify records in the
table Student.
 REVOKE - withdraws access privileges given with the GRANT command.
Privileges once given can be denied to a user using the REVOKE command.
Syntax:
REVOKE <Object_Privileges>
ON <Object_Name>
FROM <User_Name>
Example:
REVOKE UPDATE
ON Student
FROM Noel;

3.9.4. Transaction Control Language

Transaction Control (TCL) statements are used to manage the changes made by
DML statements. It allows statements to be grouped together into logical
transactions.
 COMMIT - saves work done
Syntax for SQL Commit
COMMIT;
Let us consider the following table for understanding Commit in a better way.
Example: Sample table 1
Student
Rol_No Name Address Phone Age
1 Sue Harare 0766353535 18
2 Kudzie Chitungwiza 0784464441 18
3 Sam Norton 0756577557 20
4 Farai Marondera 0764987863 18
3 Sam Norton 0756577557 20
2 Kudzie Chitungwiza 0784464441 18

Following is an example which would delete those records from the table which
have age = 20 and then COMMIT the changes in the database.
Queries:
DELETE FROM Student WHERE AGE = 20;
COMMIT;

NC Software Engineering: Database Concepts Module. Compiled by C. Uta utanoel@gmail.com 69 | 100


Output:
Thus, two rows from the table would be deleted and the SELECT statement
would look like,
Student
Rol_No Name Address Phone Age
1 Sue Harare 0766353535 18
2 Kudzie Chitungwiza 0784464441 18
4 Farai Marondera 0764987863 18
2 Kudzie Chitungwiza 0784464441 18

 SAVEPOINT – identifies a point in a transaction to which you can later


rollback.
A SAVEPOINT is a point in a transaction in which you can roll the transaction
back to a certain point without rolling back the entire transaction.
Syntax for Savepoint command:
SAVEPOINT SAVEPOINT_NAME;

This command is used only in the creation of SAVEPOINT among all the
transactions.
In general ROLLBACK is used to undo a group of transactions.
Syntax for rolling back to Savepoint command:
ROLLBACK TO SAVEPOINT_NAME;
you can ROLLBACK to any SAVEPOINT at any time to return the appropriate
data to its original state.
Example:
From the above example Sample table1,
Delete those records from the table which have age = 20 and then
ROLLBACK the changes in the database by keeping Savepoints.
Queries:
SAVEPOINT SP1; //Savepoint created.
DELETE FROM Student WHERE AGE = 20; //deleted
SAVEPOINT SP2; /Savepoint created.
Here SP1 is first SAVEPOINT created before deletion. In this example one
deletion has taken place.
After deletion again SAVEPOINT SP2 is created.
Output:
Student
Rol_No Name Address Phone Age
1 Sue Harare 0766353535 18
2 Kudzie Chitungwiza 0784464441 18
4 Farai Marondera 0764987863 18
2 Kudzie Chitungwiza 0784464441 18

Deletion have been taken place, let us assume that you have changed your
mind and decided to ROLLBACK to the SAVEPOINT that you identified as SP1
which is before deletion.

NC Software Engineering: Database Concepts Module. Compiled by C. Uta utanoel@gmail.com 70 | 100


Deletion is undone by this statement,
ROLLBACK TO SP1;
//Rollback completed.
Student
Rol_No Name Address Phone Age
1 Sue Harare 0766353535 18
2 Kudzie Chitungwiza 0784464441 18
3 Sam Norton 0756577557 20
4 Farai Marondera 0764987863 18
3 Sam Norton 0756577557 20
2 Kudzie Chitungwiza 0784464441 18

RELEASE SAVEPOINT: This command is used to remove a SAVEPOINT that


you have created.
Syntax:
RELEASE SAVEPOINT SAVEPOINT_NAME
Once a SAVEPOINT has been released, you can no longer use the ROLLBACK
command to undo transactions performed since the last SAVEPOINT.

 ROLLBACK - restores database to original state since the last COMMIT


If any error occurs with any of the SQL grouped statements, all changes
need to be aborted. The process of reversing changes is called rollback. This
command can only be used to undo transactions since the last COMMIT or
ROLLBACK command was issued.
Syntax:
ROLLBACK;
Example:
From the above example Sample table1,
Delete those records from the table which have age = 20 and then
ROLLBACK the changes in the database.
Queries:
DELETE FROM Student WHERE AGE = 20;
ROLLBACK;
Output:
Student
Rol_No Name Address Phone Age
1 Sue Harare 0766353535 18
2 Kudzie Chitungwiza 0784464441 18
3 Sam Norton 0756577557 20
4 Farai Marondera 0764987863 18
3 Sam Norton 0756577557 20
2 Kudzie Chitungwiza 0784464441 18

NC Software Engineering: Database Concepts Module. Compiled by C. Uta utanoel@gmail.com 71 | 100


3.10. Chapter 3 Questions
1. Describe a relational database system. [6]
2. Name any five entities (two of which are abstract) that can be found in a
particular organisation other than a college, university, or school. [5]
3. Draw an ER-Model for the entities identified in number 2 above. [20]
4. How is a conceptual model different from a physical model? [6]
5. Differentiate an entity type from an entity occurrence. [4]
6. Explain three ways that can help you identify entities for an organisation
which you want to design a database for. [6]
7. Explain the importance of semantic language in designing and
development of a database system. [3]
8. Using examples, relate and differentiate non-key attributes from
key-attributes. [6]
9. Describe the following types of entities:
a. Fundamental [2]
b. Attributive [2]
c. Associative [2]
d. Subtype [2]
10. What is the role of an Entity Relationship Matrix in the design
of a database? [4]
11. Explain the concept of Cardinality as used in relational databases. [3]
12. Draw an ERD for a supermarket like OK, or P’n’P to show the relationships
among, Customer, Till Operator, Supervisor and Manager. [15]
13. An external view is mainly an abstraction of the underlying data in
the database. Deliberate. [5]
14. Explain the messages conveyed by each of the following symbols:

a [2]
b [2]
c [2]

d [2]
e [2]
f [2]

g [2]

NC Software Engineering: Database Concepts Module. Compiled by C. Uta utanoel@gmail.com 72 | 100


15. Describe any three (3) Best Practices for creating a Relational Model Design. [9]
16. Explain four (4) advantages and three (3) disadvantages of a Relational
Database Model. [14]
17. What purpose does normalization serve? [5]
18. Give a full explanation of what it means to in:
a. First Normal Form (1NF) [4]
b. Second Normal Form (2NF) [4]
c. Third Normal Form (3NF) [4]
d. Normal Form (BCNF) [4]
19. Explain the concept of database schema, the types, and purpose. [15]
20. Practise normalising tables from 1NF to BCNF.
21. Write simple database-statement examples using the following commands:
a. CREATE [8]
b. DROP [4]
c. ALTER [8]
d. COMMIT [5]
e. ROLLBACK [4]
f. SELECT
i. Simple SELECT from one table [4]
ii. SELECT ALL records from a table [3]
iii. Complex SELECT from two tables, on condition. [10]
g. INSERT
i. Simple Insert [6]
ii. Insert on Condition [8]
h. UPDATE
i. Simple Update [4]
ii. CASCADE Update [8]
i. DELETE records that meet a specific criteria [5]
j. GRANT
i. Partial rights to a particular user [6]
ii. Full rights to a particular user [6]
k. REVOKE [6]
l. VIEW
i. Simple view [5]
ii. View from two tables on condition [8]
m. RENAME a particular column [4]
n. REPLACE values meeting a specific condition. [6]

NC Software Engineering: Database Concepts Module. Compiled by C. Uta utanoel@gmail.com 73 | 100


CHAPTER 4

DATABASE ADMINISTRATION

NC Software Engineering: Database Concepts Module. Compiled by C. Uta utanoel@gmail.com 74 | 100


4. DATABASE ADMINISTRATION

4.1. Objectives
Main Objective: Manage databases
Sub-objectives
1. Spell out reasons for and importance of securing database systems
2. Explain methods, tools and processes required to secure database systems
3. Explain backup and recovery plans for database systems
4. Explain and demonstrate database maintenance procedures
5. Explain the reasons, how database monitoring is done and spell out the
significance of monitoring databases

4.2. Introduction
Database administration refers to the whole set of activities performed by
a database administrator to ensure that a database is always available as needed.
Other closely related tasks and roles are database security, database monitoring and
troubleshooting, and planning for future growth.
Database administrators use specialized software to store and organize data. The
role may include capacity planning, installation, configuration, database design,
migration, performance monitoring, security, troubleshooting, as well as backup and
data recovery.

4.3. Database Security


Data has become one of the most important resources in an organisation. Therefore
every effort must be put in place to ensure its security.
Database security should provide controlled and protected access to the users and
should also maintain the overall quality of the data. The threats related to database
security are evolving every day, so it is required to come up with strong security
techniques, strategies, and tools that can safeguard databases from potential attacks.
The pillars of security revolve around CIAN: Confidentiality, Integrity, Availability,
and Non-repudiation.
4.3.1. Objectives of Database Management

There are several objectives that are sought to be addressed, and include:
1. Data Availability - make an integrated collection of data available to a wide
variety of users whenever they need to use the data.
2. Data Integrity - ensure correctness and validity of data held in the database
3. Privacy (Confidentiality) (the goal) and security (the means)
4. Management Control (Non-Repudiation, Consistence, Integrity) – role
mainly played by the DBA.
5. Data Independence (a relative term) - avoids reprogramming of
applications, allows easier conversion and reorganization
i. Physical data independence - program unaffected by changes in the
storage structure or access methods and vice-versa.
ii. Logical Data Independence - program unaffected by changes in the
schema and vice-versa.

NC Software Engineering: Database Concepts Module. Compiled by C. Uta utanoel@gmail.com 75 | 100


4.3.2. What are the Causes for Database Failure?

There are many different types of failure that can affect database processing, each
of which has to be dealt with in a different manner. Some failures affect main
memory only, while others involve non-volatile (secondary) storage. Among the
causes of failure are:
1. System Crashes: In case of system crash, the systems hang up and need to
be rebooted. These failures occur due to hardware malfunction or a bug in
the database software or the operating system itself. It causes the loss of
the content of volatile storage and brings transaction processing to a halt.
The content of non-volatile storage does not affected with this type of
failure. The assumption that hardware errors and bugs bring the system to
a halt, but do not corrupt the non-volatile storage contents is known as
the Fail-Stop Assumption.
2. User Error: An example of a user error is a user inadvertently deleting a
row or dropping a table.
3. Carelessness: Carelessness is the destruction of data by operators or users
because they were not concentrating on the task at hand.
4. Sabotage (intentional corruption of data): Sabotage is the intentional
corruption or destruction of data, hardware, or software facilities, by
employees, competitors, hackers, and governments.
5. Statement Failure: A statement failure can be defined as the inability of
the database to execute an SQL statement. While running a user program,
a transaction might have multiple statements and one of the statements
might fail due to various reasons. Typical examples are selecting from a
table that does not exist, or trying to do an insert and having the
statement fail due to lack of space. Recovery from such failures is
automatic. Upon detection, the database usually will roll back the
statement, returning control to the user or user program.
6. Application software errors: Application software errors include logical
errors in the program that is accessing the database, which causes one or
more transactions to fail.
7. Network Failure: Network failures can occur while using a client-server
configuration or a distributed database system where multiple database
servers are connected by communication networks. Communication
software, line and hardware failures will interrupt the normal operations
of the database system.
8. Media Failure: Media failures are the most dangerous failures. Not only
there is a potential to lose data if proper backup procedures are not
followed, but it usually takes more time to recover than with other kinds
of failures. A typical example of a media failure is a disk-head crash, which
causes all, databases residing on that disk or disks to be lost.
9. Natural Physical Disasters: Natural and physical disasters are the damage
caused to data, hardware and software due to natural disasters like fires,
floods, earthquakes, power failures, and excessive heat.

NC Software Engineering: Database Concepts Module. Compiled by C. Uta utanoel@gmail.com 76 | 100


4.3.3. Methods Required to Secure Database Systems

Technology can be used to ensure a secure computing environment for the


organization. Most security issues could be resolved using appropriate technology.
The basic security standards which technology can ensure are confidentiality,
integrity, availability, and non-repudiation.
1. Confidentiality: A secure system ensures the confidentiality of data. This
means that it allows individuals to see only the data they are supposed to see.
Confidentiality has several aspects like privacy of communications, secure
storage of sensitive data, authenticated users, and authorization of users.
1. Schema / sub-schema, passwords
2. Encryption (cryptography)
3. Assigning privileges
4. Firewalling
5. Physical security controls
2. Management Control / Non-Repudiation: Non-repudiation is a mechanism
that will confirm that a specific user, program, system, or device cannot deny
the actions that it carried out. A system should have a tracing mechanism that
can identify the origin, time, user / program / device ID that would have
performed or attempted to perform actions on a particular system. Thus this
system carries out logging of system activities / operations. Some operations
related to management control include:
1. Lifecycle Control
2. Auditing
3. Compliance
4. Training
5. Maintenance
3. Privacy of Communications: The DBMS should be capable of controlling the
spread of confidential personal information such as health, employment, and
credit records. It should also keep the corporate data such as trade secrets,
proprietary information about products and processes, competitive analyses,
as well as marketing and sales plans secure and away from the unauthorized
people.
4. Secure Storage of Sensitive Data: Once confidential data has been entered, its
integrity and privacy must be protected on the databases and servers wherein
it resides.
5. Authentication: One of the most basic concepts in database security is
authentication, which is quite simply the process by which a system verifies a
user's identity. A user can respond to a request to authenticate by providing a
proof of identity, or an authentication token. User ID and a password are
commonly used. The user ID and password are used to ascertain that you are
whom you claim to be.
6. Authorization: An authenticated user goes through the second layer of
security, authorization. Authorization is permission given to user, program, or
process to access an object or set of objects. The type of data access granted
to a user can vary but falls within read-only, or read and write. Privileges
specify the type of Data Manipulation Language (DML) operations like SELECT,
INSERT, UPDATE, DELETE.

NC Software Engineering: Database Concepts Module. Compiled by C. Uta utanoel@gmail.com 77 | 100


A user may be assigned all, none, or a combination of these types of
authorization. In addition to these forms of authorization for access to data, a
user may be granted authorization to modify the database schema:
1. Read authorization allows reading, but not modification, of data.
2. Insert authorization allows insertion of new data, but not modification
of existing data.
3. Update authorization allows modification, but not deletion of data.
4. Index authorization allows the creation and deletion of indexes.
5. Resource authorization allows the creation of new relations.
6. Alteration authorization allows the addition or deletion of attributes in
a relation.
7. Drop authorization allows the deletion of relations. The drop and
delete authorization differ in that delete authorization allows deletion
of tuples only. If a user deletes all tuples of a relation, the relation still
exists, but it is empty. If a relation is dropped it no longer exists.
Two methods by which the access control is done are by using privileges and
roles.
1. Privileges: A privilege is permission to access a named object in a
prescribed manner; for example, permission to query a table.
Privileges can be granted to enable a particular user to connect to
the database (create a session); select rows from someone else's
table; or execute someone else's stored procedure.
a. Database Privileges: A privilege is a right to execute a
particular type of SQL statement or to access another user's
object. Some examples of privileges include:
i. The right to connect to the database (create a session)
ii. The right to create a table
iii. The right to select rows from another user's table
iv. The right to execute another user's stored procedure

Grant the least privileges (minimal) just to enable the user to


accomplish necessary work.
i. System Privileges: A system privilege is the right to
perform a particular action, on a particular type of
object. For example, the privileges to create tables
and to delete the table in a database are system
privileges.
ii. Object Privileges: An object privilege is a privilege or
right to perform a particular action on a specific table,
view, sequence, procedure, function, or package. For
example, the privilege to delete rows from the table
STUDENT is an object privilege.
2. Roles: A role is a mechanism that can be used to provide
authorization. Roles are named groups of related privileges that you
grant to users or other roles. A single person or a group of people can
be granted a role or a group of roles. By defining different types of
roles, administrators can manage access privileges much more easily.
Roles are designed to ease the administration of end-user system and
object privileges.

NC Software Engineering: Database Concepts Module. Compiled by C. Uta utanoel@gmail.com 78 | 100


The following properties of roles allow for easier privilege
management within a database:
Reduced privilege administration - Rather them explicitly
i.
granting the same set of privileges to several uses, you can
grant the privileges for a group of related users to a role.
Then, only the role needs to be granted to each member of
the group.
ii. Dynamic privilege management - If the privileges of a group
must change, only the privileges of the role need to be
modified.
iii. Selective availability of privileges - You can selectively enable
or disable the roles granted to a user. This allows specific
control of a user's privileges in any given situation.
iv. Application-specific security - you can protect role use with a
password. Applications can be created specifically to enable a
role when supplied the correct password. Users cannot enable
the role if they do not know the password.
7. Integrity: A secure system ensures that the data it contains is valid. Data
integrity means that data is protected from deletion and corruption, both
while it resides within the data-case, and while being transmitted over the
network. Integrity allows for:
a. checkpoint/restart/recovery procedures must be incorporated
b. concurrency control and multi-user updates
c. accounting, audit trail (financial, legal)
8. Availability: A secure system makes data available to authorized users,
without delay. Denial-of-service (DOS) attacks are attempts to block
authorized users’ ability to access and use the system when needed.
Availability encompasses making the system and its data available:
a. at reasonable cost—performance in query update, eliminate or
control data redundancy
b. in meaningful format—data definition language, data dictionary
9. Easy Access—query language (4GL, SQL, forms, windows, menus);
embedded SQL, etc.; utilities for editing, report generation, sorting can be
used to facilitate database operations.
4.3.4. Tools Required to Secure Database Systems

1. Configuration Tools: Uncover configuration mistakes, identification, and


access control issues, missing patches or any toxic combination of settings
that could lead to escalation-of-privilege or denial-of-service attacks, data
leakage or unauthorized modification of data.
2. Data Mask: for example, MSSQL Data Mask has tools that are categorized
for data masking and is used for protecting data that is classified as
personally identifiable data, sensitive personal data, or commercially
sensitive data.
3. Network discovery and security auditing: For example, Nmap (“Network
Mapper”) is useful for tasks such as network discovery and network
inventory, managing service upgrade schedules, and monitoring host or
service uptime.
4. Auditing Tools: The Oracle Auditing Tools is a toolkit that could be used to
audit security within Oracle database servers. This open-source toolkit

NC Software Engineering: Database Concepts Module. Compiled by C. Uta utanoel@gmail.com 79 | 100


includes password-attack tools, command-line query tools, and query tools
to test the security of Oracle database configurations.
5. Encryption Tools: For example, DbDefence is an effective security solution
for encrypting complete databases and protecting its schema within the MS
SQL Server. It allows database administrators and developers to encrypt
databases completely.
4.3.5. Processes Required to Secure Database Systems

A few best practices can help even the smallest of businesses secure their
database from potential risks.
1. Separate the Database and Web Servers: A database should reside on a
separate database server located behind a firewall, not in the DMZ with
the web server. A tiered system is an ideal option for securing a database.
2. Encrypt Stored Files: Encrypt stored files. Stored files of a web application
often contain information about the databases the software needs to
connect to. This information, if stored in plain text, provides keys an
attacker needs to access sensitive data.
3. Encrypt Your Backups Too: Encrypt back-up files. Data theft may happen
as a result of an outside attack, but oftentimes, people we trust most that
are the attackers. Also encrypt data in transmission.
4. Use a WAF: Employ web application firewalls. In addition to protecting a
site against cross-site scripting vulnerabilities and web site vandalism, a
good application firewall can thwart SQL injection and other attacks as
well.
5. Keep Patches Current: Web sites that are rich with third-party
applications, widgets, components and various other plug-ins and add-ons
can easily find themselves a target to an exploit that should have been
patched.
6. Minimize Use of 3rd Party Apps: Keep third-party applications to a
minimum. Many of these applications are created by hobbyists or
programmers who discontinue support for them. Unless they are
absolutely necessary, don’t install them. If you do install them, kept them
patched.
7. Don't Use a Shared Server: Avoid using a shared web server if your
database holds sensitive information. A shared host exposes one to
security threats. If you have no other choice, make sure to review security
policies and speak with hosts about what their responsibilities should your
data become compromised.
8. Enable Security Controls: Enable security controls on your database.
While most databases nowadays will enable security controls by default, it
never hurts for you to go through and make sure you check the security
controls to see if this was done.

NC Software Engineering: Database Concepts Module. Compiled by C. Uta utanoel@gmail.com 80 | 100


4.4. Backup and Recovery Plans
4.4.1. Backup and Recovery Plans for Database Systems

Database backup means a duplicate (redundant) copy of data is saved during a


backup session with help of a database management system (DBMS). This copy is
available as replacement of damaged or lost primary data. Backups represent a
mechanism of protecting and retaining important information for use whenever a
need arises.
Recovery is a process of getting system back into operation after a system
problem or failure. It may mean rebuilding, repairing, or resuscitating the
database system after a disruption, corruption, or attack on the system.
4.4.2. Plan for Backup and Recovery

Planning for database backup and recovery is a continuous process that takes
much time and effort but provides great benefits. First of all you need to figure out
what information to duplicate, how often to make backups, who should do this
task, what equipment to use, and what kind of backup you ought to do.
A specific backup procedure / policy has to be put in place to regularly copy
databases and retain them in secure digital storages.
Some basic considerations that help create database backup & recovery plan
include:
1. Data Importance: For more important and business-critical data (e.g.
client base) you will need to create a plan that involves making extra
copies of your database over the same period it is running, and ensure
that the copies can be easily restored when required. For less important
data (e.g. daily log files), you can schedule a simple plan that does not
require frequent database backup and recovery.
2. Frequency of Change: The frequency of changes influences the decision
on how often to back up and recover the database. If critical data is
modified daily then you should make a daily backup schedule. Your final
decision would also depend on hardware and software capabilities.
3. Speed: Recovery speed is an important time-related factor that
determines the maximum possible time period that could be spent
on database backup and recovery.
4. Equipment: To perform timely backups and recoveries, appropriate
software and hardware (perhaps, several sets of backup media and
devices), including optical drives, removable disk drives, special file
management utilities are needed.
5. Responsibility: Ideally, one person (e.g. IT department head) should be
appointed to control and supervise the backup and recovery plan, and
several IT specialists (e.g. system administrators) should be responsible
for performing the actual backup and recovery of data.
6. Storing: Where do you plan to store database duplicates? Options include
off-site; on-site, and cloud-based.
Database security requires extensive experience handling sensitive data and
current knowledge of new cyber threats.

NC Software Engineering: Database Concepts Module. Compiled by C. Uta utanoel@gmail.com 81 | 100


4.4.3. Database Backup Approaches / Methods

Three common types of database backups can be run on a desired system: normal
(full), incremental and differential. A customized backup plan can minimize
downtime and maximize efficiency.
Whenever a file is created or updated, an archive bit is attached to that file’s
filename. One can actually view the archive bit in that file’s properties. The
archival bit receives a check mark any time that file has been updated, and the
backup software uses this checkbox to track which files on a system are due for
archiving.
1. Normal or Full Backups: When a normal or full backup runs on a selected
drive, all the files on that drive are backed up. This, of course, includes
system files, application files, user data — everything. Those files are then
copied to the selected destination (backup tapes, a secondary drive or the
cloud), and all the archive bits are then cleared.
Normal backups are the fastest source to restore lost data because all the
data on a drive is saved in one location. The downside of normal backups
is that they take a very long time to run, and in some cases this is more
time than a company can allow. Drives that hold a lot of data may not be
capable of a full backup, even if they run overnight. In these cases,
incremental and differential backups can be added to the backup schedule
to save time.
2. Incremental Backups: A common way to deal with the long running times
required for full backups is to run them only on weekends. Many
businesses then run incremental backups throughout the week since they
take far less time. An incremental backup will grab only the files that have
been updated since the last normal backup. Once the incremental backup
has run, that file will not be backed up again unless it changes or during
the next full backup.
While incremental database backups do run faster, the recovery process is
a bit more complicated. If the normal backup runs on Saturday and a file is
then updated Monday morning, should something happen to that file on
Tuesday, one would need to access the Monday night backup to restore it.
For one file, that’s not too complicated. However, should an entire drive
be lost, one would need to restore the normal backup, plus each and
every incremental backup run since the normal backup.
3. Differential Backups: An alternative to incremental database backups that
has a less complicated restore process is a differential backup. Differential
backups and recovery are similar to incremental in that these backups
grab only files that have been updated since the last normal backup.
However, differential backups do not clear the archive bit. So a file that is
updated after a normal backup will be archived every time a differential
backup is run until the next normal backup runs and clears the archive bit.
In a differential backup, only selected files and folders that have a marker
are backed up. Because a differential backup does not clear markers, if
you did two differential backups in a row on a file, the file would be
backed up each time. This backup type is moderately fast at backing up
and restoring data.
Similar to our last example, if a normal backup runs on Saturday night and
a file gets changed on Monday, that file would then be backed up when

NC Software Engineering: Database Concepts Module. Compiled by C. Uta utanoel@gmail.com 82 | 100


the differential backup runs Monday night. Since the archive bit will not be
cleared, even with no changes, that file will continue to be copied on the
Tuesday night differential backup and the Wednesday night differential
backup and every additional night until a normal backup runs again
capturing all the drive’s files and resetting the archive bit.
A restore of that file, if needed, could be found in the previous night’s
tape. In the event of a complete drive failure, one would need to restore
the last normal backup and only the latest differential backup. This is less
time consuming than an incremental backup restore. However, each night
that a differential backup runs, the backup files get larger and the time it
takes to run the backup lengthens.
Differential backup – It stores only the data changes that have occurred
since last full database backup. When same data has changed many times
since last full database backup, a differential backup stores the most
recent version of changed data. For this first, we need to restore a full
database backup.
4. Daily Backups: There is a fourth, less common form of backup, known as
daily backups. This is usually saved for mission-critical (are very important
data sets) files. If files that are updated constantly cannot wait a full
twenty-four hours for the nightly backup to run and capture them, daily
backups are the best choice. This type of backup uses the file’s timestamp
(a timestamp is a time record attached / recorded on a backup to indicate
when the last backup was done), not the archive bit, to update the file
once changes are made. This type of database backup runs during
business hours, and having too many of these files can impact network
speeds and overall performance.
5. Transaction log backup – In this, all events that have occurred in the
database, like a record of every single statement executed is backed up. It
is the backup of transaction log entries and contains all transaction that
had happened to the database. Through this, the database can be
recovered to a specific point in time. It is even possible to perform a
backup from a transaction log if the data files are destroyed and not even
a single committed transaction is lost.

Full Differential Incremental

Storage Space High Medium to High Low


Backup Speed Slowest Fast Fastest
Restoration Speed Fastest Fast Slowest
Media Required for Most recent Most recent full backup Most recent full backup &
Recovery backup only & most recent all incremental backups
differential backup since full backup
Duplication Stores a lot Stores duplicate files No duplicate files
of duplicate
files

NC Software Engineering: Database Concepts Module. Compiled by C. Uta utanoel@gmail.com 83 | 100


4.5. Database Maintenance
As well as supervising the regime that is designed to provide resilience and high
availability, and ensuring that scheduled processes complete successfully, there will
be the task of fixing application defects, applying hot fixes for missing or
inappropriate indexes, reporting software bugs, and determining temporary
workarounds, managing addition of new functions to the application.
Once the database system is implemented, the operational maintenance phase of
the database system begins. The operational maintenance is the process of
monitoring and maintaining the database system. Maintenance includes activities
such as adding new fields, changing the size of existing field, adding new tables, and
so on. As the database system requirement change, it becomes necessary to add
new tables or remove existing tables and to reorganize some files by changing
primary access methods or by dropping old indexes and constructing new ones.
Some queries or transactions may be rewritten for better performance. Database
tuning or reorganization continues throughout the life of database and while the
requirements keep changing.
4.5.1. Configuration Management

CM applied over the life cycle of a system provides visibility and control of its
performance, functional, and physical attributes. CM verifies that a system
performs as intended, and is identified and documented in sufficient detail to
support its projected life cycle. The CM process facilitates orderly management of
system information and system changes for such beneficial purposes as to revise
capability; improve performance, reliability, or maintainability; extend life; reduce
cost; reduce risk and liability; or correct defects. The relatively minimal cost of
implementing CM is returned many fold in cost avoidance. The lack of CM, or its
ineffectual implementation, can be very expensive and sometimes can have such
catastrophic consequences such as failure of equipment or loss of life.
It identifies the functional and physical attributes of software at various points in
time, and performs systematic control of changes to the identified attributes for
the purpose of maintaining software integrity and traceability throughout the
software development life cycle.
4.5.2. Types of Maintenance

Various forms of database maintenance procedures are required to ensure that


the database continues to provide expected services.
1. Corrective Maintenance: This type of maintenance implies removing errors
in a program, which might have crept in the system due to faulty design or
wrong assumptions. Corrective measures are aimed to restore a system
after a disaster or otherwise unwanted event takes place. These measures
focus on fixing or restoring the systems after a disaster. Corrective measures
may include keeping critical documents in the Disaster Recovery Plan or
securing proper insurance policies, after a "lessons learned" brainstorming
session.
2. Detective Measures: Detective measures are taken to discover the presence
of any unwanted events within the IT infrastructure. Their aim is to uncover
new potential threats. They may detect or uncover unwanted events. These
measures include installing fire alarms, using up-to-date antivirus software,
holding employee training sessions, and installing server and network
monitoring software.

NC Software Engineering: Database Concepts Module. Compiled by C. Uta utanoel@gmail.com 84 | 100


3. Preventive Measures: Preventive measures will try to prevent a disaster
from occurring. These measures seek to identify and reduce risks. They are
designed to mitigate or prevent an event from happening. These measures
may include keeping data backed up and off site, using surge protectors,
installing generators and conducting routine inspections.
4. Adaptive Maintenance: In adaptive maintenance, program functions are
changed to enable the information system to satisfy the information needs
of the user. This type of maintenance may become necessary because of
organizational changes which may include:
i) Change in the organizational procedures,
ii) Change in organizational objectives, goals, policies, etc.
iii) Change in forms,
iv) Change in information needs of managers.
v) Change in system controls and security needs, etc.
5. Perfective Maintenance: Perfective maintenance means adding new
programs or modifying the existing programs to enhance the performance
of the information system. This type of maintenance undertaken to respond
to user’s additional needs which may be due to the changes within or
outside of the organization. Outside changes are primarily environmental
changes, which may in the absence of system maintenance; render the
information system ineffective and inefficient. These environmental changes
include:
i) Changes in governmental policies, laws, etc.,
ii) Economic and competitive conditions, and
iii) New technology.

4.6. Database Monitoring


Database Activity Monitors capture and record, at a minimum, all Structured Query
Language (SQL) activity in real time or near real time, including database
administrator activity, across multiple database platforms; and can generate alerts
on policy violations.
The most significant database security component is activity monitoring, or what are
commonly called database activity monitoring (DAM) platforms. They capture all
SQL activity to the database -- including administrative actions -- and analyze the
statements for behavioural, contextual or security misuse. These tools can detect
and alert on a wide variety of threats, and most have the capability of blocking
certain statements -- though few organizations use this blocking capability.
While a number of tools can monitor various levels of database activity, Database
Activity Monitors are distinguished by five features:
1. The ability to independently monitor and audit all database activity, including
administrator activity and SELECT transactions. Tools can record all SQL
transactions: DML, DDL, DCL, (and sometimes TCL) activity.
2. The ability to store this activity securely outside the database.
3. The ability to aggregate and correlate activity from multiple heterogeneous
Database Management Systems (DBMSs). Tools can work with multiple
DBMSs (e.g., Oracle, Microsoft, IBM) and normalize transactions from
different DBMSs despite differences between SQL flavours.
4. The ability to enforce separation of duties on database administrators.
5. The most significant database security component is activity monitoring, or
what are commonly called database activity monitoring (DAM) platforms.
They capture all SQL activity to the database -- including administrative

NC Software Engineering: Database Concepts Module. Compiled by C. Uta utanoel@gmail.com 85 | 100


actions -- and analyze the statements for behavioural, contextual or security
misuse. These tools can detect and alert on a wide variety of threats, and
most have the capability of blocking certain statements -- though few
organizations use this blocking capability.
The reason most organizations roll out DAM into their security arsenal is not just for
the ability to detect threats, but because it is the best way to collect an accurate trail
of events for regulatory reporting and to provide data and data filtering options not
available with built-in database audit logs.
Proactively monitoring databases is one of the best ways to ensure a smooth long-
term operation with minimal downtime and predictable costs. By using monitoring
tools, businesses can track the performance of both hardware and software by
taking frequent snapshots of performance indicators. This highlights any changes,
identifies bottlenecks, and pinpoints the exact moments at which problems started
to occur. With this information to hand, businesses are then able to rule out
potential causes, and get to the real root of the issue.

Broadly speaking, database monitoring consists of three layers:


1. The operating system and the hardware: This layer inspects input,
output, memory utilisation, network utilisation, Central Processing Unit
(CPU), physical disk space, and component status.
2. The server software: This examines Data Manipulation Language (DML)
and Data Definition Language (DDL) statements in logs, wait time, users,
objects, whether backups run successfully, and whether replication is
running.
3. Individual SQL statements or queries: This examines throughput,
latency, concurrency/load, and how errors are being dealt with.
4.6.1. Types of monitoring tools

Many different database monitoring tools exist, but the best solution depends
entirely on an organisation’s needs. For example, some businesses might want to
put a focus on addressing database problems in real-time, some might want a
monitoring tool that boasts a sizeable SQL monitoring ability, and others might be
looking for a combination of network, server and applications monitoring. Before
investing in these tools, businesses should consider their specific environment and
install the applications that best suit their requirements.
Database monitoring tools often fall into four categories:
1. General purpose monitoring: General purpose monitoring tools offer a
little bit of everything. These tools monitor a wide range of components:
servers, databases, services, and sometimes networking. General purpose
monitoring capabilities typically scrape status metrics from the server and
store them in a time series database for charting and trending. They often
have the ability to send alerts based on thresholds, and shine when you
want to use one tool to monitor as many things as possible.
2. Reporting and administration: Reporting and administration tools focus
on the needs of database administrators or operations teams, who need
to report on database activity or manipulate databases. Needs can vary
widely, such as trying to develop a specific report or make administrative
changes that can result in adding users, data types, or modifying objects
like tables, stored procedures, schema and indexes. These tools allow for

NC Software Engineering: Database Concepts Module. Compiled by C. Uta utanoel@gmail.com 86 | 100


ad hoc interactions, like querying the database, to run a report or edit the
data directly in a table.
3. Query performance: Query performance monitoring is important for
optimising queries/SQL statements and the broader user experience
associated with an application. Understanding the totality of the different
types of requests that users and applications are sending to the database,
such as background activity, vacuuming, and backups, adds up to the
resulting performance of the database.
Different tools specialise in different aspects of query performance. Some
tools are useful for proactively diagnosing performance problems. Others
can diagnose slow running queries or analyse log files.
4. Health checks and alerting: Running regular health checks and setting up
alerts is a critical database monitoring function. This can be done by
installing programmes with plugins that execute quick status checks and
report on whether something is okay, whether it has a warning, or if it’s in
a critical condition. Some health-check tools also incorporate general
purpose monitoring with graphing and charting to provide further insight
into the ‘health’ of the database.
4.6.2. What Database Metrics should be monitored?

We will now list a set of generic categories. Under each category, we will list a few
types of database metrics you should consider monitoring. This is not an
exhaustive list, but we emphasize these because together, they paint a complete
picture of the database environment.
4.6.2.1. Infrastructure
Infrastructure should be part of any database monitoring. The metrics
should include:
 Percent CPU time used by the database process
 Available memory
 Available disk space
 Disk queue for waiting IO
 Percent virtual memory use
 Network bandwidth for inbound and outbound traffic
If the metrics are above or below the acceptable threshold, we recommend
relating the figures to database metrics as well. That’s because hardware or
network-related events like a full disk or a saturated network can reflect in
poor query performance. Similarly, a database specific issue like a blocked
database query can show up as a high CPU use.
4.6.3.2. Availability
The next thing to monitor is database availability. That’s because you want
to ensure the database is available and accessible before looking at any
other counters. It also saves you from customers complaining before you
find out an outage. Metrics can include:
 Accessibility of the database node(s) using common protocols like
Ping or Telnet
 Accessibility of the database endpoint and port (e.g. 3306 for
MySQL, 5432 for PostgreSQL, etc.)
 Failover events for master nodes or upgrade events for slave/peer
nodes in multi-node clusters

NC Software Engineering: Database Concepts Module. Compiled by C. Uta utanoel@gmail.com 87 | 100


4.6.3.3. Throughput
Throughput should be measured to create normal production performance
baselines. The actual metrics for this category will vary between different
database platforms, but there are some common ones.
 Connection wait time for database endpoints
 Number of active database connections
 Number of read queries received or in progress
 Number of insert, update, or delete commands received or in
progress
 Average time to complete a read query
 Average time to complete insert, update or delete commands
 Replication lag between primary and secondary nodes
 Number of completed transactions
 Percent growth of data and transaction log size
 Percent of times in-memory data cache is accessed
 Heap memory used
To create performance baselines, throughput metrics should be collected
during different workload periods and reported in specific time scale (e.g,
per minute). The collection process should repeat a number of times.
For example, collecting metrics during month-end batch processing or Black
Friday sale events over three to four cycles can provide insight into a
system’s health during those periods. These may be different from after-
hours operations or weekday sales events.
As baselines are built over time, they can be used to create acceptable
thresholds for alarms. Any large deviation from usual values would then
need investigation.
As an example, the following image shows a dashboard for an RDS MySQL
cluster with some of the throughput metrics. From here, it will be very easy
to see any sudden spikes or dips from the normally trending values.
3.6.3.4. Performance
Like throughput, performance counters will vary between different
databases and should be reported in a specific time scale. These metrics can
indicate potential bottlenecks and we recommend creating baselines for
these as well. The common ones include:
 Number of read or write queries currently waiting or blocked
 Percent of times disk-based virtual memory is accessed
 Number of database lock timeouts
 Number of deadlocks
 Queries running slower than a set threshold
 Warnings raised for out-of-date statistics or unusable indexes
 Skewed data distribution in nodes
 Application traces
A good monitoring tool should allow you to drill down on reported metrics.
For example, a query plan should be “clickable” to further expose the
indexes or joins chosen by the query optimizer. This type of performance
drill down is typically best done by monitoring tools that ship with the
database product.

NC Software Engineering: Database Concepts Module. Compiled by C. Uta utanoel@gmail.com 88 | 100


Sometimes third-party monitoring tools can also show these in an easy-to-
understand fashion.
A good monitoring tool should allow you to create composite metrics from
those that are available.
3.6.4.5. Scheduled Tasks
Databases often run repetitive tasks as scheduled “jobs”. Some systems like
Microsoft SQL Server or Oracle have built-in job scheduling facilities, while
others use cron or third-party schedulers. Some examples of scheduled jobs
include:
 Full and incremental database backups
 Database maintenance tasks like vacuuming, reindexing, analyzing
and updating statistics, database integrity checks, log rotation,
compaction, etc.
 Application-specific tasks like nightly data loads and exports,
archiving, etc.
Regardless of function, scheduled tasks’ outcomes (success or failure) need
to be monitored.
3.6.4.6. Security
Database security monitoring has to be aligned with enterprise-wide
security initiatives and goals. At a minimum, we recommend monitoring the
following:
 Number of failed login attempts
 Database configuration change events
 New user account creation
 Password changes
DBAs don’t need to monitor each individual event, but it’s still important to
look at the aggregated values of these events. All these metrics should be
very small during normal operation. It should become a concern only when
there are large spikes in those aggregated figures. For example, hundreds of
failed login attempts should trigger an alarm. While this may not necessarily
mean an intrusion, it still requires attention.
3.6.4.7. Logs
Every database engine has some type of log where it records information.
This log can be made up of one or more physical files. A monitoring tool
should be able to collect, parse, and store these logs and create metrics and
dashboards from the events they expose. Log management is one of the
core requirements of database monitoring because logs can contain
invaluable information like:
 Database system events (startup, shutdown, errors, etc.)
 All user and system queries
 Scheduled jobs’ outputs

NC Software Engineering: Database Concepts Module. Compiled by C. Uta utanoel@gmail.com 89 | 100


4.7. Chapter 4 Questions
1. List four (4) activities that are done in database administration. [4]
2. Explain the four (4) activities listed in 1., above. [8]
3. What is needed for one to be an effective DBA? [10]
4. How does a DBMS assist the DBA is his / her work? [10]
5. Give four (4) reasons why data must be secured. [12]
6. Explain the following pillars of security:
a. Confidentiality [3]
b. Integrity [3]
c. Availability [3]
d. Non-repudiation [3]
7. Describe three (3) factors that make it difficult to implement foolproof
security systems for enterprise data. [9]
8. Briefly write on the following terms:
a. Privacy [2]
b. Management Control [2]
c. Logical data independence [2]
d. Physical data independence [2]
e. Data Consistency [2]
9. Explain five (5) causes or factors that may result in database failure. [15]
10. How do the following help to secure database systems?
a. Assignment of privileges [5]
b. Encryption [5]
c. Firewalling [5]
d. Auditing [5]
e. Training [5]
f. Lifecycle-control [5]
g. Physical Controls [5]
h. Maintenance [5]
i. Compliance to regulations [5]
j. Authentication and Authorization [5]
11. Explain any two (2) tools that can be used to secure a database system. [10]
12. Separation of concerns, patching systems, making regular backs, enabling
security controls and minimizing the use of third party applications are
additional processes that can be used to secure database systems.
Explain the underlined terms. [20]

NC Software Engineering: Database Concepts Module. Compiled by C. Uta utanoel@gmail.com 90 | 100


13. Discuss the following three (3) database backup procedures.
a. Online, off-site, once-a-week backup [6]
b. Online, on-site end-of-day backup [6]
c. Real-time cloud-based backup [6]
14. Relate and differentiate database backup and database recovery. [8]
15. Explain the following database backup methods:
a. Normal / Full backup [4]
b. Incremental backup [4]
c. Differential backup [4]
d. Transaction log backup [4]
16. What purpose does configuration management serve in database
environments? [6]
17. Explain the following types of maintenance that can be applied to a
database system:
a. Corrective Maintenance [4]
b. Preventive Maintenance [4]
c. Adaptive Maintenance [4]
d. Perfective Maintenance [4]
18. Give and explain three reasons why database monitoring is important. [9]
19. Explain what is involved in the following monitoring procedures:
a. Infrastructural monitoring [4]
b. Availability monitoring [4]
c. Throughput monitoring [4]
d. Performance monitoring [4]
e. Security monitoring [4]
f. Log monitoring [4]
20. Security of information systems must only be the responsibility of the IT
Department. Discuss. [25]

NC Software Engineering: Database Concepts Module. Compiled by C. Uta utanoel@gmail.com 91 | 100


CHATPER 5

TRANSACTION PROCESSES

NC Software Engineering: Database Concepts Module. Compiled by C. Uta utanoel@gmail.com 92 | 100


5. TRANSATION PROCESSES

Objectives:
1. Define Transaction Processes
2. Outline (desirable) Properties of a transaction

5.1. Define Transaction Processes


The concept of a transaction has become central to many database applications.
5.1.1. A Transaction:

1. A transaction identifies an elementary unit of work carried out by an


application. A system that makes available mechanisms for the definition
and execution of transactions is called a transaction processing system.
2. A transaction can be defined syntactically: each transaction, irrespective of
the language in which it is written, is enclosed within two commands:
 Begin transaction (abbreviated to bot) and
 End transaction (abbreviated to eot)
A withdrawal, deposit, transfer, balance enquiry, and opening an account are all
transactions that can be carried out in a banking system.
A transaction (an activity, task, or piece of work) occurs through a number of sub-
transactions or sub-processes or sub-activities.
5.1.2. Transaction Process

A transaction process can be variously defined as:


 A transaction process is an executing program or process that includes
one or more database accesses, such as reading or updating of database
records.
 A transaction process is a set of logically related operations. For example,
transferring money from your bank account to your friend’s account is a
set of operations that constitute a transaction process.
 A transaction process is a series of activities done to complete a unit of
work that should be processed reliably without interference from other
users and without loss of data due to failures.
Within the transaction code (program statements), two particular instructions can
appear:
 Commit work and
 Rollback work;
To which we will make frequent reference using the two terms commit and abort,
which indicate the action associated with the respective instructions.
The effect of these two commands is crucial for the outcome of the transaction:
 The transaction will be completed successfully only following a commit
command.
 No tangible effect will be shown on the database as the result of an abort
command.
A DBMS ensures that transactions processes are free of interference from other
users, parts of a transaction are not lost due to a failure, and transactions processes
do not make the database inconsistent.
Transaction processing and database tuning are most prominent on DBMSs that
support large (enterprise) databases with many simultaneous users.
For example a transaction to transfer money from one account to another is given in
the following code:
begin transaction
x : = x - 10;
y : = y + 10;
if (error)
abort work;
else
commit work;
end transaction
We can interpret the above transaction as a bank operation to transfer a sum from
account x to account y. The transaction code shown in the example provides an
abstract description of the transaction, which in reality corresponds to a much more
complex section of code, and which could be written, for example, in SQL.
A transaction must be well-formed. A transaction is described as well-formed if it
fulfils the following conditions:
 it begins its execution with begin transaction,
 ends with end transaction,
 and includes in every possible execution only one of the two commands,
commit work or rollback work.
Further, no update or modification operations are carried out following the
execution of the commit work or rollback work command.
In some transactional interfaces, a pair of commands:
 begin transaction
 and end transaction,
are immediately and implicitly carried out after each commit or abort, to render
well-formed all the transactional computations.
From now on, we will assume that all the programs for the modification of the
contents of a database are well-formed.
Example: Simple Transaction
1. Read your account balance
2. Deduct the amount from your balance
3. Write the remaining balance to your account
4. Read your friend’s account balance
5. Add the amount to his account balance
6. Write the new updated balance to his account
The set of operations is called a transaction. This transaction contains read, write
and update operations but the transaction can have operations like read, write,
insert, update, delete.

NC Software Engineering: Database Concepts Module. Compiled by C. Uta utanoel@gmail.com 94 | 100


In DBMS, we write the above 6 steps transaction like this:
Let’s say your account is A and your friend’s account is B, you are transferring 10000
from A to B, the steps of the transaction are:
1. R(A); read (retrieve) account A
2. A = A - 10000; subtract 10000 from account A
3. W(A); write (save) the new balance of A
4. R(B); read (retrieve) account B
5. B = B + 10000; Current B value plus 10000 (value) from A
6. W(B); write (save) B
In the above transaction R refers to the read operation and W refers to the write
operation.
Transaction failure in between the operations
Transactions can have problems associated with them.
The main problem that can happen during a transaction is that the transaction can
fail before finishing all the operations in the set.
This can happen due to power failures, and system crashes. This is a serious problem
that can leave the database in an inconsistent state. Assume that transaction fail
after third operation (refer to the example above) then the amount would be
deducted from your account but your friend will not receive it.
To solve this problem, we have the following two operations:
1. Commit: If all the operations in a transaction are completed successfully
then commit those changes to the database permanently.
2. Rollback: If any of the operation fails then rollback all the changes done by
previous operations.
From the expressive power aspect, the rollback work instruction is very powerful, in
that through this, the database user can cancel the effects of the work carried out
during the transaction, irrespective of its complexity.
For transactions to be able to maintain a database in a consistent state, each
transaction must satisfy ACID properties.
The next section looks at the ACID properties to which a transaction must conform.

NC Software Engineering: Database Concepts Module. Compiled by C. Uta utanoel@gmail.com 95 | 100


5.2. Outline Properties (ACID)
Each transaction is supposed to execute a logically correct database access if
executed in its entirety without interference from other transactions.
The DBMS must enforce several transaction properties. The enforcement of the
transaction properties aims to ensure database integrity.
5.2.1. DBMS Transaction States

A transaction in DBMS can be in one of the following states.


DBMS Transaction States Diagram

1. Active State: As we have discussed in the DBMS transaction


introduction that a transaction is a sequence of operations. If a transaction
is in execution then it is said to be in active state. It doesn’t matter which
step it is in execution, until and unless the transaction is executing, it
remains in active state.
2. Failed State: If a transaction is executing and a failure occurs, either a
hardware failure or a software failure then the transaction goes into failed
state from the active state.
3. Partially Committed State: As we can see in the above diagram that a
transaction goes into “partially committed” state from the active state
when there are read and write operations present in the transaction.
A transaction contains a number of reads and writes operations. Once the
whole transaction is successfully executed, the transaction goes into
partially committed state where we have all the read and write operations
performed on the main memory (local memory) instead of the actual
database.
The reason why we have this state is because a transaction can fail during
execution so if we are making the changes in the actual database instead
of local memory, database may be left in an inconsistent state in case of
any failure. This state helps us to rollback the changes made to the
database in case of a failure during execution.
4. Committed State: If a transaction completes the execution successfully
then all the changes made in the local memory during partially
committed state are permanently stored in the database. You can also see
in the above diagram that a transaction goes from partially committed
state to committed state when everything is successful.

NC Software Engineering: Database Concepts Module. Compiled by C. Uta utanoel@gmail.com 96 | 100


5. Aborted State: As we have seen above, if a transaction fails during
execution then the transaction goes into a failed state. The changes made
into the local memory (or buffer) are rolled back to the previous
consistent state and the transaction goes into aborted state from the
failed state. Refer to the diagram above to see the interaction between
failed and aborted state.
Even though these operations can help us in avoiding several issues that
may arise during a transaction process, they are not sufficient to cater for
various situations when two transactions are running concurrently. To
handle those problems we need to understand database ACID Properties.
5.2.2. ACID Properties (Atomicity, Consistency, Isolation, and Durability)

In computer science, ACID (Atomicity, Consistency, Isolation, and Durability) is a


set of properties of database transactions. In the context of databases, a sequence
of database operations that satisfies the ACID properties and, thus, can be
perceived as single logical operation on the data, is called a transaction. For
example, a transfer of funds from one bank account to another, even involving
multiple changes such as debiting one account and crediting another, is a single
transaction.
The characteristics of these four properties as defined by Reuter and Härder are as
follows:
1. Atomicity: Atomicity requires that each transaction be "all or nothing": if
one part of the transaction fails, then the entire transaction fails, and the
database state is left unchanged. An atomic system must guarantee
atomicity in each and every situation, including power failures, errors, and
crashes. To the outside world, a committed transaction appears (by its
effects on the database) to be indivisible ("atomic”) and an aborted
transaction does not happen.
2. Consistency: The consistency property ensures that any transaction will
bring the database from one valid state to another. Any data written to
the database must be valid according to all defined rules, including
constraints, cascades, triggers, and any combination thereof. This does not
guarantee correctness of the transaction in all ways the application
programmer might have wanted (that is the responsibility of application-
level code), but merely that any programming errors cannot result in the
violation of any defined rules.
3. Isolation: The isolation property ensures that the concurrent execution of
transactions results in a system state that would be obtained if
transactions were executed sequentially, i.e., one after the other.
Providing isolation is the main goal of concurrency control. Depending on
the concurrency control method (i.e., if it uses strict - as opposed to
relaxed - serializability), the effects of an incomplete transaction might not
even be visible to another transaction.
4. Durability: The durability property ensures that once a transaction has
been committed, it will remain so, even in the event of power loss,
crashes, or errors. In a relational database, for instance, once a group of
SQL statements execute, the results need to be stored permanently (even
if the database crashes immediately thereafter). To defend against power
loss, transactions (or their effects) must be recorded in a non-volatile
memory.

NC Software Engineering: Database Concepts Module. Compiled by C. Uta utanoel@gmail.com 97 | 100


5.2.3. Example illustration on ACID properties

The following examples further illustrate the ACID properties. In these examples,
the database table has two columns, A and B. An integrity constraint requires that
the values in A and in B must sum to 100. The following SQL code creates a table
as described above:
CREATE TABLE acidtest (A INTEGER, B INTEGER, CHECK (A + B = 100));
1. Atomicity Failure: In database systems, atomicity is one of the ACID
transaction properties. In an atomic transaction, a series of database
operations either all occur, or nothing occurs. The series of operations
cannot be divided apart and executed partially from each other, which
makes the series of operations "indivisible", hence the same. A guarantee of
atomicity prevents updates to the database occurring only partially, which
can cause greater problems than rejecting the whole series outright. In
other words, atomicity means indivisibility and irreducibility.
2. Consistency Failure: Consistency is a very general term, which demands that
the data must meet all validation rules. In the previous example, the
validation is a requirement that A + B = 100. Also, it may be inferred that
both A and B must be integers. A valid range for A and B may also be
inferred. All validation rules must be checked to ensure consistency. Assume
that a transaction attempts to subtract 10 from A without altering B.
Because consistency is
checked after each transaction, it is known that A + B = 100 before the
transaction begins. If the transaction removes 10 from A successfully,
atomicity will be achieved. However, a validation check will show that A + B
= 90, which is inconsistent with the rules of the database. The entire
transaction must be cancelled and the affected rows rolled back to their pre-
transaction state. If there had been other constraints, triggers, or cascades,
every single
change operation would have been checked in the same way as above
before the transaction was committed.
3. Isolation Failure: To demonstrate isolation, we assume two transactions
execute at the same time, each attempting to modify the same data. One of
the two must wait until the other completes in order to maintain isolation.
Consider two transactions. T1 transfers 10 from A to B. T2 transfers 10 from
B to A. Combined, there are four actions:
T1 subtracts 10 from A.
T1 adds 10 to B.
T2 subtracts 10 from B.
T2 adds 10 to A.
If these operations are performed in order, isolation is maintained, although
T2 must wait.
Consider what happens if T1 fails halfway through. The database eliminates
T1's effects, and T2 sees only valid data.
By interleaving the transactions, the actual order of actions might be:
T1 subtracts 10 from A.
T2 subtracts 10 from B.

NC Software Engineering: Database Concepts Module. Compiled by C. Uta utanoel@gmail.com 98 | 100


T2 adds 10 to A.
T1 adds 10 to B.
Again, consider what happens if T1 fails halfway through. By the time T1
fails, T2 has already modified A; it cannot be restored to the value it had
before T1 without leaving an invalid database. This is known as a write
write-failure, because two transactions attempted to write to the same data
field. In a typical system, the problem would be resolved by reverting to the
last known good state, cancelling the failed transaction T1, and restarting
the interrupted transaction T2 from the good state.
4. Durability Failure: Consider a transaction that transfers 10 from A to B. First
it removes 10 from A then it adds 10 to B. At this point, the user is told the
transaction was a success, however the changes are still queued in the disk
buffer waiting to be committed to disk. Power fails and the changes
are lost. The user assumes (understandably) that the changes persist, yet
they wouldn’t have succeeded.
5. Implementation: Processing a transaction often requires a sequence of
operations that is subject to failure for a number of reasons. For instance,
the system may have no room left on its disk drives, or it may have used up
its allocated CPU time. There are two popular families of techniques:

Write-ahead logging and Shadow paging: In both cases, locks must be


acquired on all information to be updated, and depending on the level of
isolation, possibly on all data that will be read as well.
a. In write-ahead logging, atomicity is guaranteed by copying the
original (unchanged) data to a log before changing the database.
That allows the database to return to a consistent state in the event
of a crash, because original data was not changed in the first
instance.
b. In shadowing, updates are applied to a partial copy of the database,
and the new copy is activated when the transaction commits.
6. Locking versus Multi-versioning: Many databases rely upon locking to provide
ACID capabilities. Locking means that the transaction marks the data that it
accesses. This results in the DBMS not allowing other transactions to modify it
until the first transaction succeeds or fails. The lock must always be acquired
before processing data, including data that is read but not modified. Non-
trivial transactions typically require a large number of locks, resulting in
substantial overhead as well as blocking other transactions.
For example, if user A is running a transaction that has to read a row of data
that user B wants to modify, user B must wait until user A's transaction
completes. Two phase locking is often applied to guarantee full isolation. An
alternative to locking is multi-version concurrency control, in which the
database provides each reading transaction the prior, unmodified version of
data that is being modified by another active transaction. This allows readers
to operate without acquiring locks, i.e. writing transactions do not block
reading transactions, and readers do not block writers. Going back to the
example, when user A's transaction requests data that user B is modifying, the
database provides A with the version of that data that existed when user B
started his transaction. User A gets a consistent view of the database even if
other users are changing data. One implementation, namely snapshot
isolation, relaxes the isolation property.

NC Software Engineering: Database Concepts Module. Compiled by C. Uta utanoel@gmail.com 99 | 100


7. Distributed transactions: Guaranteeing ACID properties in a distributed
transaction across a distributed database, where no single node is responsible
for all data affecting a transaction, presents additional complications. Network
connections might fail, or one node might successfully complete its part of the
transaction and then be required to roll back its changes because of a failure
on another node. The two-phase commit protocol (not to be confused with
two-phase locking) provides atomicity for distributed transactions to ensure
that each participant in the transaction agrees on whether the transaction
should be committed or not. Briefly, in the first phase, one node (the
coordinator) interrogates the other nodes (the participants) and only when all
reply that they are prepared does the coordinator, in the second phase,
formalize the transaction.
5.2.4. Transactional Integrity

Transactions Integrity implies that a transaction should have its ACID properties.
Any transaction should be atomic, consistent, durable, and isolated. The quality of
a database product is measured by its transactions’ adherence to the ACID
properties:
1. Atomic — all or nothing. Atomic transactions are such that the
transaction is either entirely completed or makes no change to the
database; even if an error or a hardware fault occurs mid-transaction the
database will not be left with a half-completed transaction.
2. Consistent — the database begins and ends the transaction in a
consistent state. Consistent transactions ensure that the database is left
in a consistent state after the transaction is complete, meaning that any
integrity constraints (unique keys, foreign keys, and CHECK constraints)
must be satisfied or the transaction will be rejected.
3. Isolated — one transaction does not affect another transaction.
Isolated transactions are invisible to other users of the database while
they are being processed.
4. Durable — once committed always committed. Durable transactions
guarantee that they will not be rolled back after the caller has committed
them.

ACID – a set of properties that define the quality of database transactions

NC Software Engineering: Database Concepts Module. Compiled by C. Uta utanoel@gmail.com 100 | 100
5.3. Chapter 5 Questions
1. Define a transaction process. [3]
2. Describe a “well-formed” transaction. [5]
3. Explain the following terms:
a. Commit Transaction [2]
b. Rollback transaction [2]
c. Abort work [2]
d. Atomicity [2]
e. Consistency [2]
f. Isolation [2]
g. Durability [2]
4. List of four (4) examples of simple transactions. [4]
5. Explain two (2) reasons why a transaction may fail. [4]
6. Using examples, explain the following terms:
a. Atomicity failure [3]
b. Consistency failure [3]
c. Isolation failure [3]
d. Durability failure [3]
7. Differentiate write-ahead logging from shadowing. [4]
8. Explain three (3) advantages and four (4) challenges that face an
organisation that has implemented a database system for its operations. [14]
9. Cite two situations that may result in an Abort of a transaction process. [4]
10. How can a multi-user database system control concurrency? [6]
11. Differentiate referential integrity from domain integrity. [6]
12. Compare and contrast the two database management process:
a. Logical controls [5]
b. Physical controls [5]
13. Discuss the impact of data breaches. [25]
14. How is Availability guaranteed in database system? [6]
15. Differentiate a “cold backup” from “hot backup”. [6]
16. Explain the limitations of manual database monitoring. [5]
17. Explain two consequences of developing a database system without
using any specific model. [6]
18. Is it ideal to implement automated and manual database monitoring
system exclusively? Support your answer. [6]
19. Describe three (3) tests that must be carried out before the
implementation of a database system. [6]
20. List and explain the type of relationships that can be created
between entities. [8]

NC Software Engineering: Database Concepts Module. Compiled by C. Uta utanoel@gmail.com 101 | 100
6. References
1. Date, C, J. (2004). An Introduction to database systems (8th ed.). Harlow: Pearson
Education.
2. Gehrke, R. (2003). Database Management Systems. New York: McGraw-Hill.
3. Kronenke, D.M. & Auer, D. J. (2015). Database Concepts. (7th ed.). Harlow: Pearson
Education
4. Silberschatz, A., Korth, H.F. & Sudarshan, S. (2011). Database System Concepts. New
York: McGraw-Hill.
5. Database Systems The Complete Book
6. Data Security - Data Admin and Database Admin
7. Database Backup and Recovery
8. Database Activity Monitoring Whitepaper
9. Database Concepts
10. Introduction to Relational Databases
11. Database Normalization - Mohua Sarkar
12. Transaction processing systems

i
Network transparency is the situation in which an operating system or other service allows a user to access a
resource (such as an application program or data) without the user needing to know, and usually not being aware
of, whether the resource is located on the local machine (i.e., the computer which the user is currently using) or
on a remote machine (i.e., a computer elsewhere on the network)..
ii
Pragmentation transparency enables users to query upon any table as if it were unfragmented. Thus, it hides
the fact that the table the user is querying on is actually a fragment or union of some fragments. It also conceals
the fact that the fragments are located at diverse sites
iii
Replication transparency is the term used to describe the fact that the user should be unaware that data
is replicated.

The Data Dictionary / System Catalogue


A data dictionary is a software module and database containing descriptions and definitions concerning the
structure, data elements, interrelationships, and other characteristics of an organization's database.
Data dictionaries store the following information about the data maintained in databases:
1. Schema, subschemas, and physical schema
2. Which applications and users may retrieve the specific data and which applications and users are able
to modify the data
3. Cross-reference information, such as which programs use what data and which users receive what
reports
4. Where individual data elements originate, and who is responsible for maintaining the data
5. What the standard naming conventions is for database entities.
6. What the integrity rules is for the data
7. Where the data are stored in geographically distributed databases.
A data dictionary:
1. Contains all the data definitions, and the information necessary to identify data ownership
2. Ensures security and privacy of the data, as well as the information used during the development and
maintenance of applications which rely on the database.

NC Software Engineering: Database Concepts Module. Compiled by C. Uta utanoel@gmail.com 102 | 100

You might also like