National Certificate in Software Engineering Software Engineering: I
National Certificate in Software Engineering Software Engineering: I
SOFTWARE ENGINEERING: I
1. INTRODUCTION TO DATABASES
1. Definitions
2. Structure of a database
3. Advantages and Disadvantages of databases
4. File system comparison
5. Introduction to Database Management Systems
2. TYPES OF DATABASES
1. Introduction to Relational Databases
2. Hierarchical Databases
3. Objected Oriented (Programming) Databases
4. Distributed Databases
5. Network Databases
3. RELATIONAL DATABASE
1. Identification of Entities
2. Mapping Relationship
3. Drawing Entity Relationship (E-R) Diagrams
4. Normalisation (1st Normal Form to BCNF)
5. Database Schema (Table Design)
6. Querying (SQL Query Strings, DDL & DML)
4. DATABASE ADMINISTRATION
1. Database Security
2. Backup and Recovery Plans
3. Database Maintenance
4. Database Monitoring
5. TRANSACTION PROCESSES
1. Define Transaction Processes
2. Outline Properties
INTRODUCTION TO DATABASES
1.1. Objectives
Main Objective: Outline the application of database management system in
computing.
1. Identify various areas of database systems applications
2. Define and explain key concepts of database systems
3. Explain advantages of using database system
4. Explain disadvantages of using a database system
1.2. Introduction
The traditional / conventional approach to data management exploits the presence
of files to store data permanently. A file allows for the storage and searching of data,
but provides only simple mechanisms for access, sharing, and management.
With this approach, the procedures written in a programming language are
completely autonomous; each one defines and uses one or more ‘private’ files. Data
of possible interest to more than one program is replicated as many times as there
are programs that use it, with obvious redundancy and the possibility of
inconsistency.
Databases were created, for the most part, to overcome this type of
inconvenience. The motivation for databases over files is that there is
integration for easy access and update, non-redundancy, and multi-access.
The data within a database is structured so as to model a real world structures
and hierarchies so as to enable conceptually convenient data storage,
processing and retrieval mechanisms
Clients (Services or applications) interact with databases through queries
(remote or otherwise) to Create, Retrieve, Update, and Delete (CRUD) data
within a database. This process is facilitated through a Database Management
System (DBMS)
1.3. Definitions
There are many concepts related to databases that one studying the subject need to
be conversant with. This is necessary for understanding the concepts that are
covered in the course of study. For someone studying I.T., it is important to grow the
habit of enriching oneself with as much of I.T. skills, vocabulary and concepts as is
possible. That empowers you and makes you relevant in an ever changing I.T.
environment and the global space at large.
The following are some of the terms commonly associated with database systems. It
is also necessary to be aware that the same terms may mean different things in
different I.T., and other contexts.
1. Data: Data are the building blocks of information. The word data can have
varied meanings depending on the context it used in. Data may mean any and
beyond the following:
i. Raw facts, figures, symbols and or other representations standing singly
or in combination, but delivering incomplete and incomprehensible
message to the receiver, thus unable to equip him/her with a full scale
account of contextual issues for effective decision making.
Users / Programs
DBMS Software
Users / programs are the people or other systems that interact with
the database at the outmost periphery of the database layers. They
input, output, transfer, convert, or perform some operations on data.
Application Programs / Queries: these perform operations as
required by the users in the layer above or do some preliminary work
for lower layers. An example is rate conversion program in foreign
currency trading transaction system.
Software to Process Queries: Queries from the above layer are
processed by this layer. This includes data manipulation languages, a
component of query languages.
Software to access stored data: This is special software, generally
termed Database Management Systems (DBMS) of which a
component such as Data Control Language can be used here.
Stored Database Definition (Metadata): Acts as a reference and
provides meanings of data stored in the database. Also called
System ue (data dictionary or metadata) provides the description of
the data to enable program–data independence
Stored Database (Data items): This is the actual facts sitting on the
database / on storage media, such as hard disk, and can include
accountholder details in a banking system.
Entity Entity
1. Entity: An entity is a “thing” or object for which we can gather data about. An
entity can be physical or abstract. In a database context, an entity is a thing
that we can hold data about, in the database. An employee, a course, a piece
of equipment, a building, or a computer can be an entity. An entity represents
a category of data. Entities are used to logically separate data.
2. Record: A record is a collection of fields or attributes pertaining to a single
occurrence or instance in a database. For example, a student record in a
college system can be made up of the following attributes: student ID, student
first name, student surname, student Course ID and student DOB.
3. Attributes: An attribute is a characteristic, fact, feature, property, sub-group,
or phenomenon of information that describes an entity. In a college system, a
student is an entity. As an entity, a student has many facts. These can student
ID, student first name, student surname, student Course ID and student DOB.
Only relevant characteristics / facts are selected for a particular database
application. Facts stored are an abstraction of the actual entity. Note that
some attributes can be split further, for example, date can be split into day,
month, and year; and name into first and last names.
4. Value / Occurrence: this is the actual data item stored within an attribute /
domain / column when the table is filled up / populated. For example, if the
attribute was student ID, then a value could be something like 20200779SE.
5. Data Type: The various attributes of an entity belong to particular data types.
The types dictate whether the values stored are going to be numbers only,
strings (alphanumeric), dates, money / currency, password, yes (true) / no
(false), and many others. The types may be decided upon arbitrarily or can be
natural types.
6. Data Modelling: Iterative and progressive process of creating a specific data
model for a determined problem domain
7. Data Models: Simple representations of complex real-world data structures.
They are useful for supporting a specific problem domain. They are also called
abstractions of real-world objects or events.
Physical Flat file; one dimensional; Tree; parent-child relationships; single Network of interrelated lists; Data is stored in relations (tables); Modelling & creation of data as objects; Organize information common
Structure frequently in tabular format; table acts as the “root” of the database looks like several trees that share relationships maintained by placing a store objects and operations to be to relational tabular structures;
oftentimes multiple copies of from which other tables “branch” out; branches; key field value of one record as an performed on data; subsume the relational database
the same data were child can only have one parent, but a children have multiple parents and attribute in the related record applications interact with object model
maintained, each copy sorted parent can have multiple children parents have multiple children managers which work through object
in a different way servers to gain access to object stores
Programming Assembler, Fortran, COBOL; Commands embedded in programming Commands embedded in programming SQL, ODBC Java, C++, Smalltalk, Ada, Object Pascal, Relational: SQL3, ODBC, JDBC
Languages spreadsheets use non- languages; COBOL, PL1, Fortran, ADS & languages; COBOL, PL1, Fortran, ADS & Objective DRAGOON, BETA, Emerald, Object-Oriented: Java, C++,
Used algorithmic programming Assembler Assembler POOL, Eiffel, Seif, Oblog, ESP. Loops, Smalltalk, etc.
language Visual Basic, POLKA & Python
Structural If new fields were added to Inflexible (once data is organized in a Inflexible (once data is organized in a Flexible; because tables are subject Flexible; programs are built using chunks Cartridges, Data Blades, &
Changes the file, every program that particular way, difficult to change); particular way, difficult to change); data specific & key fields relate one entity or modules consisting of preassembled Extenders are modules that
accessed that file had to be data reorganization complicated; reorganization complicated; to another, both the data & the code & data which makes programming build on the object / relational
changed & data files would requires careful design requires careful design database structure can be easily easier & faster; changes are made in the infrastructure; they consist of
have to be converted modified & manipulated; programs underlying code rather than in the types, data structures,
independent of data format which design or structure of the database functions, & data & often
yields flexibility when modifications include special developer
are needed interfaces or prebuilt
applications
Relationships No structured Linked lists using pointers stored in the Uses series of linked lists to implement Uses key fields to link data in many Defines software pieces, object types, Primarily a relational structure
interrelationship parent/child records to navigate relationships between records; different ways; actions / methods & the interrelation with object-oriented features
between its data records through the records; pointers could be each list has an owner record & possibly supports one-to-one, one-to-many & ships between these objects; included
a disk address, the key field, or other many member records; many-to-many relationships allows objects to be re-used for different
random access technique; a single record can either be the owner or purposes
start at root and work down the tree to a member of several lists of various types;
reach target data; supports one-to-one supports one-to one, one-to-many, &
& one-to-many relationships many-to-many relationships
The database and the DBMS catalogue are usually stored on disk. Access to the
disk is controlled primarily by the operating system (OS), which schedules disk
input/output. A higher-level stored data manager module of the DBMS controls
access to DBMS information that is stored on disk, whether it is part of the
database or the catalogue. The stored data manager may use basic OS services for
carrying out low-level data transfer between the disk and computer main storage,
but it controls other aspects of data transfer, such as handling buffers in main
memory. Once the data is in main memory buffers, it can be processed by other
DBMS modules, as well as by application programs.
1. The DDL compiler processes schema definitions, specified in the DDL, and
stores descriptions of the schemas (meta-data) in the DBMS catalogue.
The catalogue includes information such as the names of files, data items,
storage details of each file, mapping information among schemas, and
constraints, in addition to many other types of information that are
needed by the DBMS modules. DBMS software modules then look up the
catalogue information as needed.
2. The run-time database processor handles database accesses at run time;
it receives retrieval or update operations and carries them out on the
database. Access to disk goes through the stored data manager.
3. The query compiler handles high-level queries that are entered
interactively. It parses, analyzes, and compiles or interprets a query by
creating database access code, and then generates calls to the run-time
processor for executing the code.
4. The pre-compiler extracts DML commands from an application program
written in a host programming language. These commands are sent to the
DML compiler for compilation into object code for database access. The
rest of the program is sent to the host language compiler. The object
codes for the DML commands and the rest of the program are linked,
forming a canned transaction whose executable code includes calls to the
runtime database processor.
The DBMS interacts with the operating system when disk accesses either to the
database or to the catalogue are needed. If the computer system is shared by
many users, the OS will schedule DBMS disk access requests and DBMS processing
along with other processes. The DBMS also interfaces with compilers for general-
purpose host programming languages. User-friendly interfaces to the DBMS can
be provided to help any of the user types to specify their requests.
In addition to possessing the software modules just described, most DBMSs have
database utilities that help the DBA in managing the database system. Common
utilities have the following types of functions:
1. Loading: A loading utility is used to load existing data files—such as text
files or sequential files—into the database. Usually, the current (source)
format of the data file and the desired (target) database file structure are
specified to the utility, which then automatically reformats the data and
stores it in the database. With the proliferation of DBMSs, transferring
data from one DBMS to another is becoming common in many
organizations. Some vendors are offering products that generate the
Other tools are often available to database designers, users, and DBAs. CASE tools
are used in the design phase of database systems. Another tool that can be quite
useful in large organizations is an expanded data dictionary (or data repository)
system. In addition to storing catalogue information about schemas and
constraints, the data dictionary stores other information, such as design decisions,
usage standards, application program descriptions, and user information. Such a
system is also called an information repository. This information can be accessed
directly by users or the DBA when needed. A data dictionary utility is similar to the
DBMS catalogue, but it includes a wider variety of information and is accessed
mainly by users rather than by the DBMS software.
Application Development Environments, such as the PowerBuilder system, are
becoming quite popular. These systems provide an environment for developing
database applications and include facilities that help in many facets of database
systems, including database design, GUI development, querying and updating, and
application program development. These environments fall under IDEs or SDKs or
Frameworks that provide templating mechanisms for design, development, and
deployment of database systems
The DBMS also needs to interface with communications software, whose function
is to allow users at locations remote from the database system site to access the
database through computer terminals, workstations, or their local personal
computers. These are connected to the database site through data
communications hardware such as phone lines, long-haul networks, local-area
networks, or satellite communication devices. Many commercial database systems
have communication packages that work with the DBMS. The integrated DBMS
and Data Communications system is called a DB/DC system. In addition, some
distributed DBMSs are physically distributed over multiple machines. In this case,
communications networks are needed to connect the machines. These are often
local area networks (LANs) but they can also be other types of networks.
TYPES OF DATABASES
2.1. Objectives
Main Objective: Be able to compare and contrast different types of database
management system:
Have an understanding of the following database management systems:
1. Relational Database
2. Hierarchical Database
3. Object oriented Database
4. Distributed Database
5. Network Database
2.2. Introduction
There are many types of databases. These include hierarchical, network, relational,
object-oriented, graph, NoSql, and NewSql. We will focus on the first five.
A hierarchical model is a structure of data organized in a tree-like model using
parent-child relationships while network model is a database model that allows
multiple records to be linked to the same owner file. A relational model, on the other
hand, is a database model to manage data as tuples grouped into relations (tables).
An object-oriented database encapsulates data and procedures.
Attribute: Each column in a Table. Attributes are the properties which define
a relation. e.g., Student_Rollno, NAME, etc.
Tables – In the Relational model the, relations are saved in the table format. It
is stored along with its entities. A table has two properties rows and columns.
Rows represent records and columns represent attributes.
Tuple – It is a single row of a table, which contains a single record.
Relation Schema: A relation schema represents the name of the relation with
its attributes.
Degree: The total number of columns / attributes which is in the relation is
called the degree of the relation.
Cardinality: Total number of rows present in the Table.
Column: The column represents the set of values for a specific attribute.
Relation instance – Relation instance is a finite set of tuples in the RDBMS
system. Relation instances never have duplicate tuples.
Relation key - Every row has one, two or multiple attributes, which is called
relation key.
Data element or record at the highest level of the hierarchy is called the root
element. Any data element can be accessed by moving progressively
downward from the root and along the branches of the tree until the desired
record is located.
Each parent can have many children. Records are dependent and arranged in
multilevel structures, consisting of one root record & any number of
subordinate levels.
Each child has only one parent
Tree is defined by path that traces parent segments to child segments,
beginning from the left. Relationships among the records are one-to-many,
since each data element is related only to one element above it.
Hierarchical path
Ordered sequencing of segments tracing hierarchical structure
Accessing data: Although it is difficult to access data in the hierarchical model,
it is easier to access data in the network model and the relational model.
Flexibility: The hierarchical model is less flexible, but the network model, and
relational model are flexible.
2.4.3. Hierarchical Model Diagram
Advantages Disadvantages
1. Data can be retrieved easily due to 1. If the parent table and child table are
the explicit links present between unrelated then adding a new entry in the
the table structures. child table is difficult because additional
2. Referential integrity is always entry must be added in the parent table.
maintained i.e. any changes made 2. Complex relationships are not supported.
in the parent table are 3. Redundancy which results in inaccurate
automatically updated in a child information.
table. 4. Change in structure leads to change in all
3. Promotes data sharing. application programs.
4. It is conceptually simple due to the 5. M: N relationship is not supported.
parent-child relationship. 6. No data manipulation or data definition
5. Database security is enforced. language
6. Efficient with 1: N relationships. 7. Lack of standards impacts compatibility
7. A clear chain of command or and portability
authority. 8. Communication barriers
8. Increases specialization. 9. Organizational Disunity
9. High performance. 10. Rigid structure / Poor flexibility
10. Clear results.
The object-oriented model is based on a collection of objects, like the E-R model.
An object contains values stored in instance variables within the object.
o Unlike the record-oriented models, these values are themselves
objects.
o Thus objects contain objects to an arbitrarily deep level of nesting.
An object also contains bodies of code that operate on the object.
o These bodies of code are called methods.
Objects that contain the same types of values and the same methods are
grouped into classes.
o A class may be viewed as a type definition for objects.
o Analogy: the programming language concept of an abstract data
type.
The only way in which one object can access the data of another object is
by invoking the method of that other object.
o This is called sending a message to the object.
o Internal parts of the object, the instance variables and method
code, are not visible externally.
o Result is two levels of data abstraction.
2.5.4. For example, consider an object representing a bank account.
2.5.8. Conclusion
Object-oriented databases are what we call navigational. This means that access
to related objects must follow the predefined linkages created by the containers
for related objects. For example, to find all the purchases made by a customer, a
program in an object-oriented database environment would do the following:
Databases in the collection are logically interrelated with each other. Often
they represent a single logical database.
Data is physically stored across multiple sites. Data in each site can be
managed by a DBMS independent of the other sites.
The processors in the sites are connected via a network. They do not have any
multiprocessor configuration.
A distributed database is not a loosely connected file system.
A distributed database incorporates transaction processing, but it is not
synonymous with a transaction processing system.
2.7. Conclusion
A distributed database is a collection of multiple, logically interrelated databases
distributed over a computer network. It may also be a single database divided into
chunks and distributed over several locations. The database is scattered over various
locations which provide local access to data and thus reduces communication costs
and increases availability.
Most of today’s business applications have shifted from traditional processing to
online processing. This has also changed the database needs of the applications.
Today, the role of databases to organize voluminous data has increased compared to
RELATIONAL DATABASES
3.1. Objectives
1. Design a relational database
1.1. Identify Entities
1.2. Map Relationships
1.3. Draw Entity Relationship (E-R) Diagrams
1.4. Normalise tables (1st Normal Form to BCNF)
1.5. Create Database Schema (Table Design)
1.6. Write and interpret Queries (SQL Query Strings, DDL & DML)
3.2. Introduction
A relational database organises data into two dimensional tables, with rows and
columns. A table may consist of one entity type. Tables can be linked together to
generate consolidated results. Relational databases are a common phenomenon in
most business organisations. This is because of their general simplicity and versatility.
A relational database organizes data in tables (or relations). A table is made up of
rows and columns. A row is also called a record (or tuple). A column is also called
a field (or attribute). A database table is similar to a spreadsheet. However, the
relationships that can be created among the tables enable a relational database to
efficiently store huge amount of data, and effectively retrieve selected data.
A language called SQL (Structured Query Language) was developed to work with
relational databases.
Just as business objects have characteristics that describe them, entities are
described by their attributes. When we represent an entity in a database, what we
actually store are that entity’s attributes. In a nutshell, attributes store data
values that either 1) describe or 2) identify entities.
Attributes become fields in a table.
Attributes that describe a person (for instance, customer, employee, student, etc.)
would include such things as name, address, and telephone number. Attributes
that identify a person would include such things as social security number or any
combination of letters and numbers that uniquely identify a person.
Attributes that describe entities are called non-key attributes.
Attributes that identify entities (entity identifiers) are called key attributes.
Each entity is completely characterized by the values of all its attributes.
Similar entities can be combined into entity types.
Similarity requires at least identical attribute structure. (Attribute names
and corresponding value domains are identical.)
Entity types are graphically represented by rectangles. Attributes label the
line connecting an entity type and a value domain (often symbolized by an
oval)
5. Define Primary Keys: The primary keys are Department Name, Supervisor
Number, Employee Number, Project Number.
Entities
Attributes
An attribute that can have many values (there are many distinct
Multivalued
values entered for it in the same column of the table).
attribute
Multivalued attribute is depicted by a dual oval.
Relationships
Symbol Meaning
Relationships (Cardinality and Modality)
Zero or One
One or More
Zero or More
Many - to - One
Many - to - Many
Advantages Disadvantages
Exceptional conceptual simplicity Limited constraint representation
Visual representation Limited relationship representation
Effective communication tool No data manipulation language
Integrated with the relational data model Loss of information content
1. Data need to be represented as a collection of relations 5. The values of an attribute should be from the same domain
2. Each relation should be depicted clearly in the table 6. Columns must contain data about attributes of the entity
3. Rows should contain data about instances of an entity 7. Cells of the table should hold a single value
4. No two rows can be identical 8. Each column should be given a unique name
Most databases are divided into many tables, most of which are related to one
another. In most modern databases, such as the relational database, relationships
are established through the use of primary and foreign keys. The purpose of
separating data into tables and establishing table relationships is to reduce data
redundancy. The process of reducing data redundancy in a relational database is
called normalization
Normalization provides a set of rules and patterns that can be applied to any
database to avoid common logical inconsistencies. Normalizing a database design
will typically improve:
Consistency, since errors that could be made with the database would be
structurally impossible
Extensibility, since changes to the database structure will only affect parts
of the database they are logically dependent on
Efficiency, since redundant information will not need to be stored
In other words, the non-key columns are dependent on primary key, only on the
primary key and nothing else. For example, suppose that we have
a Products table with columns productID (primary key), name,and
unitPrice. The column discountRate shall not belong to Products table if it
is also dependent on the unitPrice, which is not part of the primary key.
3.7.2. Purpose and Utilization
For a database to be in first normal form then every value of every column of
every table should be atomic
What does atomic mean? Speaking lightly atomic means that the value
represents a “single thing”
For example, if you have a table like this:
first_name last_name age areas
John Doe 27 {“Website design”, “Customer research”}
Mary Jane 33 {“Long term strategy planning”,”Hiring”}
Tom Smith 35 {“Marketing”}
Then the “areas” column has values that aren’t atomic. Just look at the row of
John to see that the areas’ field is storing two things “Website design” and
“Customer research”.
So this table is not in first normal form.
To be in first normal form you should store a single value per field.
In summary, for a table to be in the First Normal Form (1NF), it should follow the
following 4 rules:
1. It should only have single (atomic) valued attributes / columns.
2. Values stored in a column should be of the same domain.
3. All the columns in a table should have unique names.
4. And the order in which data is stored, does not matter.
For a table to be in second normal form then every column that is not part
of the primary key (or could act as part of another primary key) shouldn’t
be able to be inferred from a smaller part of the primary key.
What does this mean?
Say you have the following design (I have underlined the fields that
conform the primary key in this table)
Boyce and Codd Normal Form is a higher version of the Third Normal
form. This form deals with certain type of anomaly that is not handled by
3NF. A 3NF table which does not have multiple overlapping candidate keys
is said to be in BCNF. For a table to be in BCNF, following conditions must
be satisfied:
1. R must be in 3rd Normal Form
2. And, for each functional dependency (X → Y), X should be a super
Key.
Advantages of Normalisation
Avoids data modification (INSERT/DELETE/UPDATE) anomalies as each
data item lives in One place
Greater flexibility in getting the expected data in atomic granular
Normalization is conceptually cleaner and easier to maintain and change
as your needs change
Fewer null values and less opportunity for inconsistency
A better handle on database security
Increased storage efficiency
The normalization process helps maximize the use of clustered indexes,
which is the most powerful and useful type of index available. As more
data is separated into multiple tables because of normalization, the more
clustered indexes become available to help speed up data access
Disadvantages of Normalisation
More tables to join as by spreading out data into more tables, the need to
join table’s increases and the task becomes more tedious. The database
becomes harder to realize as well.
Tables will contain codes rather than real data as the repeated data will
be stored as lines of codes rather than the true data. Therefore, there is
always a need to go to the lookup table.
Data model becomes extremely difficult to query against as the data
model is optimized for applications, not for ad hoc querying. (Ad hoc
query is a query that cannot be determined before the issuance of the
query. It consists of an SQL that is constructed dynamically and is usually
constructed by desktop friendly query tools.). Hence it is hard to model
the database without knowing what the customer desires.
As the normal form type progresses, the performance becomes slower
and slower. Requires much more CPU, memory, and I/O to process thus
normalized data gives reduced database performance
Requires more joins to get the desired result. A poorly-written query can
bring the database down
Maintenance overhead. The higher the level of normalization, the greater
the number of tables in the database.
Proper knowledge is required on the various normal forms to execute the
normalization process efficiently. Careless use may lead to terrible design
filled with major anomalies and data inconsistency.
The following example will illustrate how database normalization helps achieve a good design.
The table below presents data that needs to be captured in the database:
Title Author Bio ISBN Subject Pages Publisher
Beginning MySQL Database Chad Russell, Chad Russell is a programmer and system 90593324 MySQL Database 520 Apress
Design and Optimization Jon Stephens administrator who owns his own internet hosting Design
company. Jon Stephens is a member of the MySQL
AB documentation team.
In the example shown above, a lot of storage space will be wasted if any one criterion (author or publisher) is considered as the identification
key, therefore database normalization is essential.
Normalization is a step-by-step process that cannot be carried out haphazardly. The following steps will help in attaining database
normalization.
3.7.5.1. Step 1: Create first normal form (1NF)
The database normalization process involves getting data to conform to progressive normal forms, and a higher level of database
normalization cannot be achieved unless the previous levels have been satisfied. First normal form is the basic level of database
normalization.
For 1NF, ensure that the values in each column of a table are atomic; which means they are unique, containing no sets of values. In our
case, Author and Subject do not comply.
One method for bringing a table into 1NF is to separate the entities contained in the table into separate tables. In our case, this would
result in Book, Author, Subject, and Publisher tables.
Book’s table: Author’s table:
ISBN Title Pages
Author_ID First Name Last Name
Beginning MySQL Database 1 Chad Russell
1590593324 520
Design and Optimization 2 Jon Stephens
3 Mike Hilyer
One-to-many in our example will be Books to Publisher. Each book has only one Publisher but one Publisher may have many books.
We can achieve one-to-many relationships with a foreign key. A foreign key is a mechanism in database management systems (DBMS)
that defines relations and creates constraints between data segments. It is not possible to review what is not related to the specific
book. It is not possible to have a book without an author or publisher.
When deleting a publisher, all the related books may need to be deleted along with the reviews of those books. The authors would not
need to be deleted.
The foreign key is introduced in the table that represents the “many”, pointing to the primary key on the “one” table. Since the Book
table represents the many portion of the one-to-many relationship, the primary key value of the Publisher as in a Publisher_ID column
as a foreign key is added.
To comply with 3NF we have to move these outside the publisher’s table:
Through the process of database normalization, we bring our schema’s tables into conformance with progressive normal forms. As a
result, the tables each represent a single entity – a book, an author or a subject, for example – and we benefit from decreased
redundancy, fewer anomalies, and improved efficiency.
Even when a database is in 3rd Normal Form, still there would be anomalies resultant if it has more than one Candidate Key. For a
relation schema which is in 3NF, the presence of modification anomalies which could not be treated well by 3NF is due to one of the
following reasons;
1. Reason 1: A relation schema might contain more than one candidate keys.
2. Reason 2: In case of more than one candidate keys presents, all of them might be composite.
3. Reason 3: If the above two reasons exist, then there is a possibility of overlap between the candidate keys.
A relation schema R is in BCNF if and only if,
1. For all the Functional Dependencies (FDs) hold in the relation R, if the FD is non-trivial then the determinant (LHS of FD) of that
FD should be a Super key.
Through this definition, BCNF insists that all the determinants of any Functional Dependency must be a Candidate Key. Due to this
reason, BCNF sometimes referred as strict 3NF.
Note: A relation schema R, which is in BCNF, is also in 3NF automatically. But, a relation schema R which is in 3NF need not be in BCNF.
Explanation: If we have set of FDs in R such that X → Y, then X must be a super key. In other words, if X is not a key, then the relation R
is not in BCNF.
Sometimes is BCNF is also referred as 3.5 Normal Form.
Practical Example of BCNF:
Suppose there is a company wherein employees work in more than one department. They store the data like this:
emp_dept_mapping table:
emp_id emp_dept
1001 Production and planning
1001 stores
1002 design and technical support
1002 Purchasing department
Functional dependencies:
emp_id -> emp_nationality
emp_dept -> {dept_type, dept_no_of_emp}
Candidate keys:
For first table: emp_id
For second table: emp_dept
For third table: {emp_id, emp_dept}
It is important to note that the data in the database changes frequently, while the
plans or schemas remain the same over long periods of time. The users' view of
the data (also called logical organization of data) should be in a form that is most
convenient for the users and they should not be concerned about the way data is
physically organized. Therefore, a DBMS should do the translation between the
logical (users' view) organization and the physical organization of the data in the
database.
The data in the database at any particular point in time is called a database
instance. Therefore, many database instances can correspond to the same
database schema. The schema is sometimes called the intension of the database,
while an instance is called an extension (or state) of the database.
Database Instance:
A database instance could be a state of an operational database with information
at any given time. It contains a snap of the database. Database instances tend to
alter over time. A software package ensures that its every instance (state) is in an
exceedingly valid state, by diligently following all the validations, constraints, and
conditions that the database designers have imposed.
Conceptual, logical, and physical data models are different in their objectives,
goals, and content. Key differences noted below:
Conceptual Data Model Logical Data Model (LDM) Physical Data Model (PDM)
(CDM)
Includes high-level data Includes entities (tables), Includes tables, columns, keys, data
constructs attributes (columns/ fields) types, validation rules, database triggers,
and relationships (keys) stored procedures, domains, and access
constraints
Non-technical names, so that Uses business names for Uses more defined and less generic
executives and managers at all entities & attributes specific names for tables and columns,
levels can understand the data such as abbreviated column names,
basis of Architectural limited by the database management
Description system (DBMS) and any company
defined standards
Uses general high-level data Is independent of technology Includes primary keys and indices for fast
constructs from which (platform, DBMS) data access.
Architectural Descriptions are
created in non-technical terms
(ii) ALTER - alters the structure of the database, such as changing column
arrangement
The syntax to add a column in a table in MySQL (using the ALTER TABLE
statement) is:
ALTER TABLE table_name
ADD new_column_name column_definition
[ FIRST | AFTER column_name ];
(iv) TRUNCATE - removes all records from a table, including all spaces allocated
for the records.
Warning: If you truncate a table, the TRUNCATE TABLE statement can not be
rolled back.
Syntax
The syntax for the TRUNCATE TABLE statement in MySQL is:
TRUNCATE TABLE [database_name.]table_name;
Example:
TRUNCATE TABLE customers;
This example would truncate the table called customers and remove all
records from that table.
It would be equivalent to the following DELETE statement in MySQL:
DELETE FROM customers;
Both of these statements would result in all data from the customers table
being deleted. The main difference between the two is that you can roll
back the DELETE statement if you choose, but you can't roll back the
TRUNCATE TABLE statement.
(v) COMMENT - add comments to the data dictionary
Syntax
COMMENT [IF EXISTS] ON <object_type> <object_name> IS
'<string_literal>';
COMMENT [IF EXISTS] ON COLUMN <table_name>.<column_name> IS
'<string_literal>';
Example
ALTER TABLE user MODIFY id INT(11) COMMENT 'id of user';
Data Manipulation Language (DML) statements are used for managing data
within schema objects.
A data-manipulation language (DML) is a language that enables users to access or
manipulate data as per the appropriate data model. There are basically two types:
Procedural DMLs require a user to specify what data are needed and how
to get those data.
Declarative DMLs (also referred to as nonprocedural DMLs) require a user
to specify what data are needed without specifying how to get those data.
Declarative DMLs are usually easier to learn and use than are procedural DMLs.
However, since a user does not have to specify how to get the data, the database
system has to figure out an efficient means of accessing data.
A query is a statement requesting the retrieval of information. The portion of a
DML that involves information retrieval is called a query language. The following
query in the SQL language finds the name of the student whose student-id is 5565:
SELECT students.student-name, students.dob
FROM students
WHERE students.student-id = 5565
The query specifies that those rows from the table students where the student-id
is 5565 must be retrieved, and the students.name and students.dob attributes of
these rows must be displayed.
Queries may involve information from more than one table.
Some examples:
SELECT - retrieves data from the a database
Syntax:
SELECT * FROM Table_name;
Example:
Select * from Student; It will show all the table records.
SELECT First_name, DOB FROM STUDENT WHERE Reg_no = 'S101';
Cover it by single inverted comma if its datatype is varchar or char.
Eliminating Duplicates: A table could hold duplicate rows. In such a case, you can
eliminate duplicates.
Syntax:
SELECT DISTINCT col, col, .., FROM table_name;
Example:
SELECT DISTINCT * FROM Student;
SELECT DISTINCT first_name, city, pincode FROM Student;
It scans through entire rows, and eliminates rows that have exactly the same
contents in each column.
Sorting DATA: The Rows retrieved from the table will be sorted in either
Ascending or Descending order depending on the condition specified in select
statement, the Keyword has used ORDER BY.
SELECT * FROM Student
ORDER BY First_Name;
The above statement will show records in ascending order from A to Z.
In this syntax,
First, specify the table name and a list of comma-separated columns
inside parentheses after the INSERT INTO clause.
Then, put a comma-separated list of values of the corresponding
columns inside the parentheses following the VALUES keyword.
The number of columns and values must be the same. In addition, the
positions of columns must be corresponding with the positions of their
values.
INSERT statements that use VALUES syntax can insert multiple rows. To do
this, include multiple lists of comma-separated column values, with lists
enclosed within parentheses and separated by commas. Example:
INSERT INTO tbl_name (a,b,c)
VALUES(1,2,3), (4,5,6), (7,8,9);
We use SQL (structured Query Language) for manipulating and defining data in the
database. Common operations use the acronym CRUD – standing for Create, Read,
Update and Delete. SQL allows these and many more other operations on
database to be carried out.
Creating is the processes of generating or originating new tables, and
inserting (populating) data into the created tables. The CREATE Command
in used for actuating or realizing objects such as databases and tables and
indexes.
Read means retrieving data from the database for viewing or performing
other operations.
Update makes changes to sitting data in the database for example when a
transaction is processed a previous value is replaced by a new value.
Delete erases the contents of the database, such as data held in the
tables.
3.9.3. Data Control Language
Transaction Control (TCL) statements are used to manage the changes made by
DML statements. It allows statements to be grouped together into logical
transactions.
COMMIT - saves work done
Syntax for SQL Commit
COMMIT;
Let us consider the following table for understanding Commit in a better way.
Example: Sample table 1
Student
Rol_No Name Address Phone Age
1 Sue Harare 0766353535 18
2 Kudzie Chitungwiza 0784464441 18
3 Sam Norton 0756577557 20
4 Farai Marondera 0764987863 18
3 Sam Norton 0756577557 20
2 Kudzie Chitungwiza 0784464441 18
Following is an example which would delete those records from the table which
have age = 20 and then COMMIT the changes in the database.
Queries:
DELETE FROM Student WHERE AGE = 20;
COMMIT;
This command is used only in the creation of SAVEPOINT among all the
transactions.
In general ROLLBACK is used to undo a group of transactions.
Syntax for rolling back to Savepoint command:
ROLLBACK TO SAVEPOINT_NAME;
you can ROLLBACK to any SAVEPOINT at any time to return the appropriate
data to its original state.
Example:
From the above example Sample table1,
Delete those records from the table which have age = 20 and then
ROLLBACK the changes in the database by keeping Savepoints.
Queries:
SAVEPOINT SP1; //Savepoint created.
DELETE FROM Student WHERE AGE = 20; //deleted
SAVEPOINT SP2; /Savepoint created.
Here SP1 is first SAVEPOINT created before deletion. In this example one
deletion has taken place.
After deletion again SAVEPOINT SP2 is created.
Output:
Student
Rol_No Name Address Phone Age
1 Sue Harare 0766353535 18
2 Kudzie Chitungwiza 0784464441 18
4 Farai Marondera 0764987863 18
2 Kudzie Chitungwiza 0784464441 18
Deletion have been taken place, let us assume that you have changed your
mind and decided to ROLLBACK to the SAVEPOINT that you identified as SP1
which is before deletion.
a [2]
b [2]
c [2]
d [2]
e [2]
f [2]
g [2]
DATABASE ADMINISTRATION
4.1. Objectives
Main Objective: Manage databases
Sub-objectives
1. Spell out reasons for and importance of securing database systems
2. Explain methods, tools and processes required to secure database systems
3. Explain backup and recovery plans for database systems
4. Explain and demonstrate database maintenance procedures
5. Explain the reasons, how database monitoring is done and spell out the
significance of monitoring databases
4.2. Introduction
Database administration refers to the whole set of activities performed by
a database administrator to ensure that a database is always available as needed.
Other closely related tasks and roles are database security, database monitoring and
troubleshooting, and planning for future growth.
Database administrators use specialized software to store and organize data. The
role may include capacity planning, installation, configuration, database design,
migration, performance monitoring, security, troubleshooting, as well as backup and
data recovery.
There are several objectives that are sought to be addressed, and include:
1. Data Availability - make an integrated collection of data available to a wide
variety of users whenever they need to use the data.
2. Data Integrity - ensure correctness and validity of data held in the database
3. Privacy (Confidentiality) (the goal) and security (the means)
4. Management Control (Non-Repudiation, Consistence, Integrity) – role
mainly played by the DBA.
5. Data Independence (a relative term) - avoids reprogramming of
applications, allows easier conversion and reorganization
i. Physical data independence - program unaffected by changes in the
storage structure or access methods and vice-versa.
ii. Logical Data Independence - program unaffected by changes in the
schema and vice-versa.
There are many different types of failure that can affect database processing, each
of which has to be dealt with in a different manner. Some failures affect main
memory only, while others involve non-volatile (secondary) storage. Among the
causes of failure are:
1. System Crashes: In case of system crash, the systems hang up and need to
be rebooted. These failures occur due to hardware malfunction or a bug in
the database software or the operating system itself. It causes the loss of
the content of volatile storage and brings transaction processing to a halt.
The content of non-volatile storage does not affected with this type of
failure. The assumption that hardware errors and bugs bring the system to
a halt, but do not corrupt the non-volatile storage contents is known as
the Fail-Stop Assumption.
2. User Error: An example of a user error is a user inadvertently deleting a
row or dropping a table.
3. Carelessness: Carelessness is the destruction of data by operators or users
because they were not concentrating on the task at hand.
4. Sabotage (intentional corruption of data): Sabotage is the intentional
corruption or destruction of data, hardware, or software facilities, by
employees, competitors, hackers, and governments.
5. Statement Failure: A statement failure can be defined as the inability of
the database to execute an SQL statement. While running a user program,
a transaction might have multiple statements and one of the statements
might fail due to various reasons. Typical examples are selecting from a
table that does not exist, or trying to do an insert and having the
statement fail due to lack of space. Recovery from such failures is
automatic. Upon detection, the database usually will roll back the
statement, returning control to the user or user program.
6. Application software errors: Application software errors include logical
errors in the program that is accessing the database, which causes one or
more transactions to fail.
7. Network Failure: Network failures can occur while using a client-server
configuration or a distributed database system where multiple database
servers are connected by communication networks. Communication
software, line and hardware failures will interrupt the normal operations
of the database system.
8. Media Failure: Media failures are the most dangerous failures. Not only
there is a potential to lose data if proper backup procedures are not
followed, but it usually takes more time to recover than with other kinds
of failures. A typical example of a media failure is a disk-head crash, which
causes all, databases residing on that disk or disks to be lost.
9. Natural Physical Disasters: Natural and physical disasters are the damage
caused to data, hardware and software due to natural disasters like fires,
floods, earthquakes, power failures, and excessive heat.
A few best practices can help even the smallest of businesses secure their
database from potential risks.
1. Separate the Database and Web Servers: A database should reside on a
separate database server located behind a firewall, not in the DMZ with
the web server. A tiered system is an ideal option for securing a database.
2. Encrypt Stored Files: Encrypt stored files. Stored files of a web application
often contain information about the databases the software needs to
connect to. This information, if stored in plain text, provides keys an
attacker needs to access sensitive data.
3. Encrypt Your Backups Too: Encrypt back-up files. Data theft may happen
as a result of an outside attack, but oftentimes, people we trust most that
are the attackers. Also encrypt data in transmission.
4. Use a WAF: Employ web application firewalls. In addition to protecting a
site against cross-site scripting vulnerabilities and web site vandalism, a
good application firewall can thwart SQL injection and other attacks as
well.
5. Keep Patches Current: Web sites that are rich with third-party
applications, widgets, components and various other plug-ins and add-ons
can easily find themselves a target to an exploit that should have been
patched.
6. Minimize Use of 3rd Party Apps: Keep third-party applications to a
minimum. Many of these applications are created by hobbyists or
programmers who discontinue support for them. Unless they are
absolutely necessary, don’t install them. If you do install them, kept them
patched.
7. Don't Use a Shared Server: Avoid using a shared web server if your
database holds sensitive information. A shared host exposes one to
security threats. If you have no other choice, make sure to review security
policies and speak with hosts about what their responsibilities should your
data become compromised.
8. Enable Security Controls: Enable security controls on your database.
While most databases nowadays will enable security controls by default, it
never hurts for you to go through and make sure you check the security
controls to see if this was done.
Planning for database backup and recovery is a continuous process that takes
much time and effort but provides great benefits. First of all you need to figure out
what information to duplicate, how often to make backups, who should do this
task, what equipment to use, and what kind of backup you ought to do.
A specific backup procedure / policy has to be put in place to regularly copy
databases and retain them in secure digital storages.
Some basic considerations that help create database backup & recovery plan
include:
1. Data Importance: For more important and business-critical data (e.g.
client base) you will need to create a plan that involves making extra
copies of your database over the same period it is running, and ensure
that the copies can be easily restored when required. For less important
data (e.g. daily log files), you can schedule a simple plan that does not
require frequent database backup and recovery.
2. Frequency of Change: The frequency of changes influences the decision
on how often to back up and recover the database. If critical data is
modified daily then you should make a daily backup schedule. Your final
decision would also depend on hardware and software capabilities.
3. Speed: Recovery speed is an important time-related factor that
determines the maximum possible time period that could be spent
on database backup and recovery.
4. Equipment: To perform timely backups and recoveries, appropriate
software and hardware (perhaps, several sets of backup media and
devices), including optical drives, removable disk drives, special file
management utilities are needed.
5. Responsibility: Ideally, one person (e.g. IT department head) should be
appointed to control and supervise the backup and recovery plan, and
several IT specialists (e.g. system administrators) should be responsible
for performing the actual backup and recovery of data.
6. Storing: Where do you plan to store database duplicates? Options include
off-site; on-site, and cloud-based.
Database security requires extensive experience handling sensitive data and
current knowledge of new cyber threats.
Three common types of database backups can be run on a desired system: normal
(full), incremental and differential. A customized backup plan can minimize
downtime and maximize efficiency.
Whenever a file is created or updated, an archive bit is attached to that file’s
filename. One can actually view the archive bit in that file’s properties. The
archival bit receives a check mark any time that file has been updated, and the
backup software uses this checkbox to track which files on a system are due for
archiving.
1. Normal or Full Backups: When a normal or full backup runs on a selected
drive, all the files on that drive are backed up. This, of course, includes
system files, application files, user data — everything. Those files are then
copied to the selected destination (backup tapes, a secondary drive or the
cloud), and all the archive bits are then cleared.
Normal backups are the fastest source to restore lost data because all the
data on a drive is saved in one location. The downside of normal backups
is that they take a very long time to run, and in some cases this is more
time than a company can allow. Drives that hold a lot of data may not be
capable of a full backup, even if they run overnight. In these cases,
incremental and differential backups can be added to the backup schedule
to save time.
2. Incremental Backups: A common way to deal with the long running times
required for full backups is to run them only on weekends. Many
businesses then run incremental backups throughout the week since they
take far less time. An incremental backup will grab only the files that have
been updated since the last normal backup. Once the incremental backup
has run, that file will not be backed up again unless it changes or during
the next full backup.
While incremental database backups do run faster, the recovery process is
a bit more complicated. If the normal backup runs on Saturday and a file is
then updated Monday morning, should something happen to that file on
Tuesday, one would need to access the Monday night backup to restore it.
For one file, that’s not too complicated. However, should an entire drive
be lost, one would need to restore the normal backup, plus each and
every incremental backup run since the normal backup.
3. Differential Backups: An alternative to incremental database backups that
has a less complicated restore process is a differential backup. Differential
backups and recovery are similar to incremental in that these backups
grab only files that have been updated since the last normal backup.
However, differential backups do not clear the archive bit. So a file that is
updated after a normal backup will be archived every time a differential
backup is run until the next normal backup runs and clears the archive bit.
In a differential backup, only selected files and folders that have a marker
are backed up. Because a differential backup does not clear markers, if
you did two differential backups in a row on a file, the file would be
backed up each time. This backup type is moderately fast at backing up
and restoring data.
Similar to our last example, if a normal backup runs on Saturday night and
a file gets changed on Monday, that file would then be backed up when
CM applied over the life cycle of a system provides visibility and control of its
performance, functional, and physical attributes. CM verifies that a system
performs as intended, and is identified and documented in sufficient detail to
support its projected life cycle. The CM process facilitates orderly management of
system information and system changes for such beneficial purposes as to revise
capability; improve performance, reliability, or maintainability; extend life; reduce
cost; reduce risk and liability; or correct defects. The relatively minimal cost of
implementing CM is returned many fold in cost avoidance. The lack of CM, or its
ineffectual implementation, can be very expensive and sometimes can have such
catastrophic consequences such as failure of equipment or loss of life.
It identifies the functional and physical attributes of software at various points in
time, and performs systematic control of changes to the identified attributes for
the purpose of maintaining software integrity and traceability throughout the
software development life cycle.
4.5.2. Types of Maintenance
Many different database monitoring tools exist, but the best solution depends
entirely on an organisation’s needs. For example, some businesses might want to
put a focus on addressing database problems in real-time, some might want a
monitoring tool that boasts a sizeable SQL monitoring ability, and others might be
looking for a combination of network, server and applications monitoring. Before
investing in these tools, businesses should consider their specific environment and
install the applications that best suit their requirements.
Database monitoring tools often fall into four categories:
1. General purpose monitoring: General purpose monitoring tools offer a
little bit of everything. These tools monitor a wide range of components:
servers, databases, services, and sometimes networking. General purpose
monitoring capabilities typically scrape status metrics from the server and
store them in a time series database for charting and trending. They often
have the ability to send alerts based on thresholds, and shine when you
want to use one tool to monitor as many things as possible.
2. Reporting and administration: Reporting and administration tools focus
on the needs of database administrators or operations teams, who need
to report on database activity or manipulate databases. Needs can vary
widely, such as trying to develop a specific report or make administrative
changes that can result in adding users, data types, or modifying objects
like tables, stored procedures, schema and indexes. These tools allow for
We will now list a set of generic categories. Under each category, we will list a few
types of database metrics you should consider monitoring. This is not an
exhaustive list, but we emphasize these because together, they paint a complete
picture of the database environment.
4.6.2.1. Infrastructure
Infrastructure should be part of any database monitoring. The metrics
should include:
Percent CPU time used by the database process
Available memory
Available disk space
Disk queue for waiting IO
Percent virtual memory use
Network bandwidth for inbound and outbound traffic
If the metrics are above or below the acceptable threshold, we recommend
relating the figures to database metrics as well. That’s because hardware or
network-related events like a full disk or a saturated network can reflect in
poor query performance. Similarly, a database specific issue like a blocked
database query can show up as a high CPU use.
4.6.3.2. Availability
The next thing to monitor is database availability. That’s because you want
to ensure the database is available and accessible before looking at any
other counters. It also saves you from customers complaining before you
find out an outage. Metrics can include:
Accessibility of the database node(s) using common protocols like
Ping or Telnet
Accessibility of the database endpoint and port (e.g. 3306 for
MySQL, 5432 for PostgreSQL, etc.)
Failover events for master nodes or upgrade events for slave/peer
nodes in multi-node clusters
TRANSACTION PROCESSES
Objectives:
1. Define Transaction Processes
2. Outline (desirable) Properties of a transaction
The following examples further illustrate the ACID properties. In these examples,
the database table has two columns, A and B. An integrity constraint requires that
the values in A and in B must sum to 100. The following SQL code creates a table
as described above:
CREATE TABLE acidtest (A INTEGER, B INTEGER, CHECK (A + B = 100));
1. Atomicity Failure: In database systems, atomicity is one of the ACID
transaction properties. In an atomic transaction, a series of database
operations either all occur, or nothing occurs. The series of operations
cannot be divided apart and executed partially from each other, which
makes the series of operations "indivisible", hence the same. A guarantee of
atomicity prevents updates to the database occurring only partially, which
can cause greater problems than rejecting the whole series outright. In
other words, atomicity means indivisibility and irreducibility.
2. Consistency Failure: Consistency is a very general term, which demands that
the data must meet all validation rules. In the previous example, the
validation is a requirement that A + B = 100. Also, it may be inferred that
both A and B must be integers. A valid range for A and B may also be
inferred. All validation rules must be checked to ensure consistency. Assume
that a transaction attempts to subtract 10 from A without altering B.
Because consistency is
checked after each transaction, it is known that A + B = 100 before the
transaction begins. If the transaction removes 10 from A successfully,
atomicity will be achieved. However, a validation check will show that A + B
= 90, which is inconsistent with the rules of the database. The entire
transaction must be cancelled and the affected rows rolled back to their pre-
transaction state. If there had been other constraints, triggers, or cascades,
every single
change operation would have been checked in the same way as above
before the transaction was committed.
3. Isolation Failure: To demonstrate isolation, we assume two transactions
execute at the same time, each attempting to modify the same data. One of
the two must wait until the other completes in order to maintain isolation.
Consider two transactions. T1 transfers 10 from A to B. T2 transfers 10 from
B to A. Combined, there are four actions:
T1 subtracts 10 from A.
T1 adds 10 to B.
T2 subtracts 10 from B.
T2 adds 10 to A.
If these operations are performed in order, isolation is maintained, although
T2 must wait.
Consider what happens if T1 fails halfway through. The database eliminates
T1's effects, and T2 sees only valid data.
By interleaving the transactions, the actual order of actions might be:
T1 subtracts 10 from A.
T2 subtracts 10 from B.
Transactions Integrity implies that a transaction should have its ACID properties.
Any transaction should be atomic, consistent, durable, and isolated. The quality of
a database product is measured by its transactions’ adherence to the ACID
properties:
1. Atomic — all or nothing. Atomic transactions are such that the
transaction is either entirely completed or makes no change to the
database; even if an error or a hardware fault occurs mid-transaction the
database will not be left with a half-completed transaction.
2. Consistent — the database begins and ends the transaction in a
consistent state. Consistent transactions ensure that the database is left
in a consistent state after the transaction is complete, meaning that any
integrity constraints (unique keys, foreign keys, and CHECK constraints)
must be satisfied or the transaction will be rejected.
3. Isolated — one transaction does not affect another transaction.
Isolated transactions are invisible to other users of the database while
they are being processed.
4. Durable — once committed always committed. Durable transactions
guarantee that they will not be rolled back after the caller has committed
them.
NC Software Engineering: Database Concepts Module. Compiled by C. Uta utanoel@gmail.com 100 | 100
5.3. Chapter 5 Questions
1. Define a transaction process. [3]
2. Describe a “well-formed” transaction. [5]
3. Explain the following terms:
a. Commit Transaction [2]
b. Rollback transaction [2]
c. Abort work [2]
d. Atomicity [2]
e. Consistency [2]
f. Isolation [2]
g. Durability [2]
4. List of four (4) examples of simple transactions. [4]
5. Explain two (2) reasons why a transaction may fail. [4]
6. Using examples, explain the following terms:
a. Atomicity failure [3]
b. Consistency failure [3]
c. Isolation failure [3]
d. Durability failure [3]
7. Differentiate write-ahead logging from shadowing. [4]
8. Explain three (3) advantages and four (4) challenges that face an
organisation that has implemented a database system for its operations. [14]
9. Cite two situations that may result in an Abort of a transaction process. [4]
10. How can a multi-user database system control concurrency? [6]
11. Differentiate referential integrity from domain integrity. [6]
12. Compare and contrast the two database management process:
a. Logical controls [5]
b. Physical controls [5]
13. Discuss the impact of data breaches. [25]
14. How is Availability guaranteed in database system? [6]
15. Differentiate a “cold backup” from “hot backup”. [6]
16. Explain the limitations of manual database monitoring. [5]
17. Explain two consequences of developing a database system without
using any specific model. [6]
18. Is it ideal to implement automated and manual database monitoring
system exclusively? Support your answer. [6]
19. Describe three (3) tests that must be carried out before the
implementation of a database system. [6]
20. List and explain the type of relationships that can be created
between entities. [8]
NC Software Engineering: Database Concepts Module. Compiled by C. Uta utanoel@gmail.com 101 | 100
6. References
1. Date, C, J. (2004). An Introduction to database systems (8th ed.). Harlow: Pearson
Education.
2. Gehrke, R. (2003). Database Management Systems. New York: McGraw-Hill.
3. Kronenke, D.M. & Auer, D. J. (2015). Database Concepts. (7th ed.). Harlow: Pearson
Education
4. Silberschatz, A., Korth, H.F. & Sudarshan, S. (2011). Database System Concepts. New
York: McGraw-Hill.
5. Database Systems The Complete Book
6. Data Security - Data Admin and Database Admin
7. Database Backup and Recovery
8. Database Activity Monitoring Whitepaper
9. Database Concepts
10. Introduction to Relational Databases
11. Database Normalization - Mohua Sarkar
12. Transaction processing systems
i
Network transparency is the situation in which an operating system or other service allows a user to access a
resource (such as an application program or data) without the user needing to know, and usually not being aware
of, whether the resource is located on the local machine (i.e., the computer which the user is currently using) or
on a remote machine (i.e., a computer elsewhere on the network)..
ii
Pragmentation transparency enables users to query upon any table as if it were unfragmented. Thus, it hides
the fact that the table the user is querying on is actually a fragment or union of some fragments. It also conceals
the fact that the fragments are located at diverse sites
iii
Replication transparency is the term used to describe the fact that the user should be unaware that data
is replicated.
NC Software Engineering: Database Concepts Module. Compiled by C. Uta utanoel@gmail.com 102 | 100