Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
2 views

Database and SQL

This document provides an introduction to Relational Database Conception Principles for second-year software engineering students, focusing on the structure, management, and optimization of databases. Key concepts include functional dependence, normalization, normal forms, and integrity constraints, which are essential for reducing redundancy and ensuring data integrity. Understanding these principles is crucial for building reliable and efficient software systems.

Uploaded by

wegabrice05
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Database and SQL

This document provides an introduction to Relational Database Conception Principles for second-year software engineering students, focusing on the structure, management, and optimization of databases. Key concepts include functional dependence, normalization, normal forms, and integrity constraints, which are essential for reducing redundancy and ensuring data integrity. Understanding these principles is crucial for building reliable and efficient software systems.

Uploaded by

wegabrice05
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 65

1

Database and SQL


General Introduction to Relational Database Conception

Welcome to the study of Relational Database Conception Principles! As second-year software engineering
students, you are now delving deeper into the core principles that will enable you to design, optimize, and
maintain robust databases. In this course, you will explore the theoretical and practical aspects of how modern

H
databases are structured and how data is managed efficiently.

RR
Relational Databases are foundational to many software applications, providing a structured way to store and
retrieve large amounts of information. They are based on the relational model, which organizes data into
tables (or relations) composed of rows and columns. This model has proven to be powerful for handling a
wide variety of data and queries in business, web applications, and beyond.

O
Key Concepts You Will Explore:

D
1. Functional Dependence
This principle helps ensure that data is stored in a way that prevents redundancy and maintains

N
integrity. You will learn how certain attributes (columns) depend on others and how to leverage this for
better database design.
2. Algorithms and Normalization
By
Database normalization is a systematic way of organizing data to reduce redundancy and improve
integrity. By applying certain algorithms, we can transform complex, inefficient databases into
simpler, more efficient ones. You will study different normal forms, which are guidelines for how
tables should be structured to achieve this efficiency.
L

3. Normal Forms
SQ

Normal forms, such as 1NF, 2NF, 3NF, BCNF, and beyond, are rules used to assess whether a
database is well-structured. You will learn how to decompose large, complex tables into smaller,
well-defined ones to avoid anomalies during data operations.
4. Integrity Constraints (Static and Dynamic)
These constraints ensure that the data in the database remains accurate and consistent over time. Static
&

constraints involve rules that must be true for any data in the database, while dynamic constraints
enforce rules during database updates or transactions. These constraints play a crucial role in
maintaining the reliability of the data.
B

Why This is Important:


D

Understanding relational database principles is crucial for building reliable, scalable, and maintainable
software systems. Whether you're building a web application, a mobile app, or a large enterprise

system, having a well-designed database ensures that the system performs well and that data remains secure
and consistent.

DB & SQL by Mr. Ndorrh Oswald E. 2024/2023


2
1) Functional Dependence

Functional Dependence (FD) is a fundamental concept in the theory of relational databases, crucial for
ensuring the accuracy and efficiency of a database's structure. It describes the relationship between two
attributes (or sets of attributes) in a relational database table. Essentially, one attribute (or a set of attributes) is
said to be functionally dependent on another if the value of the first attribute determines the value of the
second.

H
Definition

In a relation (table) R, an attribute B is functionally dependent on attribute A if, for every valid instance of A

RR
in the table, the corresponding value of B is always the same. This is written as:

This notation reads as: A determines B. In other words, if you know the value of A, you can always
determine the value of B.

O
Example:

D
Consider a table of employees where each employee has a unique employee ID. The relationship between the

N
Employee ID and Employee Name would be an example of functional dependence because knowing the
Employee ID allows you to uniquely determine the Employee Name.
By
L
SQ

Here, Employee Name is functionally dependent on Employee ID because for each Employee ID, there is a
unique corresponding Employee Name. Mathematically, we express this as:

Employee ID → Employee Name


&

Types of Functional Dependence:


B

1. Full Functional Dependence


An attribute B is fully functionally dependent on a set of attributes A if it is functionally dependent on
D

A and not on any subset of A. In other words, all parts of A are necessary to determine B.
Example: In a table with Student_ID and Course_Code, a student's final grade is fully functionally
dependent on both Student_ID and Course_Code, since both are needed to uniquely determine the
grade.
2. Partial Functional Dependence
In a partial functional dependence, an attribute B is functionally dependent on only a part of a
composite key (a key that consists of more than one attribute).

DB & SQL by Mr. Ndorrh Oswald E. 2024/2023


3
Example: If a table has a composite primary key made up of Student_ID and Course_Code, and the
Student_Name only depends on Student_ID but not on Course_Code, this is partial dependence.
3. Transitive Functional Dependence
This occurs when one attribute depends on a second attribute, which in turn depends on a third
attribute. In this case, the third attribute is transitively dependent on the first one.
Example: If Employee_ID → Department_ID and Department_ID → Manager_ID, then we can infer
that Employee_ID → Manager_ID through transitive dependence.

H
Importance in Database Design:

RR
● Redundancy Reduction: Understanding functional dependencies helps reduce redundancy in the
database. If attributes are functionally dependent on others, there is no need to repeat data
unnecessarily.
● Normalization: Functional dependence is essential for the process of normalization, which is the

O
process of structuring a relational database to minimize redundancy and dependency.
● Data Integrity: Properly identifying and using functional dependencies ensures that the database

D
maintains integrity, meaning the data remains consistent and accurate.

N
2) Algorithms and Normalization

Normalization is a systematic process of organizing a relational database to minimize redundancy and


improve data integrity. It involves dividing large, complex tables into smaller, well-structured ones without
By
losing data relationships. The goal of normalization is to make the database efficient by reducing data
redundancy and eliminating undesirable characteristics such as update, insert, and delete anomalies.

Normalization follows a series of steps known as normal forms, with each step progressively organizing the
L

data to meet stricter requirements. Various algorithms are used to achieve these forms and ensure that the
database structure is optimized.
SQ

Key Concepts

1. Normalization: The process of organizing data to reduce redundancy and improve integrity.
2. Normal Forms: The levels of normalization (1NF, 2NF, 3NF, BCNF, etc.) that a database can meet,
&

each ensuring higher levels of organization and reduced redundancy.

Algorithms in Normalization
B

The normalization process involves using specific algorithms or steps to transform a database into various
D

normal forms. These steps are iterative, with each stage building on the previous one.

Step-by-Step Algorithms for Normalization

1. First Normal Form (1NF): Removing Repeating Groups

DB & SQL by Mr. Ndorrh Oswald E. 2024/2023


4
To achieve 1NF, ensure that the data in each column of a table is atomic (i.e., indivisible). This means no
repeating groups or arrays are allowed within columns.

Algorithm for 1NF:

● Identify tables that have repeating groups or sets of values in a single column.
● Split those columns into separate rows, ensuring that each value is atomic.

Example: Consider the following table where a student can have multiple courses in a single row:

H
RR
O
To transform this into 1NF:

D
● Split the "Courses" column into multiple rows.

N
By
L
SQ

Now the table is in 1NF, as each column contains atomic values.

2. Second Normal Form (2NF): Eliminating Partial Dependencies


&

For a table to be in 2NF, it must first be in 1NF, and all non-key attributes must be fully functionally
dependent on the entire primary key (no partial dependencies). This is particularly relevant when dealing with
composite keys (keys made up of more than one attribute).
B

Algorithm for 2NF:


D

● Identify any partial dependencies, where a non-key attribute is dependent on only part of a composite
key.
● Remove those attributes and place them in a separate table.

Example: Consider the following table where Course_ID and Student_ID together form the primary key:

DB & SQL by Mr. Ndorrh Oswald E. 2024/2023


5

In this case, Course_Name depends only on Course_ID, not on the combination of Course_ID and
Student_ID. To eliminate this partial dependency, split the table:

H
● Course Table:

RR
O
D
● Enrollment Table:

N
By
Now, both tables are in 2NF.
L

3. Third Normal Form (3NF): Removing Transitive Dependencies


SQ

To achieve 3NF, the table must be in 2NF, and there should be no transitive dependencies. A transitive
dependency exists when one non-key attribute depends on another non-key attribute.

Algorithm for 3NF:


&

● Identify transitive dependencies (if any attribute depends on a non-key attribute).


● Remove those attributes into a separate table.
B

Example: Consider the following table where Student_ID determines Advisor_Name, and Advisor_Name
determines Advisor_Office:
D

DB & SQL by Mr. Ndorrh Oswald E. 2024/2023


6
Here, Advisor_Office is transitively dependent on Student_ID through Advisor_Name. To eliminate this, split
the table:

● Student Table:

H
RR
● Advisor Table:

O
D
Now the tables are in 3NF.

4. Boyce-Codd Normal Form (BCNF): Stricter Version of 3NF


N
By
BCNF is a stricter form of 3NF. It deals with situations where a table may not have partial or transitive
dependencies but still violates normalization principles due to non-candidate keys.

Algorithm for BCNF:


L

● Check if every determinant (an attribute or set of attributes that determine others) is a candidate key.
SQ

● If any determinant is not a candidate key, split the table.

Integrity Constraints in Normalization:

During the normalization process, integrity constraints are enforced to ensure data accuracy and consistency.
&

These include:

● Static Constraints: Rules that must hold true at all times (e.g., uniqueness of primary keys).
B

● Dynamic Constraints: Rules that apply during updates or insertions (e.g., foreign key constraints).
D

Importance of Algorithms and Normalization:

● Eliminate Redundancy: Reduce duplicate data to save storage space and simplify data management.
● Improve Data Integrity: Ensure data accuracy and consistency by organizing it logically.
● Prevent Anomalies: Minimize update, insert, and delete anomalies that can arise in poorly structured
databases.

DB & SQL by Mr. Ndorrh Oswald E. 2024/2023


7
● Optimize Query Performance: By splitting data into well-structured tables, queries become more
efficient.

3) Normal Forms in Database Normalization


Normal forms are the guidelines or steps used to organize data in a relational database to reduce redundancy
and improve data integrity. Each normal form builds upon the previous one, adding stricter rules to eliminate
anomalies and ensure data consistency. Understanding these forms is crucial to ensuring an efficient and

H
effective database design.

RR
There are several normal forms (NFs), each addressing specific issues in database design. The most
commonly used ones are 1NF, 2NF, 3NF, and BCNF.

1. First Normal Form (1NF): Eliminate Repeating Groups

O
A table is in First Normal Form (1NF) if:

D
● Each column contains only atomic (indivisible) values.
● There are no repeating groups or arrays in the table.

N
Key Requirement: All values in a column must be of the same type, and there should be no multiple values
within a single field.
By
Example: Consider a table where a student can enroll in multiple courses:
L
SQ

This table violates 1NF because the Courses column contains multiple values. To bring it into 1NF, we split
&

the multiple values into separate rows:


B
D

Now, the table is in 1NF, as each column contains atomic values.

DB & SQL by Mr. Ndorrh Oswald E. 2024/2023


8
2. Second Normal Form (2NF): Eliminate Partial Dependencies

A table is in Second Normal Form (2NF) if:

● It is already in 1NF.
● There are no partial dependencies; i.e., all non-key attributes are fully dependent on the primary key.

Key Requirement: Every non-key attribute should depend on the whole primary key, not just part of it. This

H
is especially important when dealing with composite keys (keys made up of multiple columns).

Example: Consider a table with a composite primary key of Student_ID and Course_ID:

RR
O
D
Here, Student_Name depends only on Student_ID, not on the combination of Student_ID and
Course_ID. To remove this partial dependency, we split the table:

● Student Table:
N
By
L
SQ

● Enrollment Table:
&
B

Now, both tables are in 2NF.


D

3. Third Normal Form (3NF): Eliminate Transitive Dependencies

A table is in Third Normal Form (3NF) if:

● It is already in 2NF.
● There are no transitive dependencies; i.e., non-key attributes do not depend on other non-key
attributes.

DB & SQL by Mr. Ndorrh Oswald E. 2024/2023


9
Key Requirement: Every non-key attribute should depend only on the primary key, not on any other non-key
attributes.

Example: Consider the following table:

H
RR
Here, Advisor_Office depends on Advisor_Name, which is itself dependent on Student_ID. This is
a transitive dependency. To eliminate it, we split the table:

● Student Table:

O
D
N
By
● Advisor Table:
L
SQ

Now, both tables are in 3NF, as there are no transitive dependencies.

4. Boyce-Codd Normal Form (BCNF): Stricter Version of 3NF


&

A table is in Boyce-Codd Normal Form (BCNF) if:

● It is already in 3NF.
B

● Every determinant (an attribute that determines others) is a candidate key.

Key Requirement: If a non-trivial functional dependency exists, the left side of the dependency must be a
D

super key (a unique identifier).

Example: Consider the following table where Course_ID determines the Instructor, but
Instructor is not a candidate key:

DB & SQL by Mr. Ndorrh Oswald E. 2024/2023


10

Here, Course_ID → Instructor is fine, but Instructor → Room violates BCNF because
Instructor is not a candidate key. To satisfy BCNF, we must split the table:

H
● Course Table:

RR
O
D
● Instructor Table:

N
By
Now, both tables meet the BCNF requirements.
L

Other Normal Forms:


SQ

● Fourth Normal Form (4NF): Removes multi-valued dependencies, where one attribute can have
multiple independent values.
● Fifth Normal Form (5NF): Deals with join dependencies, ensuring that any combination of data can
be reconstructed from the smaller tables without redundancy.
&
B

Importance of Normal Forms:


D

● Reduces Redundancy: By applying these rules, redundant data is minimized, which saves space and
reduces the risk of data inconsistency.
● Eliminates Anomalies: Insert, update, and delete anomalies are avoided by organizing data into
well-structured tables.
● Improves Data Integrity: Ensures that the database remains consistent and accurate over time.

DB & SQL by Mr. Ndorrh Oswald E. 2024/2023


11
● Optimizes Performance: Well-structured tables improve query performance and make database
management easier.

4) Integrity Constraints in Database Systems


Integrity constraints are rules applied to a database to ensure the accuracy and consistency of data. These
constraints help maintain the quality of the data by restricting the types of operations that can be performed.
Without integrity constraints, data can become unreliable and inconsistent, leading to a breakdown in the

H
utility of the database.

RR
There are two major types of integrity constraints:

● Static Constraints: Rules that apply to the structure and content of the data at all times.
● Dynamic Constraints: Rules that apply during updates, insertions, and deletions of data (during

O
transactions).

D
1. Static Integrity Constraints

Static integrity constraints ensure that the data in a database satisfies certain conditions at all times. These

N
constraints are generally enforced by the database management system (DBMS) and are checked before any
data is added to or modified in the database.
By
Types of Static Integrity Constraints:

● Primary Key Constraint: Ensures that every row in a table has a unique identifier, and the values in
the primary key field must be unique and non-null.
L

Example: In a table of Employees, each employee must have a unique Employee_ID:


SQ
&

The primary key constraint ensures that no two rows can have the same Employee_ID, and it cannot be
B

null.
D

● Unique Constraint:

Ensures that all the values in a column are unique, but unlike a primary key, it allows for null values.
Example: A table of Customers where the Email address must be unique for each customer:

DB & SQL by Mr. Ndorrh Oswald E. 2024/2023


12

● Foreign Key Constraint:

H
Ensures that a value in one table corresponds to a valid value in another table, maintaining referential
integrity.
Example: In a table of Orders, the Customer_ID must reference a valid Customer_ID in the

RR
Customers table:

O
D
If a value is entered in Customer_ID that does not exist in the Customers table, the database will reject
the entry.

● Not Null Constraint:

Ensures that a column cannot have a null value. N


By
Example: In a table of Products, the Product_Name column cannot be left empty (null).
L
SQ

Check Constraint: Ensures that values in a column meet a specific condition.


&

Example: A Salary column in the Employees table could have a check constraint to ensure that salaries
are always greater than zero:
B
D

2. Dynamic Integrity Constraints

Dynamic integrity constraints are rules that govern the behavior of data when it is modified through inserts,
updates, or deletions. These constraints ensure that transactions follow the business rules or data validation
rules of the system.

DB & SQL by Mr. Ndorrh Oswald E. 2024/2023


13
Types of Dynamic Integrity Constraints:

● Referential Integrity: Ensures that changes to one table do not invalidate relationships with another
table. It prevents situations where an entry in a table refers to non-existent data in a related table.
Example: If an Order references a Customer_ID, deleting that customer from the Customers
table should either be prevented (restrict) or cascade the delete to the Orders table (cascade delete).
● Transaction Integrity (ACID properties): Ensures that transactions in a database follow the ACID
properties—Atomicity, Consistency, Isolation, and Durability:

H
○ Atomicity: Ensures that a transaction is either fully completed or fully rolled back, preventing
partial updates.

RR
○ Consistency: Ensures that the database transitions from one valid state to another.
○ Isolation: Ensures that concurrent transactions do not affect each other.
○ Durability: Ensures that once a transaction is committed, it remains in the system even in case
of a system crash.

O
● Example: A bank transaction that transfers money between two accounts should ensure that both the
debit from one account and the credit to another account occur together. If one fails, both should fail.

D
Triggers: A form of dynamic constraint that automatically performs actions when specific conditions occur in

N
the database. Triggers can be used to enforce business rules dynamically.
Example: A trigger that automatically updates the Total_Salary when a new Bonus is added to the
Employees table:
By
L
SQ
&

● Assertions:
B

Constraints that can involve multiple tables and are checked dynamically when data changes. Assertions are
D

used to express complex business rules that go beyond simple column-based constraints.
Example: An assertion that checks that the number of Managers in a department does not exceed a certain
limit:

DB & SQL by Mr. Ndorrh Oswald E. 2024/2023


14

Static vs Dynamic Integrity Constraints:

● Static Constraints apply at all times and are enforced when data is inserted or modified, ensuring data

H
structure rules are followed (e.g., primary keys, unique constraints).
● Dynamic Constraints are rules that apply to database modifications and transactions, ensuring that

RR
updates, inserts, and deletes follow certain logic (e.g., referential integrity, triggers).

Importance of Integrity Constraints:

O
1. Maintain Data Accuracy and Consistency: Integrity constraints ensure that data entered into the
database follows the predefined rules, preventing the introduction of inaccurate or inconsistent data.

D
2. Prevent Anomalies: They help avoid anomalies such as incorrect updates, invalid deletions, and
erroneous insertions.

N
3. Enforce Business Rules: Integrity constraints enforce the rules that are critical for the application or
organization’s business logic, such as ensuring salaries are never negative or that certain relationships
between entities always hold.
By
4. Ensure Database Reliability: With well-defined integrity constraints, the database remains reliable
and performs as expected even in complex transactional environments.

SQL Language: An Overview


L

SQL (Structured Query Language) is the standard language used to interact with relational databases. It
SQ

allows users to create, query, update, and manage data in a database system. As a software engineer, mastering
SQL is crucial, as it enables you to perform a variety of operations on a database with efficiency and
precision.
&

Key Areas of SQL:

1. Data Definition Language (DDL): Deals with the structure of the database schema.
B

○ CREATE: Used to create a new database, table, index, or view.


○ ALTER: Used to modify an existing database structure.
D

○ DROP: Used to delete objects such as tables, indexes, or databases.


○ TRUNCATE: Used to remove all rows from a table without logging individual row deletions.
2. Data Manipulation Language (DML): Deals with the manipulation of data stored in the database.
○ SELECT: Retrieves data from the database.
○ INSERT: Adds new data to a table.
○ UPDATE: Modifies existing data in a table.
○ DELETE: Removes data from a table.

DB & SQL by Mr. Ndorrh Oswald E. 2024/2023


15
3. Data Control Language (DCL): Manages access to the database.
○ GRANT: Provides privileges to users.
○ REVOKE: Removes privileges from users.
4. Transaction Control Language (TCL): Manages the changes made by DML operations.
○ COMMIT: Saves the changes made by a transaction permanently.
○ ROLLBACK: Undoes changes made in the current transaction.
○ SAVEPOINT: Sets a point in a transaction to which a rollback can revert.

H
Important SQL Concepts

RR
1. Creating Tables (DDL)

SQL allows you to define the structure of your data with the CREATE statement. This includes defining
tables, columns, and the data types for each column.

O
Example:

D
N
By
L

2. Selecting Data (DML)


SQ

The SELECT statement is used to retrieve data from a database. You can filter, sort, and aggregate data using
various SQL clauses.

Basic SELECT Query:


&
B
D

● WHERE Clause: Filters records based on conditions.


● ORDER BY: Sorts results.
● GROUP BY: Groups data for aggregation.
● HAVING: Filters records after aggregation.

DB & SQL by Mr. Ndorrh Oswald E. 2024/2023


16
Example with Aggregation:

H
3. Inserting Data (DML)

RR
The INSERT INTO statement is used to add new data to a table.

Example:

O
D
4. Updating Data (DML)

N
The UPDATE statement allows you to modify existing data in a table.
By
Example:
L
SQ

5. Deleting Data (DML)


&

The DELETE statement removes data from a table.

Example:
B
D

6. Joins

Joins are used to combine data from multiple tables. Common types of joins include:

DB & SQL by Mr. Ndorrh Oswald E. 2024/2023


17
● INNER JOIN: Returns records with matching values in both tables.
● LEFT JOIN (LEFT OUTER JOIN): Returns all records from the left table and the matched records
from the right table.
● RIGHT JOIN (RIGHT OUTER JOIN): Returns all records from the right table and the matched
records from the left table.
● FULL OUTER JOIN: Returns all records when there is a match in either table.

Example:

H
RR
O
7. Subqueries

D
A subquery is a query within another query. It is used to perform operations in stages, retrieving intermediate
results that can be used in a larger query.

Example:

N
By
L
SQ

8. Constraints

SQL supports various constraints to enforce rules on data columns, such as:

● NOT NULL: Ensures that a column cannot have NULL values.


● UNIQUE: Ensures all values in a column are different.
&

● PRIMARY KEY: Uniquely identifies each row in a table.


● FOREIGN KEY: Ensures the integrity of references between tables.
B

● CHECK: Ensures that the values in a column meet a specific condition.


D

Example:

DB & SQL by Mr. Ndorrh Oswald E. 2024/2023


18

H
RR
9. Indexes

An index is used to speed up queries by creating a quick lookup table for data.

O
Example:

D
10. Views

N
By
A view is a virtual table based on the result set of an SQL query. It does not store data itself, but rather
retrieves data stored in other tables.

Example:
L
SQ
&

SQL for Transaction Management


B

1. COMMIT and ROLLBACK


D

● COMMIT: Saves all changes made during the transaction.


● ROLLBACK: Reverts changes to the last COMMIT or SAVEPOINT.

Example:

DB & SQL by Mr. Ndorrh Oswald E. 2024/2023


19

H
RR
O
2. SAVEPOINT

D
Allows you to define points within a transaction to which you can roll back if necessary.

Example:

N
By
L
SQ
&

Practical Uses of SQL

1. Data Analysis: SQL is frequently used for data analysis, extracting meaningful insights from large
B

datasets by querying the database.


2. Database Management: SQL allows database administrators (DBAs) to manage user access, security,
D

backups, and other administrative tasks.


3. Reporting: SQL is used to generate reports by selecting and summarizing data from one or more
tables.

SQL Language: An Overview

DB & SQL by Mr. Ndorrh Oswald E. 2024/2023


20
SQL (Structured Query Language) is the standard language used to interact with relational databases. It
allows users to create, query, update, and manage data in a database system. As a software engineer, mastering
SQL is crucial, as it enables you to perform a variety of operations on a database with efficiency and
precision.

Key Areas of SQL:

1. Data Definition Language (DDL): Deals with the structure of the database schema.

H
○ CREATE: Used to create a new database, table, index, or view.
○ ALTER: Used to modify an existing database structure.

RR
○ DROP: Used to delete objects such as tables, indexes, or databases.
○ TRUNCATE: Used to remove all rows from a table without logging individual row deletions.
2. Data Manipulation Language (DML): Deals with the manipulation of data stored in the database.
○ SELECT: Retrieves data from the database.

O
○ INSERT: Adds new data to a table.
○ UPDATE: Modifies existing data in a table.

D
○ DELETE: Removes data from a table.
3. Data Control Language (DCL): Manages access to the database.

N
○ GRANT: Provides privileges to users.
○ REVOKE: Removes privileges from users.
4. Transaction Control Language (TCL): Manages the changes made by DML operations.
By
○ COMMIT: Saves the changes made by a transaction permanently.
○ ROLLBACK: Undoes changes made in the current transaction.
○ SAVEPOINT: Sets a point in a transaction to which a rollback can revert.
L

Important SQL Concepts


SQ

1. Creating Tables (DDL)

SQL allows you to define the structure of your data with the CREATE statement. This includes defining
tables, columns, and the data types for each column.
&

Example:

sql
B

Copy code
D

CREATE TABLE Employees (

Employee_ID INT PRIMARY KEY,

Employee_Name VARCHAR(100),

DB & SQL by Mr. Ndorrh Oswald E. 2024/2023


21
Department VARCHAR(50),

Salary DECIMAL(10, 2)

);

H
2. Selecting Data (DML)

The SELECT statement is used to retrieve data from a database. You can filter, sort, and aggregate data using

RR
various SQL clauses.

Basic SELECT Query:

O
sql

D
Copy code

SELECT Employee_Name, Salary

FROM Employees
N
By
WHERE Department = 'IT';
L

● WHERE Clause: Filters records based on conditions.


● ORDER BY: Sorts results.
SQ

● GROUP BY: Groups data for aggregation.


● HAVING: Filters records after aggregation.

Example with Aggregation:


&

sql

Copy code
B

SELECT Department, AVG(Salary) AS Avg_Salary


D

FROM Employees

GROUP BY Department

HAVING AVG(Salary) > 5000;

DB & SQL by Mr. Ndorrh Oswald E. 2024/2023


22

3. Inserting Data (DML)

The INSERT INTO statement is used to add new data to a table.

Example:

sql

H
Copy code

RR
INSERT INTO Employees (Employee_ID, Employee_Name, Department, Salary)

VALUES (1001, 'John Doe', 'Sales', 5500.00);

O
D
4. Updating Data (DML)

N
The UPDATE statement allows you to modify existing data in a table.

Example:
By
sql

Copy code
L

UPDATE Employees
SQ

SET Salary = 6000

WHERE Employee_ID = 1001;


&

5. Deleting Data (DML)


B

The DELETE statement removes data from a table.


D

Example:

sql

Copy code

DELETE FROM Employees

DB & SQL by Mr. Ndorrh Oswald E. 2024/2023


23
WHERE Employee_ID = 1001;

6. Joins

Joins are used to combine data from multiple tables. Common types of joins include:

● INNER JOIN: Returns records with matching values in both tables.

H
● LEFT JOIN (LEFT OUTER JOIN): Returns all records from the left table and the matched records
from the right table.

RR
● RIGHT JOIN (RIGHT OUTER JOIN): Returns all records from the right table and the matched
records from the left table.
● FULL OUTER JOIN: Returns all records when there is a match in either table.

O
Example:

D
sql

Copy code

SELECT Employees.Employee_Name, Departments.Department_Name


N
By
FROM Employees

INNER JOIN Departments ON Employees.Department =


Departments.Department_ID;
L
SQ

7. Subqueries

A subquery is a query within another query. It is used to perform operations in stages, retrieving intermediate
&

results that can be used in a larger query.

Example:
B

sql
D

Copy code

SELECT Employee_Name

FROM Employees

WHERE Salary > (SELECT AVG(Salary) FROM Employees);

DB & SQL by Mr. Ndorrh Oswald E. 2024/2023


24

8. Constraints

SQL supports various constraints to enforce rules on data columns, such as:

● NOT NULL: Ensures that a column cannot have NULL values.


● UNIQUE: Ensures all values in a column are different.
● PRIMARY KEY: Uniquely identifies each row in a table.

H
● FOREIGN KEY: Ensures the integrity of references between tables.
● CHECK: Ensures that the values in a column meet a specific condition.

RR
Example:

sql

O
Copy code

D
CREATE TABLE Orders (

Order_ID INT PRIMARY KEY,

Customer_ID INT, N
By
Order_Date DATE,

CONSTRAINT fk_customer FOREIGN KEY (Customer_ID)


L

REFERENCES Customers(Customer_ID)
SQ

);
&

9. Indexes

An index is used to speed up queries by creating a quick lookup table for data.
B

Example:
D

sql

Copy code

CREATE INDEX idx_employee_name ON Employees(Employee_Name);

DB & SQL by Mr. Ndorrh Oswald E. 2024/2023


25

10. Views

A view is a virtual table based on the result set of an SQL query. It does not store data itself, but rather
retrieves data stored in other tables.

Example:

H
sql

RR
Copy code

CREATE VIEW EmployeeView AS

O
SELECT Employee_Name, Department

FROM Employees

D
WHERE Salary > 5000;

N
By
SQL for Transaction Management

1. COMMIT and ROLLBACK


L

● COMMIT: Saves all changes made during the transaction.


● ROLLBACK: Reverts changes to the last COMMIT or SAVEPOINT.
SQ

Example:

sql
&

Copy code

BEGIN TRANSACTION;
B
D

UPDATE Employees

SET Salary = 5000

WHERE Employee_ID = 1001;

DB & SQL by Mr. Ndorrh Oswald E. 2024/2023


26

-- If something goes wrong, rollback the changes

ROLLBACK;

H
-- If everything is fine, commit the changes

RR
COMMIT;

O
2. SAVEPOINT

Allows you to define points within a transaction to which you can roll back if necessary.

D
Example:

sql

Copy code N
By
SAVEPOINT SavePoint1;
L

UPDATE Employees
SQ

SET Salary = 6000

WHERE Employee_ID = 1002;


&

ROLLBACK TO SavePoint1; -- Undo changes after SavePoint1


B
D

Practical Uses of SQL

1. Data Analysis: SQL is frequently used for data analysis, extracting meaningful insights from large
datasets by querying the database.

DB & SQL by Mr. Ndorrh Oswald E. 2024/2023


27
2. Database Management: SQL allows database administrators (DBAs) to manage user access, security,
backups, and other administrative tasks.
3. Reporting: SQL is used to generate reports by selecting and summarizing data from one or more
tables.

Questions for SQL Language

H
1. Data Definition (DDL) (20 Marks)

RR
a) Write an SQL statement to create a Customers table with the following fields: Customer_ID (Primary
Key), Customer_Name, Phone, and Email. (5 Marks)

O
b) Modify the Customers table by adding a new column Address of type VARCHAR(255). (3 Marks)

c) Drop the Phone column from the Customers table. (2 Marks)

D
d) Write the SQL command to remove all rows from the Orders table without removing the table itself. (3
Marks)

N
e) Define the difference between TRUNCATE and DELETE in SQL. (7 Marks)
By
2. Data Manipulation (DML) (20 Marks)

a) Write an SQL query to insert a new record into the Orders table with the following data: Order_ID =
L

101, Customer_ID = 501, Order_Date = '2024-10-10'. (5 Marks)


SQ

b) Update the Salary of all employees in the IT department by 10%. (5 Marks)

c) Write a query to delete all orders placed before the year 2023 from the Orders table. (5 Marks)

d) Explain how the GROUP BY clause is used with an example. (5 Marks)


&

3. Joins and Subqueries (20 Marks)


B

a) Write a query to retrieve the names of all employees and their department names using an INNER JOIN
between Employees and Departments tables. (5 Marks)
D

b) Explain the difference between LEFT JOIN and RIGHT JOIN with examples. (5 Marks)

c) Write a subquery to find the employees whose salary is greater than the average salary of all employees. (5
Marks)

d) Write an SQL query to find the second-highest salary from the Employees table. (5 Marks)

DB & SQL by Mr. Ndorrh Oswald E. 2024/2023


28
Database Administration
Database Administration involves the tasks and responsibilities required to manage, maintain, and protect a
database system. Database administrators (DBAs) ensure that databases are secure, available, and efficiently
managed, allowing users and applications to access and manipulate data as needed.

Databases are at the heart of modern information systems, supporting business operations, analytics, and
decision-making. The role of a DBA is crucial in ensuring that these systems run smoothly, are optimized for

H
performance, and are protected against risks such as data corruption, unauthorized access, and system failures.

RR
Key responsibilities of database administrators include:

● Installation and Configuration: Setting up the database software and configuring it according to the
system’s requirements.

O
● Data Backup and Recovery: Ensuring that data is backed up regularly and can be restored in case of
data loss.

D
● Performance Monitoring and Optimization: Regularly monitoring the performance of the database
and making adjustments to improve efficiency.
● Security Management: Protecting the database from unauthorized access and ensuring compliance
with data security standards.

N
● Managing Data Integrity: Ensuring that data within the database remains accurate and consistent.
By
● Handling User Access: Managing user permissions and ensuring that only authorized users can access
specific data.
● Troubleshooting: Identifying and resolving database issues to minimize downtime.
L
SQ

Physical Implementation of Data in Database Administration


The physical implementation of data refers to the way in which data is stored and organized on the
physical storage medium (such as hard drives, SSDs, or cloud storage). This level of database administration
deals with how the database system physically manages data storage and retrieval, optimizing it for
&

performance and ensuring data availability.

Key Concepts in Physical Data Implementation:


B

1. Storage Structures:
○ Tablespaces and Datafiles: Databases use storage structures called tablespaces, which are
D

logical units that group together datafiles the physical files on disk that store the actual data.
Each datafile belongs to a specific tablespace.
2. Example: In Oracle databases, tablespaces such as USER_DATA or TEMP_DATA consist of datafiles
located on physical storage devices.
3. Data Block Management:

DB & SQL by Mr. Ndorrh Oswald E. 2024/2023


29
○ Data is stored in blocks (also called pages), which are the smallest unit of data that a
database can read or write. A block consists of rows and columns and is the foundation of
data storage.
4. Block Size: The size of each block (e.g., 8KB, 16KB) can affect performance, and DBAs may adjust
this parameter based on workload requirements.
5. Partitioning:
○ Partitioning divides large tables or indexes into smaller, more manageable pieces, which can
be stored across different disks or locations. This improves performance by allowing faster

H
data retrieval and more efficient management of large datasets.
6. Example: A large sales database may partition data by date range (e.g., monthly or yearly partitions)
so that queries targeting specific dates run faster.

RR
7. RAID (Redundant Array of Independent Disks):
○ RAID technology is used to combine multiple physical disks into a single logical unit for
redundancy and performance. Different RAID levels (RAID 0, RAID 1, RAID 5, etc.) offer
varying levels of data redundancy (backup) and performance enhancement.

O
8. Example: RAID 1 mirrors data across two disks, ensuring that if one disk fails, the other retains an
exact copy of the data.

D
9. Indexes:
○ Indexes are auxiliary structures that help speed up data retrieval by allowing the database to

N
find data quickly without scanning the entire table. Indexes are stored physically, and DBAs
must manage their creation and maintenance for optimal performance.
10. Types of Indexes:
○ B-Tree Indexes: Used for fast, ordered retrieval of data.
By
○ Bitmap Indexes: Ideal for columns with low cardinality (few distinct values, such as gender or
status).
11. Data Compression:
○ Compression reduces the physical space required to store data by removing redundancy.
L

This can improve disk usage and performance, especially in read-heavy databases.
12. Example: Data in a historical records table could be compressed to save storage space while
SQ

maintaining access speed.


13. Buffer Cache Management:
○ Databases use a buffer cache to temporarily store data blocks in memory for faster access.
DBAs manage how much of the system's RAM is allocated to cache, which greatly impacts
database performance.
&

14. Backup and Archiving:


○ Physical implementation includes setting up reliable backup solutions (full, incremental, or
differential) and configuring archiving mechanisms to store older data offsite or on secondary
B

storage.
15. Example: A daily incremental backup system can save only the data that has changed since the last
D

backup, reducing the time and storage space needed.


16. Data Encryption:
○ Physical-level encryption ensures that data stored on disk is encrypted, providing security
even if the physical storage device is compromised. This is often required for compliance with
data protection regulations.
17. Example: Encrypting sensitive customer data at rest using Transparent Data Encryption (TDE).

DB & SQL by Mr. Ndorrh Oswald E. 2024/2023


30

Importance of Physical Implementation:

● Performance Optimization: Efficient physical implementation of data helps in improving query


performance, data retrieval times, and overall system responsiveness.
● Data Availability and Reliability: Properly configuring physical storage ensures that data remains
available even during hardware failures or storage issues (e.g., through RAID, partitioning, and

H
backup strategies).
● Storage Management: By organizing how data is physically stored, DBAs ensure that the database
uses storage space effectively and that expanding data storage needs can be met without disrupting

RR
operations.
● Security: Physical encryption and secure storage mechanisms protect data from unauthorized
access or theft, ensuring compliance with security policies and regulations.

O
Structure of the File and Index in Database Administration

D
In database administration, understanding the structure of files and indexes is crucial for optimizing data
storage and retrieval. Both files and indexes play vital roles in how databases physically store, organize, and
access data. Proper management of these structures can significantly improve database performance and
efficiency.

N
By
1. Structure of Files

Files in a database system refer to the physical storage of data on disk. The database uses several types of
files to store various elements, such as data, indexes, transaction logs, and more. The structure of files
determines how the data is organized, stored, and accessed efficiently.
L

Types of Database Files:


SQ

1. Data Files:
○ Data files store the actual data that is inserted into the database. They contain the records,
tables, and indexes that users interact with.
○ Data is stored in a structured format, often in blocks or pages (fixed-size units, such as 8KB
&

or 16KB).
○ These blocks are the smallest units of data storage that the database can read or write.
2. Example: In Oracle databases, data is stored in files with extensions like .dbf, while in SQL Server,
B

data files have .mdf and .ndf extensions.


3. Log Files:
D

○ Log files, or transaction logs, record all changes made to the database. These logs ensure
the database can recover from failures by replaying transactions that were committed before
the crash.
○ Transaction logs also play a role in ensuring ACID properties (Atomicity, Consistency,
Isolation, Durability).
4. Example: In SQL Server, log files typically have the .ldf extension.
5. Control Files:

DB & SQL by Mr. Ndorrh Oswald E. 2024/2023


31
○Control files contain metadata about the database structure, such as the names and locations
of data files, log files, and other important configuration information.
6. Temporary Files:
○ These files are used to store temporary data during operations such as sorting and hashing.
They are cleared once the operation is completed.
7. Redo/Undo Log Files:
○ Redo logs capture all transactions for recovery purposes. Undo logs help roll back
uncommitted changes in case of failure, ensuring data integrity.

H
File Organization:

RR
● Heap Files: This is a simple, unordered file structure where records are placed in the order they are
inserted. There is no specific organization of data, making retrieval slower for large datasets.
● Sorted Files: In sorted files, records are stored in sorted order based on one or more attributes. This
organization improves search performance for queries involving the sorted attribute.

O
● Clustered Files: In a clustered file, data is physically stored based on a clustering index, which
groups related data together for faster retrieval.

D
Importance of File Structure:

N
● Efficiency: Well-organized files help in efficient data retrieval and management, improving database
performance.
● Data Integrity: Proper structuring of files, especially log files, ensures data integrity during
By
transactions and system crashes.
● Storage Optimization: Effective file organization optimizes disk space usage and ensures that large
datasets can be stored and accessed without unnecessary overhead.
L

2. Structure of Indexes
SQ

Indexes in a database are auxiliary structures that help speed up data retrieval. They act like a "lookup table"
for the database, allowing it to quickly find the desired data without scanning the entire table.

Types of Indexes:
&

1. B-Tree Index:
○ The B-Tree (Balanced Tree) is the most commonly used indexing structure. It stores data in a
B

sorted, hierarchical tree structure, allowing efficient searching, inserting, and deleting
operations.
D

○ B-Tree indexes are ideal for range queries, such as finding values between a specific range
of dates or numbers.
2. Example: If you have a table with employee records and want to create an index on the
Employee_ID column:

DB & SQL by Mr. Ndorrh Oswald E. 2024/2023


32
Bitmap Index:

● Bitmap indexes are particularly useful for columns with low cardinality (few distinct values), such as
gender or status columns.
● In a bitmap index, each distinct value is associated with a bitmap, and each bit represents whether a
record contains that value.

Example: A bitmap index on a Gender column would have two bitmaps: one for Male and one for Female.

H
Hash Index:

RR
● A hash index uses a hash function to distribute values across buckets. This provides constant-time
access for equality-based searches but is not suitable for range queries.
● Hash indexes are best used when you frequently query for specific values (e.g., searching for a
customer by ID).

O
Example: Hash indexes are commonly used in NoSQL databases like MongoDB, where queries typically
search for specific document keys.

D
Clustered Index:

● N
A clustered index sorts the physical order of the data in the table based on the indexed column(s). A
table can have only one clustered index because the data can be sorted in only one way.
When you create a clustered index, the rows are stored on disk in the order of the index. This can
By
significantly improve performance for range queries.

Example: In SQL Server, creating a clustered index on a column:


L
SQ

Non-Clustered Index:

● A non-clustered index does not affect the physical order of the data in the table. Instead, it creates a
&

separate structure that points to the actual data.


● You can create multiple non-clustered indexes on a table, allowing faster access for different types of
queries.
B

Composite Index:
D

● A composite index is an index that includes more than one column. These are useful when queries
involve filtering or sorting by multiple columns.

Example: Creating a composite index on Employee_Name and Department:

DB & SQL by Mr. Ndorrh Oswald E. 2024/2023


33
Index Structure Components:

● Index Key: The column or columns on which the index is created. It defines how the data is sorted
and organized in the index.
● Pointers: An index contains pointers to the actual location of the data in the table. These pointers
allow the database to quickly navigate to the correct row(s).
● Leaf Nodes: In B-Tree indexes, the leaf nodes store the actual data or pointers to the data.

Index Storage:

H
● Sparse Index: Only stores entries for some records, typically pointing to the block where the data can

RR
be found.
● Dense Index: Stores an entry for every record in the table, making it faster to locate data but
requiring more storage space.

O
Index Maintenance:

● Index Rebuilding: Over time, indexes may become fragmented as data is inserted, updated, or

D
deleted. Rebuilding the index reorganizes the data, improving performance.
● Index Statistics: These are metadata that provide information about the distribution of data in

N
indexed columns. They help the query optimizer choose the best index for a query.

Importance of Index Structure:


By
● Performance Improvement: Indexes significantly improve the speed of data retrieval operations,
especially for large datasets.
● Efficient Query Execution: Proper indexing allows the database to quickly locate the required data
without scanning the entire table.
L

● Storage Optimization: Indexes use additional disk space but reduce the load on the database by
improving query execution time, especially for complex queries.
SQ

Summary: Structure of Files and Indexes


&

● Files: Handle the physical storage of data, including data files, log files, control files, and temporary
files. The organization of files (heap, sorted, or clustered) impacts how efficiently data can be
retrieved and stored.
B

● Indexes: Help speed up data retrieval by providing quick access paths to the data. Different types of
indexes (B-tree, bitmap, hash, clustered, etc.) are used based on the nature of the data and the types
D

of queries.

Efficient management of file structures and indexes is critical in database administration to ensure high
performance, optimized storage, and data integrity.

DB & SQL by Mr. Ndorrh Oswald E. 2024/2023


34
Control of Concurrent Access in Database Administration
Concurrent access refers to the situation where multiple users or processes attempt to access or modify the
same database at the same time. In a multi-user database environment, controlling concurrent access is
crucial to ensure that transactions are executed accurately, data integrity is maintained, and performance is
optimized. Database administrators (DBAs) use various techniques and mechanisms to manage concurrent
access and prevent issues like data corruption, deadlocks, or race conditions.

H
Challenges of Concurrent Access

When multiple transactions are executed simultaneously in a database, several problems can arise if

RR
concurrency is not controlled:

1. Lost Updates: Occurs when two transactions read the same data and then modify it, leading to one
transaction overwriting the changes of the other.

O
○ Example: If two users read the same bank balance and both attempt to update it (e.g.,
withdrawing money), one update might be lost.

D
2. Dirty Reads: Occurs when a transaction reads data that has been modified by another transaction
but not yet committed.
○ Example: A transaction might read a value that is later rolled back by another transaction,
leading to incorrect results.

N
3. Non-repeatable Reads: Occurs when a transaction reads the same data multiple times and gets
different results because another transaction has modified the data in the meantime.
By
○ Example: A transaction reads a customer’s address, but another transaction updates the
address before the first transaction reads it again.
4. Phantom Reads: Occurs when a transaction reads a set of rows that match a condition but then finds
different rows when it reads again because another transaction has inserted or deleted rows.
L

○ Example: A transaction reads a list of orders, but another transaction inserts a new order
before the list is read again, causing the first transaction to see a different result.
SQ

Techniques to Control Concurrent Access


&

To prevent these issues, databases use various techniques to manage concurrent access:

1. Locks
B

Locking is the most common method used by databases to control concurrent access to data. A lock is a
mechanism that prevents other transactions from accessing the same data until the current transaction has
D

completed its work.

Types of Locks:

1. Shared Lock (S):


○ A shared lock allows multiple transactions to read the same data simultaneously, but it
prevents any transaction from modifying the data while the shared lock is held.

DB & SQL by Mr. Ndorrh Oswald E. 2024/2023


35
○ Example: Multiple users can read the same product information at the same time, but none
can update it.
2. Exclusive Lock (X):
○ An exclusive lock allows only one transaction to modify the data, and it prevents all other
transactions from reading or modifying the data while the lock is held.
○ Example: When a user updates an employee’s salary, no other user can read or modify that
employee’s data until the update is complete.
3. Row-Level Lock:

H
○ Locks an individual row in a table. This allows multiple transactions to work on different rows
of the same table without interfering with each other.
○ Example: Multiple transactions can update different rows of an Orders table concurrently,

RR
each locking only the row it is working on.
4. Table-Level Lock:
○ Locks an entire table, preventing any other transaction from reading or modifying any rows in
that table until the lock is released.

O
○ Example: A bulk update operation might lock the entire Customers table to ensure that no
one else can modify it during the update.

D
5. Intent Locks:
○ Intent locks are used by the DBMS to signal that a transaction intends to acquire a more

N
restrictive lock (e.g., an exclusive lock). These locks are used to prevent conflicts at higher
levels (such as a table-level lock) when lower-level locks (such as row-level locks) are in use.
By
Locking Example:
L
SQ
&
B

2. Locking Protocols
D

● Two-Phase Locking (2PL):


○ In two-phase locking, a transaction acquires all the locks it needs before it starts releasing
any. This ensures that the transaction does not interfere with other transactions once it has
begun executing.
○ Growing Phase: The transaction acquires locks.
○ Shrinking Phase: The transaction releases locks.

DB & SQL by Mr. Ndorrh Oswald E. 2024/2023


36
○ Strict 2PL: The transaction holds all locks until it commits, ensuring serializability and
preventing dirty reads.

3. Isolation Levels

Database management systems (DBMS) offer isolation levels to control the visibility of changes made by
one transaction to other concurrent transactions. Higher isolation levels provide greater data consistency but
can reduce concurrency. The four standard isolation levels are:

H
1. Read Uncommitted:
○ The lowest isolation level, where transactions can read uncommitted changes made by other

RR
transactions (allowing dirty reads).
○ Advantage: Maximum concurrency and performance.
○ Disadvantage: May lead to dirty reads and inconsistencies.
2. Read Committed:

O
○ A transaction can only read data that has been committed by other transactions. This prevents
dirty reads but allows non-repeatable reads and phantom reads.
○ Advantage: Balances data consistency and concurrency.

D
3. Repeatable Read:
○ Ensures that if a transaction reads a value once, it will read the same value again, even if

N
other transactions modify the data in the meantime (no dirty or non-repeatable reads).
○ Disadvantage: Phantom reads are still possible, as new rows can be inserted by other
transactions.
By
4. Serializable:
○ The highest isolation level, ensuring complete isolation between transactions. It prevents dirty
reads, non-repeatable reads, and phantom reads by executing transactions as if they were
serialized (executed one after another).
L

○ Disadvantage: Significantly reduces concurrency and may result in performance bottlenecks.


SQ

Isolation Level Example:


&
B
D

4. Timestamps and Versioning

DB & SQL by Mr. Ndorrh Oswald E. 2024/2023


37
Instead of locks, some databases use timestamp-based concurrency control or multi-version
concurrency control (MVCC), allowing transactions to execute without waiting for locks to be released.

1. Timestamp-Based Concurrency:
○ Each transaction is assigned a unique timestamp. Transactions are executed in order of their
timestamps, ensuring consistency without the need for locks.
2. Multi-Version Concurrency Control (MVCC):
○ In MVCC, the database maintains multiple versions of a record. Each transaction sees the
version of the record that was current when the transaction began, thus avoiding conflicts with

H
other concurrent transactions.
○ This method eliminates locking conflicts and improves performance in read-heavy workloads.

RR
○ Example: PostgreSQL and Oracle implement MVCC, allowing readers to see consistent
snapshots of data even as it is being updated by other transactions.

MVCC Example:

O
D
N
By
5. Deadlock Detection and Prevention
L

Deadlocks occur when two or more transactions are waiting for each other’s locks, creating a cycle of
dependencies that cannot be resolved. To manage deadlocks, DBMSs use:
SQ

1. Deadlock Detection:
○ The system periodically checks for deadlocks and, if one is detected, it selects one of the
transactions to roll back, allowing the others to proceed.
2. Deadlock Prevention:
&

○ By enforcing a strict ordering of resource acquisition or using timeout-based mechanisms,


deadlocks can be prevented. For example, a transaction that takes too long to acquire a lock
may be aborted.
B

Deadlock Detection Example:


D

-- Deadlock detection mechanism would identify a deadlock if two transactions are waiting for each other

6. Optimistic and Pessimistic Concurrency Control

1. Optimistic Concurrency Control:

DB & SQL by Mr. Ndorrh Oswald E. 2024/2023


38
○Assumes that conflicts between transactions are rare and allows multiple transactions to
execute without acquiring locks. When a transaction attempts to commit, the system checks
for conflicts. If a conflict is detected, the transaction is rolled back.
○ Advantage: Higher concurrency, suitable for read-heavy systems.
2. Pessimistic Concurrency Control:
○ Assumes that conflicts are likely and acquires locks to prevent conflicts from occurring.
○ Advantage: Ensures greater consistency in write-heavy systems.

H
Summary: Control of Concurrent Access

RR
● Locks: Shared, exclusive, row-level, table-level, and intent locks are used to control access to data
during transactions.
● Locking Protocols: Techniques like two-phase locking (2PL) ensure transactions execute safely

O
without interfering with one another.
● Isolation Levels: Control the degree to which the changes made by one transaction are visible to

D
other concurrent transactions. Higher isolation levels ensure more data consistency but reduce
concurrency.
● MVCC and Timestamps: Allow transactions to see consistent snapshots of data without locking,


improving performance for read-heavy operations.

N
Deadlock Management: Techniques like deadlock detection and prevention are used to handle
cycles of waiting transactions.
By
● Optimistic vs. Pessimistic Concurrency Control: Optimistic control assumes low conflict rates,
while pessimistic control assumes conflicts are likely and uses locking to prevent them.

Controlling concurrent access is critical for ensuring data consistency, integrity, and system performance in
L

multi-user environments.
SQ

Breakdown Resistance in Database Administration

Breakdown resistance, also known as fault tolerance, refers to the ability of a database system to continue
operating smoothly in the event of hardware or software failures. Ensuring breakdown resistance is crucial for
maintaining database availability, data integrity, and minimizing downtime. In modern database systems,
&

administrators must design for fault tolerance by implementing various strategies that protect against data loss
and ensure system resilience during unexpected breakdowns.
B

Key Concepts of Breakdown Resistance


D

1. Redundancy:
Redundancy involves creating multiple copies of critical components, such as data, hardware, or
system processes, to ensure that if one component fails, another can take over. This prevents complete
system breakdown and helps maintain continuous service.
Examples of Redundancy:

DB & SQL by Mr. Ndorrh Oswald E. 2024/2023


39
○ RAID (Redundant Array of Independent Disks): RAID technology combines multiple
physical disks into a single logical storage system to provide redundancy and fault tolerance.
RAID configurations like RAID 1 (mirroring) and RAID 5 (striping with parity) ensure that
data remains available even if a disk fails.
○ Replication: Data replication involves copying data from one server to another in real-time or
near-real-time. If the primary server goes down, the secondary server can immediately take
over.
■ Master-Slave Replication: The master server handles all write operations, while the

H
slave server mirrors the data. In case of failure, the slave can be promoted to master.
2. High Availability (HA):

RR
High availability refers to a system's ability to remain accessible for the maximum possible time. This
is achieved through hardware and software configurations that reduce the risk of downtime caused by
failures.
Techniques for High Availability:

O
○ Failover Clustering: In failover clustering, multiple database servers are configured in a
cluster, where one server is actively serving requests, while others stand by. If the primary

D
server fails, another server in the cluster takes over (failover), ensuring minimal disruption.
■ Example: In an SQL Server failover cluster, if one node fails, another node

N
automatically takes over database operations without requiring manual intervention.
○ Load Balancing: Load balancing distributes incoming requests across multiple servers to
prevent any one server from being overloaded. This not only improves performance but also
By
adds redundancy. If one server fails, the load balancer directs traffic to the other servers.
■ Example: Load balancers are used in cloud database services like AWS RDS or Google
Cloud SQL to ensure that database traffic is evenly distributed.
3.
L

Backups:
Regularly backing up data is a fundamental strategy for ensuring that data can be restored in the event
SQ

of a breakdown. Backup strategies vary depending on the database system, workload, and criticality
of data.
Types of Backups:
○ Full Backups: A complete backup of all data in the database. Full backups provide a
comprehensive snapshot but can be time-consuming and require a lot of storage.
&

○ Incremental Backups: Only backs up the data that has changed since the last backup (whether
full or incremental). This saves time and storage space.
○ Differential Backups: Backs up the data that has changed since the last full backup. A
B

differential backup is larger than an incremental one but faster to restore.


4. Backup Scheduling:
D

○ Databases often implement automated backup schedules (e.g., nightly or weekly full backups,
hourly incremental backups) to ensure data is protected without manual intervention.
5. Disaster Recovery:
Disaster recovery (DR) refers to a set of strategies and tools used to recover from major failures, such
as hardware breakdowns, natural disasters, or cyber-attacks, that cause significant downtime.
Disaster Recovery Techniques:

DB & SQL by Mr. Ndorrh Oswald E. 2024/2023


40
○ Offsite Backups: Storing backups at a location separate from the primary data center ensures
that data is recoverable even in case of physical damage to the main site.
○ Cloud Backups: Cloud storage solutions like Amazon S3 or Google Cloud Storage provide
scalable, offsite backup options that can be used to recover data after a disaster.
6. Disaster Recovery Plans:
○ A Disaster Recovery Plan (DRP) defines the procedures for recovering data and resuming
operations after a breakdown. It typically includes the location of backups, recovery timelines,
responsible personnel, and testing procedures to ensure recovery readiness.

H
7. Recovery Point Objective (RPO): Defines how much data loss is acceptable, measuring the
maximum period during which data might be lost.

RR
Recovery Time Objective (RTO): Defines the maximum acceptable downtime after a failure before
operations must be restored.
8. Database Replication and Mirroring:
Replication and mirroring are advanced techniques to ensure breakdown resistance by maintaining

O
real-time or near-real-time copies of the database on different servers or locations.
○ Synchronous Replication: In synchronous replication, changes made to the primary database

D
are immediately applied to the replica, ensuring that both databases remain identical. This
technique is ideal for disaster recovery but may introduce some latency.

N
○ Asynchronous Replication: Changes made to the primary database are copied to the replica
with a slight delay. This reduces latency but carries a higher risk of data loss if a failure occurs
before the changes are fully replicated.
By
9. Database Mirroring: In database mirroring, two copies of a database (primary and mirror) are
maintained on different servers. The mirror database automatically takes over if the primary database
fails. This is commonly used for high availability and disaster recovery.
10. Data Integrity Checks:
L

To ensure that data remains uncorrupted, database systems use data integrity checks, such as
SQ

checksums or parity bits, to detect and correct data corruption.


○ Example: In MySQL, the InnoDB storage engine performs automatic checks to ensure that
data written to disk matches the data being read.
11. Fault-Tolerant Hardware:
Using fault-tolerant hardware is a strategy to ensure that even hardware failures don't lead to system
&

downtime.
○ Hot-swappable Components: Hardware components like disks, power supplies, or network
cards can be replaced without shutting down the system, maintaining uptime.
B

○ Dual Power Supplies: Having multiple power supplies ensures that the system stays online
even if one fails.
D

12. Automated Monitoring and Alerts:


Automated monitoring tools continuously check the health of the database and its underlying
infrastructure, triggering alerts if potential issues are detected. This helps DBAs act before a failure
occurs.
Example Tools:
○ Nagios: Used for monitoring databases and hardware health.

DB & SQL by Mr. Ndorrh Oswald E. 2024/2023


41
○ New Relic: Provides real-time performance monitoring for databases and applications.
13. Hot Standby and Warm Standby:
○ Hot Standby: Involves having an exact replica of the primary database constantly running and
synchronized. In the event of a failure, the system can immediately switch over to the hot
standby without any significant downtime.
○ Warm Standby: Similar to hot standby, but the standby server is not fully synchronized in
real-time. A warm standby requires a few minutes to get up-to-date before taking over, leading
to slightly more downtime compared to hot standby.

H
Practical Breakdown Resistance Techniques

RR
Example: Database Replication for Breakdown Resistance

O
D
N
By
L

Example: Backup and Restore in SQL Server


SQ
&
B
D

Summary: Breakdown Resistance in Databases

● Redundancy: Using techniques like RAID and replication to ensure multiple copies of data exist,
preventing data loss during hardware failure.

DB & SQL by Mr. Ndorrh Oswald E. 2024/2023


42
● High Availability: Techniques like failover clustering, load balancing, and database mirroring ensure
continuous access to data even in the event of a server failure.
● Backup and Disaster Recovery: Regular backups, offsite/cloud backups, and disaster recovery plans
ensure that data can be restored quickly after a major breakdown.
● Replication and Mirroring: Real-time replication of databases allows systems to failover to backup
servers with minimal data loss.
● Monitoring and Fault-Tolerant Hardware: Using automated monitoring tools and fault-tolerant
hardware to prevent breakdowns and minimize downtime during hardware failures.

H
By implementing these strategies, database administrators can create systems that resist breakdowns, recover

RR
quickly from failures, and ensure data availability and integrity at all times.

Security and Protection of Data in Database Administration

O
Data security is one of the most critical aspects of database administration. With increasing threats from
cyberattacks, data breaches, and insider threats, database administrators (DBAs) must ensure that sensitive

D
data is protected from unauthorized access, corruption, and theft. Implementing robust security measures
safeguards both the integrity and confidentiality of data, ensuring compliance with data protection regulations

N
such as GDPR, HIPAA, and CCPA.

Key Concepts in Database Security and Data Protection


By
1. Authentication
Authentication is the process of verifying the identity of users attempting to access the database.
Strong authentication mechanisms ensure that only authorized users can log in to the database system.
L

Common Authentication Methods:


○ Username and Password: The most basic form of authentication where users must provide
SQ

valid credentials to access the database. However, weak passwords can be easily compromised.
○ Multi-factor Authentication (MFA): MFA adds an extra layer of security by requiring users
to provide two or more verification factors, such as a password and a one-time code sent to a
mobile device.
○ Single Sign-On (SSO): Allows users to authenticate once and gain access to multiple systems,
&

including the database, without logging in separately each time.


○ OAuth and OpenID Connect: Modern authentication standards that integrate with third-party
B

identity providers for secure and seamless authentication.

Example:
D

DB & SQL by Mr. Ndorrh Oswald E. 2024/2023


43
Authorization and Access Control
Authorization determines what actions authenticated users are allowed to perform. Access control
mechanisms ensure that users have the right level of access to database resources.

Role-Based Access Control (RBAC):

● In RBAC, access rights are assigned to roles, and users are assigned to roles based on their job
functions. This simplifies access management, as administrators need only manage roles instead of

H
individual users.

RR
Example:

O
D
Least Privilege Principle:

N
● The least privilege principle ensures that users only have the minimum permissions required to
perform their tasks, reducing the risk of misuse or exploitation of access rights.
By
Granular Permissions:

● Permissions can be defined at various levels of granularity, including the database level, table level,
row level, or even column level.
L

Example:
SQ
&

3 Encryption
Encryption protects data by converting it into a format that can only be read by those who have the
decryption key. Encryption ensures that even if data is intercepted, it remains unreadable without proper
authorization.
B

Types of Encryption:
D

● Encryption at Rest: This refers to encrypting data when it is stored on disk (e.g., in datafiles or
backups). This protects data from physical theft or unauthorized access to storage.
○ Transparent Data Encryption (TDE): A database feature that automatically encrypts data
stored in datafiles.

DB & SQL by Mr. Ndorrh Oswald E. 2024/2023


44
Example in SQL Server:

H
● Encryption in Transit: Data is encrypted as it travels over networks, protecting it from being intercepted by
attackers. This is achieved through protocols like SSL/TLS (Secure Sockets Layer/Transport Layer Security).

RR
● Example: Configuring SSL/TLS for MySQL or PostgreSQL to encrypt communication between the
database and client applications.
● Column-Level Encryption: Sensitive columns, such as social security numbers or credit card details, can be
encrypted to protect specific fields of data.

O
Example:

D
N
By
L

4 Data Masking
Data masking hides sensitive information by replacing it with obfuscated or anonymized data. This is
SQ

particularly useful for testing and development environments where production data should not be exposed.

Types of Data Masking:

● Static Data Masking: Replaces sensitive data with realistic but fictitious data for non-production
&

environments.
● Dynamic Data Masking: Masks sensitive data on the fly when it is queried by unauthorized users,
showing obfuscated values instead of the actual data.
B

Example in SQL Server:


D

ALTER TABLE Customers ALTER COLUMN Credit_Card_Number ADD MASKED WITH (FUNCTION =
'partial(4, "XXXX-XXXX-XXXX-", 4)');

DB & SQL by Mr. Ndorrh Oswald E. 2024/2023


45

5. Database Auditing
Auditing refers to tracking and logging all database activities, such as user logins, changes to data,
modifications to the schema, and permission changes. Auditing is essential for identifying suspicious

H
activities and ensuring regulatory compliance.

RR
Key Components of Auditing:

● Login Auditing: Tracks when and how users access the database, identifying any unauthorized access
attempts.

O
● Data Access Auditing: Logs all data retrieval, modification, and deletion activities to provide an audit
trail of who accessed sensitive data.
● Schema Change Auditing: Captures all changes made to the database structure, such as adding or

D
dropping tables, columns, or indexes.

N
Example in MySQL:
By
L

6. Data Integrity and Consistency


Data integrity ensures that data remains accurate, consistent, and reliable throughout its lifecycle. Database
SQ

administrators enforce data integrity through various mechanisms.

Key Data Integrity Mechanisms:

● Constraints: Enforce rules on the data, such as primary keys, foreign keys, unique constraints, and
&

check constraints, to prevent invalid data from being entered.


● Triggers: Automated actions that occur when specific changes are made to the database, often used to
maintain integrity between related data.
B

Example:
D

DB & SQL by Mr. Ndorrh Oswald E. 2024/2023


46
7. Database Vulnerability Assessments
Regularly conducting vulnerability assessments is essential for identifying potential security risks in the
database. This includes scanning for weak passwords, unpatched vulnerabilities, misconfigurations, and
unnecessary privileges.
Example Tools:

● SQLMap: An open-source tool for detecting SQL injection vulnerabilities.


● Microsoft Baseline Security Analyzer (MBSA): Used to assess the security of Microsoft databases.

H
8. Intrusion Detection and Prevention Systems (IDPS)

RR
Intrusion Detection and Prevention Systems (IDPS) monitor the database for abnormal behavior and potential
attacks. They can detect and prevent threats such as SQL injection, brute force attacks, and privilege escalation
attempts.
SQL Injection Prevention: SQL injection is a common attack where malicious code is injected into an SQL

O
query to gain unauthorized access. Preventing SQL injection involves:

● Input Validation: Ensuring that user inputs are properly validated before being used in SQL queries.

D
● Parameterized Queries: Using prepared statements and parameterized queries to prevent malicious
input from being executed as code.

Example in MySQL:
N
By
L
SQ

9. Physical Security
Physical security protects the database's hardware, such as servers and storage devices, from physical threats like
theft, vandalism, or natural disasters.
&

Best Practices for Physical Security:

○ Controlled Access: Restrict physical access to data centers and server rooms to authorized
B

personnel only.
○ Environmental Monitoring: Install temperature, humidity, and fire detection systems to
D

prevent damage to servers.


○ Data Destruction: Use secure methods for data destruction, such as degaussing or shredding
hard drives, to ensure that data cannot be recovered after disposal.

10. Disaster Recovery and Backup Security


Even with strong security measures, data loss can still occur due to hardware failures, natural disasters, or
cyber-attacks. Therefore, having a secure backup and disaster recovery plan is essential.

DB & SQL by Mr. Ndorrh Oswald E. 2024/2023


47
Secure Backup Practices:

● Encrypt Backups: Ensure that backup files are encrypted so that even if they are stolen, the data
remains protected.
● Backup Integrity Checks: Regularly test backups to ensure that they are valid and can be restored
when needed.
● Offsite and Cloud Storage: Store backups offsite or in secure cloud environments to protect against
physical disasters.

H
Example in PostgreSQL:

RR
O
Summary: Security and Protection of Data in Database Systems

D
● Authentication: Verifies user identities with methods like multi-factor authentication (MFA) and
Single Sign-On (SSO).

N
● Authorization and Access Control: Manages user privileges and enforces the least privilege principle
using role-based access control (RBAC).
● Encryption: Secures data at rest, in transit, and at the column level to prevent unauthorized access.
By
● Data Masking: Obscures sensitive

Parameter Setting, Start, Stop, Save, and Restoration in Database Administration


L

Database administrators (DBAs) manage various operational aspects of databases, including configuring
system parameters, starting and stopping database services, and ensuring data is saved and restored efficiently.
SQ

These tasks are crucial for maintaining the performance, security, and availability of the database system.
Let’s explore each of these components in more detail.

1. Parameter Setting
&

Parameter setting refers to configuring the database system's behavior by adjusting various settings that
affect performance, security, storage management, and other critical operations. Database systems come with
default settings, but DBAs often fine-tune these parameters based on the specific requirements of the
B

application and hardware resources.


D

Common Types of Parameters:

1. Memory Parameters:
○ Buffer Cache Size: Controls the amount of memory allocated for caching frequently accessed
data. Increasing the buffer size can improve query performance, but it requires careful
management of available memory.

DB & SQL by Mr. Ndorrh Oswald E. 2024/2023


48
■ Example: In MySQL, the parameter innodb_buffer_pool_size controls the
size of the InnoDB buffer pool, which caches data and indexes.

2. Connection Parameters:

H
○ Max Connections: Limits the number of concurrent user connections to the database. Setting
this parameter ensures that the database does not become overloaded with too many

RR
connections, which could slow down performance or cause crashes.
■ Example: In PostgreSQL, max_connections sets the maximum number of
concurrent connections.

O
D
3. Logging Parameters:

N
○ Log File Size and Rotation: Configures how logs are stored and rotated. These settings are
crucial for auditing, debugging, and ensuring that log files do not consume excessive disk
space.
By
■ Example: In SQL Server, you can configure the size and retention policy for
transaction logs.
4. Timeout Parameters:
○ Query Timeout: Limits how long a query can run before the system automatically terminates
L

it, preventing resource-intensive queries from degrading system performance.


■ Example: In Oracle, the RESOURCE_LIMIT parameter controls session-level resource
SQ

limits, including query timeout.


5. Security Parameters:
○ Password Expiration: Sets the rules for password strength, expiration, and lockout policies to
ensure that only authorized users have access to the database.
&

■ Example: In Oracle, the PASSWORD_LIFE_TIME parameter enforces password


expiration policies.
6. Performance Tuning Parameters:
B

○ Parallel Query Execution: Allows the database to execute queries in parallel, improving
performance for large or complex queries.
D

○ Cache Size for SQL Plans: Controls how many SQL execution plans are cached, reducing
parsing time for frequently executed queries.

Importance of Parameter Setting:

● Performance Optimization: Fine-tuning parameters like memory allocation, cache size, and connection
limits can significantly improve database performance.

DB & SQL by Mr. Ndorrh Oswald E. 2024/2023


49
● Security: Configuring security parameters such as password policies, connection timeouts, and
logging helps protect the database from unauthorized access.
● Stability: Properly set parameters ensure that the database runs efficiently, even under heavy loads,
reducing the risk of downtime.

2. Start and Stop Database Services

H
The ability to start and stop the database is a fundamental administrative task. Starting the database service

RR
means initializing the processes required for the database to run, while stopping involves shutting down those
processes in an orderly fashion.

Starting the Database:

O
When starting a database, the system performs several actions:

D
1. Initialize Memory Structures: The database allocates memory structures like the buffer cache, shared
memory, and process memory.

N
2. Start Background Processes: Background processes like log writers, checkpoint processes, and
database writers are initialized to handle essential tasks.
3. Mount Datafiles: The database loads necessary datafiles and control files to ensure the data is
By
accessible.
4. Open Database: The database becomes available for connections, allowing users to perform
read/write operations.
L

Example:
SQ
&
B
D

Stopping the Database:

Stopping the database safely ensures that no data is lost, and all active transactions are properly handled. The
system typically follows these steps:

1. Flush Data to Disk: The database writes all data held in memory to disk to prevent data loss.

DB & SQL by Mr. Ndorrh Oswald E. 2024/2023


50
2. Shutdown Background Processes: Active background processes are stopped.
3. Close Connections: The database terminates all active user sessions and closes network connections.
4. Close Datafiles: The database releases all datafiles and logs that were in use.

Example:

H
RR
O
D
Modes of Shutdown:

N
● Normal: Waits for all active transactions to complete and users to log off before shutting down.
● Immediate: Rolls back all active transactions and forces an immediate shutdown without waiting for
user sessions to end.
By
● Abort: Performs a forced shutdown, bypassing the normal shutdown process. This is usually used as a
last resort in emergency situations.
L

3. Save
SQ

Saving in database terms refers to ensuring that changes made to data are permanently recorded. This is done
through the use of transactions, which allow a set of database operations to be treated as a single unit of
work. Saving is crucial for ensuring data consistency and integrity.
&

Key Concepts of Saving:

1. Transactions: A transaction is a group of one or more SQL operations that are treated as a single,
B

indivisible unit. Either all operations in the transaction are successfully applied, or none of them are
(in the case of a failure).
D

ACID Properties:
○ Atomicity: Ensures that all operations within a transaction are completed, or none are.
○ Consistency: Ensures that a transaction leaves the database in a valid state.
○ Isolation: Transactions are isolated from each other until they are complete.
○ Durability: Once a transaction is committed, its changes are permanent and survive system
failures.

DB & SQL by Mr. Ndorrh Oswald E. 2024/2023


51
2. COMMIT: The COMMIT command is used to save changes made in a transaction. Once a transaction is
committed, the changes are made permanent in the database.
Example:

H
3. ROLLBACK: The ROLLBACK command undoes all changes made in a transaction, restoring the

RR
database to its previous state.
Example:

O
D
N
4. SAVEPOINT: A savepoint allows you to set a point within a transaction to which you can roll back. This
is useful for complex transactions where only a portion of the work may need to be undone.
Example:
By
L
SQ

4. Restoration

Restoration refers to the process of recovering a database after a failure, corruption, or other disaster. DBAs
&

must be able to restore data from backups or transaction logs to bring the system back to its operational state.

Types of Restoration:
B

1. Full Database Restoration:


D

○ Restores the entire database from a full backup. This is typically done in case of severe
corruption or hardware failure where all data must be recovered.

Example in SQL Server:

DB & SQL by Mr. Ndorrh Oswald E. 2024/2023


52
2. Point-in-Time Recovery:
○ Restores the database to a specific point in time, allowing recovery from accidental data loss or
corruption. This is achieved by using a combination of full backups, differential backups, and
transaction logs.

Example:

H
RR
O
3. Partial Restoration:
○ In cases where only specific tables or files need to be recovered, partial restoration is used.

D
This is useful for large databases where restoring the entire database would take too long.
4. Transaction Log Restoration:

N
○ Transaction logs are used to recover uncommitted transactions and apply them to the database
after a crash, ensuring no data is lost.
By
Example:
L
SQ

Summary: Parameter Setting, Start, Stop, Save, and Restoration

● Parameter Setting: Involves configuring database settings for memory, security, connections, and
logging to optimize performance and ensure security.
&

● Start and Stop: DBAs must properly start and stop database services to initialize processes, allocate
resources, and ensure data is saved before shutting down.
● Save: Ensures that all changes made in a transaction are permanently written to the database using `
B

Distributed Database and Distributed Processing in Database Administration


D

In modern database systems, especially in large-scale or global environments, distributed databases and
distributed processing play a key role in ensuring performance, scalability, fault tolerance, and high
availability. These concepts allow data and computational tasks to be spread across multiple locations or
systems, providing significant advantages over traditional centralized databases.

1. Distributed Database

DB & SQL by Mr. Ndorrh Oswald E. 2024/2023


53
A distributed database is a collection of multiple, logically interrelated databases distributed across different
locations (nodes). These nodes can be located on different servers within the same physical location (data
center) or across geographically distant sites. Despite being physically distributed, the system operates as a
single, unified database to the end user.

Characteristics of Distributed Databases:

1. Data Distribution:

H
○ Data is divided and stored across multiple sites (nodes), each potentially managed by its own
local database system. Each site can operate independently but is part of the overall distributed

RR
database system.
2. Transparency:
○ Location Transparency: Users interact with the distributed database without needing to know
where the data is physically stored.

O
○ Replication Transparency: Users do not need to know if data is replicated across multiple
nodes; the system automatically handles replication behind the scenes.

D
○ Fragmentation Transparency: If data is fragmented and stored across different locations, the
system ensures that users can access it as if it were stored in one place.

N
3. Replication:
○ Data can be replicated across multiple sites to improve availability and fault tolerance. In the
event of a failure at one site, another site can continue to provide access to the replicated data.
By
4. Fragmentation:
○ Horizontal Fragmentation: Data is divided by rows, with each fragment containing a subset
of rows from a table. For example, sales data for different regions can be stored in different
locations.
L

○ Vertical Fragmentation: Data is divided by columns, with each fragment containing a subset
of columns from a table. For example, customer names may be stored at one site, and customer
SQ

addresses at another.
5. Autonomy:
○ Each node in the distributed system can function independently. In some systems, nodes can
execute queries or updates without needing to communicate with the entire network.
&

6. High Availability:
○ A distributed database system provides better availability, since if one node fails, other nodes
can continue to operate. This is especially useful in systems requiring 24/7 uptime.
B

Example of Distributed Database:


D

● Amazon DynamoDB: A NoSQL distributed database used by Amazon to manage data across multiple
geographical regions. It ensures low latency and high availability by distributing data across multiple
servers.

2. Distributed Processing

DB & SQL by Mr. Ndorrh Oswald E. 2024/2023


54
Distributed processing refers to the use of multiple computing nodes to divide and execute tasks
simultaneously. In the context of distributed databases, this means that the workload (query execution, data
analysis, transaction management, etc.) is distributed across multiple nodes, improving efficiency and
scalability.

Characteristics of Distributed Processing:

1. Parallel Execution:

H
○ In distributed processing, tasks are divided into smaller units that can be executed in parallel on
multiple nodes, thus reducing the overall processing time.

RR
○ Example: A complex query can be split into smaller subqueries, each processed by a different
node in the distributed system. The results are then combined and presented to the user.
2. Data Localization:
○ By processing data locally on the node where it is stored, distributed processing reduces the

O
need to transfer large amounts of data across the network, thus improving performance.
3. Load Balancing:

D
○ Distributed processing ensures that the workload is evenly distributed across all nodes,
preventing any one node from becoming a bottleneck.

N
○ Example: In a distributed system, queries can be routed to the node with the least load or the
node closest to the user, reducing latency and improving performance.
4. Fault Tolerance:
By
○ In a distributed system, if one node fails, the processing tasks can be reassigned to other nodes,
ensuring that the system continues to function without interruption.
5. Scalability:
○ Distributed processing allows systems to scale horizontally, meaning additional nodes can be
L

added to the system as the workload increases. This makes it possible to handle large datasets
and high volumes of transactions without degrading performance.
SQ

6. Concurrency:
○ Distributed systems handle multiple tasks simultaneously, often involving concurrent updates
or queries on different parts of the distributed database. Concurrency control mechanisms are
essential to ensure that simultaneous transactions do not lead to inconsistencies.
&

Distributed Processing Frameworks:

● Apache Hadoop: A distributed processing framework that allows for the distributed storage and
B

processing of large datasets. It uses the MapReduce programming model to divide tasks across
multiple nodes and process them in parallel.
D

● Apache Spark: A distributed processing engine that supports large-scale data processing and
analytics. It is known for its ability to handle both batch and real-time processing tasks across
distributed environments.

Example of Distributed Processing in Action:

DB & SQL by Mr. Ndorrh Oswald E. 2024/2023


55
● Google BigQuery: A cloud-based data warehouse that supports distributed processing for large-scale
data analytics. Queries submitted to BigQuery are divided into smaller tasks, each executed on
different nodes, with the results aggregated and returned to the user.

Benefits of Distributed Databases and Distributed Processing

H
1. Improved Performance:
○ By distributing data and processing tasks across multiple nodes, the system can handle more

RR
queries and transactions simultaneously, significantly improving performance for large datasets
or high-traffic applications.
2. Fault Tolerance and High Availability:
○ Distributed databases and distributed processing systems are designed to continue operating

O
even if individual nodes fail, ensuring that data is always available.
3. Scalability:

D
○ As the data grows, more nodes can be added to the system to maintain performance, making
distributed systems highly scalable. This is essential for applications with dynamic and

N
growing workloads.
4. Data Localization and Reduced Latency:
○ By storing and processing data closer to the users or applications, distributed systems reduce
By
latency and minimize the need to transfer large amounts of data across the network.
5. Geographical Distribution:
○ Distributed databases allow data to be stored across different geographical locations, ensuring
compliance with local regulations, faster access for users in different regions, and improved
L

disaster recovery capabilities.


6. Concurrent Processing:
SQ

○ Distributed processing enables multiple tasks to be executed in parallel, speeding up complex


data processing jobs and analytics.
&

Challenges of Distributed Databases and Processing

1. Complexity:
B

○ Distributed systems are more complex to design, manage, and maintain than centralized
systems. Issues such as data consistency, synchronization, and communication between nodes
D

need to be carefully handled.


2. Data Consistency:
○ Maintaining data consistency across distributed nodes is challenging, especially when there are
network failures or delays. Distributed databases must implement consistency models (e.g.,
eventual consistency, strong consistency) to address this issue.
3. Network Latency and Bandwidth:

DB & SQL by Mr. Ndorrh Oswald E. 2024/2023


56
○ The performance of distributed systems depends on the quality of the network connecting the
nodes. Poor network performance can lead to delays in data processing and communication
between nodes.
4. Concurrency Control:
○ Managing concurrent access to data in distributed databases requires sophisticated concurrency
control mechanisms to avoid conflicts, such as deadlocks or race conditions.
5. Data Security:
○ With data stored across multiple locations, ensuring security and compliance becomes more

H
difficult. Encryption, access control, and auditing are essential in distributed environments.

RR
Example of Distributed Database and Processing Setup:

O
Example of Setting Up Database Replication (MySQL):

D
N
By
L
SQ
&
B
D

DB & SQL by Mr. Ndorrh Oswald E. 2024/2023


57
Example of Distributed Query in Apache Hadoop (MapReduce Model):

H
RR
O
D
Example of Distributed Query in Apache Hadoop (MapReduce Model):

// Mapper class in Hadoop MapReduce

N
public class TokenizerMapper extends Mapper<Object, Text, Text,
IntWritable>{
By
public void map(Object key, Text value, Context context) throws
IOException, InterruptedException {
L

StringTokenizer itr = new StringTokenizer(value.toString());


SQ

while (itr.hasMoreTokens()) {

word.set(itr.nextToken());
&

context.write(word, one);

}
B

}
D

DB & SQL by Mr. Ndorrh Oswald E. 2024/2023


58
Summary: Distributed Database and Distributed Processing

● Distributed Database: A system where data is stored across multiple locations, with replication and
fragmentation improving performance, availability, and fault tolerance.
○ Benefits: High availability, scalability, fault tolerance, and data localization.
○ Challenges: Complexity, data consistency, network issues, and security.
● Distributed Processing: Tasks are divided across multiple nodes for parallel execution, reducing
processing time and improving system scalability.

H
○ Benefits: Faster query execution, concurrent task handling, and improved performance.
○ Challenges: Managing concurrency, network latency, and ensuring consistency.

RR
Distributed databases and processing are essential for building scalable, fault-tolerant systems in today's
distributed and cloud environments.

O
D
Auditing and Optimization in Database Administration

Auditing and optimization are two crucial aspects of database administration that ensure the database

N
operates securely, efficiently, and with high performance. Auditing focuses on tracking and recording database
activities to monitor security and compliance, while optimization deals with enhancing the database's
performance, ensuring that queries, storage, and resources are used efficiently.
By
1. Auditing
L

Auditing in database administration refers to the process of tracking and recording database operations and
SQ

activities to ensure security, compliance, and transparency. Auditing helps detect unauthorized access, unusual
behavior, and potential security breaches, providing a trail of activities that can be reviewed in the event of a
security incident or system failure.
&

Key Elements of Database Auditing:

1. User Activity Monitoring:


○ Auditing tracks who accessed the database, when, and what actions they performed. This can
B

include login attempts, queries executed, changes to data, and updates to database structures.
○ Example: Monitoring which users modified sensitive financial records in the database.
D

2. Data Access Auditing:


○ This type of auditing tracks which users accessed specific data, especially sensitive information
like personal details, credit card numbers, or confidential business records. It ensures that only
authorized users are accessing protected data.

Example:

DB & SQL by Mr. Ndorrh Oswald E. 2024/2023


59

3. Schema Change Auditing:

● Audits changes made to the database structure, such as adding or dropping tables, columns, indexes, or

H
constraints. This is essential for ensuring that changes to the schema are authorized and correctly
executed.

RR
Example: Tracking who added or removed columns from the Employees table in a payroll database.

4. Transaction Auditing:

O
● Audits capture detailed information about data modification activities, such as inserts, updates, and
deletes. This helps maintain an audit trail of all changes made to the data, including who made the

D
change, when it was made, and what was changed.

N
Example:
By
L
SQ
&

5. Compliance Auditing:
B

● Ensures that the database adheres to regulations such as GDPR, HIPAA, or PCI-DSS by tracking data
D

access and management activities related to sensitive or regulated data. Auditing can prove
compliance by showing that policies are being followed.

6. Security Auditing:

DB & SQL by Mr. Ndorrh Oswald E. 2024/2023


60
● Monitors activities like login attempts, role or privilege changes, and any actions performed by
privileged users (e.g., database administrators). This type of auditing is crucial for detecting potential
security risks, such as unauthorized attempts to access or modify sensitive data.

Example:

● Auditing failed login attempts to detect potential brute-force attacks.

H
7. Audit Trails:

● An audit trail is a chronological record of all actions performed within the database, providing detailed

RR
information for post-incident investigations. Audit logs can include details like the user’s IP address,
timestamps, the SQL statement executed, and the outcome (success or failure).

Example:

O
D
N
By
Benefits of Auditing:

● Security and Compliance: Helps track unauthorized activities and ensures that regulatory requirements
(e.g., GDPR, HIPAA) are met by maintaining detailed logs of who accessed or modified data.
L

● Fraud Detection: By monitoring changes to financial records or sensitive data, auditing can detect
SQ

potential fraud or data tampering.


● Incident Investigation: Audit trails are invaluable for investigating and understanding the sequence of
events leading to a security breach or system failure.
● Accountability: Users are held accountable for their actions in the database, ensuring that all
modifications or accesses are traceable to a specific individual or process.
&

2. Optimization
B

Optimization in database administration involves improving the performance, efficiency, and scalability of
the database. DBAs focus on query optimization, storage optimization, and resource management to ensure
D

the database runs smoothly and can handle growing workloads without degradation in performance.

Key Aspects of Database Optimization:

1. Query Optimization:

DB & SQL by Mr. Ndorrh Oswald E. 2024/2023


61
○ Query optimization is the process of improving the performance of SQL queries by ensuring
that they are executed in the most efficient way possible. This involves analyzing query
execution plans, indexing strategies, and writing efficient SQL code.

Techniques for Query Optimization:

○ Indexing: Properly indexing the database tables allows the query optimizer to quickly locate
the data, reducing the need to perform full table scans.

H
○ Execution Plans: Examining the query execution plan helps DBAs understand how the
database engine is executing a query and identify inefficiencies such as unnecessary joins or

RR
scans.
Example:

O
D
● Joins Optimization: Optimizing how joins are handled (e.g., using INNER JOIN instead of LEFT
JOIN when appropriate) can significantly reduce query processing time.

N
● **Avoiding SELECT ***: Using SELECT * returns all columns, which can be resource-intensive.
Selecting only the required columns reduces the amount of data being processed.
By
Example:
L

2. Index Optimization:
SQ

● Indexing improves the speed of data retrieval by allowing the database to find rows more efficiently.
However, creating too many indexes or poorly designed indexes can slow down write operations
(insert, update, delete).
&

Best Practices:

● Use Composite Indexes: Create composite indexes on columns frequently used together in WHERE
B

clauses.
● Avoid Redundant Indexes: Index only the necessary columns to avoid excessive maintenance
D

overhead.
● Periodically Rebuild Indexes: Rebuilding fragmented indexes improves performance by reorganizing
the data stored in them.

Example:

DB & SQL by Mr. Ndorrh Oswald E. 2024/2023


62

3. Normalization and Denormalization:

● Normalization: Ensures that the database schema is designed to minimize data redundancy and

H
improve data integrity. However, excessive normalization can sometimes lead to performance issues,
especially with complex joins.

RR
● Denormalization: In some cases, denormalizing the database (introducing some redundancy) can
improve performance by reducing the need for joins.

Example:

O
● A highly normalized database may require several joins for a simple query. Denormalizing by
duplicating certain fields could simplify queries and improve performance.

D
4. Caching:

N
● Caching involves storing the results of frequently executed queries or parts of queries in memory,
reducing the need to retrieve the same data from disk repeatedly.
By
● Database-level Caching: Some databases offer built-in caching mechanisms for frequently queried
data.
● Application-level Caching: Tools like Redis or Memcached can be used to cache query results at the
application layer, reducing the load on the database.
L

Example: Caching the result of a frequent query in a distributed cache like Redis to avoid querying the
SQ

database multiple times.

5. Partitioning:

● Partitioning involves dividing large tables into smaller, more manageable pieces, which can improve
&

query performance, reduce disk space usage, and simplify management.


● Horizontal Partitioning: Divides a table into multiple partitions based on row data. For example,
partitioning a sales table by year.
B

● Vertical Partitioning: Divides a table into partitions based on columns. For example, storing
frequently queried columns separately from less frequently queried ones.
D

Example:

DB & SQL by Mr. Ndorrh Oswald E. 2024/2023


63

H
RR
5. Resource Management:

O
● Memory Allocation: Ensuring that the database has adequate memory (e.g., buffer cache, sort area) to
handle queries efficiently.

D
● Connection Pooling: Reduces the overhead of establishing new connections by reusing existing
database connections.

N
● CPU and Disk Usage: Monitoring and optimizing CPU and disk usage to prevent bottlenecks in
database performance.
By
● Load Balancing: Distributing query and transaction load across multiple database servers or shards to
avoid overwhelming a single server.

6. Query Execution Plan Analysis:


L

● DBAs can analyze query execution plans to identify inefficiencies in how queries are processed. Most
modern databases provide tools to examine and optimize execution plans.
SQ

Example in PostgreSQL:
&

8. Database Maintenance Tasks:


B

● Regular database maintenance is essential for optimizing performance. This includes tasks such as:
D

○ Vacuuming (PostgreSQL): Removes dead rows from

Auditing Questions

1. Define database auditing and explain its importance in database administration. (5 Marks)

2. What are the key elements of database auditing? Explain each element briefly. (10 Marks)

DB & SQL by Mr. Ndorrh Oswald E. 2024/2023


64
3. Consider a database where sensitive customer data, such as credit card numbers, is stored. Describe how you
would implement auditing to monitor who accesses this sensitive data. (5 Marks)

4. Write an SQL query to create an audit trail that logs any UPDATE or DELETE operation performed on the
Employees table. (5 Marks)

5. Explain the concept of compliance auditing and provide an example of a scenario where it would be essential.
(5 Marks)

H
6. Discuss how auditing helps in detecting fraud and ensuring accountability in a database system. (5 Marks)

RR
7. Explain the difference between data access auditing and schema change auditing. Provide an example where
each would be useful. (5 Marks)

8. Write an SQL statement to enable general query logging for auditing purposes in MySQL. (5 Marks)

O
9. How does database auditing contribute to meeting regulatory requirements such as GDPR and HIPAA? (10
Marks)

D
10. Describe the concept of audit trail and explain its importance in security incident investigations. (5 Marks)

N
By
Optimization Questions

1. What is query optimization? Explain why it is important in database performance. (5 Marks)

2. What role does indexing play in query optimization? Provide an example of how creating an index on a table
L

can improve query performance. (5 Marks)


SQ

3. Consider the following query:

sql

Copy code
&

SELECT * FROM Orders WHERE Order_Date BETWEEN '2024-01-01' AND


'2024-12-31';
B
D

● Explain how you would optimize this query to improve its performance. (5 Marks)

4. What is the purpose of analyzing a query execution plan? Provide an example of how it can help identify
inefficiencies in query processing. (5 Marks)

5. Describe the difference between horizontal partitioning and vertical partitioning in database optimization.
Provide an example scenario where each would be appropriate. (10 Marks)

DB & SQL by Mr. Ndorrh Oswald E. 2024/2023


65
6. Write an SQL query that creates a composite index on the Customers table to improve the performance of
queries that filter by both Customer_Name and City. (5 Marks)

7. Discuss the trade-offs between normalization and denormalization in terms of database optimization. (5
Marks)

8. What is caching in the context of database optimization? How does caching improve database performance? (5
Marks)

H
9. Explain the importance of regularly rebuilding fragmented indexes. What are the potential effects of not
maintaining indexes in a heavily used database? (5 Marks)

RR
10. How does connection pooling improve database performance? Explain how it works in terms of reducing the
overhead of database connections. (5 Marks)

O
D
N
By
L
SQ
&
B
D

DB & SQL by Mr. Ndorrh Oswald E. 2024/2023

You might also like