Database For Exit Exam
Database For Exit Exam
Tutorials Module
Prepared by
March, 2023
Manual Approach
In the manual approach, data storage and retrieval follows the primitive and traditional way of
information handling where cards and paper are used for the purpose. The data storage and retrieval
will be performed using human labor.
Files for as many event and objects as the organization has are used to store information.
Each of the files containing various kinds of information is labeled and stored in one ore more
cabinets.
The cabinets could be kept in safe places for security purpose based on the sensitivity of the
information contained in it.
Insertion and retrieval is done by searching first for the right cabinet then for the right the file
then the information.
One could have an indexing system to facilitate access to the data
Limitations of the Manual approach
✓ Prone to error.
✓ Difficult to update, retrieve, integrate.
✓ You have the data but it is difficult to compile the information.
✓ Limited to small size information.
✓ Cross referencing is difficult
An alternative approach of data handling is a computerized way of dealing with the information. The
computerized approach could also be either decentralized or centralized base on where the data
resides in the system.
Traditional File Based Approach
There were, and still are, several computer applications with file based processing used for the
purpose of data handling. Even though the approach evolved over time, the basic structure is still
similar if not identical.
File based systems were an early attempt to computerize the manual filing system.
This approach is the decentralized computerized data handling method.
A collection of application programs perform services for the end-users. In such systems, every
application program that provides service to end users defines and manages its own data.
Such systems have number of programs for each of the different applications in the
organization.
Since every application defines and manages its own data, the system is subjected to serious
data duplication problem.
File, in traditional file based approach, is a collection of records which contains logically related
data.
File Handling
Data Entry Routines
and Reports
File Definition
Sales Sales Files
Sales Application Programs
File Handling
Data Entry Routines
and Reports
File Definition
Contracts
Contracts Contracts Application Programs Files
Limitations of the Traditional File Based approach
As business application become more complex demanding more flexible and reliable data handling
methods, the shortcomings of the file based system became evident. These shortcomings include, but
not limited to:
Separation or Isolation of Data: Available information in one application may not be known.
Limited data sharing.
Lengthy development and maintenance time.
Duplication or redundancy of data.
Data dependency on the application.
Incompatible file formats between different applications and programs creating inconsistency.
Fixed query processing which is defined during application development.
The limitations for the traditional file based data handling approach arise from two basic reasons.
1. Definition of the data is embedded in the application program which makes it difficult to
modify the database definition easily.
2. No control over the access and manipulation of the data beyond that imposed by the
application programs.
The most significant problem experienced by the traditional file based approach of data handling is
the “update anomalies”. We have three types of update anomalies;
1. Modification Anomalies: a problem experienced when one or more data value is modified on
one application program but not on others containing the same data set.
2. Deletion Anomalies: a problem encountered where one record set is deleted from one
application but remain untouched in other application programs.
3. Insertion Anomalies: a problem experienced whenever there is new data item to be recorded,
and the recording is not made in all the applications. And when same data item is inserted at
different applications, there could be errors in encoding which makes the new data item to
be considered as a totally different object.
Database Approach
Queries could be expressed in a very high-level language, which greatly increased the efficiency of
database programmers. The database approach emphasizes the integration and sharing of data
throughout the organization.
Thus in Database Approach:
✓ Database is just a computerized record keeping system or a kind of electronic filing cabinet.
✓ Database is a repository for collection of computerized data files.
✓ Database is a shared collection of logically related data designed to meet the information
needs of an organization. Since it is a shared corporate resource, the database is integrated
with minimum amount of or no duplication.
✓ Database is a collection of logically related data where these logically related data comprises
entities, attributes, relationships, and business rules of an organization's information.
Data Dictionary:
Due to the fact that a database is a self describing system, this tool, Data Dictionary, is
used to store and organize information about the data stored in the database.
The DBMS environment has five components. To design and use a database, there will be the
interaction or integration of Hardware, Software, Data, Procedure and People.
1. Hardware: are components that one can touch and feel.
2. Software: are collection of commands and programs used to manipulate the hardware to
perform a function.
3. Data: Operational and Metadata.
4. Procedure: this is the rules and regulations on how to design and use a database.
5. People: the people that are responsible or play a role in designing, implementing, managing,
administering and using the resources in the database.
Roles in Database Design and Use
1. Database Administrator (DBA)
Responsible to oversee, control and manage the database resources (the database itself, the
DBMS and other related software).
Authorizing access to the database.
Coordinating and monitoring the use of the database.
Responsible for determining and acquiring hardware and software resources.
Accountable for problems like poor security, poor performance of the system.
Involves in all steps of database development
We can have further classifications of this role in big organizations having huge amount of data and
user requirement.
1. Data Administrator (DA): is responsible on management of data resources. It involves in
database planning, development, maintenance of standards policies and procedures at the
conceptual and logical design phases.
2. Database Administrator (DBA): is more technically oriented role. He/she is Responsible
for the physical realization of the database. It involves in physical design, implementation,
security and integrity control of the database.
2. Database Designer (DBD)
Identifies the data to be stored and choose the appropriate structures to represent and store the
data.
Should understand the user requirement and should choose how the user views the database.
Involve on the design phase before the implementation of the database system.
We have two distinctions of database designers, one involving in the logical and conceptual design
and another involving in physical design.
1. Logical and Conceptual DBD
Identifies data (entity, attributes and relationship) relevant to the organization.
Identifies constraints on each data.
Understand data and business rules in the organization.
Sees the database independent of any data model at conceptual level and consider
one specific data model at logical design phase.
2. Physical DBD
Take logical design specification as input and decide how it should be physically
realized.
Map the logical data model on the specified DBMS with respect to tables and
integrity constraints. (DBMS dependent designing).
Select specific storage structure and access path to the database.
Design security measures required on the database.
3. Application Programmer and Systems Analyst
System analyst determines the user requirement and how the user wants to view the database.
The application programmer implements these specifications as programs; code, test, debug,
document and maintain the application program.
Determines the interface on how to retrieve, insert, update and delete data in the database.
The application could use any high level programming language according to the availability,
the facility and the required service.
4. End Users
Workers, whose job requires accessing the database frequently for various purposes, there are
different group of users in this category.
1. Naive Users:
Sizable proportion of users.
Unaware of the DBMS.
Only access the database based on their access level and demand.
Use standard and pre-specified types of queries.
2. Sophisticated Users
Are users familiar with the structure of the Database and facilities of the DBMS?
Have complex requirements.
Have higher level queries.
Are most of the time engineers, scientists, business analysts, etc.
3. Casual Users
Users who access the database occasionally.
Need different information from the database each time.
Use sophisticated database queries to satisfy their needs.
Are most of the time middle to high level managers.
Chapter Two
ANSI-SPARC Architecture
The database-planning phase begins when a customer requests to develop a database project. During
planning phase, four major activities are performed.
• Review and approve the database project request.
• Prioritize the database project request.
• Allocate resources such as money, people and tools.
• Arrange a development team to develop the database project.
Database planning should also include the development of standards that govern how data will be
collected, how the format should be specified, what necessary documentation will be needed.
Requirements Analysis
Requirements analysis is done in order to understand the problem, which is to be solved. It is very
important activity for the development of database system. The person responsible for the
requirements analysis is often called "Analyst".
There are two major activities in requirements analysis.
• Problem understanding or analysis
• Requirement specifications.
Design
The database design is the major phase of information engineering. In this phase, the information
models that were developed during analysis are used to design a conceptual schema for the database
and to design transaction and application.
• In conceptual schema design, the data requirements collected in Requirement Analysis phase
are examined and a conceptual database schema is produced.
• In transaction and application design, the database applications analyzed in Requirement
Analysis phase are examined and specifications of these applications are produced. There are
two major steps in design phase:
Database Design
Process Design
Conceptual Conceptual
level Schema
Internal Internal
Level Schema
Physical Data
Organization Data Base
Internal Schema
Physical Schema
External Level: Users' view of the database. It describes that part of database that is relevant to a
particular user. Different users have their own customized view of the database independent of
other users.
Conceptual Level: Community view of the database. It describes what data is stored in database and
relationships among the data.
Internal Level: Physical representation of the database on the computer. It describes how the data is
stored in the database.
Conceptual level
Staff_No FName LName DOB Salary Bno
Internal level
Struct STAFF
{ Int Staff_No;
Int Branch_No;
Char FName[15];
Char LName[15];
Date Date_of_Birth;
Float Salary;
Strcut STAFF *next;
};
Data Independence
Logical Data Independence:
Refers to immunity of external schemas to changes in conceptual schema.
Conceptual schema changes e.g. addition/removal of entities should not require changes to
external schema or rewrites of application programs.
The capacity to change the conceptual schema without having to change the external schemas
and their application programs.
The ability to modify the physical schema without changing the logical schema
Applications depend on the logical schema.
In general, the interfaces between the various levels and components should be well defined so
that changes in some parts do not seriously influence others.
The capacity to change the internal schema without having to change the conceptual schema.
Refers to immunity of conceptual schema to changes in the internal schema.
Internal schema changes e.g. using different file organizations, storage structures/devices should
not require change to conceptual or external schemas.
External / Conceptual
Mapping Logical Data independency
Conceptual
Schema
Conceptual / Internal
Mapping Physical Data independency
Internal
Schema
Examples: CREATE DATABASE database_name; This query will create a new database in
SQL and name the database as the provided
❖ ALTER፡ Alter command is used to modify the existing database objects.
✓ TCL commands are used with DML Commands only. B/c DDL Commands automatically saves
the state of the data.
✓ COMMIT command: is a TCL command used to save changes invoked by a transaction to the
database. It saves all the transactions or changes to the database since the last COMMIT or
ROLLBACK command. After COMMIT the changes cannot be undone.
To perform any operation in the database, such as for creating tables, sequences or views, a user
needs privilege. Privileges are of two types:
1) System: This includes permissions for creating sessions, tables, etc and all types of other
system privileges.
2) Object: This includes permissions for any command or query to perform any operation on
the database tables.
❖ GRANT: This command is used to give access or permission to specific users or roles on
specific database objects like table, view, etc.
Chapter Three
Data Model
Data Model: a set of concepts to describe the structure of a database, and certain constraints that the
database should obey.
A Data model is a description of the way that data is stored in a database. Data model helps to
understand the relationship between entities and to create the most effective structure to hold data.
Data Model is a collection of tools or concepts for describing
✓ Data ✓ Data semantics
✓ Data relationships ✓ Data constraints
The main purpose of Data Model is to represent the data in an understandable way.
Categories of data models include:
Object-based
Record-based
Physical
Department
Employee Job
Time Card
Activity
Department Job
Employee Activity
Time Card
Advantages of Network Data Model
Network Model is able to model complex relationships and represents semantics of add/delete
on the relationships.
Can handle most situations for modeling using record types and relationship types.
Language is navigational; uses constructs like FIND, FIND member, FIND owner, FIND
NEXT within set, GET etc. Programmers can do optimal navigation through the database.
Disadvantages of Network Data Model
Navigational and procedural nature of processing
Database contains a complex array of pointers that thread through a set of records.
Little scope for automated "query optimization”
Relational Data Model
Can define more flexible and complex relationship.
Viewed as a collection of tables called “Relations” equivalent to collection of record types.
Relation Two dimensional table.
Stores information or data in the form of table’s → rows and columns.
Records are related by the data stored jointly in the fields of records in two tables or files. The
related tables contain information that creates the relation.
Alternative terminologies
Relation Table File
Tuple Row Record
Attribute Column Field
Chapter Four
Types of Attributes
(1) Simple (atomic) Vs Composite attributes
✓ Simple : contains a single value (not divided into sub parts)
E.g. Age, gender
✓ Composite: Divided into sub parts (composed of other attributes)
E.g. Name, address
(2) Single-valued Vs multi-valued attributes
✓ Single-valued: have only single value (the value may change but has only one value at
one time).
E.g. Name, Sex, Id. No. color_of_eyes
✓ Multi-Valued: have more than one value.
E.g. Address, dependent-name
Person may have several college degrees
(3) Stored vs. Derived Attribute
✓ Stored: not possible to derive or compute.
E.g. Name, Address
✓ Derived: The value may be derived or computed from the values of other attributes.
E.g. Age (current year – year of birth)
Length of employment (current date- starts date)
Profit (earning-cost)
G.P.A (grade point/credit hours)
(4) Null Values
✓ NULL applies to attributes which are not applicable or which do not have values.
✓ You may enter the value NA (meaning not applicable).
✓ Value of a key attribute can not be null.
Default value - assumed value if no explicit value.
Degree of a Relationship
An important point about a relationship is how many entities participate in it. The number of
entities participating in a relationship is called the Degree of the relationship.
Among the Degrees of relationship, the following are the basic:
✓ Unary/Recursive Relationship: Tuples/records of a Single entity are related withy each
other.
✓ Binary Relationships: Tuples/records of two entities are associated in a relationship.
✓ Ternary Relationship: Tuples/records of three different entities are associated.
✓ And a generalized one N-Nary Relationship: Tuples from arbitrary number of entity sets are
participating in a relationship.
Cardinality of a Relationship
Another important concept about relationship is the number of instances/tuples that can be
associated with a single instance from one entity in a single relationship. The number of
instances participating or associated with a single instance from an entity in a relationship is
called the Cardinality of the relationship. The major cardinalities of a relationship are:
✓ ONE-TO-ONE: one tuple is associated with only one other tuple.
➢ E.g. Building – Location → as a single building will be located in a single location and
as a single location will only accommodate a single Building.
✓ ONE-TO-MANY, one tuple can be associated with many other tuples, but not the reverse.
➢ E.g. Department-Student→ as one department can have multiple students.
✓ MANY-TO-ONE, many tuples are associated with one tuple but not the reverse.
➢ E.g. Employee – Department → as many employees belong to a single department.
✓ MANY-TO-MANY: one tuple is associated with many other tuples and from the other side,
with a different role name one tuple will be associated with many tuples
➢ E.g. Student – Course → as a student can take many courses and a single course can be
attended by many students.
4. Relational Constraints/Integrity Rules
Relational Integrity
✓ Domain Integrity: No value of the attribute should be beyond the allowable limits.
✓ Entity Integrity: In a base relation, no attribute of a Primary Key can assume a value of NULL.
✓ Referential Integrity: If a Foreign Key exists in a relation, either the Foreign Key value must
match a Candidate Key value in its home relation or the Foreign Key value must be NULL.
✓ Enterprise Integrity: Additional rules specified by the users or database administrators of a
database are incorporated.
Key constraints
If tuples are need to be unique in the database, and then we need to make each tuple distinct. To do
this we need to have relational keys that uniquely identify each relation.
✓ Super Key: an attribute or set of attributes that uniquely identifies a tuple within a relation.
✓ Candidate Key: a super key such that no proper subset of that collection is a Super Key
within the relation.
A candidate key has two properties:
1. Uniqueness
2. Irreducibility
If a super key is having only one attribute, it is automatically a Candidate key.
If a candidate key consists of more than one attribute it is called Composite Key.
✓ Primary Key: the candidate key that is selected to identify tuples uniquely within the relation.
The entire set of attributes in a relation can be considered as a primary case in a worst case.
✓ Foreign Key: an attribute, or set of attributes, within one relation that matches the candidate
key of some relation.
A foreign key is a link between different relations to create the view or the unnamed relation
4.3. Relational Views
Relations are perceived as a Table from the users’ perspective. Actually, there are two kinds of
relation in relational database. The two categories or tyapes of Relations are Named and Unnamed
Relations. The basic difference is on how the relation is created, used and updated:
Base Relation
A Named Relation corresponding to an entity in the conceptual schema, whose tuples are
physically stored in the database.
View (Unnamed Relation)
A View is the dynamic result of one or more relational operations operating on the base
relations to produce another virtual relation that does not actually exist as presented. So a view
is virtually derived relation that does not necessarily exist in the database but can be produced
upon request by a particular user at the time of request. The virtual table or relation can be
created from single or different relations by extracting some attributes and records with or
without conditions.
Purpose of a view
➢ Hides unnecessary information from users: since only part of the base relation (Some collection
of attributes, not necessarily all) are to be included in the virtual table.
➢ Provide powerful flexibility and security: since unnecessary information will be hidden from
the user there will be some sort of data security.
➢ Provide customized view of the database for users: each user is going to be interfaced with their
own preferred data set and format by making use of the Views.
➢ A view of one base relation can be updated.
➢ Update on views derived from various relations is not allowed since it may violate the integrity
of the database.
➢ Update on view with aggregation and summary is not allowed. Since aggregation and summary
results are computed from a base relation and does not exist actually.
Schemas and Instances and Database State
When a database is designed using a Relational data model, all the data is represented in a form of a
table. In such definitions and representation, there are two basic components of the database. The
two components are the definition of the Relation or the Table and the actual data stored in each
table. The data definition is what we call the Schema or the skeleton of the database and the
Relations with some information at some point in time is the Instance or the flesh of the database.
Schemas
✓ Schema describes how data is to be structured, defined at setup/Design time (also called
"metadata").
✓ Since it is used during the database development phase, there is rare tendency of changing the
schema unless there is a need for system maintenance which demands change to the definition
of a relation.
✓ Database Schema (Intension): specifies name of relation and the collection of the attributes
(specifically the Name of attributes).
➢ Refer to a description of database (or intention).
➢ Specified during database design.
➢ Should not be changed unless during maintenance.
✓ Schema Diagrams
➢ Convention to display some aspect of a schema visually.
✓ Schema Construct
➢ Refers to each object in the schema (e.g. STUDENT).
➢ E.g. STUNEDT (FName, LName, Id, Year, Dept, Sex)
Instances
✓ Instance: is the collection of data in the database at a particular point of time (snap-shot).
➢ Also called State or Snap Shot or Extension of the database.
➢ Refers to the actual data in the database at a specific point in time.
➢ State of database is changed any time we add, delete or update an item.
➢ Valid state: the state that satisfies the structure and constraints specified in the schema
and is enforced by DBMS.
✓ Since Instance is actual data of database at some point in time, changes rapidly.
✓ To define a new database, we specify its database schema to the DBMS (database is empty).
✓ Database is initialized when we first load it with data.
Chapter Five
Database design
Database design is the process of coming up with different kinds of specification for the data to be
stored in the database.
Information System with Database application consists of several tasks which include:
✓ Planning of Information systems Design
✓ Requirements Analysis,
✓ Design (Conceptual, Logical and Physical Design)
✓ Testing
✓ Implementation
✓ Operation and Support
From these different phases, the prime interest of a database system will be the Design part which
is again sub divided into other three sub-phases.
These sub-phases are:
➢ Conceptual Design
➢ Logical Design, and
➢ Physical Design
✓ In general, one has to go back and forth between these tasks to refine a database design, and
decisions in one task can influence the choices in another task.
Conceptual Design
Logical Design
Physical Design
CompositAttribute
Composite Attribute
Attribute Multi-Valued Attribute
Key
Age
Enrolled_in
Acedamic_year Semester
Grade
1..1 0..1
Employee Manages Branch
One-To-Many Relationships
✓ In the one-to-many relationship a loan is associated with at most one customer via borrower, a
customer is associated with several (including 0) loans via borrower
Many-To-Many Relationship
✓ A customer is associated with several (possibly 0) loans via borrower.
✓ A loan is associated with several (possibly 0) customers via borrower.
0..* 0..*
Instructor Teaches Course
Normalization
Normalization is the process of identifying the logical associations between data items and
designing a database that will represent such associations but without suffering the update
anomalies which are;
1. Insertion Anomalies
2. Deletion Anomalies
3. Modification Anomalies
Normalization may reduce system performance since data will be cross referenced from many
tables. Thus de-normalization is sometimes used to improve performance, at the cost of reduced
consistency guarantees.
Normalization normally is considered as good if it is lossless decomposition.
Insertion anomalies
An "insertion anomaly" is a failure to place information about a new database entry into all the
places in the database where information about that new entry needs to be stored. In a properly
normalized database, information about a new entry needs to be inserted into only one place in
the database; in an inadequately normalized database, information about a new entry may need
to be inserted into more than one place and, human fallibility being what it is, some of the
needed additional insertions may be missed.
Deletion anomalies
A "deletion anomaly" is a failure to remove information about an existing database entry when
it is time to remove that entry. In a properly normalized database, information about an old, to-
be-gotten-rid-of entry needs to be deleted from only one place in the database; in an
inadequately normalized database, information about that old entry may need to be deleted from
more than one place, and, human fallibility being what it is, some of the needed additional
deletions may be missed.
Modification anomalies
A modification of a database involves changing some value of the attribute of a table. In a
properly normalized database table, whatever information is modified by the user, the change
will be effected and used accordingly.
The purpose of normalization is to reduce the chances for anomalies to occur in a database.
Data Dependency
The logical associations between data items that point the database designer in the direction of a
good database design are referred to as determinant or dependent relationships.
Two data items A and B are said to be in a determinant or dependent relationship if certain values
of data item B always appears with certain values of data item A. if the data item A is the
determinant data item and B the dependent data item then the direction of the association is from A
to B and not vice versa.
X → Y holds if whenever two tuples have the same value for X, they must have the same value for Y
The notation is: A→B which is read as; B is functionally dependent on A.
In general, a functional dependency is a relationship among attributes. In relational databases, we
can have a determinant that governs one other attribute or several other attributes.
FDs are derived from the real-world constraints on the attributes.
Example
Dinner Type of Wine
Meat Red
Fish White
Cheese Rose
Since the type of Wine served depends on the type of Dinner, we say Wine is functionally
dependent on Dinner.
Dinner → Wine
Dinner Type of Wine Type of Fork
Meat Red Meat fork
Fish White Fish fork
Cheese Rose Cheese fork
Since both Wine type and Fork type are determined by the Dinner type, we say Wine is
functionally dependent on Dinner and Fork is functionally dependent on Dinner.
Dinner → Wine
Dinner → Fork
Partial Dependency
If an attribute which is not a member of the primary key is dependent on some part of the primary
key (if we have composite primary key) then that attribute is partially functionally dependent on the
primary key.
Let {A, B} is the Primary Key and C is no key attribute.
Then if {A, B} → C and B→ C or A→ C
Then C is partially functionally dependent on {A, B}
Full Dependency
If an attribute which is not a member of the primary key is not dependent on some part of the
primary key but the whole key (if we have composite primary key) then that attribute is fully
functionally dependent on the primary key.
Let {A, B} is the Primary Key and C is no key attribute
Then if {A, B} → C and B→ C and A→ C doesn’t hold (if B can not determine C and B can not
determine C)
Then C Fully functionally dependent on {A, B}
Transitive Dependency
In mathematics and logic, a transitive relationship is a relationship of the following form: "If A
implies B, and if also B implies C, then A implies C."
Example:
If Mr X is a Human, and if every Human is an Animal, then Mr X must be an Animal.
Generalized way of describing transitive dependency is that:
If A functionally governs B, AND
If B functionally governs C
THEN A functionally governs C
Provided that neither C nor B determines A i.e. (B /→ A and C /→ A)
In the normal notation:
{(A→B) AND (B→C)} ==> A→C provided that B /→ A and C /→ A
Steps of Normalization:
We have various levels or steps in normalization called Normal Forms. The level of complexity,
strength of the rule and decomposition increases as we move from one lower level Normal Form to
the higher.
A table in a relational database is said to be in a certain normal form if it satisfies certain
constraints.
Normal form below represents a stronger condition than the previous one
Normalization towards a logical design consists of the following steps:
Un-Normalized Form:
Identify all data elements
First Normal Form:
Find the key with which you can find all data
Second Normal Form:
Remove part-key dependencies. Make all data dependent on the whole key.
Third Normal Form
Remove non-key dependencies. Make all data dependent on nothing but the key.
For most practical purposes, databases are considered normalized if they adhere to third normal
form.
Requires that all column values in a table are atomic (e.g., a number is an atomic value, while a
list or a set is not). We have two ways of achieving this: -
1. Putting each repeating group into a separate table and connecting them with a primary key-
foreign key relationship.
2. Moving these repeating groups to a new row by repeating the common attributes. If so then
find the key with which you can find all data
Definition: a table (relation) is in 1NF
If
There are no duplicated rows in the table. Unique identifier.
Each cell is single-valued (i.e., there are no repeating groups).
Entries in a column (attribute, field) are of the same kind.
Example for First Normal form (1NF)
UNNORMALIZED
EmpID FName LName SkillID Skill SkillType School SchoolAdd Skill
level
12 tamiru tessema 2 SQL Database AU Sidist_killo 5
6 VB.6 Programming Helico Piazza 8
16 Lemma Alemu 5 C++ Programming NAC Saris 6
1 IP Programming Jimma Jimma_city 4
28 Mesfin Taye 2 SQL Database AAU Sidist_killo 10
65 Almaz Abera 2 SQL Database Helico Piazza 9
4 Prolog Programming Jimma Jimma_city 8
7 Java Programming AAU Sidist_killo 6
24 Teddy Tamiru 8 Oracle Database NAC Saris 5
94 Taye Gizaw 3 Cisco Networking AAU Sidist_killo 7
FIRST NORMAL FORM (1NF)
Remove all repeating groups. Distribute the multi-valued attributes into different rows and identify
a unique identifier for the relation so that is can be said is a relation in relational database.
EmpID FName LName SkillID Skill SkillType School SchoolAdd Skill level
12 tamiru tessema 2 SQL Database AU Sidist_killo 5
12 tamiru tessema 6 VB.6 Programming Helico Piazza 8
16 Lemma Alemu 5 C++ Programming NAC Saris 6
16 Lemma Alemu 1 IP Programming Jimma Jimma_city 4
28 Mesfin Taye 2 SQL Database AAU Sidist_killo 10
65 Almaz Abera 2 SQL Database Helico Piazza 9
65 Almaz Abera 4 Prolog Programming Jimma Jimma_city 8
65 Almaz Abera 7 Java Programming AAU Sidist_killo 6
24 Teddy Tamiru 8 Oracle Database NAC Saris 5
94 Taye Gizaw 3 Cisco Networking AAU Sidist_killo 7
No partial dependency of a non key attribute on part of the primary key. This will result in a set of
relations with a level of Second Normal Form.
Any table that is in 1NF and has a single-attribute (i.e., a non-composite) primary key is
automatically in 2NF.
Definition: a table (relation) is in 2NF
If
It is in 1NF and
If all non-key attributes are dependent on the entire primary key. i.e. no partial
dependency.
Example for 2NF:
EMP_PROJ
EmpID EmpName ProjNo ProjName ProjLoc ProjFund ProjMangID Incentive
EMP_PROJ rearranged
EmpID ProjNo EmpName ProjName ProjLoc ProjFund ProjMangID Incentive
Business rule: Whenever an employee participates in a project, he/she will be entitled for an
incentive.
This schema is in its 1NF since we don’t have any repeating groups or attributes with multi-valued
property. To convert it to a 2NF we need to remove all partial dependencies of non key attributes
on part of the primary key.
{EmpID, ProjNo}→ EmpName, ProjName, ProjLoc, ProjFund, ProjMangID, Incentive
But in addition to this we have the following dependencies
FD1: {EmpID}→EmpName
FD2: {ProjNo}→ProjName, ProjLoc, ProjFund, ProjMangID
FD3: {EmpID, ProjNo}→ Incentive
As we can see, some non key attributes are partially dependent on some part of the primary key.
This can be witnessed by analyzing the first two functional dependencies (FD1 and FD2). Thus,
each Functional Dependencies, with their dependent attributes should be moved to a new relation
where the Determinant will be the Primary Key for each.
EMPLOYEE
EmpID EmpName
PROJECT
ProjNo ProjName ProjLoc ProjFund ProjMangID
EMP_PROJ
EmpID ProjNo Incentive
THIRD NORMAL FORM (3NF)
This schema is in its 2NF since the primary key is a single attribute.
Let’s take StudID, Year and Dormitory and see the dependencies.
StudID→Year AND Year→Dormitory
And Year cannot determine StudID and Dormitory cannot determine StudID
Then transitively StudID→Dormitory
To convert it to a 3NF we need to remove all transitive dependencies of non key attributes on
another non-key attribute.
The non-primary key attributes, dependent on each other will be moved to another table and linked
with the main table using Candidate Key- Foreign Key relationship.
STUDENT
StudID Stud_F_Name Stud_L_Name Dept Year
125/97 tamiru tessema Info Sc 1
654/95 Lemma Alemu Geog 3
842/95 Mesfin Taye Comp. Sc 3
165/97 Abera Belay Info Sc 1
985/95 Almaz Abera Geog 3
DORM
Year Dormitory
1 401
3 403
Generally, even though there are other four additional levels of Normalization, a table is said to be
normalized if it reaches 3NF. A database with all tables in the 3NF is said to be Normalized
Database.
Mnemonic for remembering the rationale for normalization up to 3NF could be the following:
1. No Repeating or Redundancy: - no repeating fields in the table.
2. The Fields Depend Upon the Key: - the table should solely depend on the key.
3. The Whole Key: - no partial key dependency.
4. And Nothing But the Key: - no inter data dependency.
5. So Help Me Codd: - since Codd came up with these rules.
5.1.3. Other Levels of Normalization
Boyce-Codd Normal Form (BCNF):
Isolate Independent Multiple Relationships - No table may contain two or more 1: n or N: M
relationships that are not directly related.
The correct solution, to cause the model to be in 4th normal form, is to ensure that all M: M
relationships are resolved independently if they are indeed independent, as shown below.
Def: A table is in BCNF if it is in 3NF and if every determinant is a candidate key.
Forth Normal form (4NF)
Isolate Semantically Related Multiple Relationships - There may be practical constrains on
information that justify separating logically related many-to-many relationships.
Def: A table is in 4NF if it is in BCNF and if it has no multi-valued dependencies.
Fifth Normal Form (5NF)
A model limited to only simple (elemental) facts, as expressed in ORM.
Def: A table is in 5NF, also called "Projection-Join Normal Form" (PJNF), if it is in 4NF and if
every join dependency in the table is a consequence of the candidate keys of the table.
Domain-Key Normal Form (DKNF)
Models are free from all modification anomalies.
Def: A table is in DKNF if every constraint on the table is a logical consequence of the definition of
keys and domains.
The underlying ideas in normalization are simple enough. Through normalization we want to
design for our relational database a set of tables that;
(1) Contain all the data necessary for the purposes that the database is to serve.
(2) have as little redundancy as possible.
(3) Accommodate multiple values for types of data that require them.
(4) Permit efficient updates of the data in the database, and
(5) Avoid the danger of losing data unknowingly.
Pitfalls of Normalization
Constraints
Every relation has some conditions that must hold for it to be a valid relation. These conditions are
called Relational Integrity Constraints. There are three main integrity constraints −
➢ Key constraints
➢ Domain constraints
➢ Referential integrity constraints
Key Constraints
There must be at least one minimal subset of attributes in the relation, which can identify a tuple
uniquely. This minimal subset of attributes is called key for that relation. If there is more than one
such minimal subset, these are called candidate keys.
Key constraints force that −
➢ In a relation with a key attribute, no two tuples can have identical values for key attributes.
➢ A key attribute cannot have NULL values.
Key constraints are also referred to as Entity Constraints.
Domain Constraints
Attributes have specific values in real-world scenario. For example, age can only be a positive
integer. The same constraints have been tried to employ on the attributes of a relation. Every
attribute is bound to have a specific range of values. For example, age cannot be less than zero and
telephone numbers cannot contain a digit outside 0-9.
File organization: is the organization of the data of a file into records, blocks and access structures.
This includes the way records and blocks are stored on the storage medium and linked. Access
method: provides a group of operations that can be applied to a file.
File Organization
➢ The File is a collection of records. Using the primary key, we can access the records. The
type and frequency of access can be determined by the type of file organization which was
used for a given set of records.
➢ File organization is a logical relationship among various records. This method defines how
file records are mapped onto disk blocks.
➢ File organization is used to describe the way in which the records are stored in terms of
blocks, and the blocks are placed on the storage medium.
➢ The first approach to map the database to the file is to use the several files and store only
one fixed length record in any given file. An alternative approach is to structure our files so
that we can contain multiple lengths for records.
➢ Files of fixed length records are easier to implement than the files of variable length
records.
File organization contains various methods. These particular methods have pros and cons on the
basis of access or selection. In the file organization, the programmer decides the best-suited file
organization method according to his requirement.
This method is the easiest method for file organization. In this method, files are stored sequentially.
This method can be implemented in two ways:
Suppose we have four records R1, R3 and so on upto R9 and R8 in a sequence. Hence, records are
nothing but a row in the table. Suppose we want to insert a new record R2 in the sequence, then it
will be placed at the end of the file. Here, records are nothing but a row in any table.
Suppose there is a preexisting sorted sequence of four records R1, R3 and so on upto R6 and R7.
Suppose a new record R2 has to be inserted in the sequence, then it will be inserted at the end of the
file, and then it will sort the sequence.
Pros of sequential file organization
o It contains a fast and efficient method for the huge amount of data.
o In this method, files can be easily stored in cheaper storage mechanism like magnetic tapes.
o It is simple in design. It requires no much effort to store the data.
o This method is used when most of the records have to be accessed like grade calculation of
a student, generating the salary slip, etc.
o This method is used for report generation or statistical calculations.
Suppose we have five records R1, R3, R6, R4 and R5 in a heap and suppose we want to insert a
new record R2 in a heap. If the data block 3 is full then it will be inserted in any of the database
selected by the DBMS, let's say data block 1.
If we want to search, update or delete the data in heap file organization, then we need to traverse
the data from staring of the file till we get the requested record.
If the database is very large then searching, updating or deleting of record will be time-consuming
because there is no sorting or ordering of records. In the heap file organization, we need to check
all the data until we get the requested record.
Hash File Organization uses the computation of hash function on some fields of the records. The
hash function's output determines the location of disk block where the records are to be placed.
When a record has to be received using the hash key columns, then the address is generated, and
the whole record is retrieved using that address. In the same way, when a new record has to be
inserted, then the address is generated using the hash key and record is directly inserted. The same
process is applied in the case of delete and update.
In this method, there is no effort for searching and sorting the entire file. In this method, each
record will be stored randomly in the memory.
B+ File Organization
➢ B+ tree file organization is the advanced method of an indexed sequential access method. It
uses a tree-like structure to store records in File.
➢ It uses the same concept of key-index where the primary key is used to sort the records. For
each primary key, the value of the index is generated and mapped with the record.
➢ The B+ tree is similar to a binary search tree (BST), but it can have more than two children.
In this method, all the records are stored only at the leaf node. Intermediate nodes act as a
pointer to the leaf nodes. They do not contain any records.
The above B+ tree shows that:
o There is one root node of the tree, i.e., 25.
o There is an intermediary layer with nodes. They do not store the actual record. They have
only pointers to the leaf node.
o The nodes to the left of the root node contain the prior value of the root and nodes to the
right contain next value of the root, i.e., 15 and 30 respectively.
o There is only one leaf node which has only values, i.e., 10, 12, 17, 20, 24, 27 and 29.
o Searching for any record is easier as all the leaf nodes are balanced.
o In this method, searching any record can be traversed through the single path and accessed
easily.
ISAM method is an advanced sequential file organization. In this method, records are stored in the
file using the primary key. An index value is generated for each primary key and mapped with the
record. This index contains the address of the record in the file.
If any record has to be retrieved based on its index value, then the address of the data block is
fetched and the record is retrieved from the memory.
Pros of ISAM:
o In this method, each record has the address of its data block, searching a record in a huge
database is quick and easy.
o This method supports range retrieval and partial retrieval of records. Since the index is
based on the primary key values, we can retrieve the data for the given range of value. In
the same way, the partial value can also be easily searched, i.e., the student name starting
with 'JA' can be easily searched.
Cons of ISAM
o This method requires extra space in the disk to store the index value.
o When the new records are inserted, then these files have to be reconstructed to maintain the
sequence.
o When the record is deleted, then the space used by it needs to be released. Otherwise, the
performance of the database will slow down.
In this method, we can directly insert, update or delete any record. Data is sorted based on the key
with which searching is done. Cluster key is a type of key with which joining of the table is
performed.
1. Indexed Clusters:
In indexed cluster, records are grouped based on the cluster key and stored together. The above
EMPLOYEE and DEPARTMENT relationship is an example of an indexed cluster. Here, all the
records are grouped based on the cluster key- DEP_ID and all the records are grouped.
2. Hash Clusters:
It is similar to the indexed cluster. In hash cluster, instead of storing the records based on the cluster
key, we generate the value of the hash key for the cluster key and store the records with the same
hash key value.
o primary index
▪ specified on the ordering key field of ordered file of records
▪ index file is an ordered file with two keys
▪ primary key
▪ pointer to a disk block
▪ one index entry in the index file for each block in the data file
▪ disadvantage: insertion and deletion of records
▪ move records around and change index values
▪ solutions: use unordered overflow file, use linked list of overflow records
o clustering index
▪ used if numerous records can have the same value for the ordering field
▪ file records are physically ordered on a non-key field without a distinct value for
each record
▪ index file is an ordered file with two keys
▪ clustering field value
▪ pointer to a disk block of the first appearance of the field value
o secondary index
▪ can be specified on any non ordering field
▪ index file is file with two keys
▪ indexing field
▪ block pointer or record pointer
▪ usually needs more storage and longer search time than primary index (because
it’s dense)
• indexes may be dense or sparse
o dense index has an index entry for every search key value in the data file
o sparse index has entries for only some search values
▪ applicable when records are sequentially ordered on search-key
Types of SQL
Here are five types of widely used SQL queries.
1) Data Definition Language (DDL)
2) Data Manipulation Language (DML)
3) Data Control Language(DCL)
4) Transaction Control Language(TCL)
5) Data Query Language (DQL)
Data Definition Language
Data Definition Language helps you to define the database structure or schema. Let’s learn about
DDL commands with syntax.
Five types of DDL commands in SQL are:
CREATE
CREATE statements is used to define the database structure schema:
Syntax:
CREATE TABLE TABLE_NAME (COLUMN_NAME DATATYPES [,....]);
For example:
➢ Create database university;
➢ Create table students;
➢ Create view for_students;
DROP
Drops commands remove tables and databases from RDBMS.
Syntax
DROP TABLE ;
For example:
➢ Drop object_type object_name;
➢ Drop database university;
➢ Drop table student;
ALTER
Alters command allows you to alter the structure of the database.
Syntax:
To add a new column in the table
ALTER TABLE table_name ADD column_name COLUMN-definition;
To modify an existing column in the table:
ALTER TABLE MODIFY(COLUMN DEFINITION....);
For example:
Alter table guru99 add subject varchar;
TRUNCATE:
This command used to delete all the rows from the table and free the space containing the table.
Syntax:
TRUNCATE TABLE table_name;
Example:
TRUNCATE table students;
The Data Control Language is a subset of the Structured Query Language. Database
administrators use DCL to configure security access to relational databases. It complements
the Data Definition Language, which adds and deletes database objects, and the Data
Manipulation Language, which retrieves, inserts, and modifies the contents of a database.
DCL is the simplest of the SQL subsets, as it consists of only three commands: GRANT,
REVOKE, and DENY. Combined, these three commands provide administrators with the
flexibility to set and remove database permissions in granular fashion.
The GRANT command adds new permissions to a database user. It has a very simple syntax,
defined as follows:
GRANT [privilege]
ON [object]
TO [user]
[WITH GRANT OPTION]
Here's the rundown on each of the parameters you can supply with this command:
• Privilege — can be either the keyword ALL (to grant a wide variety of permissions) or a
specific database permission or set of permissions. Examples include CREATE
DATABASE, SELECT, INSERT, UPDATE, DELETE, EXECUTE and CREATE VIEW.
• Object — can be any database object. The valid privilege options vary based on the type
of database object you include in this clause. Typically, the object will be either a
database, function, stored procedure, table or view.
• User — can be any database user. You can also substitute a role for the user in this clause
if you wish to make use of role-based database security.
• If you include the optional WITH GRANT OPTION clause at the end of the GRANT
command, you not only grant the specified user the permissions defined in the SQL
statement but also give the user permission to further grant those same permissions
to other database users. For this reason, use this clause with care.
For example, assume you wish to grant the user Joe the ability to retrieve information from
the employee table in a database called HR. Use the following SQL command:
GRANT SELECT
ON HR.employees
TO Joe
Joe can retrieve information from the employees' table. He will not, however, be able to grant
other users permission to retrieve information from that table because the DCL script did not
include the WITH GRANT OPTION clause.
The REVOKE command removes database access from a user previously granted such access.
The syntax for this command is defined as follows:
➢ Permission — specifies the database permissions to remove from the identified user. The
command revokes both GRANT and DENY assertions previously made for the identified
permission.
➢ Object — can be any database object. The valid privilege options vary based on the type of
database object you include in this clause. Typically, the object will be either a database,
function, stored procedure, table, or view.
➢ User — can be any database user. You can also substitute a role for the user in this clause if
you wish to make use of role-based database security.
➢ The GRANT OPTION FOR clause removes the specified user's ability to grant the specified
permission to other users. If you include the GRANT OPTION FOR clause in a REVOKE
statement, the primary permission is not revoked. This clause revokes only the granting
ability.
➢ The CASCADE option also revokes the specified permission from any users that the specified
user granted the permission.
The following command revokes the permission granted to Joe in the previous example:
REVOKE SELECT
ON HR.employees
FROM Joe
The DENY command explicitly prevents a user from receiving a particular permission. This
feature is helpful when a user is a member of a role or group that is granted a permission, and you
want to prevent that individual user from inheriting the permission by creating an exception. The
syntax for this command is as follows:
DENY [permission]
ON [object]
TO [user]
The parameters for the DENY command are identical to those used for the GRANT command.
For example, if you wished to ensure that Matthew would never receive the ability to delete
information from the employees' table, issue the following command:
DENY DELETE
ON HR.employees
TO Matthew
• System: This includes permissions for creating session, table, etc and all types of other
system privileges.
• Object: This includes permissions for any command or query to perform any operation on
the database tables.
• GRANT: Used to provide any user access privileges or other priviliges for the database.
References
1. Introduction to Database systems- RameezElmasri and Shamakanth B. Navathe
2. Introduction to Relational Databases and SQL Programming - Christopher Allen, Simon
Chatwin,` Catherine A. Creary
3. Fundamentals of Database systems - C.J. Data
4. Database – Models, Language and Design - James L. Johnson
5. Joseph M. Hellerstein, Michael Stonebraker and James Hamilton Architecture of a
Database System
6. https://www.javatpoint.com/dbms-cluster-file-organization
7. https://www.javatpoint.com/dbms-tutorial