Paper 2 Database-Chapter A.2
Paper 2 Database-Chapter A.2
Paper 2 Database-Chapter A.2
DBMS stands for "Database Management System." In short, a DBMS is a database program.
Technically speaking, it is a software system that uses a standard method of cataloging,
retrieving, and running queries on data. The DBMS manages incoming data, organizes it, and
provides ways for the data to be modified or extracted by users or other programs.
Short for relational database management system and pronounced as separate letters, a type of
database management system (RDBMS) that stores data in the form of related tables. Relational
databases are powerful because they require few assumptions about how data is related or how it
will be extracted from the database. As a result, the same database can be viewed in many
different ways.
An important feature of relational systems is that a single database can be spread across several
tables. This differs from flat-file databases, in which a database is self-contained in a single table.
Almost all full-scale database systems are RDBMS's. Small database systems, however, use
other designs that provide less flexibility in posing queries.
DBMS Functions
There are several functions that a DBMS performs to ensure data integrity and
consistency of data in the database. The ten functions in the DBMS are: data dictionary
management, data storage management, data transformation and presentation, security
management, multiuser access control, backup and recovery management, data integrity
management, database access languages and application programming interfaces, database
communication interfaces, and transaction management.
Data Dictionary is where the DBMS stores definitions of the data elements and their
relationships (metadata). The DBMS uses this function to look up the required data component
structures and relationships. When programs access data in a database they are basically going
through the DBMS. This function removes structural and data dependency and provides the user
with data abstraction. In turn, this makes things a lot easier on the end user. The Data Dictionary
is often hidden from the user and is used by Database Administrators and Programmers.
Page 1 of 22
This particular function is used for the storage of data and any related data entry forms or screen
definitions, report definitions, data validation rules, procedural code, and structures that can
handle video and picture formats. Users do not need to know how data is stored or manipulated.
Also involved with this structure is a term called performance tuning that relates to a database’s
efficiency in relation to storage and access speed.
This function exists to transform any data entered into required data structures. By using the
data transformation and presentation function the DBMS can determine the difference between
logical and physical data formats.
4. Security Management
This is one of the most important functions in the DBMS. Security management sets
rules that determine specific users that are allowed to access the database. Users are given a
username and password or sometimes through biometric authentication (such as a fingerprint or
retina scan) but these types of authentication tend to be more costly. This function also sets
restraints on what specific data any user can see or manage.
Data integrity and data consistency are the basis of this function. Multiuser access
control is a very useful tool in a DBMS, it enables multiple users to access the database
simultaneously without affecting the integrity of the database.
Backup and recovery is brought to mind whenever there is potential outside threats to a
database. For example if there is a power outage, recovery management is how long it takes to
recover the database after the outage. Backup management refers to the data safety and integrity;
for example backing up all your mp3 files on a disk.
The DBMS enforces these rules to reduce things such as data redundancy, which is
when data is stored in more than one place unnecessarily, and maximizing data consistency,
making sure database is returning correct/same answer each time for same question asked.
Page 2 of 22
9. Database Communication Interfaces
This refers to how a DBMS can accept different end user requests through different
network environments. An example of this can be easily related to the internet. A DBMS can
provide access to the database using the Internet through Web Browsers (Mozilla Firefox,
Internet Explorer, Netscape).
This refers to how a DBMS must supply a method that will guarantee that all the updates in a
given transaction are made or not made.All transactions must follow what is called the ACID
properties.
Database Schema
Page 3 of 22
Page 4 of 22
There are three forms of schema:
Physical
Conceptual
External
The physical schema describes details of how data is stored: files, indices, etc. on the random
access disk system. It also typically describes the record layout of files and type of files (hash, b-
tree, flat).
Early applications worked at this level - explicitly dealt with details. E.g., minimizing physical
distances between related data and organizing the data structures within the file (blocked records,
linked lists of blocks, etc.)
Problems:
In the relational model, the conceptual schema presents data as a set of tables.
The DBMS maps data access between the conceptual to physical schemas automatically.
In the relational model, the external schema also presents data as a set of relations. An external
schema specifies a view of the data in terms of the conceptual level. It is tailored to the needs of
a particular category of users. Portions of stored data should not be seen by some users and
begins to implement a level of security and simplifies the view for these users
E.g.
Page 5 of 22
Students should not see faculty salaries.
Faculty should not see billing or payment data.
Information that can be derived from stored data might be viewed as if it were stored.
Applications are written in terms of an external schema. The external view is computed when
accessed. It is not stored. Different external schemas can be provided to different categories of
users. Translation from external level to conceptual level is done automatically by DBMS at run
time. The conceptual schema can be changed without changing application:
Conceptual/logical and external schema described by the data definition language (DDL)
Integrity constraints, domains described by DDL
Operations on data described by the data manipulation language (DML)
Directives that influence the physical schema (affects performance, not semantics) are
described by the storage definition language (SDL)
Page 6 of 22
Data Independence
Logical data independence
Immunity of external
models to changes in the
logical model
Occurs at user interface
level
Entity-Relationship Model
A semantic model, captures meanings
E-R modeling is a conceptual level model
Proposed by P.P. Chen in 1970s
Page 7 of 22
Page 8 of 22
Relational Model
Record- and table-based model
Relational database modeling is a logical-level model
Proposed by E.F. Codd
Page 9 of 22
Object-oriented Model
Uses the E-R modeling as a basis but extended to include encapsulation, inheritance
Objects have both state and behavior
Page 10 of 22
Data Dictionary
A data dictionary is a collection of descriptions of the data objects or items in a data model for
the benefit of programmers and others who need to refer to them. A first step in analyzing a
system of objects with which users interact is to identify each object and its relationship to other
objects. This process is called data modeling and results in a picture of object relationships. After
each data object or item is given a descriptive name, its relationship is described (or it becomes
part of some structure that implicitly describes relationship), the type of data (such as text or
image or binary value) is described, possible predefined values are listed, and a brief textual
description is provided. This collection can be organized for reference into a book called a data
dictionary.
When developing programs that use the data model, a data dictionary can be consulted to
understand where a data item fits in the structure, what values it may contain, and basically what
the data item means in real-world terms. For example, a bank or group of banks could model the
data objects involved in consumer banking. They could then provide a data dictionary for a
bank's programmers. The data dictionary would describe each of the data items in its data model
for consumer banking (for example, "Account holder" and ""Available credit").
DDL: Data Definition Language (DDL) is a standard for commands that define the different
structures in a database. DDL statements create, modify, and remove database objects such as
tables, indexes, and users. Common DDL statements are CREATE, ALTER, and DROP.
Database Terms:
Table: In relational databases and flat file databases, a table is an organized set of data elements
(values) using a model of vertical columns (which are identified by their name) and horizontal
rows, the cell being the unit where a row and column intersect. A table has a specified number of
columns, but can have any number of rows. Each row is identified by the values appearing in a
particular column subset which has been identified as a unique key index.
Table is equivalent to relation/file.
Field: In the context of a relational database table, a column is a set of data values of a particular
simple type, one for each row of the table.The columns provide the structure according to which
the rows are composed.
Page 11 of 22
The term field is often used interchangeably with column, although many consider it more
correct to use field (or field value) to refer specifically to the single item that exists at the
intersection between one row and one column.
Field is equivalent to attribute/column.
Primary Key:
The primary key of a relational table uniquely identifies each record in the table. It can either be
a normal attribute that is guaranteed to be unique (such as Social Security Number in a table with
no more than one record per person) or it can be generated by the DBMS (such as a globally
unique identifier, or GUID, in Microsoft SQL Server). Primary keys may consist of a single
attribute or multiple attributes in combination.
The difference between Primary Key and Secondary Key are stated below:-
Primary Key
3) A Primary Key is one of the Candidate Keys or you can say one of the irreducible super key,
depends on database designer which one he needs.
Secondary Key
3) Attribute used for Secondary key are not the ones used for Super Key i.e. Secondary Key is
not even be one of the Super Key.
Candidate Key:
In the relational model of databases, a candidate key of a relation is a minimal superkey for that
relation; that is, a set of attributes such that
1. the relation does not have two distinct tuples (i.e. rows or records in common database
language) with the same values for these attributes (which means that the set of attributes
is a superkey)
Page 12 of 22
2. there is no proper subset of these attributes for which (1) holds (which means that the set
is minimal).
Definition: A candidate key is a combination of attributes that can be uniquely used to identify a
database record without any extraneous data. Each table may have one or more candidate keys.
One of these candidate keys is selected as the table primary key.
One-to-one relationship:
In a one-to-one relationship each item of entity A can be associated with 0 or 1 item of entity B.
An employee, for example, is usually linked to only 1 office. Or a beer brand has 1 country of
origin.
e.g.
People and their passport. However, this only counts if you look at their current
passports. Each person has one current, valid passport and each current, valid passport
belongs to one person.
A relational database design is a collection of database tables that are interlinked by primary
keys and foreign keys. The relational model comprises a number of rules that help you discover
the correct relations between data. These rules are called 'normal forms'. In the following
chapters of this database design tutorial I will show you how to normalize a database.
One-to-many relationship
Page 13 of 22
A relational database design is a collection of database tables that are interlinked by primary
keys and foreign keys. The relational model comprises a number of rules that help you discover
the correct relations between data. These rules are called 'normal forms'. In the following
chapters of this database design tutorial I will show you how to normalize a database.
Another example of a one-to-many relationship is the relationship that exists between a mother
and her children. A mother can have many children and each child has only one mother.
(Technically it would be better to speak of a woman and her children instead of a mother and her
children, because in a one-to-many relationship a mother can have 0, 1 or many children and a
mother with 0 children isn't technically a mother. But let's just play along, ok?)
When one record in table A can be linked to 0, 1 or many records in table B, you are dealing
with a one-to-many relationship. In the relational model a one-to-many relationship is modelled
using two tables.
The many-to-many relationship is a relationship where multiple rows from table A can
correspond to multiple rows in table B. An example of such a relationship is a school where
teachers teach students. In most schools each teacher teaches multiple students and each student
can be taught by multiple teachers.
A many-to-many relationship is modelled with tree tables. Two 'source' tables and one junction
table. The primary key of the junction table A_B is a composite key. It is made of two fields:
the two foreign key fields that refer to the primary key of table A and table B.
Page 14 of 22
All primary keys must be unique. This implies that the combination of field A and B must be
unique in table A_B.
Database Normalization
The guidelines for proper relational database design are laid out in the relational model. They
are grouped into 5 groups called normal forms. The first normal form represents the lowest
form of database normalization, the fifth represents the highest form of database normalization.
The normal forms are guidelines for good database design. You are not obliged to adhere to all
five normal forms when designing a database. Nevertheless, you are advised to normalize you
database to some extent, because normalization has some significant advantages in terms of the
efficiency and maintainability of your database.
Database normalization
The guidelines for proper relational database design are laid out in the relational model. They
are grouped into 5 groups called normal forms. The first normal form represents the lowest
form of database normalization, the fifth represents the highest form of database normalization.
The normal forms are guidelines for good database design. You are not obliged to adhere to all
five normal forms when designing a database. Nevertheless, you are advised to normalize you
database to some extent, because normalization has some significant advantages in terms of the
efficiency and maintainability of your database.
In a normalized database structure you can make complex data selections with relatively
simple SQL queries.
Data integrity. A normalized database allows for reliable data storage.
Database normalization avoids redundant (duplicate) storage of data. Data are always
stored in only one location which makes it easy to insert, update or delete data. There is
one exception to this rule. The keys themselves are stored in multiple locations, because
they are copied as foreign keys to other tables. If you want to state it correctly you should
say that logical data is not duplicated.
Scalability is the ability of a system to deal with future growth. For a database this means
that it must still be able to perform quickly when the number of users and the amount of
data grows. Scalability is a very important characteristic of any database model and for
database management systems.
Page 15 of 22
These are some of the general tasks that are associated with database normalization.
The first normal form states that a database table is a representation of an entity in the system
you are building. Examples of entities are order, customer, booking, hotel, product, etc. Each
row in the database table represents one instance of an entity. For example in a customer table
each row represents one customer.
In order for a database to be normalized according to the second normal form it must first be
normalized according to the rules of the first normal form. The second normal deals with data
redundancy.
Data redundancy
Page 16 of 22
Duplication of data accross rows in the store field.
The table above could belong to a company that sells cars and has multiple stores in The
Netherlands.
When looking at the table above you might see multiple examples of duplication accross rows.
The brand field could be split of into a separate table. Also, the type field could be split of into a
table that has a many-to-one relationship with the brand table, because a brand hans multiple
types.
The store column contains the store where the car is currently located. Store is an obvious case
of data redundancy and a good candidate for a separate entity that should be referenced from the
car table with a foreign key relationship.
Below is an example of how you could better model the car situation by avoiding data
redundancy in the car table.
Page 17 of 22
In the setup above the car table has a foreign key reference to the type and store tables. The
brand column is gone, because brand is implicitely referenced through the type reference. When
a type is referenced, a brand is also referenced, because a type belongs to a brand.
Data redundancy has been largely removed from the model. If you are strict perhaps you still
aren't satisfied with this solution. What about the country_of_origin field in the brand table?
There is no duplication yet, because there are only four brands from different countries. A strict
database designer might split off the country names into a separate country table.
And even then you might still not be satisfied, because you could also split of the color field in
the car table.
How strictly you design your tables is up to you and depends on the situation. If you are going to
have huge quantities of cars in your system and you want to be able to search cars by color, it
might be wise to split off colors into a separate table, so they are not duplicated.
The third normal form deals with transitive dependencies. A transitive dependency between
database fields exists when the value of a non-key field is determined by the value of another
non-key field. For a database to be in the third normal form it must first be in the second normal
form.
Transitive dependencies
Page 18 of 22
In this table not all fields are solely dependent on the primary key. There exists a separate
relationship between the postal_code field and the city and province field. In the Netherlands,
city and province are both determined by the postal code, so there is no need to store city and
province in the clients table. If you know the postal code, you already know the city and
province.
Such transitive relationships should be avoided of you want to model your database to the third
normal form.
In this case, removing the transative relationship from the table can be achieved by removing the
city and province fields from the table and storing them in a separate table, containing the postal
code (primary key), the province name and the city name. Figuring out the postal code - city -
province combinations for an entire country is really hard work. That is why such tables are sold
commercially.
Another example of the application of the third normal form is this (way too) simple order table
from an online shop.
Value Added Tax is a percentage that is added to the price of a product (19% in the table above).
This means that the total_ex_vat amount can be calculated from the total_inc_vat amount and
vice versa. You should store either one of these fields, but not both. You should leave the task of
calculating total_inc_vat from total_ex_vat or vice versa to the program that uses the database.
The third normal form basically says you should not store data that in fields that can be derived
from other (non-key) fields in a table. Especially in the client table example, applying the third
normal form requires either a lot of work or the purchase of a commercial postal code-city-
province table.
Page 19 of 22
The third normal form is not always adhered to in database design. When designing a database
you should always compare the advantages of a higher normal form to the work it takes to apply
and mantain that normal form. In the case of the client table I would personally choose not to
normalize to the third normal form. In the latter case, I would. Storing derived data, like the
result of a calculation that is based on existing data is usually a bad idea.
On the last page you learned how records from different tables are linked to each other in a
relational database. Before you start creating and linking tables it is important that you think
about the entities that exist in your system and that you decide how these entities are related. In
database design, the entities and their relationships are represented in an entity-relationship
diagram (ERD). The ERD is the result of the database design process.
Entities
You may be wondering what an entity is. Well, it's a "thing". In a system. There. My mother
always wanted me to become a teacher, because I explain things so well.
In the context of database design an entity is anything that might deserve its own table in your
database model. When you design a database you should identify the entities in the system you
are creating. This is mostly a matter of talking to your client or to yourself and figuring out what
data your system will be working with.
Let's take a webshop system as an example. A webshop sells products. A product would be a
very obvious entity in a webshop system. Products are ordered by customers. There, two more
obvious entities we've already seen: orders and customers.
An order is paid for by the customer... that interesting. Are we going to have a payment table in
our webshop database? Possibly. But isn't the payment a "single piece of information" that
belongs to an order? That's also possible.
If you are not sure just think about what information you want to store about a payment. You
might want to store the payment method and the payment date. These are still single pieces of
information that could belong to an order. You could interpret these as the payment method of
an order and the payment date of an order, so I don't see the necessity to model payment as a
separate table, although conceptually, you could see a payment as an "entity", because you might
regard it as a separate container of information (payment date, payment method).
Let's not get too academic
As you can see there is a difference between an entity an an actual table in your database. IT
specialists can become VERY academic about this difference. I am not such an IT specialist.
This difference depends on the way you look at your data. If you look at data modelling from a
software perspective you might come up with a lot of entities that don't translate directly to
tables. In this tutorial we are looking at data strictly from a database perspective and in our little
world an entity translates to a table.
Hang in there, you are really close to attaining your database design degree.
Page 20 of 22
As you can see, deciding what entities your system has is a bit of an intellectual process that
takes some experience and that is often subject to some changing and tinkering and
reconsideration, but it's certainly not rocket science.
An entity-relationship diagram can get pretty large if you are building a complex application.
Some contain hundreds or even thousands of tables.
Relationships
The second step of database design is deciding what relationships exist between the entities in
your system. Now this may get a little complicated sometimes, but again, it's no rocket science.
With some experience and some (re)consideration you usually end up with a database model that
is right or almost right.
I already introduced you to the one-to-many relationship and I will tell you a lot more about
relationships in the coming pages of this tutorial, so I am not going to go into them here. Just
remember that deciding what relationships your entities have is an important part of database
design and the relationships are represented in the entity-relationship diagram.
Page 21 of 22