Unit-1 Database Management System
Unit-1 Database Management System
SYSTEM
Prepared By:
Amrit Parajuli
Lecturer: Waling Multiple Campus
Lecturer: Andhikhola Polytechnic Institute
Lecturer: Pioneers Higher Education Academy
Owner: Integrated IT Solutions
Email: amritparazuli@gmail.com
Contact: 9841954069
INTRODUCTION
Data: Data are simply facts or figures or bits of information, but not
information itself. When data are processed, interpreted, organized,
structured or presented so as to make them meaningful or useful, they
are called information. Information provides context for data. For
example The history of temperature readings all over the world for
the past 100 years is data. If this data is organized and analyzed to
find that global temperature is rising, then that is information.
uniquely. An entity can contain multiple keys as we saw in PERSON table. The
key which is most suitable from those lists become a primary key.
In the EMPLOYEE table, ID can be primary key since it is unique for each
a tuple.
The remaining attributes except for primary key are considered as a candidate
Rest of the attributes like SSN, Passport_Number, and License_Number, etc. are
considered as a candidate key.
SUPER KEYS
Super key is a set of an attribute which can uniquely identify a tuple. Super key is a
superset of a candidate key.
For example: In the EMPLOYEE table, for(EMPLOEE_ID,
EMPLOYEE_NAME) the name of two employees can be the same, but their
EMPLYEE_ID can't be the same. Hence, this combination can also be a key.
The super key would be EMPLOYEE-ID, (EMPLOYEE_ID, EMPLOYEE-
NAME), etc.
FOREIGN KEYS
Foreign keys are the column of the table which is used to point to the primary key
of another table.
In a company, every employee works in a specific department, and employee and
department are two different entities. So we can't store the information of the
department in the employee table. That's why we link these two tables through the
primary key of one table.
We add the primary key of the DEPARTMENT table, Department_Id as a new
attribute in the EMPLOYEE table.
Now in the EMPLOYEE table, Department_Id is the foreign key, and both the
tables are related.
DATABASE LANGUAGES
Database languages can be used to read, store and update the data in the database. There are
four different types of database languages. They are:
1. Data Definition Language (DDL): DDL stands for Data Definition Language. It is used
to define database structure or pattern. It is used to create schema, tables, indexes,
constraints, etc. in the database. Using the DDL statements, you can create the skeleton
of the database. The task that comes under DDL are CREATE, ALTER, DROP,
TRUNCATE, RENAME, COMMENT etc.
2. Data Manipulation Language (DML): DML stands for Data Manipulation Language.
It is used for accessing and manipulating data in a database. It handles user requests. The
tasks that comes under DML are SELECT, INSERT, UPDATE, DELETE, MERGE etc.
3. Data Control Language (DCL): DCL stands for Data Control Language. It is used to
retrieve the stored or saved data. The DCL execution is transactional. It also has rollback
parameters.The tasks that comes under DCL are GRANT and REVOKE.
4. Transaction Control Language (TCL): TCL is used to run the changes made by the
DML statement. TCL can be grouped into a logical transaction. Here are some tasks that
come under TCL:
Commit: It is used to save the transaction on the database.
Rollback: It is used to restore the database to original since the last Commit.
DATA MODELS
Data models define how the logical structure of a database is modeled. Data
Models are fundamental entities to introduce abstraction in a DBMS. Data
models define how data is connected to each other and how they are processed
and stored inside the system.
Hierarchical Data Model:
The Hierarchical model was essentially born from the first mainframe database
management system. It uses an upside-down tree to structure data. The top of
the tree is the parent and the branches are children. Each child can only have
one parent but a parent can have many children.
HIERARCHIAL MODEL
Advantages:
Structures data in an upside-down tree. (Simplifies data
overview)
Manages large amounts of data.
Disadvantages:
Complex (users require physical representation of database)
model.
Promotes data integrity.
Disadvantages:
Data relationships must be predefined.
Users are still require to know the physical representation of the database.
OBJECT ORIENTED MODEL
In Object Oriented Data Model, data and their
relationships are contained in a single structure which
is referred as object. In this, real world problems are
represented as objects with different attributes. All
objects have multiple relationships between them.
Basically, it is combination of Object Oriented
programming and Relational Database Model.
OBJECT ORIENTED DATABSE MODEL
Advantages:
The object-oriented data model allows the ‘real world’ to be modeled more
closely.
OODBMSs allow new data types to be built from existing types.
object oriented database are capable of storing different types of data, for
example, pictures, voice video, including text, numbers and so on.
OODBMSs use a different protocol to handle the types of long-duration
Disadvantages:
Few relational databases have limits on field lengths which can't be exceeded.
grows, and the relations between pieces of data become more complicated.
Complex relational database systems may lead to isolated databases where the
RELATIONSHIP
A relationship is used to describe the relation between entities.
Diamond or rhombus is used to represent the relationship.
2. One-to-many relationship
3. Many-to-one relationship
4. Many-to-many relationship
ONE-TO-ONE RELATIONSHIP
When more than one instance of the entity on the left, and only
one instance of an entity on the right associates with the
relationship then it is known as a many-to-one relationship.
For example, Student enrolls for only one course, but a course
can have many students.
MANY TO MANY RELATIONSHIP
When more than one instance of the entity on the left, and more than
one instance of an entity on the right associates with the relationship
then it is known as a many-to-many relationship.
For example, Employee can assign by many projects and project can
have many employees.
ER DIAGRAM
ER Diagram:
An ER diagram shows the relationship among entity sets. An entity set
is a group of similar entities and these entities can have attributes. In
terms of DBMS, an entity is a table or attribute of a table in database,
so by showing relationship among tables and their attributes, ER
diagram shows the complete logical structure of a database.
ADVANTAGES OF ER DIAGRAM
Conceptually it is very simple: ER model is very simple because if
we know relationship between entities and attributes, then we can
easily draw an ER diagram.
Better visual representation: ER model is a diagrammatic
representation of any logical structure of database. By seeing ER
diagram, we can easily understand relationship among entities and
relationship.
Effective communication tool: It is an effective communication
tool for database designer.
Highly integrated with relational model: ER model can be easily
converted into relational model by simply converting ER model into
tables.
Easy conversion to any data model: ER model can be easily
converted into another data model like hierarchical data model,
network data model and so on.
NORMALIZATION
Normalization is the process of organizing the data in the database.
Normalization is used to minimize the redundancy from a relation or set of
relations. It is also used to eliminate the undesirable characteristics like Insertion,
Update and Deletion Anomalies.
Normalization divides the larger table into the smaller table and links them using
relationship.
The normal form is used to reduce redundancy from the database table.
Types of Normal Forms
There are the four types of normal forms:
ADVANTAGES/NEED OF
NORMALIZATION
Advantages:
Greater overall database organization
Need:
It is used to remove the duplicate data and database anomalies from
KEY TERMS
Prime attribute: An attribute, which is a part of the candidate-key is
known as a prime attribute.
Non-prime attribute: An attribute, which is not a part of the primary key
is said to be a non-prime attribute.
PRIMARY KEY in DBMS is a column or group of columns in a table
that uniquely identify every row in that table.
CANDIDATE KEY in SQL is a set of attributes that uniquely identify
tuples in a table. Candidate Key is a super key with no repeated attributes.
FOREIGN KEY is a column that creates a relationship between two
tables. The purpose of Foreign keys is to maintain data integrity and allow
navigation between two different instances of an entity.
A nonkey attribute does not uniquely identify an instance of an entity.
For example, a database can have multiple instances of the same customer
name, which means that “customer name” is not unique.
A functional dependent is a constraint that specifies the relationship
between two sets of attributes where one set can accurately determine the
value of other sets.
1NF
The table will be in First Normal Form (1NF) if all the attributes of the table
contain only atomic values. We can also say that if a table holds the multivalued
data items in attributes or composite values, the relation cannot be in the first
normal form. So, we need to make it first normal form by making the entries of
the table atomic.
2NF
A Relation will be in 2NF if it follows the following condition:
The table or relation should be in 1NF or First Normal Form.
All the non-prime attributes should be fully functionally dependent on the
candidate key.
The table should not contain any partial dependency.
3NF
The table will be in Third Normal Form (3NF) if it follows the given
conditions:
The table or relation should be in 2NF.
It should not contain any transitive dependency. A Transitive Dependency is
that any non-prime attribute determines or depends on the other non-prime
attribute.
A relation is in 3NF if FD X determines Y ('X' -> 'Y') satisfies one of the
following condition:
If X -> Y is a trivial FD, i.e., Y is a subset of X.
If X -> Y, where X is a Super key.
If X -> Y, (Y - X) is a prime attribute.
3NF
BCNF
It stands for Boyce Codd Normal form, which is the
next version of 3NF. Sometimes, it is also pronounced
as 3.5 NF. A normal form is said to be in BCNF if it
follows the given conditions:
A table or relation must be in 3NF.
If a relation R has functional dependencies (FD) and if
A determines B, where A is a super Key, the relation is
in BCNF.
CENTRALIZED DATABASE SYSTEM
A centralized database is stored at a single location such as a mainframe computer.
It is maintained and modified from that location only and usually accessed using
an internet connection such as a LAN or WAN. The centralized database is used
by organizations such as colleges, companies, banks etc.
As can be seen from the above diagram, all the information for the organization is
stored in a single database. This database is known as the centralized database.
ADVANTAGES/DISADVANTAGES OF CENTRALIZED
DATABASE SYSTEM
Advantages:
The data integrity is maximized as the whole database is stored at a single physical
location.
The data redundancy is minimal in the centralized database. All the data is stored
together and not scattered across different locations.
Since all the data is in one place, there can be stronger security measures around it. So,
the centralized database is much more secure.
The centralized database is cheaper than other types of databases as it requires less power
and maintenance.
All the information in the centralized database can be easily accessed from the same
location and at the same time.
Disadvantages:
Since all the data is at one location, it takes more time to search and
access it.
There is a lot of data access traffic for the centralized database.
Since all the data is at the same location, if multiple users try to access
it simultaneously it creates a problem.
If there are no database recovery measures in place and a system
failure occurs, then all the data in the database will be destroyed.
DISTRIBUTED DATABASE SYSTEM
In a distributed database management system, the database is not stored at a single location.
Rather, it may be stored in multiple computers at the same place or geographically spread far
away. Despite all this, the distributed database appears as a single database to the user. A diagram
to better explain this is as follows:
As seen in the figure, the components of the distributed database can be in multiple locations
such as India, Canada, Australia, etc. However, this is transparent to the user i.e the database
appears as a single entity.
ADVANTAGES/ DISADVANTAGES OF DISTRIBUTED
DATABASE SYSTEM
Advantages:
If there were a natural catastrophe such as a fire or an earthquake, all the data would
normal functions.
Disadvantages:
The distributed database is quite complex and it is difficult to make sure that a user gets
secured at all the locations it is stored. Moreover, the infrastructure connecting all the
nodes in a distributed database also needs to be secured.
It is difficult to maintain data integrity in the distributed database because of its nature.
There can also be data redundancy in the database as it is stored at multiple locations.
CENTRALIZED VS DISTRIBUTED DATABASE
SYSTEM
Centralized Database System Distributed Database System
1. It is a database that is stored, located as 1. It is a database which consists of
well as maintained at a single location multiple databases which are connected
only. with each other and are spread across
2. The data access time in the case of different physical locations.
multiple users is more in a centralized 2. The data access time in the case of
database. multiple users is less in a distributed
3. The management, modification, and database.
backup of this database are easier as the 3. The management, modification, and
entire data is present at the same backup of this database are very difficult
location. as it is spread across different physical
4. This database provides a uniform and locations.
complete view to the user. 4. Since it is spread across different
5. This database has more data consistency locations thus it is difficult to provide a
in comparison to distributed database. uniform view to the user.
6. Centralized database is less costly. 5. This database may have some data
replications thus data consistency is less.
6. This database is very expensive.
DATA SECURITY
Data security is the practice of protecting digital information from unauthorized
access, corruption, or theft throughout its entire lifecycle. It’s a concept that
encompasses every aspect of information security from the physical security of
hardware and storage devices to administrative and access controls, as well as the
logical security of software applications. It also includes organizational policies
and procedures.
Accidental loss of data may result from
Crashes during transaction processing
Logical errors in the program
Due to the distribution of data over several computers.
Intentional Loss of data may result from
Unauthorized reading of data.
Unauthorized modification of data.
Unauthorized destruction of data.
SECURITY MEASURE AT DIFFERENT
LEVELS
To protect the database, we must take security measures at several levels:
Physical: The sites containing the computer systems must be secured against
only a limited portion of the database. Other users may be allowed to issue
queries, but may be forbidden to modify the data. It is responsibility of the
database system to ensure that these authorization restrictions are not
violated.
GUIDELINE FOR DATA SECURITY
1. Use encryption to protect confidential data.
2. Backup important data and test the backup regularly.
3. Use strong passwords, keep them private and change
regularly.
4. Activate password protection for unattended
computing devices.
5. Beware of suspicious e-mails.
6. Configure your computer securely.
7. Turn off unnecessary wireless connections.
8. Observe and comply with the “Data Protection
Principles”.
9. Report Information Security incident immediately.
10. Configure firewall.
DATA ABSTRACTION
Database systems are made-up of complex data structures. To ease the user
interaction with database, the developers hide internal irrelevant details from
users. This process of hiding irrelevant details from user is called data abstraction.
LEVELS OF DATA ABSTRACTION
We have three levels of abstraction:
Physical level: This is the lowest level of data abstraction. It describes how
data is actually stored in database. You can get the complex data structure
details at this level.
Logical level: This is the middle level of 3-level data abstraction
architecture. It describes what data is stored in database.
View level: Highest level of data abstraction. This level describes the user
interaction with database system.
Example: Let’s say we are storing customer information in a customer
table. At physical level these records can be described as blocks of storage
(bytes, gigabytes, terabytes etc.) in memory. These details are often hidden
from the programmers.
At the logical level these records can be described as fields and attributes
along with their data types, their relationship among each other can be
logically implemented. The programmers generally work at this level
because they are aware of such things about database systems.
At view level, user just interact with system with the help of GUI and enter
the details at the screen, they are not aware of how the data is stored and
DATABASE ADMINISTRATOR
A database administrator (DBA) is a specialized computer systems administrator
who maintains a successful database environment by directing or performing all
related activities to keep the data secure. The top responsibility of a DBA
professional is to maintain data integrity. This means the DBA will ensure that
data is secure from unauthorized access but is available to users.
Roles or functions of DBA:
1. Installation, configuration and upgradation of databases like Microsoft SQL/
MySQL/ Oracle Server Software.
2. Evaluating the features of various databases.
3. Establishing and maintaining sound backup and recovery policies and
procedures.
4. Taking care of database design and implementation.
5. Implementing and maintaining the database security.
6. Database tuning, application tunning and performance monitoring.
7. Maintaining documentation and standards.
8. DBA does some technical trouble shooting and consultation to development
teams.
DATA INTEGRITY
Data Integrity is the overall completeness, accuracy and consistency of data. This
can be indicated by the absence of alteration between two instances or between
two updates of a data record, meaning data is intact and unchanged. Data integrity
is usually imposed during the database design phase through the use of standard
procedures and rules. It is maintained through the use of various error-checking
methods and validation procedures.
Types of Data Integrity:
1. Entity Integrity: The entity integrity constraint states that primary key value
can't be null. This is because the primary key value is used to identify individual
rows in relation and if the primary key has a null value, then we can't identify
those rows. A table can contain a null value other than the primary key field.
2. Referential Integrity: A referential integrity constraint is specified between two
tables. In the Referential integrity constraints, if a foreign key in Table 1 refers to
the Primary Key of Table 2, then every value of the Foreign Key in Table 1 must
be null or be available in Table 2.
3. Domain Integrity: Domain constraints can be defined as the definition of a
valid set of values for an attribute. The data type of domain includes string,
character, integer, time, date, currency, etc. The value of the attribute must be
available in the corresponding domain.