Database Management System

Cloud IT Solution Page 225
Database Management System
Data:
Data is nothing but facts and statistics stored or free flowing over a network, generally it's raw
and unprocessed. Example: “The price of oil is 40 tk per litre” is a data. This is a raw value.
Database:
A Database is a collection of related data organised in a way that data can be easily accessed,
managed and updated.
Database Management System
 Database management system is software which is used to manage the database. For
example: MySQL, Oracle, etc are a very popular commercial database which is used in
different applications.
 DBMS provides an interface to perform various operations like database creation, storing
data in it, updating data, creating a table in the database and a lot more.
 It provides protection and security to the database. In the case of multiple users, it also
maintains data consistency.
Characteristics of Database Management System

 Provides security and removes redundancy
 Self-describing nature of a database system
 Insulation between programs and data abstraction
 Support of multiple views of the data
 Sharing of data and multiuser transaction processing
 DBMS allows entities and relations among them to form tables.
 It follows the ACID concept (Atomicity, Consistency, Isolation, and Durability).
 DBMS supports multi-user environment that allows users to access and manipulate data
in parallel.
File System [ICT Ministry (AP)-2017]

A File Management system is a DBMS that allows acces to single files or tables at a time. In a
File System, data is directly stored in set of files. It contains flat files that have no relation to
other files (when only one table is stored in single file, then this file is known as flat file).
Advantages of DBMS over File system
 Data redundancy and inconsistency – Redundancy is the concept of repetition of data

i.e. each data may have more than a single copy. The file system cannot control
redundancy of data as each user defines and maintains the needed files for a specific
application to run. There may be a possibility that two users are maintaining same files
data for different applications. Hence changes made by one user do not reflect in files
used by second users, which leads to inconsistency of data. Whereas DBMS controls
redundancy by maintaining a single repository of data that is defined once and is accessed
by many users. As there is no or less redundancy, data remains consistent.
 Data isolation – Because data are scattered in various files, and files may be in different
formats, writtent new application programs to retrive the appropriate data is difficult.
 Data sharing – File system does not allow sharing of data or sharing is too complex.
Whereas in DBMS, data can be shared easily due to centralized system.
 Data concurrency – Concurrent access to data means more than one user is accessing
the same data at the same time. Anomalies occur when changes made by one user gets
lost because of changes made by other user. File system does not provide any procedure
to stop anomalies. Whereas DBMS provides a locking system to stop anomalies to occur.
 Data searching – For every search operation performed on file system, a different
application program has to be written. While DBMS provides inbuilt searching
operations. User only has to write a small query to retrieve data from database.
 Security problems – Not every user of the database system should be able to access all
the data. For example, in a banking system, payroll personnel need to see only that part of
the database that has information about the various bank employees. They do not need
access to information about customer accounts. But, since application programs are
added to the file-processing system in an ad hoc manner, enforcing such security
constraints is difficult.
 Data integrity – There may be cases when some constraints need to be applied on the
data before inserting it in database. The file system does not provide any procedure to
check these constraints automatically. Whereas DBMS maintains data integrity by
enforcing user defined constraints on data by itself.
RDBMS
A relational database management system (RDBMS) is a database management system (DBMS)
that is based on the relational model.
Stduents
ID Name Phone DOB

500 Matt 555-4141 01/09/1989
501 Jery 867-5309 3/15/1981
502 Sean 876-9123 10/31/1982
ID ClassID Sem ClassID Title Class Num

500 1001 Fall02 1001 Intro to Informatics 1101
501 1002 Fall02 1002 Data mining 1400
501 1002 Spr03 1003 Internet and society 1400
502 1003 S203
Courses
Takes_Course
DBMS Vs RDBMS
DBMS RDBMS
DBMS applications store data as file. RDBMS applications store data in a tabular
form.
In DBMS, data is generally stored in either a In RDBMS, the tables have an identifier
hierarchical form or a navigational form. called primary key and the data values are
stored in the form of tables.
Normalization is not present in DBMS. Normalization is present in RDBMS.
DBMS does not apply any security with RDBMS defines the integrity
regards to data manipulation. constraint for the purpose of ACID
(Atomicity, Consistency, Isolation and
Durability) property.
DBMS uses file system to store data, so there In RDBMS, data values are stored in the
will be no relation between the tables. form of tables, so a relationship between
these data values will be stored in the form of
a table as well.
DBMS has to provide some uniform methods RDBMS system supports a tabular structure
to access the stored information. of the data and a relationship between them
to access the stored information.
DBMS does not support distributed RDBMS supports distributed database.
database.
DBMS is meant to be for small organization RDBMS is designed to handle large
and deal with small data. it supports single amount of data. it supports multiple users.
user.
Examples of DBMS are file systems, xml etc. Example of RDBMS are mysql, postgre, sql
server, oracle etc.
Data Abstraction in DBMS
This process of hiding irrelevant details from user is called data abstraction. We have three
levels of abstraction
Physical level: This is the lowest level of data abstraction. It describes how data is actually stored
in database. You can get the complex data structure details at this level.
Logical level: This is the middle level of 3-level data abstraction architecture. It describes what
data is stored in database.
View level: Highest level of data abstraction. This level describes the user interaction with
database system.
Example: Let’s say we are storing customer information in a customer table. At physical
level these records can be described as blocks of storage (bytes, gigabytes, terabytes etc.) in
memory. These details are often hidden from the programmers.
At the logical level these records can be described as fields and attributes along with their data
types, their relationship among each other can be logically implemented. The programmers
generally work at this level because they are aware of such things about database systems.
At view level, user just interact with system with the help of GUI and enter the details at the
screen, they are not aware of how the data is stored and what data is stored; such details are
hidden from them.
DBMS Architecture
Types of DBMS Architecture
There are three types of DBMS architecture:
1. Single tier architecture

2. Two tier architecture
3. Three tier architecture
1. Single tier architecture
In this type of architecture, the database is readily available on the client machine; any request
made by client doesn’t require a network connection to perform the action on the database.
For example, let’s say you want to fetch the records of employee from the database and the
database is available on your computer system, so the request to fetch employee details will be
done by your computer and the records will be fetched from the database by your computer as
well. This type of system is generally referred as local database system.
2. Two tier architecture
In two-tier architecture, the Database system is present at the server machine and the DBMS
application is present at the client machine, these two machines are connected with each other
through a reliable network as shown in the above diagram.
Whenever client machine makes a request to access the database present at server using a query
language like sql, the server perform the request on the database and returns the result back to the
client. The application connection interface such as JDBC, ODBC are used for the interaction
between server and client.
3. Three tier architecture
In three-tier architecture, another layer is present between the client machine and server machine.
In this architecture, the client application doesn’t communicate directly with the database systems
present at the server machine, rather the client application communicates with server application
and the server application internally communicates with the database system present at the server.
Table
In Relational database model, a table is a collection of data elements organized in terms of rows
and columns. A table is also considered as a convenient representation of relations. But a table
can have duplicate row of data while a true relation cannot have duplicate data. Table is the
simplest form of data storage. Below is an example of an Employee table.
ID Name Age Salary
1 Adam 34 13000
2 Alex 28 15000
3 Stuart 20 18000
4 Ross 42 19020
SQL term Relational database term Description

Row Tuple or record A data set representing a single item
Column Attribute or field A labeled element of a tuple, e.g. Address or
Date of birth
Table Relation or Base relvar A set of tuples sharing the same attributes; a set
of columns and rows
View or result Derived relvar Any set of tuples; a data report from the
set RDBMS in response to a query
SQL Command
SQL defines following data languages to manipulate data of RDBMS.
DDL: Data Definition Language
All DDL commands are auto-committed. That means it saves all the changes permanently in the
database.
Command Description
create to create new table or database
alter for alteration
truncate delete data from table
drop to drop a table
rename to rename a table
DML: Data Manipulation Language

DML commands are not auto-committed. It means changes are not permanent to database, they
can be rolled back.
Command Description
insert to insert a new row
update to update existing row
delete to delete a row
merge merging two rows or two tables
TCL: Transaction Control Language
These commands are to keep a check on other commands and their affect on the database. These
commands can annul changes made by other commands by rolling back to original state. It can
also make changes permanent.
Command Description
commit to permanently save
rollback to undo change
savepoint to save temporarily
DCL: Data Control Language

Data control language provides command to grant and take back authority.
Command Description
grant grant permission of right
revoke Take back permission.
DQL: Data Query Language
Command Description
select retrieve records from one or more table
SQL
Commands
DDL DML DCL TCL
Create Select Grant Commit

Alter Insert Revoke Rollback
Drop Update Save Point
Truncate Delete Set Transaction
Comment Marge
Rename Call
Explain Plan
Lock Table
Previous year question:
1. Create table employee (name varchar,id integer). What type of statement Ans.: b
is this ? [Combined(AP)-2018]
a) DML b) DDL c) View d) Integrity constraint
2. The SQL statement that queries or reads data from a table is -- [ICB(AP)- Ans.: a
2017]
a) SELECT b)READ c) QUERY d) None of the above
3. To remove the duplicate rows from the result of an SQL Select statement, Ans.: b
the ------- qualifier specified include. [Combined(AME)-2018]
a)Only b) distinct c) Unique d) Single
4. What is DDL and SML command? Explain.

SQL Comment:
CREATETABLE table_name (
column1 datatype,
column2 datatype,
column3 datatype
);
CREATETABLE Persons (
ID int NOTNULL,
LastName varchar(255) NOTNULL,
FirstName varchar(255),
Age int,
CONSTRAINT PK_Person PRIMARYKEY (ID,LastName)
);
ALTERTABLE Persons
ADDCONSTRAINT PK_Person PRIMARYKEY (ID,LastName);
ALTERTABLE Persons
DROPCONSTRAINT PK_Person;
CREATETABLE Orders (
OrderID int NOTNULL,
OrderNumber int NOTNULL,
PersonID int,
PRIMARYKEY (OrderID),
CONSTRAINT FK_PersonOrder FOREIGNKEY (PersonID)
REFERENCES Persons(PersonID)
);
ALTERTABLE Orders
ADDFOREIGNKEY (PersonID) REFERENCES Persons(PersonID);
ALTERTABLE Orders
DROPCONSTRAINT FK_PersonOrder;
DROPTABLE Shippers;
TRUNCATETABLE table_name;
ALTERTABLE table_name
ADD column_name datatype;
ALTER TABLE table_name

DROP COLUMN column_name;

ALTER COLUMN column_name datatype;
MODIFY COLUMN column_name datatype;
INSERT INTO table_name (column1, column2, column3, ...)
VALUES (value1, value2, value3, ...);
UPDATE table_name
SET column1 = value1, column2 = value2, ...
WHERE condition
Default Method
DECODE: DECODE (supplier_id, 10000, 'IBM',
10001, 'Microsoft',
10002, 'Hewlett Packard',
'Gateway') result
CASE:CASE supplier_id
WHEN '10000' THEN 'IBM'
WHEN '10001' THEN 'Microsoft'
WHEN '10002' THEN 'Hewlett Packard '
ELSE 'Gateway'
END
NVL: NVL (‘commission_pct' , ' ' ) or NVL (

commission_pct,1 )
LENGTH: LENGTH('CANDIDE') Length in characters
TO_DATE:TO_DATE('11-10-2001','dd-mm-yy')
TO_NUMBER:TO_NUMBE ('100')
TO_CHAR:TO_CHAR (199)
TRIM:TRIM ('my name is mehedi')
LOWER: LOWER('MehedI')
UPPER: UPPER('MehedI')
SUBSTR: SUBSTR(string,start_position,length) or SUBSTR (
'This is a test', 6, 2 )
REPLACE: REPLACE (string1,
string_to_replace,replacement_string) or
REPLACE (‘222tech', '2', '3' );
LIKE Operator Description
WHERE CustomerName LIKE 'a%' Finds any values that start with a
WHERE CustomerName LIKE '%a' Finds any values that end with a
WHERE CustomerName LIKE '%or%' Finds any values that have or in any position
Normalization of Database [Pubali Bank(SO)-2018]

 Normalization is a database design technique which organizes tables in a manner that
reduces redundancy and dependency of data.
 It divides larger tables to smaller tables and links them using relationships.
 In other word, Database Normalization is a technique of organizing the data in the
database. Normalization is a systematic approach of decomposing tables to eliminate data
redundancy and undesirable characteristics like Insertion, Update and Deletion
Anomalies. It is a multi-step process that puts data into tabular form by removing
duplicated data from the relation tables.
Normalization is used for mainly two purposes

 Eliminating redundant (useless) data.
 Ensuring data dependencies make sense i.e. data is logically stored.
Problem without Normalization
 Updating Anomaly: To update address of a student who occurs twice or more than twice
in a table, we will have to update S_Address column in all the rows, else data will
become inconsistent.
 Insertion Anamoly : Suppose for a new admission, we have a Student id(S_id), name
and address of a student but if student has not opted for any subjects yet then we have to
insert NULL there, leading to Insertion Anamoly.
 Deletion Anamoly : If (S_id) 401 has only one subject and temporarily he drops it, when
we delete that row, entire student record will be deleted along with it.
Normalization Rule
 The inventor of the relational model Edgar Codd proposed the theory of normalization
with the introduction of First Normal Form, and he continued to extend theory with
Second and Third Normal Form. Later he joined with Raymond F. Boyce to develop the
theory of Boyce-Codd Normal Form.
 Theory of Data Normalization in SQL is still being developed further. For example, there
are discussions even on 6th Normal Form. However, in most practical applications,
normalization achieves its best in 3rd Normal Form.
1. First Normal Form

2. Second Normal Form
3. Third Normal Form
4. BCNF(Boyce-Code NF)
5. Fourth Normal Form
6. Fifth Normal Form
7. Sixth Normal Form
First Normal Form (1NF)

The following criteria’s must be satisfied for 1NF.
 Each table cell should contain a single value.

 No composite values
 All entries in any column must be of the same kind
 Each column must have a unique name
 No two rows are identical
Student Table
Student Age Subject

Adam 15 Biology, Maths
Alex 14 Maths
Stuart 17 Maths
In First Normal Form, any row must not have a column in which more than one value is saved, like
separated with commas. Rather than that, we must separate such data into multiple rows.
Student Table following 1NF will be-
Student Age Subject

Adam 15 Biology
Adam 15 Maths
Alex 14 Maths
Stuart 17 Maths
Second Normal Form (2NF)
 Rule 1- Be in 1NF
 Rule 2- Single Column must be Primary Key (where primary key is the unique key which
can identify a row uniquely like student roll , mobile number or sociel voter id number)
As per the Second Normal Form there must not be any partial dependency of any column on
primary key. It means that for a table that has concatenated primary key, each column in the table
that is not part of the primary key must depend upon the entire concatenated key for its existence.
If any column depends only on one part of the concatenated key, then the table fails Second
normal form.
New Student Table following 2NF will be:
Id Student Age
1 Adam 15
2 Alex 14
3 Stuart 17
In Student Table, the primary key will be Id column, and the foreign key for the Stu_subject
table is Id which is referenced bt Student table.
Id Student Subject
1 Adam Biology
1 Adam Maths
2 Alex Maths
3 Stuart Maths
Now, if somebody want’s to insert a subject in Stu_subject table which does not exist in Student
table then an error will be shown!
Example 1: If a relation is in 2NF then:
(a) Every candidate key is a primary key
(b) Every non-prime attribute is fully functionally dependent on each relation key
(c) Every attribute is functionally independent
(d) Every relational key is a primary key
Ans.: A table is in 2NF if it is 1NF and every non-prime attribute of the table is dependent on the
complete candidate key that means no non-prime attribute is partially dependent on any key.
Partially dependent means, dependent on a proper subset of key.
So, ans is b
Third Normal Form (3NF)

 Rule 1- Be in 2NF
 Rule 2- Has no transitive functional dependencies
Transitive functional dependency

A transitive is a type of functional dependency which happens when t is indirectly formed by two
functional dependencies.
Example:
Company CEO Age
Microsoft Satya Nadella 51
Google Sundar Pichai 46
Alibaba Jack Ma 54
{Company} -> {CEO} (if we know the compay, we know its CEO's name)
{CEO } -> {Age} If we know the CEO, we know the Age
Therefore according to the rule of rule of transitive dependency:
{ Company} -> {Age} should hold, that makes sense because if we know the company name, we
can know his age.
So, in mathmetical form we can say -- A functional dependency is said to be transitive if it is
indirectly formed by two functional dependencies. For Example:
X -> Z is a transitive dependency if the following three functional dependencies hold true:
 X->Y
 Y does not ->X
 Y->Z
Note: You need to remember that transitive dependency can only occur in a relation of three or
more attributes.
Transitive functional dependency should be removed from the table and also the table must be in
Second Normal form. For example, consider a table with following fields.
Student_Detail Table
Student_id Student_name DOB Street city State Zip
In this table Student_id is Primary key, but street, city and state depends upon Zip. The
dependency between zip and other fields is called transitive dependency. Hence to apply 3NF,
we need to move the street, city and state to new table, with Zip as primary key.
New Student_Detail Table:
Student_id Student_name DOB Zip
Address Table :
Zip Street city state
The advantage of removing transitive dependency is-
 Amount of data duplication is reduced.
 Data integrity achieved.
The Boyce-Codd Normal Form
A relational schema R is in Boyce–Codd normal form (BCNF) if, for every one of its
dependencies X → Y, one of the following conditions holds true:
X → Y is a trivial functional dependency (i.e., Y is a subset of X)
X is a super key for schema R
1. Why do we need to normalize a database? [Combined(AP)-2018] Ans.: a

a) To remove redundancy b) To make data meaningful
c) To make database secure d) To make database consistency
2. Repeated data exist at-- [BB(AP)-2016] Ans.: a
a) unnormalized b) 1NF c) 2NF d) 3NF
3. What is normalization? [JBL AEO(IT)-2015] Ans.: a
a) To Remove Redundancy b)To make Database
c) To make data meaningful d) To make database Consistency
4. In the normal form, a composite attribute is-------------converted to Ans.: a
individual attributes. [Combined(AP)-2018]
a) First b) Second c) Third d) Fourth
Explanation:
1NF (First Normal Form) Rules
 Each table cell should contain a single value.
 Each record needs to be unique.
ACID Properties [ICB(AP) -2017,BDBL(IT) -2017]

A transaction can be defined as a group of tasks. A single task is the minimum processing
unit which cannot be divided further. Database System plays with lots of different types of
transactions where all transaction has certain characteristic. This characteristic is known
ACID Properties. ACID Properties take grantee for all database transactions to accomplish all
tasks.
1. Atomicity: This property ensures that either all the operations of a transaction reflect in
database or none. i.e “Either commits all or nothing”. Let’s take an example of banking
system to understand this:
Suppose Account A has a balance of tk 400 & B has tk 700. Account A is
transferring tk 100 to Account B. This is a transaction that has two operations
a) Debiting tk100 from A’s balance
A=old_balance-100
b) Crediting tk 100 to B’s balance.
B=Old_balance+100
Let’s say first operation passed successfully while second failed, in this case A’s balance
would be tk 300 while B would be having tk 700 instead of tk 800. This is unacceptable
in a banking system. Either the transaction should fail without executing any of the
operation or it should process both the operations. The Atomicity property ensures that.
2. Consistency: Every attribute in the database have some rules to ensure the stability of the
database. The constraint puts on the data value should be constant before and after the
execution of the transaction.If the system fails because of the invalid data while doing an
operation, revert back the system to its previous state.
Example: The total amount in A and B account should be the same before and after the
transaction. The sum of the money in A and B account before is 400+700= tk 1100 and
after the transaction is 300+800=tk 1100. So this transaction preserves consistency ACID
properties in DBMS.
3. Isolation: If you are performing multiple transactions on the single database, operation
from any transaction should not interfere with operation in other transaction. The
execution of all transaction should be isolated from other transaction (that means no other
transaction should run concurrently when there is a transaction already running). For
example account A is having a balance of tk 400 and it is transferring tk 100 to account
B & C both.
So we have two transactions here. Let’s say these transactions run concurrently and both
the transactions read tk 400 balance, in that case the final balance of A would be 300$
instead of tk 200. This is wrong. If the transaction were to run in isolation then the second
transaction would have read the correct balance tk 300 (before debiting tk100) once the
first transaction went successful.
4. Durability: “committed data stored forever”.The database should be durable enough to

hold all its latest updates even if the system fails or restarts. If a transaction updates a
chunk of data in a database and commits, then the database will hold the modified data. If
a transaction commits but the system fails before the data could be written on to the disk,
then that data will be updated once the system springs back into action.
Key types
 Candidate Key
 Primary Key
 Unique Key
 Alternate Key
 Composite Key
 Super Key
 Foreign Key
 Surrogate Key
Example: Let’s see the STUDENT table

STUDENT
SID FNAME LNAME COURSEID
Here in STUDENT table keys are:

Super key: SID, FNAME+LAME, FNAME+COURSEID, LNAME +LNAME
Candidate keys are SID or FNAME+LAME
Primary Key: SID
Foreign Key: COURSEID
Alternate Key: FNAME+LAME
Composite Key: FNAME+LAME
Primary key
A column or columns is called primary key (PK) that uniquely identifies each row in the table.
If you want to create a primary key, you should define a PRIMARY KEY constraint when you
create or modify a table.
When multiple columns are used as a primary key, it is known as composite primary key.
Points to remember for primary key
 Primary key enforces the entity integrity of the table.
 Primary key always has unique data.
 A primary key length cannot be exceeded than 900 bytes.
 A primary key cannot have null value.
 There can be no duplicate value for a primary key.
 A table can contain only one primary key constraint.
Main advantage of primary key
 The main advantage of this uniqueness is that we get fast access.
Foreign key
In the relational databases, a foreign key is a field or a column that is used to establish a link
between two tables.
In simple words you can say that, a foreign key in one table used to point primary key in another
table.
Super key
 Super key=candidate key +zero/more attributes.
 Every Candidate key is a super key.
 But every super key is not a candidate key.
 An attribute or set of attributes that uniquely defines a tuple within a relation. However, a
super key may contain additional attributes that are not necessary for unique
identification.
Example 1: A super key for an entity consist of
a) One attribute only
b) At least two attribute
c) At most two attribute
d) One or more attribute
Answer.: d
Candidate key
A super key such that no proper subset is a super key within the relation. So, basically has two
properties: Each candidate key uniquely identifies tuple in the relation; & no proper subset of the
composite key has the uniqueness property.
Alternate key
 Any candidate key that has not been selected as the primary key.
 An alternate key is just a candidate key that has not been selected as the primary key.
Composite key
 When a candidate key consists of more than one attribute.
 It may be a candidate key or primary key.
Surrogate Key
Surrogate keys are keys that have no business meaning and are solely used to identify a record in
the table. The surrogate key is not derived from application data. The surrogate is internally
generated by the system and is invisible to the user or application
What are the integrity rules in DBMS?
Data integrity is one significant aspect while maintaining the database. So, data integrity is
enforced in the database system by imposing a series of rules. Those set of integrity is known as
the integrity rules.
There are two integrity rules in DBMS:
Entity Integrity: It specifies that "Primary key cannot have a NULL value."
Referential Integrity: It specifies that "Foreign Key can be either a NULL value or should be the
Primary Key value of other relation
1. Which of the following is a group of one or more attributes that uniquely Ans.: a
identifies a row? [ICB(AP)-2017]
a) Key b)Determinant
c) Tuple d) Relation
2. In an Entity-Relationship many-to-many relationship corresponds to a- --- Ans.: a
in actual database. [JBL AEO(IT)-2015]
a) Table b)field c) row d) primary key
3. A primary key must also be -------------- [Combined(IT/ICT)-2018] Ans.: b
a) a) Foreign key b) Unique
c) Identical d) Case sensitive
4. To remove the duplicate rows from the result of an SQL Select statement, Ans.: b
the ------- qualifier specified include. [Combined(AME)-2018]
a)Only b) distinct c) Unique d) Single
5. The subset of super key is a candidate key under what condition? Ans. a
a) No proper subset is a super key b) All subsets are super keys
c) Subset is a super key d) Each subset is a super key
Explanation: The subset of a set cannot be the same set. Candidate key is a
set from a super key which cannot be the whole of the super set.
6. Difference between primary key, foreign key and candidate key. [Combined(AP-HBFC,KB)-
2018,ICB(AP) -2017]
7. Difference between primary key and super key. [Palli Sanchaya Bank (Programmer)-2018,EPB-2018]
8. Difference between super key and Unique constrain key. [Pubali Bank (so)-2018]
Deadlock in DBMS
A deadlock is a condition wherein two or more tasks are waiting for each other in order to be
finished but none of the task is willing to give up the resources that other task needs. In this
situation no task ever gets finished and is in waiting state forever.
Coffman conditions
Coffman stated four conditions for a deadlock occurrence. A deadlock may occur if all the
following conditions holds true.
 Mutual exclusion condition: There must be at least one resource that cannot be used by
more than one process at a time.
 Hold and wait condition: A process that is holding a resource can request for additional
resources that are being held by other processes in the system.
 No preemption condition: A resource cannot be forcibly taken from a process. Only the
process can release a resource that is being held by it.
 Circular wait condition: A condition where one process is waiting for a resource that is
being held by second process and second process is waiting for third process ….so on and
the last process is waiting for the first process. Thus making a circular chain of waiting.
Deadlock detection
Resource scheduler is one that keeps the track of resources allocated to and requested by
processes. Thus, if there is a deadlock it is known to the resource scheduler. This is how a
deadlock is detected.
Once a deadlock is detected it is being corrected by following methods:
 Terminating processes involved in deadlock: Terminating all the processes involved in

deadlock or terminating process one by one until deadlock is resolved can be the
solutions but both of these approaches are not good. Terminating all processes cost high
and partial work done by processes gets lost. Terminating one by one takes lot of time
because each time a process is terminated, it needs to check whether the deadlock is
resolved or not. Thus, the best approach is considering process age and priority while
terminating them during a deadlock condition.
 Resource Preemption: Another approach can be the preemption of resources and
allocation of them to the other processes until the deadlock is resolved.
Deadlock prevention
We have learnt that if all the four Coffman conditions hold true then a deadlock occurs so
preventing one or more of them could prevent the deadlock.
 Removing mutual exclusion: All resources must be sharable that means at a time more
than one processes can get a hold of the resources. That approach is practically
impossible.
 Removing hold and wait condition: This can be removed if the process acquires all the
resources that are needed before starting out. Another way to remove this to enforce a
rule of requesting resource when there are none in held by the process.
 Preemption of resources: Preemption of resources from a process can result in rollback
and thus this needs to be avoided in order to maintain the consistency and stability of the
system.
 Avoid circular wait condition: This can be avoided if the resources are maintained in a
hierarchy and process can hold the resources in increasing order of precedence. This
avoid circular wait. Another way of doing this to force one resource per process rule – A
process can request for a resource once it releases the resource currently being held by it.
This avoids the circular wait.
Deadlock Avoidance
Deadlock can be avoided if resources are allocated in such a way that it avoids the deadlock
occurrence. There are two algorithms for deadlock avoidance.
 Wait/Die
 Wound/Wait
JOIN
A JOIN clause is used to combine rows from two or more tables, views based on a related column
between them.
SELECT Orders.OrderID, Customers.CustomerName,
Orders.OrderDate
FROM Orders
INNERJOIN Customers ON Orders.CustomerID=Customers.CustomerID;
Different Types of SQL JOINs
(INNER) JOIN: Returns records that have matching values in both tables
LEFT (OUTER) JOIN: Return all records from the left table, and the matched records from the
right table
RIGHT (OUTER) JOIN: Return all records from the right table, and the matched records from
the left table
FULL (OUTER) JOIN: Return all records when there is a match in either left or right table
SELECT column_name(s)
FROM table1
INNER JOIN table2 ON table1.column_name =
table2.column_name;
FROM table1
LEFTJOIN table2 ON table1.column_name = table2.column_name;
FROM table1
RIGHTJOIN table2 ON table1.column_name =
table2.column_name;
FROM table1
FULLOUTERJOIN table2 ON table1.column_name =
table2.column_name;
Self Join
A self-join is a query in which a table is joined (compared) to itself. Self-joins are used to
compare values in a column with other values in the same column in the same table. One
practical use for self-joins: obtaining running counts and running totals in an SQL query.
Example
SELECT a.ID, b.NAME, a.SALARY

FROM CUSTOMERS a, CUSTOMERS b
WHERE a.SALARY < b.SALARY;
Aggregate functions
The GROUP BY statement is often used with aggregate functions (COUNT, MAX, MIN, SUM,
AVG) to group the result-set by one or more columns.
Sequence of Clause
1. where
2. group by
3. having
4. order by
Example
SELECT COUNT(CustomerID), Country
FROM Customers
GROUPBY Country
ORDERBY COUNT(CustomerID) DESC;
SELECT COUNT(CustomerID), Country
FROM Customers
GROUPBY Country
HAVING COUNT (CustomerID) > 5
ORDERBY COUNT (CustomerID) DESC;
SQL View
A view in SQL is a logical subset of data from one or more tables. View is used to restrict data
access.
Syntax for creating a view
CREATE or REPLACE viewview_name AS
FROM table_name
WHERE condition
Types of view
There are two types of view
1. Simple View
2. Complex View
Simple View Complex View
Created from one table Created from one or more
table
Does not contain functions Contain functions
Does not contain groups of data Contains groups of data
SQL Sequence
Sequence is a feature supported by some database systems to produce unique values on demand.
Some DBMS like MySQL supports AUTO_INCREMENT in place of Sequence.
AUTO_INCREMENT is applied on columns, it automatically increments the column value by 1
each time a new record is entered into the table. Sequence is also somewhat like
AUTO_INCREMENT but its has some extra features.
Creating Sequence
Syntax to create sequences is,
CREATE Sequencesequence-name
start with initial-value
increment by increment-value
maxvaluemaximum-value
cycle|nocycle
Query Example
Max salary
SELECT MAX(salary) FROM Employee
WHERE Salary NOT IN (SELECT Max(Salary) FROM Employee);
SELECT MAX(Salary) From Employee
WHERE Salary < ( SELECT Max(Salary) FROM Employee);
SELECT Id, Salary FROM Employee e
WHERE 1 (N-1) = (SELECT COUNT(DISTINCT Salary)
FROM Employee p WHERE e.Salary<p.Salary
Only Distinct Value
SELECT NAME,COUNT(NAME) FROM STUD GROUP BY NAME
HAVING COUNT(NAME)>1;
Positive/Negative Value
SELECT
(SELECT COUNT(roll_no) FROM stud WHERE roll_no>0)
Positevalue ,
(SELECT COUNT(roll_no) FROM stud WHERE roll_no<0)
Negativevalue
Current Date
SELECT CURDATE();
Difference between Truncate and Delete
Truncate Delete
We can’t Rollback after performing We can rollback after delete
truncate. Example
Example
Begain tran Begain tran
Trancate table tranTest; Delete from tranTest;
Select * from tranTest; Select * from tranTest;
Rollback; Rollback;
Select * from tranTest; Select * from tranTest;
Truncate reset identity of table Truncate reset identity of table
It locks the entire table. It locks the table row.
Its DDL(Data Definition Language) Its DML(Data Manipulation Language)
command. command.
We can’t use where cluses with it. We can use where to filter data to delete.
Trigger is not fired while truncate. Trigger is fired.
Syntax : Syntax :
Trancate table tablename 1. Delete from tablename
2. Delete from tablename
where
columnanme=condition
Difference Primary Key and Unique Key

Primary Key Unique Key
Primary Key cannot accept null value Unique Key can accept only one null
value.
We can have only one primary key in a We can have more than one unique key.
table.
Primary key can be made foreign key Unique key can be made foreign key into
into another table another table
By default it adds clustered index By default it adds unique non clustered
index
Difference among Delete , Drop and Truncate
Type Delete Drop Truncate
Usage Remove row from a Delete a table from Remove all
table. the database /Data rows from a
dictionary table.
Type command DML DDL DDL
Rollback Can be rollback Can’t rollback Can’t rollback
Rows ,indexes Only table row are Table rows, indexes Only table rows
and privileges deleted. and privileges are are deleted.
deleted.
DML trigger Trigger is fired No triggers are fired No triggers are
firing fired
Performance Slower than truncate Quick but could lead Faster than
to complications delete.
Uno space Uses Uno space Does not use Uno Uses Uno space
space but not much as
delete
Permanent Does not remove the Remove all records Remove the
Deletion record permanently ,indexes and record
privileges permanently
permanently
Where clause Yes No No
Row deletion Deletes all rows or Deletes all rows Deletes all rows
some rows.
Difference Procedure and Function
Procedure Function
Procedure does not return a value through return Function returns value by return
statement. statement.
Return statement may or may not be present in Return statement has must be present in
procedure. function.
In Procedure return statement is just that without Return statement in function must
any expression. contain a expression, expression can be
Return statement in procedure is used only to variable, hard coded values an arithmetic
transfer control back to calling program expression involved in column.
The return data type is not required in procedure. The return data type is required to declare
in a function.
Procedure is call as a standalone call like a A function has to be called as part of an
command in any other procedure, function or SQL statement or part of an expression
trigger only.
Procedure may return no value or multiple value Function returns a single value.
time outmode parameters.
Procedure does not purity level Function has purity level.
Procedure are mainly written manipulate and Function normally should not be used to
process the data from the table. manipulate the data.
Advantage of subprogram (Procedures & Functions)
 Extensibility
 Modularity
 Reusability
 Maintainability
 Abstraction & Data Hiding
 Security
PLSQL
Cursor
A cursor is a pointer to this context area. PL/SQL controls the context area through a cursor. A
cursor holds the rows (one or more) returned by a SQL statement. The set of rows the cursor
holds is referred to as the active set.
You can name a cursor so that it could be referred to in a program to fetch and process the rows
returned by the SQL statement, one at a time. There are two types of cursors −
1. Implicit cursors
2. Explicit cursors
Attribute Description
%FOUND Returns TRUE if an INSERT, UPDATE, or DELETE statement
affected one or more rows or a SELECT INTO statement returned
one or more rows. Otherwise, it returns FALSE.
%NOTFOUND The logical opposite of %FOUND. It returns TRUE if an INSERT,
UPDATE, or DELETE statement affected no rows, or a SELECT
INTO statement returned no rows. Otherwise, it returns FALSE.
%ISOPEN Always returns FALSE for implicit cursors, because Oracle closes the
SQL cursor automatically after executing its associated SQL
statement.
%ROWCOUNT Returns the number of rows affected by an INSERT, UPDATE, or
DELETE statement, or returned by a SELECT INTO statement.
To create an explicit cursor you need to follow 5 steps.

1. Declare
2. Open
3. Fetch
4. Close
5. Reallocates
Triggers
Triggers are stored programs, which are automatically executed or fired when some events occur.
Triggers are, in fact, written to be executed in response to any of the following events –
 A database manipulation (DML) statement (DELETE, INSERT, or UPDATE)

 A database definition (DDL) statement (CREATE, ALTER, or DROP).
 A database operation (SERVERERROR, LOGON, LOGOFF, STARTUP, or
SHUTDOWN)
Database
UPDATE TRIGGER
BEGIN
Application
……………….
UPDATE TABLE SET… INSERT TRIGGER
TABLE
INSERT INTO TABLE… BEGIN
DELETE FROM ……………….

DELETE TRIGGER
TABLE...
BEGIN
……………….
Example
CREATE OR REPLACE TRIGGER orders_after_insert
AFTER INSERT
ON orders
FOR EACH ROW
DECLARE
v_username varchar2(10);
BEGIN
-- Find username of person performing the INSERT into

the table
SELECT user INTO v_username
FROM dual;
-- Insert record into audit table

INSERT INTO orders_audit
( order_id,
quantity,
cost_per_item,
total_cost,
username )
VALUES
( :new.order_id,
:new.quantity,
:new.cost_per_item,
:new.total_cost,
v_username );
END;
Triggers can be defined on the table, view, schema, or database with which the event is
associated.
Benefits of Triggers
Triggers can be written for the following purposes –
 Generating some derived column values automatically
 Enforcing referential integrity
 Event logging and storing information on table access
 Auditing
 Synchronous replication of tables
 Imposing security authorizations
 Preventing invalid transactions
Package
 A package is a schema object that groups logically related PL/SQL types, variables,
constants, subprograms, cursors, and exceptions. A package is compiled and stored in the
database, where many applications can share its contents.
Advantage of Package
 Less I/O, More efficiency
 Program overloading is available only for package subprograms where as standalone sub
program can not be overloaded.
 Avoid dependencies.
 Variable declared in the package specification is global.
E-R Diagram
ER-Diagram is a visual representation of data that describes how data is related to each other.
Symbols and Notations
Components of E-R Diagram
The E-R diagram has three main components.

Entity/Data Object
An Entity can be any object, place, person or class. In E-R Diagram, an entity is represented
using rectangles. Consider an example of an Organization. Employee, Manager, Department,
Product and many more can be taken as entities from an Organization.
Weak Entity
Weak entity is an entity that depends on another entity. Weak entity doen't have key attribute of
their own. Double rectangle represents weak entity.
Attribute
[
An Attribute describes a property or characteristic of an entity. For example, Name, Age,

Address etc can be attributes of a Student. An attribute is represented using eclipse.
Key Attribute
Key attribute represents the main characteristic of an Entity. It is used to represent Primary key.
Ellipse with underlying lines represent Key Attribute.
Composite Attribute
An attribute can also have their own attributes. These attributes are known as Composite
attribute.
Relationship
A Relationship describes relations between entities. Relationship is represented using diamonds
There are three types of relationship that exist between Entities.

1. Binary Relationship
2. Recursive Relationship
3. Ternary Relationship
Binary Relationship
Binary Relationship means relation between two Entities. This is further divided into three types.
1. One to One : This type of relationship is rarely seen in real world.
 The above example describes that one student can enroll only for one course and a course
will also have only one Student. This is not what you will usually see in relationship.
2. One to Many: It reflects business rule that one entity is associated with many number of
same entity. The example for this relation might sound a little weird, but this means that
one student can enroll to many courses, but one course will have one Student.
 The arrows in the diagram describes that one student can enroll for only one
course.
3. Many to One: It reflects business rule that many entities can be associated with just one
entity. For example, Student enrolls for only one Course but a Course can have many
Students.
4. Many to Many :
The above diagram represents that many students can enroll for more than one course.
Recursive Relationship
When an Entity is related with itself it is known as Recursive Relationship.
Ternary Relationship
Relationship of degree three is called Ternary relationship.
Generalization
Generalization is a bottom-up approach in which two lower level entities combine to form a
higher-level entity. In generalization, the higher-level entity can also combine with other lower
level entity to make further higher level entity.
Specialization
Specialization is opposite to Generalization. It is a top-down approach in which one higher level
entity can be broken down into two lower level entities. In specialization, some higher level
entities may not have lower-level entity sets at all.
Aggregation
Aggregation is a process when relation between two entities is treated as a single entity. Here the
relation between Center and Course is acting as an Entity in relation with Visitor.
Weak entity
An entity set that does not possess sufficient attributes to form a primary key is called a weak
entity set.
Strong entity set

One that does have a primary key is called a strong entity set.
Data Flow Diagram

Data flow diagram is graphical representation of flow of data in an information system. It is
capable of depicting incoming data flow, outgoing data flow and stored data. The DFD does not
mention anything about how data flows through the system.
Types of DFD
Logical DFD - This type of DFD concentrates on the system process and flow of data in the
system. For example in a Banking software system, how data is moved between different entities.
Physical DFD - This type of DFD shows how the data flow is actually implemented in the
system. It is more specific and close to the implementation.
DFD Components
 Entities - Entities are source and destination of information data. Entities are represented
by rectangles with their respective names.
 Process - Activities and action taken on the data are represented by Circle or Round-
edged rectangles.
 Data Storage - There are two variants of data storage - it can either be represented as a
rectangle with absence of both smaller sides or as an open-sided rectangle with only one
side missing.
 Data Flow - Movement of data is shown by pointed arrows. Data movement is shown
from the base of arrow as its source towards head of the arrow as destination.
Entity Process Data Store Data flow
Symbol Name Meaning

Data flow Represent flows of data
Process Represents activities and processes
including data processing/ conversion.
Data store Represents stored data(e.g ledgers, files,
databases)
Data source (External) Represents the originating orgins(i.e
source) or destination(i.e sink) of data.
UML (Unified Modeling Language)

UML is a standard unified modeling language approved by the OMG (Object Management
Group) (a standardization body for object-oriented technologies). It is used in the notation of
deliverables (e.g., specification documents) in object-oriented development, from analysis to
design, implementation, and testing.
1. Class diagram
2. Use case diagram
3. Sequence diagram
4. Communication diagram (collaboration diagram)
5. State machine diagram (state chart diagram)
6. Activity diagram
7. Component diagram
8. Object diagram
9. Package diagram
10. Timing diagram
Use case diagram
A use case diagram at its simplest is a representation of a user's interaction with the system that
shows the relationship between the user and the different use cases in which the user is involved.
A use case diagram can identify the different types of users of a system and the different use
cases and will often be accompanied by other types of diagrams as well.
ER-Diagram ATM:
Indexing
Indexing is a data structure technique to efficiently retrieve records from the database files based
on some attributes on which the indexing has been done. Indexing in database systems is similar
to what we see in books.
Benefits
 Improve the search efficiency
 Consist of two fields (key and block point).
 Index is an order file.
 Searching can be binary

Average no of block access to access a record is Log2B
Indexing is defined based on its indexing attributes. Indexing can be of the following types −
 Primary Index − Primary index is defined on an ordered data file. The data file is
ordered on a key field. The key field is generally the primary key of the relation.
 Secondary Index − Secondary index may be generated from a field which is a candidate
key and has a unique value in every record, or a non-key with duplicate values.
 Clustering Index − Clustering index is defined on an ordered data file. The data file is
ordered on a non-key field.
Ordered Indexing is of two types −
1. Dense Index
2. Sparse Index
Employee Id Employee Name Block

1 A B1
2 B
3 C B2
4 D
5 E B3
6 F
Dense Index
In dense index, there is an index record for every search key value in the database. This makes
searching faster but requires more space to store index records itself. Index records contain search
key value and a pointer to the actual record on the disk.
Search Key Block Point

1 B1
2 B1
3 B2
4 B2
5 B3
6 B3
Sparse Index
In sparse index, index records are not created for every search key. An index record here contains
a search key and an actual pointer to the data on the disk. To search a record, we first proceed by
index record and reach at the actual location of the data. If the data we are looking for is not
where we directly reach by following the index, then the system starts sequential search until the
desired data is found.
Search Key Block Point
1 B1
3 B2
5 B3
Multilevel Index
Index records comprise search-key values and data pointers. Multilevel index is stored on the disk
along with the actual database files. As the size of the database grows, so does the size of the
indices. There is an immense need to keep the index records in the main memory so as to speed
up the search operations. If single-level index is used, then a large size index cannot be kept in
memory which leads to multiple disk accesses.
Hash Organization
Bucket − A hash file stores data in bucket format. Bucket is considered a unit of storage. A
bucket typically stores one complete disk block, which in turn can store one or more records.
Hash Function − A hash function, h, is a mapping function that maps all the set of search-keys
K to the address where actual records are placed. It is a function from search keys to bucket
addresses.
There are two types of hash file organizations –
1. Static Hashing.
2. Dynamic Hashing
Static Hashing
In this method of hashing, the resultant data bucket address will be always same. That means, if
we want to generate address for EMP_ID = 103 using mod (5) hash function, it always result in
the same bucket address 3. There will not be any changes to the bucket address here. Hence
number of data buckets in the memory for this static hashing remains constant throughout. In our
example, we will have five data buckets in the memory used to store the data.
Operation
Insertion − When a record is required to be entered using static hash, the hash function h
computes the bucket address for search key K, where the record will be stored.
Bucket address = h(K)
Search − When a record needs to be retrieved, the same hash function can be used to retrieve the
address of the bucket where the data is stored.
Delete − This is simply a search followed by a deletion operation.

Bucket Overflow
The condition of bucket-overflow is known as collision. This is a fatal state for any static hash
function. In this case, overflow chaining can be used.
Overflow Chaining − When buckets are full, a new bucket is allocated for the same hash result
and is linked after the previous one. This mechanism is called Closed Hashing.
Linear Probing − When a hash function generates an address at which data is already stored, the
next free bucket is allocated to it. This mechanism is called Open Hashing.
Dynamic Hashing
The problem with static hashing is that it does not expand or shrink dynamically as the size of the
database grows or shrinks. Dynamic hashing provides a mechanism in which data buckets are
added and removed dynamically and on-demand. Dynamic hashing is also known as extended
hashing.
Hash function, in dynamic hashing, is made to produce a large number of values and only a few
are used initially.
Data center
A data center is a facility used to house computer systems and associated components, such as
telecommunications and storage systems. It generally includes redundant or backup power
supplies, redundant data communications connections, environmental controls (e.g. air
conditioning, fire suppression) and various security devices. A large data center is an industrial
scale operation using as much electricity as a small town.
Tier 1 to 4 data center is nothing but a standardized methodology used to define uptime of data
center. This is useful for measuring:
a) Data center performance
b) Investment
c) ROI (return on investment)
Tier 4 data center considered as most robust and less prone to failures. Tier 4 is designed to host
mission critical servers and computer systems, with fully redundant subsystems (cooling, power,
network links, storage etc.) and compartmentalized security zones controlled by biometric access
controls methods. Naturally, the simplest is a Tier 1 data center used by small business or shops.
 Tier 1 = Non-redundant capacity components (single uplink and servers).
 Tier 2 = Tier 1 + Redundant capacity components.
 Tier 3 = Tier 1 + Tier 2 + Dual-powered equipment’s and multiple uplinks.
 Tier 4 = Tier 1 + Tier 2 + Tier 3 + all components are fully fault-tolerant including
uplinks, storage, chillers, HVAC systems, servers etc. Everything is dual-powered.
Data Center Availability According To Tiers
The levels also describe the availability of data from the hardware at a location as follows:
 Tier 1: Guaranteeing 99.671% availability.
Blade Server
A blade server is a stripped-down server computer with a modular design optimized to minimize
the use of physical space and energy. Blade servers have many components removed to save
space, minimize power consumption and other considerations, while still having all the functional
components to be considered a computer. Unlike a rack-mount server, a blade server needs a
blade enclosure, which can hold multiple blade servers, providing services such as power,
cooling, networking, various interconnects and management. Together, blades and the blade
enclosure, form a blade system. Different blade providers have differing principles regarding
what to include in the blade itself, and in the blade system as a whole.
RAID
RAID is a technology that is used to increase the performance and/or reliability of data storage.
The abbreviation stands for Redundant Array of Inexpensive Disks. A RAID system consists of
two or more drives working in parallel.
This article covers the following RAID levels:

 RAID 0 – striping
 RAID 1 – mirroring
 RAID 5 – striping with parity
 RAID 6 – striping with double parity
 RAID 10 – combining mirroring and striping
RAID level 0 – Striping
Advantages
 RAID 0 offers great performance, both in read and writes operations. There is no
overhead caused by parity controls.
 All storage capacity is used, there is no overhead.
 The technology is easy to implement.
Disadvantages
 RAID 0 is not fault-tolerant. If one drive fails, all data in the RAID 0 array are lost. It
should not be used for mission-critical systems.
RAID level 1 – Mirroring
Advantages
 RAID 1 offers excellent read speed and a write-speed that is comparable to that of a
single drive.
 In case a drive fails, data do not have to be rebuilt, they just have to be copied to the
replacement drive.
 RAID 1 is a very simple technology.
Disadvantages
 The main disadvantage is that the effective storage capacity is only half of the total drive
capacity because all data get written twice.
 Software RAID 1 solutions do not always allow a hot swap of a failed drive. That means
the failed drive can only be replaced after powering down the computer it is attached to.
For servers that are used simultaneously by many people, this may not be acceptable.
Such systems typically use hardware controllers that do support hot swapping
RAID level 5
Advantages
 Read data transactions are very fast while write data transactions are somewhat slower
(due to the parity that has to be calculated).
 If a drive fails, you still have access to all data, even while the failed drive is being
replaced and the storage controller rebuilds the data on the new drive.
Disadvantages
 Drive failures have an effect on throughput, although this is still acceptable.
 This is complex technology. If one of the disks in an array using 4TB disks fails and is
replaced, restoring the data (the rebuild time) may take a day or longer, depending on the
load on the array and the speed of the controller. If another disk goes bad during that
time, data are lost forever.
RAID level 6 – Striping with double parity
Advantages
 Like with RAID 5, read data transactions are very fast.
 If two drives fail, you still have access to all data, even while the failed drives are being
replaced. So RAID 6 is more secure than RAID 5.
Disadvantages
 Write data transactions are slower than RAID 5 due to the additional parity data that
have to be calculated. In one report I read the write performance was 20% lower.
 Drive failures have an effect on throughput, although this is still acceptable.
 This is complex technology. Rebuilding an array in which one drive failed can take a
long time.
RAID level 10 – combining RAID 1 & RAID 0
Advantages
 If something goes wrong with one of the disks in a RAID 10 configuration, the rebuild
time is very fast since all that is needed is copying all the data from the surviving mirror
to a new drive. This can take as little as 30 minutes for drives of 1 TB.
Disadvantages
 Half of the storage capacity goes to mirroring, so compared to large RAID 5 or RAID 6
arrays, this is an expensive way to have redundancy.
Big Data
 Big data is a term that describes the large volume of data – both structured and
unstructured – that inundates a business on a day-to-day basis.
Why Big Data
 Increase of storage capacities
 Increase of processing power
 Availability of data
 Every day we create 2.5 quintillion bytes of data; 90% of the data in the world today has
been created in the last two years alone
Sources of Big Data
 Social networking sites: Facebook, Google, LinkedIn all these sites generates huge
amount of data on a day to day basis as they have billions of users worldwide.
 E-commerce site: Sites like Amazon, Flipkart, Alibaba generates huge amount of logs
from which users buying trends can be traced.
 Weather Station: All the weather station and satellite gives very huge data which are
stored and manipulated to forecast weather.
 Telecom Company: Telecom giants like Airtel, Vodafone study the user trends and
accordingly publish their plans and for this they store the data of its million users.
 Share Market: Stock exchange across the world generates huge amount of data through
its daily transaction.
3V's of Big Data

 Velocity: The data is increasing at a very fast rate. It is estimated that the volume of data
will double in every 2 years.
 Variety: Now a day’s data are not stored in rows and column. Data is structured as well
as unstructured. Log file, CCTV footage is unstructured data. Data which can be saved in
tables are structured data like the transaction data of the bank.
 Volume: The amount of data which we deal with is of very large size of Peta bytes
Issues
 Huge amount of unstructured data which needs to be stored, processed and analyzed
Solution
 Storage: This huge amount of data, Hadoop uses HDFS (Hadoop Distributed File
System) which uses commodity hardware to form clusters and store data in a
distributed fashion. It works on Write once, read many times principle.
 Processing: Map Reduce paradigm is applied to data distributed over network to find
the required output.
 Analyze: Pig, Hive can be used to analyze the data.
 Cost: Hadoop is open source so the cost is no more an issue.
SQL Vs NoSQL
SQL NoSQL
Databases are categorized as Relational NoSQL databases are categorized as Non-
Database Management System relational or distributed database system.
(RDBMS).
SQL databases have fixed or static or NoSQL databases have dynamic schema.
predefined schema.
SQL databases display data in form of NoSQL databases display data as collection of
tables so it is known as table-based key-value pair, documents, graph databases or
database. wide-column stores.
SQL databases are vertically scalable. NoSQL databases are horizontally scalable.
SQL databases use a powerful In NoSQL databases, collection of documents
language Structured Query are used to query the data. It is also called
Language to define and manipulate unstructured query language. It varies from
the data. database to database.
SQL databases are best suited for NoSQL databases are not so good for complex
complex queries. queries because these are not as powerful as
SQL queries.
SQL databases are not best suited for NoSQL databases are best suited for
hierarchical data storage. hierarchical data storage.
MySQL, Oracle, Sqlite, PostgreSQL MongoDB, BigTable, Redis, RavenDB,

and MS-SQL etc. are the example of Cassandra, Hbase, Neo4j, CouchDB etc. are
SQL database. the example of nosql database
Oracle Vs Mysql
Feature Oracle Mysql

Strengths Aircraft carrier database Price/Performance Great
capable of running large OLTP performance when
and VLDBs. applications leverage
architecture.
Database Product  Enterprise ($$$$)  Enterprise ($) –
 Standard ($$) supported,more
 Standard One ($) stable.
 Express (Free) - up  Community (free)
to 4GB
Application More you do in the database Web applications often don’t
Perspective the more you will love Oracle leverage database server
with compiled PL/SQL, functionality. Web apps more
XML,APEX, Java, etc. concerned with fast reads.
Adminstration Requires lots of in-depth Can be trivial to get it setup
knowledge and skill to manage and running. Large and
large environments. Can get advanced configurations can
extremely complex but also get complex.
very powerful.
Popularity Extremely popular in Fortune Extremely popular with web
100, medium/large enterprise companies,
business applications and startups,small/medium
medium/large data warehouses. businesses, small/medium
projects.
Application Domain Medium/Large OLTP and Web (MySQL excels) Data
enterprise applications. Oracle Warehouse Gaming
excels in large business Small/media OLTP
applications. Medium/Large environmnets
data warehouse
Development 1) Java 1) PHP
Environments 2) .NET 2) JAVA
3) APEX 3) Ruby on Rails
4) Ruby on Rails 4) .NET
5) PHP 5) Perl
Database Database instance has Database Instance stores
Server(Instance) numerous background global memory in mysqld
processes dependent on background process. User
configuration. System Global sessions are managed through
Area is shared memory for threads.
SMON, PMON,DBWR,
LGWR, ARCH, RECO,
etc.Sessions are managed
through server processes.
Database Server Uses tablespaces for system Made up of database
metadata, user data and schemas.
indexes. Common tablespaces
include
Partitioning $$$ with lots of options Free, basic features
Replication $$$, lots of features and Free, relatively easy to setup
options. Much higher and manage. Basic features
complexity with a lot of but works great. Great
features. Allows a lot of data horizontal scalability.
filtering and manipulation.
Transactions Regular and Index only tables
InnoDB and upcoming
support transactions. Falcon and Maria storage
engines
Backup/Recovery Recovery Manager (RMAN) No online backup builtin.
supports hot backups and runs
as a separate central repository
for multiple Oracle database
servers.
Export/Import More features. Easy, very basic.
Data Dictionary(catalog) Data dictionary offers lots of Information_schema and
detailed information for tuning. mysql database schemas offer
Oracle starting to charge for basic metadata.
use of new metadata structures.
Management/Monitoring $$$$, Grid Control offers lots $, MySQL Enterprise
of functionality. Lots of 3rd Monitor offers basic
party options such as Quest. functionality. Additional
open source solutions. May
also use admin scripts.
Storage Tables managed in tablespaces. Each storage engine uses
ASM offers striping and different storage. Varies from
mirroring using cheap fast individual files to
disks. tablespaces.
Stored Procedures Advanced features, runs Very basic features, runs
interpreted or compiled. Lots interpreted in session threads.
of built in packages add Limited scalability.
significant functionality.
Extremely scalable.
Comparison between Rack and Blade Servers
Rack Servers Blade Servers
Definition Rack servers are also known as traditional A blade server is a stripped down
servers. They are essentially stand alone computer server that is based on a
computers on which applications are run. All the modular design. It minimizes the
components like hard drives, a network card, use of physical space.
etc. are contained in a case.
Origin Rack servers are specially designed to be stored Blade comes from the word
in racks, hence the name rack server. “blade” indicating the restricted
format).
Focus Rack servers are very expandable Comparatively less
Power More Less
Demand
Maintenance More Less
Cost More Less
Size Comparatively large Compact
Cabling More Less
Suitable for Small business Extended organizations
Benefits  Make it easy to keep things neat  Lower acquisition cost
and orderly (most include some  Lower operational cost
kind of cable management) for deployment
 Known to be very expandable  Lower cost for
 Many rack servers support large troubleshooting and
amounts of RAM repair
 Lower power
requirements
 Lower space and
cooling requirements
 Reduces the cabling
requirements
 Very efficient on out-
of-band management
 Allows faster server-to-
server communication.
 They offer greater
flexibility
Configurations Available in multiple U iterations Only available in 2U
configurations.
Example Dell PowerEdge R320, R420 and R520. Dell PowerEdge M series
Design Stand alone Modular
Disadvantages Consumes more physical rack space. Reliability on the chassis
Mount inside a Special rack Chassis
Model Test
1. Which one of the following attribute can be taken as a primary key?

a) Name
b) Street
c) Id
d) Department
2. Which one of the following is a set of one or more attributes taken collectively to
uniquely identify a record?
a) Candidate key
b) Sub key
c) Super key
d) Foreign key
3. In SQL, which command is used to add a column/ integrity constraint to a table-
a. ADD COLUMN b. INSERT COLUMN c. MODIFY TABLE d. ALTER TABLE
4. In SQL, which command(s) are is used to enable/disable a database trigger?
a. ALTER TRIGGER b. ALTER DATABASE c. ALTER TABLE d. MODIFY TRIGER
5. In a relational schema, each tuple is divided into fields called-
a. Relations b. Domains c. Queries d. All of the above
6. In SQL, which command is used to changes data in a table?
a. UPDATE b. INSERT c. BROWSE d. APPEND
7. The_____ operation allows the combining of two relations by merging pairs of
tuples, one from each relation, into a single tuple.
a) Select
b) Join
c) Union
d) Intersection
8. In a large DBMS_
a. each user can “see” only a small part of the entire database
b. each user can access every sub-schema
c. each subschema contains every field in the logical schema
d. All of the above
9. Which of the following command(s) is used to recompile a stored procedure in
SQL?
a. COMPILE PROCEDURE b. ALTER PROCEDURE
c. MODIFY PROCEDURE d. All of the above
10. Internal auditors should review data system design before they are-
a. developed b. implemented c. modified d. All of the above
11. A____ means that one record in a particular record types may be related to more
than one record of another record type.
a. One-to-one relationship b. One-to-many relationship
c. Many-to-one relationship d. Many-to-many relationship
12. Which command is used to redefine a column of the table in SQL?
a. ALTER TABLE b. DEFINE TABLE c.MODIFY TABLE d. All of the above
13. Which command is used to enable/disable/drop an integrity constraint in SQL?
a. DEFINE TABLE b. MODIFY TABLE c. ALTER TABLE d. All of the above
14. An attribute A of datatype varchar(20) has the value “Avi”. The attribute B of
datatype char(20) has value ”Reed”. Here attribute A has ____ spaces and attribute
B has ____ spaces.
a) 3, 20
b) 20, 4
c) 20, 20
d) 3, 4
15. The language used in application programs to request data from the DBMS is
referred to as the-
a. DML b. DDL c. query language d. All of these above
16. A database management system might consist of application programs and a
software package called
a. FORTRAN b. AUTOFLOW c. BPL d. TOTAL
17. An audit trail
a. is used to make back-up copies b. is the recorded history of operations performed on a file
c. can be used to restore lost information d. All of the above
18. A race condition occurs when
a. Two concurrent activities interact to cause a processing error
b. two users of the DBMS are interacting with different files at the same time
c. both (a) and (b)
d. None of the above
19. An indexing operation
a. sorts a file using a single key b. sorts using two keys
c. establishes an index for a file d. both (b) and (c)
20. The on-line softcopy display a customer’s charge account to respond to an inquiry
is an example of a
a. forecasting report b. exception report
c. regularly scheduled report d. on demand report
21. In SQL, which command is used to create a synonym for a schema object?
a. CREATE SCHEMA b. CREATE SYNONYM
c. CREATE SAME d. All of the above
22. If you want your database to include methods, you should use a _______database.
a. Network b. Distributed c. Hierarchical d. Object-Oriented
23. In SQL, which of the following is not a data Manipulation Language Commands?
a. DELETE b. SELECT c. UPDATE d. CREATE
24. Which of the following is not characteristic of a relational database model?
a. tables b. treelike structure c. complex logical relationships d. records
25. A computer file contains several records. What does each record contain?
a. Bytes b. Words c. Fields d. Database
26. In SQL, the CREATE VIEW command is used
a. to recompile view b. to define a view of one or more tables or views
c. to recompile a table d. to create a trigger
27. A ______ Contains the smallest unit of meaningful data, so you might call it the
basic building block for adata file.
a. File Structure b. Records c. Fields d. Database
28. CREATE TABLE employee (name VARCHAR, id INTEGER)
What type of statement is this?
a) DML
b) DDL
c) View
d) Integrity constraint
29. In SQL, which command is used to create a database user?
a. ADD USER TO DATABASE b. MK USER
c. CREATE USER d. All of the above
30. A _____ means that one record in a particular record type is related to only one
record of another record type.
a. One-to-one relationship b. One-to-many relationship
c. Many-to-many relationship d. Many-to-many relationship
Model Test Answer
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
c c d a b a b a b d b a c a a
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
c b a c d b d d b c b c b c a

Database Management System

Uploaded by

Copyright:

Available Formats

Database Management System

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Database Management System

Uploaded by

Copyright:

Available Formats

Cloud IT Solution Page 225

Database Management System

Characteristics of Database Management System

File System [ICT Ministry (AP)-2017]

Advantages of DBMS over File system

 Data redundancy and inconsistency – Redundancy is the concept of repetition of data

ID Name Phone DOB

ID ClassID Sem ClassID Title Class Num

Data Abstraction in DBMS

Types of DBMS Architecture

There are three types of DBMS architecture:

1. Single tier architecture

1. Single tier architecture

3. Three tier architecture

SQL term Relational database term Description

DDL: Data Definition Language

DML: Data Manipulation Language

DCL: Data Control Language

DDL DML DCL TCL

Create Select Grant Commit

Previous year question:

4. What is DDL and SML command? Explain.

ALTER TABLE table_name

ALTER TABLE table_name

NVL: NVL (‘commission_pct' , ' ' ) or NVL (

Normalization of Database [Pubali Bank(SO)-2018]

Normalization is used for mainly two purposes

1. First Normal Form

First Normal Form (1NF)

 Each table cell should contain a single value.

Student Age Subject

Student Age Subject

Third Normal Form (3NF)

Transitive functional dependency

Microsoft Satya Nadella 51

Google Sundar Pichai 46

Previous year question:

1. Why do we need to normalize a database? [Combined(AP)-2018] Ans.: a

ACID Properties [ICB(AP) -2017,BDBL(IT) -2017]

4. Durability: “committed data stored forever”.The database should be durable enough to

Example: Let’s see the STUDENT table

Here in STUDENT table keys are:

Previous year question:

 Terminating processes involved in deadlock: Terminating all the processes involved in

SELECT a.ID, b.NAME, a.SALARY

Difference Primary Key and Unique Key

To create an explicit cursor you need to follow 5 steps.

 A database manipulation (DML) statement (DELETE, INSERT, or UPDATE)

DELETE FROM ……………….

-- Find username of person performing the INSERT into

-- Insert record into audit table

Components of E-R Diagram

The E-R diagram has three main components.

An Attribute describes a property or characteristic of an entity. For example, Name, Age,

There are three types of relationship that exist between Entities.

Strong entity set

Data Flow Diagram

Entity Process Data Store Data flow

Symbol Name Meaning

UML (Unified Modeling Language)