Database Management System
Database Management System
Database Management System
Data:
Data is nothing but facts and statistics stored or free flowing over a network, generally it's raw
and unprocessed. Example: “The price of oil is 40 tk per litre” is a data. This is a raw value.
Database:
A Database is a collection of related data organised in a way that data can be easily accessed,
managed and updated.
Database Management System
Database management system is software which is used to manage the database. For
example: MySQL, Oracle, etc are a very popular commercial database which is used in
different applications.
DBMS provides an interface to perform various operations like database creation, storing
data in it, updating data, creating a table in the database and a lot more.
It provides protection and security to the database. In the case of multiple users, it also
maintains data consistency.
RDBMS
A relational database management system (RDBMS) is a database management system (DBMS)
that is based on the relational model.
Stduents
This process of hiding irrelevant details from user is called data abstraction. We have three
levels of abstraction
Physical level: This is the lowest level of data abstraction. It describes how data is actually stored
in database. You can get the complex data structure details at this level.
Logical level: This is the middle level of 3-level data abstraction architecture. It describes what
data is stored in database.
View level: Highest level of data abstraction. This level describes the user interaction with
database system.
Cloud IT Solution Page 228
Example: Let’s say we are storing customer information in a customer table. At physical
level these records can be described as blocks of storage (bytes, gigabytes, terabytes etc.) in
memory. These details are often hidden from the programmers.
At the logical level these records can be described as fields and attributes along with their data
types, their relationship among each other can be logically implemented. The programmers
generally work at this level because they are aware of such things about database systems.
At view level, user just interact with system with the help of GUI and enter the details at the
screen, they are not aware of how the data is stored and what data is stored; such details are
hidden from them.
DBMS Architecture
In this type of architecture, the database is readily available on the client machine; any request
made by client doesn’t require a network connection to perform the action on the database.
For example, let’s say you want to fetch the records of employee from the database and the
database is available on your computer system, so the request to fetch employee details will be
done by your computer and the records will be fetched from the database by your computer as
well. This type of system is generally referred as local database system.
Cloud IT Solution Page 229
2. Two tier architecture
In two-tier architecture, the Database system is present at the server machine and the DBMS
application is present at the client machine, these two machines are connected with each other
through a reliable network as shown in the above diagram.
Whenever client machine makes a request to access the database present at server using a query
language like sql, the server perform the request on the database and returns the result back to the
client. The application connection interface such as JDBC, ODBC are used for the interaction
between server and client.
In three-tier architecture, another layer is present between the client machine and server machine.
In this architecture, the client application doesn’t communicate directly with the database systems
present at the server machine, rather the client application communicates with server application
and the server application internally communicates with the database system present at the server.
Table
In Relational database model, a table is a collection of data elements organized in terms of rows
and columns. A table is also considered as a convenient representation of relations. But a table
can have duplicate row of data while a true relation cannot have duplicate data. Table is the
simplest form of data storage. Below is an example of an Employee table.
ID Name Age Salary
1 Adam 34 13000
2 Alex 28 15000
3 Stuart 20 18000
4 Ross 42 19020
All DDL commands are auto-committed. That means it saves all the changes permanently in the
database.
Command Description
create to create new table or database
alter for alteration
truncate delete data from table
drop to drop a table
rename to rename a table
SQL
Commands
1. Create table employee (name varchar,id integer). What type of statement Ans.: b
is this ? [Combined(AP)-2018]
a) DML b) DDL c) View d) Integrity constraint
2. The SQL statement that queries or reads data from a table is -- [ICB(AP)- Ans.: a
2017]
a) SELECT b)READ c) QUERY d) None of the above
3. To remove the duplicate rows from the result of an SQL Select statement, Ans.: b
the ------- qualifier specified include. [Combined(AME)-2018]
a)Only b) distinct c) Unique d) Single
SQL Comment:
CREATETABLE table_name (
column1 datatype,
column2 datatype,
column3 datatype
);
CREATETABLE Persons (
ID int NOTNULL,
LastName varchar(255) NOTNULL,
FirstName varchar(255),
Age int,
CONSTRAINT PK_Person PRIMARYKEY (ID,LastName)
);
ALTERTABLE Persons
ADDCONSTRAINT PK_Person PRIMARYKEY (ID,LastName);
ALTERTABLE Persons
DROPCONSTRAINT PK_Person;
CREATETABLE Orders (
OrderID int NOTNULL,
OrderNumber int NOTNULL,
PersonID int,
PRIMARYKEY (OrderID),
CONSTRAINT FK_PersonOrder FOREIGNKEY (PersonID)
REFERENCES Persons(PersonID)
);
ALTERTABLE Orders
ADDFOREIGNKEY (PersonID) REFERENCES Persons(PersonID);
ALTERTABLE Orders
DROPCONSTRAINT FK_PersonOrder;
DROPTABLE Shippers;
TRUNCATETABLE table_name;
ALTERTABLE table_name
ADD column_name datatype;
UPDATE table_name
SET column1 = value1, column2 = value2, ...
WHERE condition
Default Method
DECODE: DECODE (supplier_id, 10000, 'IBM',
10001, 'Microsoft',
10002, 'Hewlett Packard',
'Gateway') result
CASE:CASE supplier_id
WHEN '10000' THEN 'IBM'
WHEN '10001' THEN 'Microsoft'
WHEN '10002' THEN 'Hewlett Packard '
ELSE 'Gateway'
END
Student Table
In Student Table, the primary key will be Id column, and the foreign key for the Stu_subject
table is Id which is referenced bt Student table.
Id Student Subject
1 Adam Biology
1 Adam Maths
Cloud IT Solution Page 236
2 Alex Maths
3 Stuart Maths
Now, if somebody want’s to insert a subject in Stu_subject table which does not exist in Student
table then an error will be shown!
Example 1: If a relation is in 2NF then:
(a) Every candidate key is a primary key
(b) Every non-prime attribute is fully functionally dependent on each relation key
(c) Every attribute is functionally independent
(d) Every relational key is a primary key
Ans.: A table is in 2NF if it is 1NF and every non-prime attribute of the table is dependent on the
complete candidate key that means no non-prime attribute is partially dependent on any key.
Partially dependent means, dependent on a proper subset of key.
So, ans is b
Alibaba Jack Ma 54
{Company} -> {CEO} (if we know the compay, we know its CEO's name)
{CEO } -> {Age} If we know the CEO, we know the Age
Therefore according to the rule of rule of transitive dependency:
Cloud IT Solution Page 237
{ Company} -> {Age} should hold, that makes sense because if we know the company name, we
can know his age.
So, in mathmetical form we can say -- A functional dependency is said to be transitive if it is
indirectly formed by two functional dependencies. For Example:
X -> Z is a transitive dependency if the following three functional dependencies hold true:
X->Y
Y does not ->X
Y->Z
Note: You need to remember that transitive dependency can only occur in a relation of three or
more attributes.
Transitive functional dependency should be removed from the table and also the table must be in
Second Normal form. For example, consider a table with following fields.
Student_Detail Table
Student_id Student_name DOB Street city State Zip
In this table Student_id is Primary key, but street, city and state depends upon Zip. The
dependency between zip and other fields is called transitive dependency. Hence to apply 3NF,
we need to move the street, city and state to new table, with Zip as primary key.
New Student_Detail Table:
Student_id Student_name DOB Zip
Address Table :
Zip Street city state
The advantage of removing transitive dependency is-
Amount of data duplication is reduced.
Data integrity achieved.
The Boyce-Codd Normal Form
A relational schema R is in Boyce–Codd normal form (BCNF) if, for every one of its
dependencies X → Y, one of the following conditions holds true:
X → Y is a trivial functional dependency (i.e., Y is a subset of X)
X is a super key for schema R
1. Atomicity: This property ensures that either all the operations of a transaction reflect in
database or none. i.e “Either commits all or nothing”. Let’s take an example of banking
system to understand this:
Suppose Account A has a balance of tk 400 & B has tk 700. Account A is
transferring tk 100 to Account B. This is a transaction that has two operations
a) Debiting tk100 from A’s balance
A=old_balance-100
b) Crediting tk 100 to B’s balance.
B=Old_balance+100
Let’s say first operation passed successfully while second failed, in this case A’s balance
would be tk 300 while B would be having tk 700 instead of tk 800. This is unacceptable
in a banking system. Either the transaction should fail without executing any of the
operation or it should process both the operations. The Atomicity property ensures that.
2. Consistency: Every attribute in the database have some rules to ensure the stability of the
database. The constraint puts on the data value should be constant before and after the
execution of the transaction.If the system fails because of the invalid data while doing an
operation, revert back the system to its previous state.
Cloud IT Solution Page 239
Example: The total amount in A and B account should be the same before and after the
transaction. The sum of the money in A and B account before is 400+700= tk 1100 and
after the transaction is 300+800=tk 1100. So this transaction preserves consistency ACID
properties in DBMS.
3. Isolation: If you are performing multiple transactions on the single database, operation
from any transaction should not interfere with operation in other transaction. The
execution of all transaction should be isolated from other transaction (that means no other
transaction should run concurrently when there is a transaction already running). For
example account A is having a balance of tk 400 and it is transferring tk 100 to account
B & C both.
So we have two transactions here. Let’s say these transactions run concurrently and both
the transactions read tk 400 balance, in that case the final balance of A would be 300$
instead of tk 200. This is wrong. If the transaction were to run in isolation then the second
transaction would have read the correct balance tk 300 (before debiting tk100) once the
first transaction went successful.
Key types
Candidate Key
Primary Key
Unique Key
Alternate Key
Composite Key
Super Key
Foreign Key
Surrogate Key
Primary key
A column or columns is called primary key (PK) that uniquely identifies each row in the table.
If you want to create a primary key, you should define a PRIMARY KEY constraint when you
create or modify a table.
When multiple columns are used as a primary key, it is known as composite primary key.
Points to remember for primary key
Primary key enforces the entity integrity of the table.
Primary key always has unique data.
A primary key length cannot be exceeded than 900 bytes.
A primary key cannot have null value.
There can be no duplicate value for a primary key.
A table can contain only one primary key constraint.
Main advantage of primary key
The main advantage of this uniqueness is that we get fast access.
Foreign key
In the relational databases, a foreign key is a field or a column that is used to establish a link
between two tables.
In simple words you can say that, a foreign key in one table used to point primary key in another
table.
Super key
Super key=candidate key +zero/more attributes.
Every Candidate key is a super key.
But every super key is not a candidate key.
An attribute or set of attributes that uniquely defines a tuple within a relation. However, a
super key may contain additional attributes that are not necessary for unique
identification.
Example 1: A super key for an entity consist of
a) One attribute only
Cloud IT Solution Page 241
b) At least two attribute
c) At most two attribute
d) One or more attribute
Answer.: d
Candidate key
A super key such that no proper subset is a super key within the relation. So, basically has two
properties: Each candidate key uniquely identifies tuple in the relation; & no proper subset of the
composite key has the uniqueness property.
Alternate key
Any candidate key that has not been selected as the primary key.
An alternate key is just a candidate key that has not been selected as the primary key.
Composite key
When a candidate key consists of more than one attribute.
It may be a candidate key or primary key.
Surrogate Key
Surrogate keys are keys that have no business meaning and are solely used to identify a record in
the table. The surrogate key is not derived from application data. The surrogate is internally
generated by the system and is invisible to the user or application
What are the integrity rules in DBMS?
Data integrity is one significant aspect while maintaining the database. So, data integrity is
enforced in the database system by imposing a series of rules. Those set of integrity is known as
the integrity rules.
There are two integrity rules in DBMS:
Entity Integrity: It specifies that "Primary key cannot have a NULL value."
Referential Integrity: It specifies that "Foreign Key can be either a NULL value or should be the
Primary Key value of other relation
1. Which of the following is a group of one or more attributes that uniquely Ans.: a
identifies a row? [ICB(AP)-2017]
a) Key b)Determinant
c) Tuple d) Relation
Cloud IT Solution Page 242
2. In an Entity-Relationship many-to-many relationship corresponds to a- --- Ans.: a
in actual database. [JBL AEO(IT)-2015]
a) Table b)field c) row d) primary key
3. A primary key must also be -------------- [Combined(IT/ICT)-2018] Ans.: b
a) a) Foreign key b) Unique
c) Identical d) Case sensitive
4. To remove the duplicate rows from the result of an SQL Select statement, Ans.: b
the ------- qualifier specified include. [Combined(AME)-2018]
a)Only b) distinct c) Unique d) Single
5. The subset of super key is a candidate key under what condition? Ans. a
a) No proper subset is a super key b) All subsets are super keys
c) Subset is a super key d) Each subset is a super key
Explanation: The subset of a set cannot be the same set. Candidate key is a
set from a super key which cannot be the whole of the super set.
6. Difference between primary key, foreign key and candidate key. [Combined(AP-HBFC,KB)-
2018,ICB(AP) -2017]
7. Difference between primary key and super key. [Palli Sanchaya Bank (Programmer)-2018,EPB-2018]
8. Difference between super key and Unique constrain key. [Pubali Bank (so)-2018]
Deadlock in DBMS
A deadlock is a condition wherein two or more tasks are waiting for each other in order to be
finished but none of the task is willing to give up the resources that other task needs. In this
situation no task ever gets finished and is in waiting state forever.
Cloud IT Solution Page 243
Coffman conditions
Coffman stated four conditions for a deadlock occurrence. A deadlock may occur if all the
following conditions holds true.
Mutual exclusion condition: There must be at least one resource that cannot be used by
more than one process at a time.
Hold and wait condition: A process that is holding a resource can request for additional
resources that are being held by other processes in the system.
No preemption condition: A resource cannot be forcibly taken from a process. Only the
process can release a resource that is being held by it.
Circular wait condition: A condition where one process is waiting for a resource that is
being held by second process and second process is waiting for third process ….so on and
the last process is waiting for the first process. Thus making a circular chain of waiting.
Deadlock detection
Resource scheduler is one that keeps the track of resources allocated to and requested by
processes. Thus, if there is a deadlock it is known to the resource scheduler. This is how a
deadlock is detected.
Once a deadlock is detected it is being corrected by following methods:
Deadlock prevention
We have learnt that if all the four Coffman conditions hold true then a deadlock occurs so
preventing one or more of them could prevent the deadlock.
Removing mutual exclusion: All resources must be sharable that means at a time more
than one processes can get a hold of the resources. That approach is practically
impossible.
Removing hold and wait condition: This can be removed if the process acquires all the
resources that are needed before starting out. Another way to remove this to enforce a
rule of requesting resource when there are none in held by the process.
Preemption of resources: Preemption of resources from a process can result in rollback
and thus this needs to be avoided in order to maintain the consistency and stability of the
system.
Cloud IT Solution Page 244
Avoid circular wait condition: This can be avoided if the resources are maintained in a
hierarchy and process can hold the resources in increasing order of precedence. This
avoid circular wait. Another way of doing this to force one resource per process rule – A
process can request for a resource once it releases the resource currently being held by it.
This avoids the circular wait.
Deadlock Avoidance
Deadlock can be avoided if resources are allocated in such a way that it avoids the deadlock
occurrence. There are two algorithms for deadlock avoidance.
Wait/Die
Wound/Wait
JOIN
A JOIN clause is used to combine rows from two or more tables, views based on a related column
between them.
SELECT Orders.OrderID, Customers.CustomerName,
Orders.OrderDate
FROM Orders
INNERJOIN Customers ON Orders.CustomerID=Customers.CustomerID;
Different Types of SQL JOINs
(INNER) JOIN: Returns records that have matching values in both tables
LEFT (OUTER) JOIN: Return all records from the left table, and the matched records from the
right table
RIGHT (OUTER) JOIN: Return all records from the right table, and the matched records from
the left table
FULL (OUTER) JOIN: Return all records when there is a match in either left or right table
SELECT column_name(s)
FROM table1
INNER JOIN table2 ON table1.column_name =
table2.column_name;
Cloud IT Solution Page 245
SELECT column_name(s)
FROM table1
LEFTJOIN table2 ON table1.column_name = table2.column_name;
SELECT column_name(s)
FROM table1
RIGHTJOIN table2 ON table1.column_name =
table2.column_name;
SELECT column_name(s)
FROM table1
FULLOUTERJOIN table2 ON table1.column_name =
table2.column_name;
Self Join
A self-join is a query in which a table is joined (compared) to itself. Self-joins are used to
compare values in a column with other values in the same column in the same table. One
practical use for self-joins: obtaining running counts and running totals in an SQL query.
Example
Aggregate functions
The GROUP BY statement is often used with aggregate functions (COUNT, MAX, MIN, SUM,
AVG) to group the result-set by one or more columns.
Sequence of Clause
1. where
2. group by
3. having
4. order by
Example
SELECT COUNT(CustomerID), Country
FROM Customers
GROUPBY Country
ORDERBY COUNT(CustomerID) DESC;
SELECT COUNT(CustomerID), Country
FROM Customers
GROUPBY Country
HAVING COUNT (CustomerID) > 5
ORDERBY COUNT (CustomerID) DESC;
Cloud IT Solution Page 246
SQL View
A view in SQL is a logical subset of data from one or more tables. View is used to restrict data
access.
Syntax for creating a view
CREATE or REPLACE viewview_name AS
SELECT column_name(s)
FROM table_name
WHERE condition
Types of view
There are two types of view
1. Simple View
2. Complex View
Simple View Complex View
Created from one table Created from one or more
table
Does not contain functions Contain functions
Does not contain groups of data Contains groups of data
SQL Sequence
Sequence is a feature supported by some database systems to produce unique values on demand.
Some DBMS like MySQL supports AUTO_INCREMENT in place of Sequence.
AUTO_INCREMENT is applied on columns, it automatically increments the column value by 1
each time a new record is entered into the table. Sequence is also somewhat like
AUTO_INCREMENT but its has some extra features.
Creating Sequence
Syntax to create sequences is,
CREATE Sequencesequence-name
start with initial-value
increment by increment-value
maxvaluemaximum-value
cycle|nocycle
Query Example
Max salary
SELECT MAX(salary) FROM Employee
WHERE Salary NOT IN (SELECT Max(Salary) FROM Employee);
SELECT MAX(Salary) From Employee
WHERE Salary < ( SELECT Max(Salary) FROM Employee);
SELECT Id, Salary FROM Employee e
WHERE 1 (N-1) = (SELECT COUNT(DISTINCT Salary)
FROM Employee p WHERE e.Salary<p.Salary
Cloud IT Solution Page 247
Only Distinct Value
SELECT NAME,COUNT(NAME) FROM STUD GROUP BY NAME
HAVING COUNT(NAME)>1;
Positive/Negative Value
SELECT
(SELECT COUNT(roll_no) FROM stud WHERE roll_no>0)
Positevalue ,
(SELECT COUNT(roll_no) FROM stud WHERE roll_no<0)
Negativevalue
Current Date
SELECT CURDATE();
Difference between Truncate and Delete
Truncate Delete
We can’t Rollback after performing We can rollback after delete
truncate. Example
Example
Begain tran Begain tran
Trancate table tranTest; Delete from tranTest;
Select * from tranTest; Select * from tranTest;
Rollback; Rollback;
Select * from tranTest; Select * from tranTest;
Truncate reset identity of table Truncate reset identity of table
It locks the entire table. It locks the table row.
Its DDL(Data Definition Language) Its DML(Data Manipulation Language)
command. command.
We can’t use where cluses with it. We can use where to filter data to delete.
Trigger is not fired while truncate. Trigger is fired.
Syntax : Syntax :
Trancate table tablename 1. Delete from tablename
2. Delete from tablename
where
columnanme=condition
Procedure Function
Procedure does not return a value through return Function returns value by return
statement. statement.
Return statement may or may not be present in Return statement has must be present in
procedure. function.
In Procedure return statement is just that without Return statement in function must
any expression. contain a expression, expression can be
Return statement in procedure is used only to variable, hard coded values an arithmetic
transfer control back to calling program expression involved in column.
The return data type is not required in procedure. The return data type is required to declare
in a function.
Procedure is call as a standalone call like a A function has to be called as part of an
command in any other procedure, function or SQL statement or part of an expression
trigger only.
Procedure may return no value or multiple value Function returns a single value.
Cloud IT Solution Page 249
time outmode parameters.
Procedure does not purity level Function has purity level.
Procedure are mainly written manipulate and Function normally should not be used to
process the data from the table. manipulate the data.
Advantage of subprogram (Procedures & Functions)
Extensibility
Modularity
Reusability
Maintainability
Abstraction & Data Hiding
Security
PLSQL
Cursor
A cursor is a pointer to this context area. PL/SQL controls the context area through a cursor. A
cursor holds the rows (one or more) returned by a SQL statement. The set of rows the cursor
holds is referred to as the active set.
You can name a cursor so that it could be referred to in a program to fetch and process the rows
returned by the SQL statement, one at a time. There are two types of cursors −
1. Implicit cursors
2. Explicit cursors
Attribute Description
%FOUND Returns TRUE if an INSERT, UPDATE, or DELETE statement
affected one or more rows or a SELECT INTO statement returned
one or more rows. Otherwise, it returns FALSE.
%NOTFOUND The logical opposite of %FOUND. It returns TRUE if an INSERT,
UPDATE, or DELETE statement affected no rows, or a SELECT
INTO statement returned no rows. Otherwise, it returns FALSE.
%ISOPEN Always returns FALSE for implicit cursors, because Oracle closes the
SQL cursor automatically after executing its associated SQL
statement.
%ROWCOUNT Returns the number of rows affected by an INSERT, UPDATE, or
DELETE statement, or returned by a SELECT INTO statement.
UPDATE TRIGGER
BEGIN
Application
……………….
UPDATE TABLE SET… INSERT TRIGGER
TABLE
INSERT INTO TABLE… BEGIN
……………….
Example
CREATE OR REPLACE TRIGGER orders_after_insert
AFTER INSERT
ON orders
FOR EACH ROW
DECLARE
v_username varchar2(10);
BEGIN
END;
Triggers can be defined on the table, view, schema, or database with which the event is
associated.
Benefits of Triggers
Triggers can be written for the following purposes –
Generating some derived column values automatically
Enforcing referential integrity
Event logging and storing information on table access
Auditing
Synchronous replication of tables
Imposing security authorizations
Preventing invalid transactions
Package
A package is a schema object that groups logically related PL/SQL types, variables,
constants, subprograms, cursors, and exceptions. A package is compiled and stored in the
database, where many applications can share its contents.
Advantage of Package
Less I/O, More efficiency
Program overloading is available only for package subprograms where as standalone sub
program can not be overloaded.
Avoid dependencies.
Variable declared in the package specification is global.
E-R Diagram
ER-Diagram is a visual representation of data that describes how data is related to each other.
Cloud IT Solution Page 252
Symbols and Notations
Attribute
[
Key Attribute
Key attribute represents the main characteristic of an Entity. It is used to represent Primary key.
Ellipse with underlying lines represent Key Attribute.
Composite Attribute
An attribute can also have their own attributes. These attributes are known as Composite
attribute.
Cloud IT Solution Page 254
Relationship
A Relationship describes relations between entities. Relationship is represented using diamonds
The above example describes that one student can enroll only for one course and a course
will also have only one Student. This is not what you will usually see in relationship.
2. One to Many: It reflects business rule that one entity is associated with many number of
same entity. The example for this relation might sound a little weird, but this means that
one student can enroll to many courses, but one course will have one Student.
Cloud IT Solution Page 255
The arrows in the diagram describes that one student can enroll for only one
course.
3. Many to One: It reflects business rule that many entities can be associated with just one
entity. For example, Student enrolls for only one Course but a Course can have many
Students.
4. Many to Many :
The above diagram represents that many students can enroll for more than one course.
Recursive Relationship
When an Entity is related with itself it is known as Recursive Relationship.
Cloud IT Solution Page 256
Ternary Relationship
Relationship of degree three is called Ternary relationship.
Generalization
Generalization is a bottom-up approach in which two lower level entities combine to form a
higher-level entity. In generalization, the higher-level entity can also combine with other lower
level entity to make further higher level entity.
Specialization
Specialization is opposite to Generalization. It is a top-down approach in which one higher level
entity can be broken down into two lower level entities. In specialization, some higher level
entities may not have lower-level entity sets at all.
Aggregation
Aggregation is a process when relation between two entities is treated as a single entity. Here the
relation between Center and Course is acting as an Entity in relation with Visitor.
Cloud IT Solution Page 257
Weak entity
An entity set that does not possess sufficient attributes to form a primary key is called a weak
entity set.
ER-Diagram ATM:
Indexing
Indexing is a data structure technique to efficiently retrieve records from the database files based
on some attributes on which the indexing has been done. Indexing in database systems is similar
to what we see in books.
Benefits
Improve the search efficiency
Consist of two fields (key and block point).
Cloud IT Solution Page 261
Index is an order file.
Searching can be binary
Average no of block access to access a record is Log2B
Indexing is defined based on its indexing attributes. Indexing can be of the following types −
Primary Index − Primary index is defined on an ordered data file. The data file is
ordered on a key field. The key field is generally the primary key of the relation.
Secondary Index − Secondary index may be generated from a field which is a candidate
key and has a unique value in every record, or a non-key with duplicate values.
Clustering Index − Clustering index is defined on an ordered data file. The data file is
ordered on a non-key field.
Ordered Indexing is of two types −
1. Dense Index
2. Sparse Index
Hash Organization
Bucket − A hash file stores data in bucket format. Bucket is considered a unit of storage. A
bucket typically stores one complete disk block, which in turn can store one or more records.
Hash Function − A hash function, h, is a mapping function that maps all the set of search-keys
K to the address where actual records are placed. It is a function from search keys to bucket
addresses.
There are two types of hash file organizations –
1. Static Hashing.
2. Dynamic Hashing
Static Hashing
In this method of hashing, the resultant data bucket address will be always same. That means, if
we want to generate address for EMP_ID = 103 using mod (5) hash function, it always result in
the same bucket address 3. There will not be any changes to the bucket address here. Hence
number of data buckets in the memory for this static hashing remains constant throughout. In our
example, we will have five data buckets in the memory used to store the data.
Cloud IT Solution Page 263
Operation
Insertion − When a record is required to be entered using static hash, the hash function h
computes the bucket address for search key K, where the record will be stored.
Bucket address = h(K)
Search − When a record needs to be retrieved, the same hash function can be used to retrieve the
address of the bucket where the data is stored.
Linear Probing − When a hash function generates an address at which data is already stored, the
next free bucket is allocated to it. This mechanism is called Open Hashing.
Dynamic Hashing
The problem with static hashing is that it does not expand or shrink dynamically as the size of the
database grows or shrinks. Dynamic hashing provides a mechanism in which data buckets are
added and removed dynamically and on-demand. Dynamic hashing is also known as extended
hashing.
Hash function, in dynamic hashing, is made to produce a large number of values and only a few
are used initially.
Cloud IT Solution Page 264
Data center
A data center is a facility used to house computer systems and associated components, such as
telecommunications and storage systems. It generally includes redundant or backup power
supplies, redundant data communications connections, environmental controls (e.g. air
conditioning, fire suppression) and various security devices. A large data center is an industrial
scale operation using as much electricity as a small town.
Tier 1 to 4 data center is nothing but a standardized methodology used to define uptime of data
center. This is useful for measuring:
a) Data center performance
b) Investment
c) ROI (return on investment)
Tier 4 data center considered as most robust and less prone to failures. Tier 4 is designed to host
mission critical servers and computer systems, with fully redundant subsystems (cooling, power,
network links, storage etc.) and compartmentalized security zones controlled by biometric access
controls methods. Naturally, the simplest is a Tier 1 data center used by small business or shops.
Tier 1 = Non-redundant capacity components (single uplink and servers).
Tier 2 = Tier 1 + Redundant capacity components.
Tier 3 = Tier 1 + Tier 2 + Dual-powered equipment’s and multiple uplinks.
Tier 4 = Tier 1 + Tier 2 + Tier 3 + all components are fully fault-tolerant including
uplinks, storage, chillers, HVAC systems, servers etc. Everything is dual-powered.
The levels also describe the availability of data from the hardware at a location as follows:
Tier 1: Guaranteeing 99.671% availability.
Tier 2: Guaranteeing 99.741% availability.
Tier 3: Guaranteeing 99.982% availability.
Tier 4: Guaranteeing 99.995% availability.
Blade Server
A blade server is a stripped-down server computer with a modular design optimized to minimize
the use of physical space and energy. Blade servers have many components removed to save
Cloud IT Solution Page 265
space, minimize power consumption and other considerations, while still having all the functional
components to be considered a computer. Unlike a rack-mount server, a blade server needs a
blade enclosure, which can hold multiple blade servers, providing services such as power,
cooling, networking, various interconnects and management. Together, blades and the blade
enclosure, form a blade system. Different blade providers have differing principles regarding
what to include in the blade itself, and in the blade system as a whole.
RAID
RAID is a technology that is used to increase the performance and/or reliability of data storage.
The abbreviation stands for Redundant Array of Inexpensive Disks. A RAID system consists of
two or more drives working in parallel.
Advantages
RAID 0 offers great performance, both in read and writes operations. There is no
overhead caused by parity controls.
All storage capacity is used, there is no overhead.
The technology is easy to implement.
Disadvantages
RAID 0 is not fault-tolerant. If one drive fails, all data in the RAID 0 array are lost. It
should not be used for mission-critical systems.
Cloud IT Solution Page 266
RAID level 1 – Mirroring
Advantages
RAID 1 offers excellent read speed and a write-speed that is comparable to that of a
single drive.
In case a drive fails, data do not have to be rebuilt, they just have to be copied to the
replacement drive.
RAID 1 is a very simple technology.
Disadvantages
The main disadvantage is that the effective storage capacity is only half of the total drive
capacity because all data get written twice.
Software RAID 1 solutions do not always allow a hot swap of a failed drive. That means
the failed drive can only be replaced after powering down the computer it is attached to.
For servers that are used simultaneously by many people, this may not be acceptable.
Such systems typically use hardware controllers that do support hot swapping
RAID level 5
Advantages
Read data transactions are very fast while write data transactions are somewhat slower
(due to the parity that has to be calculated).
If a drive fails, you still have access to all data, even while the failed drive is being
replaced and the storage controller rebuilds the data on the new drive.
Disadvantages
Drive failures have an effect on throughput, although this is still acceptable.
Cloud IT Solution Page 267
This is complex technology. If one of the disks in an array using 4TB disks fails and is
replaced, restoring the data (the rebuild time) may take a day or longer, depending on the
load on the array and the speed of the controller. If another disk goes bad during that
time, data are lost forever.
RAID level 6 – Striping with double parity
Advantages
Like with RAID 5, read data transactions are very fast.
If two drives fail, you still have access to all data, even while the failed drives are being
replaced. So RAID 6 is more secure than RAID 5.
Disadvantages
Write data transactions are slower than RAID 5 due to the additional parity data that
have to be calculated. In one report I read the write performance was 20% lower.
Drive failures have an effect on throughput, although this is still acceptable.
This is complex technology. Rebuilding an array in which one drive failed can take a
long time.
RAID level 10 – combining RAID 1 & RAID 0
Advantages
If something goes wrong with one of the disks in a RAID 10 configuration, the rebuild
time is very fast since all that is needed is copying all the data from the surviving mirror
to a new drive. This can take as little as 30 minutes for drives of 1 TB.
Cloud IT Solution Page 268
Disadvantages
Half of the storage capacity goes to mirroring, so compared to large RAID 5 or RAID 6
arrays, this is an expensive way to have redundancy.
Big Data
Big data is a term that describes the large volume of data – both structured and
unstructured – that inundates a business on a day-to-day basis.
Why Big Data
Increase of storage capacities
Increase of processing power
Availability of data
Every day we create 2.5 quintillion bytes of data; 90% of the data in the world today has
been created in the last two years alone
Sources of Big Data
Social networking sites: Facebook, Google, LinkedIn all these sites generates huge
amount of data on a day to day basis as they have billions of users worldwide.
E-commerce site: Sites like Amazon, Flipkart, Alibaba generates huge amount of logs
from which users buying trends can be traced.
Weather Station: All the weather station and satellite gives very huge data which are
stored and manipulated to forecast weather.
Telecom Company: Telecom giants like Airtel, Vodafone study the user trends and
accordingly publish their plans and for this they store the data of its million users.
Share Market: Stock exchange across the world generates huge amount of data through
its daily transaction.
Issues
Huge amount of unstructured data which needs to be stored, processed and analyzed
Solution
Storage: This huge amount of data, Hadoop uses HDFS (Hadoop Distributed File
System) which uses commodity hardware to form clusters and store data in a
distributed fashion. It works on Write once, read many times principle.
Processing: Map Reduce paradigm is applied to data distributed over network to find
the required output.
Analyze: Pig, Hive can be used to analyze the data.
Cost: Hadoop is open source so the cost is no more an issue.
Cloud IT Solution Page 269
SQL Vs NoSQL
SQL NoSQL
Databases are categorized as Relational NoSQL databases are categorized as Non-
Database Management System relational or distributed database system.
(RDBMS).
SQL databases have fixed or static or NoSQL databases have dynamic schema.
predefined schema.
SQL databases display data in form of NoSQL databases display data as collection of
tables so it is known as table-based key-value pair, documents, graph databases or
database. wide-column stores.
SQL databases are vertically scalable. NoSQL databases are horizontally scalable.
SQL databases use a powerful In NoSQL databases, collection of documents
language Structured Query are used to query the data. It is also called
Language to define and manipulate unstructured query language. It varies from
the data. database to database.
SQL databases are best suited for NoSQL databases are not so good for complex
complex queries. queries because these are not as powerful as
SQL queries.
SQL databases are not best suited for NoSQL databases are best suited for
hierarchical data storage. hierarchical data storage.
Oracle Vs Mysql
Model Test
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
c c d a b a b a b d b a c a a
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
c b a c d b d d b c b c b c a