ADBT Unit-1
ADBT Unit-1
LECTURE PLAN
R/TP/03
ISSUE:C:REV:01
nr
Examples:
Entity book identified by ISBN number, entity
vacations identified by travel agency booking number.
ub
e.c
o.
Relationship:
. A relationnot in the strict relational model
sensebetween pairs of entities (a binary relationship
ht
tp
://
cs
et
http://csetube.co.nr/
GKMCET
LECTURE PLAN
R/TP/03
ISSUE:C:REV:01
ht
tp
://
cs
et
ub
e.c
o.
nr
PRIMARY KEY:
An entity may be defined as a thing which is recognized as being capable of an
independent existence and which can be uniquely identified.
An entity is an abstraction from the complexities of some domain.
http://csetube.co.nr/
GKMCET
LECTURE PLAN
R/TP/03
ISSUE:C:REV:01
NORMALIZATION:
Database normalization is the process of removing redundant data from your
tables in to improve storage efficiency, data integrity, and scalability.
In the relational model, methods exist for quantifying how efficient a database is.
These classifications are called normal forms (or NF), and there are algorithms
for converting a given database between them.
Normalization generally involves splitting existing tables into multiple ones,
which must be re-joined or linked each time a query is issued.
The Purpose of Normalization:
nr
cs
et
ub
e.c
o.
ht
tp
://
Definition of 1NF:
First Normal Form is a relation in which the intersection of each
row and column contains one and only one value.
There are two approaches to removing repeating groups from
unnormalized tables:
1.Removes the repeating groups by entering appropriate data
in the empty columns of rows containing the repeating data.
2.Removes the repeating group by placing the repeating data,
along with a copy of the original key attribute(s), in a separate
relation. A primary key is identified for the new relation.
Second normal form (2NF):
It is a relation that is in first normal form and every non-primary-key attribute is fully
functionally dependent on the primary key.
Boyce-Codd normal form (BCNF):
A relation is in BCNF, if and only if, every determinant is a
candidate key.
Multi-valued dependency (MVD) :
It represents a dependency between attributes (for example, A,
B and C) in a relation, such that for each value of A there is a
set of values for B and a set of value for C. However, the set of
http://csetube.co.nr/
GKMCET
LECTURE PLAN
R/TP/03
ISSUE:C:REV:01
real world
database
cust# name
address
name
ub
e.c
o.
nr
address
tp
://
cs
et
cust# name
address
cust# name
address
cust# name
address
cust# name
address
separate
combined
QUERY PROCESSING:
Validate and translate the query
http://csetube.co.nr/
GKMCET
LECTURE PLAN
R/TP/03
ISSUE:C:REV:01
Good syntax.
All referenced relations exist.
Translate the SQL to relational algebra.
Optimize
Make it run faster.
Evaluate
:
Three Steps of Query Processing
1) The Parsing and translation will first translate the query into its internal form, then
translate the query into relational algebra and verifies relations.
nr
2) Optimization is to find the most efficient evaluation plan for a query because there can
be more than one way.
ht
tp
://
cs
et
ub
e.c
o.
http://csetube.co.nr/
GKMCET
LECTURE PLAN
ht
tp
://
cs
et
ub
e.c
o.
Retrieve multiple records (each may be on a different block) if the searchkey is not a candidate key. EA5 = HTi + SC(A, r)
Join operation:
Compute the theta join, r s
for each tuple tr in r do begin
for each tuple ts in s do begin
test pair (tr, ts) to see if they satisfy the join condition
if they do, add tr ts to the result.
end
end
r is called the outer relation and s the inner relation of the join.
Requires no indices and can be used with any kind of join condition.
Expensive since it examines every pair of tuples in the two relations. If the
smaller relation fits entirely in main memory, use that relation as the inner
relation.
nr
R/TP/03
ISSUE:C:REV:01
QUERY OPTIMIZATION:
http://csetube.co.nr/
GKMCET
LECTURE PLAN
R/TP/03
ISSUE:C:REV:01
A site-seeing trip
Start : A SQL Query
End: An execution plan
Intermediate Stopovers
query trees
logical tree transforms
strategy selection
What happens after the journey?
Execution plan is executed
Query answer returned
ht
tp
://
cs
et
ub
e.c
o.
nr
Query Trees
http://csetube.co.nr/
GKMCET
LECTURE PLAN
R/TP/03
ISSUE:C:REV:01
ht
tp
://
cs
et
ub
e.c
o.
nr
DF
Transaction processing:
http://csetube.co.nr/
GKMCET
LECTURE PLAN
R/TP/03
ISSUE:C:REV:01
If some of the operations are completed but errors occur when the others are
attempted, the transaction-processing system rolls back all of the operations of
the transaction (including the successful ones), thereby erasing all traces of the
transaction and restoring the system to the consistent, known state that it was in
before processing of the transaction began.
e.c
o.
nr
://
cs
et
ub
ht
tp
Transaction processing guards against hardware and software errors that might leave a
transaction partially completed, with the system left in an unknown, inconsistent state. If
the computer system crashes in the middle of a transaction, the transaction processing
system guarantees that all operations in any uncommitted (i.e., not completely processed)
transactions are cancelled.
Transactions are processed in a strict chronological order. If transaction n+1 intends to
touch the same portion of the database as transaction n, transaction n+1 does not begin
until transaction n is committed. Before any transaction is committed, all other
transactions affecting the same part of the system must also be committed; there can be
no holes in the sequence of preceding transactions.
Methodology:
The basic principles of all transaction-processing systems are the same. However, the
terminology may vary from one transaction-processing system to another, and the terms
used below are not necessarily universal.
Rollback:
http://csetube.co.nr/
GKMCET
LECTURE PLAN
R/TP/03
ISSUE:C:REV:01
Rollforward:
e.c
o.
nr
cs
et
ub
If the database fails entirely, it must be restored from the most recent back-up. The backup will not reflect transactions committed since the back-up was made.
ht
tp
://
However, once the database is restored, the journal of after images can be applied to the
database (rollforward) to bring the database up to date. Any transactions in progress at
the time of the failure can then be rolled back.
The result is a database in a consistent, known state that includes the results of all
transactions committed up to the moment of failure.
Deadlocks:
In some cases, two transactions may, in the course of their processing, attempt to access
the same portion of a database at the same time, in a way that prevents them from
proceeding
. For example, transaction A may access portion X of the database, and transaction B may
access portion Y of the database.
If, at that point, transaction A then tries to access portion Y of the database while
transaction B tries to access portion X, a deadlock occurs, and neither transaction can
move forward. T
Concurrency control:
http://csetube.co.nr/
10
GKMCET
LECTURE PLAN
R/TP/03
ISSUE:C:REV:01
o.
et
://
cs
ub
e.c
nr
ht
tp
Atomicity
http://csetube.co.nr/
11
GKMCET
LECTURE PLAN
Either the effects of all or none of its operations remain ("all or nothing"
semantics) when a transaction is completed (committed or aborted respectively).
In other words, to the outside world a committed transaction appears (by its
effects on the database) to be indivisible, atomic, and an aborted transaction does
not leave effects on the database at all, as if never existed.
Consistency - Every transaction must leave the database in a consistent (correct)
state, i.e., maintain the predetermined integrity rules of the database (constraints
upon and among the database's objects).
A transaction must transform a database from one consistent state to another
consistent state (however, it is the responsibility of the transaction's programmer
to make sure that the transaction itself is correct, i.e., performs correctly what it
intends to perform (from the application's point of view) while the predefined
integrity rules are enforced by the DBMS). Thus since a database can be normally
changed only by transactions, all the database's states are consistent. An aborted
transaction does not change the database state it has started from, as if it never
existed (atomicity above).
Isolation - Transactions cannot interfere with each other (as an end result of their
executions). Moreover, usually (depending on concurrency control method) the
effects of an incomplete transaction are not even visible to another transaction.
Providing isolation is the main goal of concurrency control.
Durability - Effects of successful (committed) transactions must persist through
crashes (typically by recording the transaction's effects and its commit event in a
non-volatile memory).
ub
tp
ht
://
cs
et
e.c
o.
nr
R/TP/03
ISSUE:C:REV:01
http://csetube.co.nr/
12
GKMCET
LECTURE PLAN
R/TP/03
ISSUE:C:REV:01
RECOVERY.
Failure can occur through
Site failure
S1
S4
S2
S3
S5
ht
tp
://
cs
et
ub
nr
Loss of message
By network protocol
DDBMS deals with it transparently
o.
e.c
http://csetube.co.nr/
13
GKMCET
LECTURE PLAN
R/TP/03
ISSUE:C:REV:01
nr
ht
tp
://
cs
et
ub
e.c
o.
Independent Recovery
A recovering site makes a transition directly to a final state without
communicating with other sites.
Lemma
For a protocol, if a local states concurrency set contains both an abort
and commit, it is not resilient to an arbitrary failure of a single site.
Si commit because other sites may be in abort
Si abort because other sites may be in commit
Rule 1: S: Intermediate state
If C(s) contains a commit failure transition from S to commit
Otherwise failure transition from S to abort
http://csetube.co.nr/
14
GKMCET
LECTURE PLAN
R/TP/03
ISSUE:C:REV:01
ht
tp
://
cs
et
ub
e.c
o.
nr
http://csetube.co.nr/
15
GKMCET
LECTURE PLAN
R/TP/03
ISSUE:C:REV:01
Database Tuning:
nr
e.c
o.
I/O tuning
ht
tp
://
cs
et
ub
Hardware and software configuration of disk subsystems are examined: RAID levels and
configuration [1], block and stripe size allocation, and the configuration of disks,
controller cards, storage cabinets, and external storage systems such as a SAN.
Transaction logs and temporary spaces are heavy consumers of I/O, and affect
performance for all users of the database. Placing them appropriately is crucial.
Frequently joined tables and indexes are placed so that as they are requested from file
storage, they can be retrieved in parallel from separate disks simultaneously. Frequently
accessed tables and indexes are placed on separate disks to balance I/O and prevent read
queuing.
DBMS tuning:
DBMS tuning refers to tuning of the DBMS and the configuration of the memory and
processing resources of the computer running the DBMS. This is typically done through
configuring the DBMS, but the resources involved are shared with the host system.
Tuning the DBMS can involve setting the recovery interval (time needed to restore the
state of data to a particular point in time), assigning parallelism (the breaking up of work
from a single query into tasks assigned to different processing resources), and network
protocols used to communicate with database consumers.
Memory is allocated for data, execution plans, procedure cache, and work space. It is
much faster to access data in memory than data on storage, so maintaining a sizable cache
of data makes activities perform faster. The same consideration is given to work space.
Caching execution plans and procedures means that they are reused instead of recompiled
http://csetube.co.nr/
16
GKMCET
LECTURE PLAN
R/TP/03
ISSUE:C:REV:01
when needed. It is important to take as much memory as possible, while leaving enough
for other processes and the OS to use without excessive paging of memory to storage.
Processing resources are sometimes assigned to specific activities to improve
concurrency. On a server with eight processors, six could be reserved for the DBMS to
maximize available processing resources for the database.
Database maintenance:
Database maintenance includes backups, column statistics updates, and defragmentation
of data inside the database files.[2]
ub
e.c
o.
nr
On a heavily used database, the transaction log grows rapidly. Transaction log entries
must be removed from the log to make room for future entries. Frequent transaction log
backups are smaller, so they interrupt database activity for shorter periods of time.
tp
://
cs
et
DBMS use statistic histograms to find data in a range against a table or index. Statistics
updates should be scheduled frequently and sample as much of the underlying data as
possible. Accurate and updated statistics allow query engines to make good decisions
about execution plans, as well as efficiently locate data.
ht
Defragmentation of table and index data increases efficiency in accessing data. The
amount of fragmentation depends on the nature of the data, how it is changed over time,
and the amount of free space in database pages to accept inserts of data without creating
additional pages.
http://csetube.co.nr/
17