Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

SBM College of Engineering and Technolog PDF

Download as pdf or txt
Download as pdf or txt
You are on page 1of 107

SBM COLLEGE OF ENGINEERING AND TECHNOLOGY

STUDY MATERIALS

III SEMESTER - B.E – COMPUTER SCIENCE AND ENGINEERING

QUESTION BANK - CS6302 DATABASE MANAGEMENT SYSTEMS

UNIT I

1.What are the disadvantages of file processing system?


 Data Redundancy
 Data Inconsistency
 Difficulty in Accessing Data
 Data Isolation
 Integrity Problems
 Security and access control
 Concurrency Problems

2. Explain the basic structure of a relational database with an example.


Relationship among a set of values thus a table represents a collection of
relationships. There is a direct correspondence between the concept of a table and
the mathematical concept of a relation. A substantial theory has been developed
for relational databases.

Bname account ename Balance


Downtown 101 john 500
SFU 102 smith 790

Ename Street ecity


John Pender vanconver
Hayes North Burnaby

3.What do you mean by weak entity set?

K.S.GOBINATH M.E., (MBA)., (AP/CSE), SBM CET CS6302 –DBMS STUDY MATERIAL
In a relational database, a weak entity is an entity that cannot be uniquely
identified by its attributes alone; therefore ,it must use a foreign key in
conjunction with its attributes to create a primary key. The foreign key is
typically a primary key of an entity it is related to.

4. Give example for one to one and one to many relationships.

A link is created between two tables where the primary key of one table is
associated with the foreign key of another table using database relationships.
Example
Book table (pk_book_id,title,ISBN) is associated with Author
(pk_author_id,author_name,phone_no,fk_book_id).

5.What is the need of normalization?

 Database normalization is the process of removing redundant data from


your tables in to improve storage efficiency, data integrity, and scalability.
 In the relational model, methods exist for quantifying how efficient a
database is. These classifications are called normal forms (or NF), and there
are algorithms for converting a given database between them.
 Normalization generally involves splitting existing tables into multiple
ones, which must be rejoined or linked each time a query is issued.

6. Write a note on functional dependencies.


Functional dependency (FD) is a constraint between two sets of attributes
from the database.
 A functional dependency is a property of the semantics or meaning of the
attributes. In every relation R(A1, A2,…, An) there is a FD called the PK -
> A1, A2, …, An Formally the FD is defined as follows
 If X and Y are two sets of attributes, that are subsets of T
For any two tuples t1 and t2 in r , if t1[X]=t2[X], we must also have t1[Y]=t2[Y].

7.Who is a DBA? What are the responsibilities of a DBA?


A database administrator (short form DBA) is a person responsible for the
design, implementation, maintenance and repair of an organization's database.
They are also known by the titles Database Coordinator or Database
Programmer, and is closely related to the Database Analyst, Database Modeller,
Programmer Analyst, and Systems Manager.
The role includes the development and design of database strategies, monitoring
and improving database performance and capacity, and planning for future
K.S.GOBINATH M.E., (MBA)., (AP/CSE), SBM CET CS6302 –DBMS STUDY MATERIAL
expansion requirements. They may also plan, co-ordinate and implement security
measures to safeguard the database

8. What is a data model? List the types of data models used.

A data model is a collection of conceptual tools for describing data, data


relationships, data semantics and consistency constraints
A database model is the theoretical foundation of a database and
fundamentally determines in which manner data can be stored, organized, and
manipulated in a database system. It thereby defines the infrastructure offered by
a particular database system. The most popular example of a database model is
the relational model.
Types of data model used
 Hierarchical model
 Network model
 Relational model
 Entity-relationship
 Object-relational model
 Object model

9. What are the components of storage manager?

 File manager
 Buffer manager
 Data integrity manager
 Transaction manager

10. What is a data dictionary?


A data dictionary is a file or a set of files that contains a database's
metadata. The data dictionary contains records about other objects in the
database, such as data ownership, data relationships to other objects, and
other data. The data dictionary is a crucial component of any relational
database

11. What is an entity relationship model?


An entity-relationship model (ERM) is a theoretical and conceptual way
of showing data relationships in software development. ERM is a
database modelingtechnique that generates an abstract diagram or visual
representation of a system's data that can be helpful in designing a relational
database.
K.S.GOBINATH M.E., (MBA)., (AP/CSE), SBM CET CS6302 –DBMS STUDY MATERIAL
12. Define single valued and multi valued attributes.
Single Valued attribute: Attributes that can have single value at a
particular instance of time are called single valued. A person can’t have more
than one age value. Therefore, age of a person is a single-values attribute.
Multi valued attributes: A multi-valued attribute can have more than one
value at one time. For example, degree of a person is a multi-valued attribute
since a person can have more than one degree. Where appropriate, upper and
lower bounds may be placed on the number of values in a multi-valued attribute

13. What are stored, composite and derived attributes?


Stored Attributes: Attribute that cannot be derived from other attributes
are called as stored attributes. Example: Birth date of an employee is a stored
attribute.
Composite attribute : This attribute can be further divided into more
attributes.
Derived attribute : These attributes are derived from other attributes. It
can be derived from multiple attributes and also from a separate table

14. What is meant by the degree of relationship set?


The degree of relationship (also known as cardinality) is the number of
occurrences in one entity which are associated (or linked) to the number of
occurrences in another.

There are three degrees of relationship, known as:

1. one-to-one (1:1)
2. one-to-many (1:M)
3. many-to-many (M:N)

15. Define weak and strong entity sets?


Weak entity set: entity set that do not have key attribute of their own are
called weak entity sets.
Strong entity set: Entity set that has a primary key is termed a strong entity
set..

16. Explain the two type’s o f participation constraint.

K.S.GOBINATH M.E., (MBA)., (AP/CSE), SBM CET CS6302 –DBMS STUDY MATERIAL
There are three Types of Relationship Constraints-

1. Structural Constraints
o Participation Constraints
o Cardinality Ratio
2. Overlap Constraints
3. Covering Constraints

Structural Constraints are applicable for binary relationships and Overlap and
Covering Constraints are applicable for EERD(Extended ER Diagrams).

Participation (or) Optionality Constraints-

Participation concerns with the involvement of entities in a relationship. It


specifies whether the existence of an entity depends on another entity. There are
two types of Participation Constraints -

1. Total/Mandatory Participation
2. Partial/Optional Participation

17. Define the two levels of data independence.

 Physical data independence


o The ability to modify the physical scheme without causing application
programs to be rewritten
o Modifications at this level are usually to improve performance
 Logical data independence
o The ability to modify the conceptual scheme without causing
application programs to be rewritten
o Usually done when logical structure of database is altered

18.Write down any two major responsibilities of a database administrator.

1.Selection of hardware and software

 Keep up with current technological trends


 Predict future changes
 Emphasis on established off the shelf products

2. Managing data security and privacy

K.S.GOBINATH M.E., (MBA)., (AP/CSE), SBM CET CS6302 –DBMS STUDY MATERIAL
 Protection of data against accidental or intentional loss, destruction, or
misuse
 Firewalls
 Establishment of user privileges

19. Define irreducible sets of dependencies.


A functional depending set S is irreducible if the set has the following three
properties:

1. Each right set of a functional dependency of S contains only one attribute.


2. Each left set of a functional dependency of S is irreducible. It means that
reducing any one attribute from left set will change the content of S (S will
lose some information).
3. Reducing any functional dependency will change the content of S.
Sets of Functional Dependencies(FD) with these properties are also
called canonical or minimal.

20. Define the normal forms.


Database Normalization is a technique of organizing the data in the
database. Normalization is a systematic approach of decomposing tables to
eliminate data redundancy and undesirable characteristics like Insertion, Update
and Deletion Anamolies. 1NF,2NF,3NF,BCNF,DKNF.

21.Give the levels of data abstraction?


a) Physical level
b) Logical level
c) View level

22. Define instance and schema?

Instance: Collection of data stored in the data base at a particular moment is


called an Instance of the database.

Schema: The overall design of the data base is called the data base schema.

23.Define the terms

1) Physical schema
K.S.GOBINATH M.E., (MBA)., (AP/CSE), SBM CET CS6302 –DBMS STUDY MATERIAL
2) logical schema.
Physical schema: The physical schema describes the database design at the
physical level, which is the lowest level of abstraction describing how the data
are actually stored.
Logical schema: The logical schema describes the database design at the logical
level, which describes what data are stored in the database and what relationship
exists among the data.

24. Mention the actors on scene.


 Database administrator
 Database designer
 End users

16/10/8 Marks Questions

1. Drawbacks in file processing system. (8)


To allow users to manipulate the information, the system has a number of
application programs that manipulate the files, including
• A program to debit or credit an account
• A program to add a new account
• A program to find the balance of an account
• A program to generate monthly statements
System programmers wrote these application programs to meet the needs of
the bank.
This typical file-processing system is supported by a conventional
operating system. The system stores permanent records in various files, and it
needs different application programs to extract records from, and add records to,
the appropriate files. Before database management systems (DBMSs) came
along, organizations usually stored information in such systems.
Data redundancy and inconsistency
 Multiple file formats, duplication of information in different files

K.S.GOBINATH M.E., (MBA)., (AP/CSE), SBM CET CS6302 –DBMS STUDY MATERIAL
 Data redundancy leads to higher storage and data base access cost.
 Updation is not done properly in the various copies of the same data.
Difficulty in accessing data
 Required data cannot be retrieved in a convenient and efficient manner
Ex.
Requirement : find out the names of the customer who lives in the particular
postal code.
Given: An application program to generate the list of all customers.
Solution:
 Obtain the list of all customers and extract the requirements manually
 Ask a system programmer to write the necessary application program
 Need to write a new program to carry out each new task
Data isolation
 Because data are stored in various files may be in different formats writing
new application programs to retrieve the appropriate data is difficult.
Integrity problems
 Integrity constraints (e.g. account balance > 0) become “buried” in
program code rather than being stated explicitly Hard to add new
constraints or change existing ones
Atomicity of updates
 Failures may leave database in an inconsistent state with partial updates
carried out
Example:
 Transfer of funds from one account to another should either complete or not
happen at all
Concurrent access by multiple users
 Concurrent accessed needed for performance

K.S.GOBINATH M.E., (MBA)., (AP/CSE), SBM CET CS6302 –DBMS STUDY MATERIAL
 Uncontrolled concurrent accesses can lead to inconsistencies
 Example: Two people reading a balance and updating it at the same time
Security problems
 Hard to provide user access to some, but not all, data
 Database systems offer solutions to all the above problems

2. Explain DBMS over all System Architecture. (16)


A database system is partitioned into modules that deal with each of
the responsibilities of the overall system. The functional components of a
database system can be broadly divided into the storage manager and the
query processor components.
The storage manager is important because databases typically require
a large amount of storage space. Corporate databases range in size from
hundreds of gigabytes to, for the largest databases, terabytes of data.
The query processor is important because it helps the database system
simplify and facilitate access to data.
Storage Manager

A storage manager is a program module that provides the interface


between the lowlevel data stored in the database and the application
programs and queries submitted to the system.
Thus, the storage manager is responsible for storing, retrieving, and
updating data in the database. The storage manager components include:

Authorization and integrity manager, which tests for the satisfaction of


integrity constraints and checks the authority of users to access data.

K.S.GOBINATH M.E., (MBA)., (AP/CSE), SBM CET CS6302 –DBMS STUDY MATERIAL
Transaction manager, which ensures that the database remains in a
consistent (correct) state despite system failures, and that concurrent
transaction executions proceed without conflicting.
File manager, which manages the allocation of space on disk storage and
the data structures used to represent information stored on disk.
Buffer manager, which is responsible for fetching data from disk storage
into main memory, and deciding what data to cache in main memory. The
buffer manager is a critical part of the database system, since it enables the
database to handle data sizes that are much larger than the size of main
memory. The storage manager implements several data structures as part of
the physical system implementation:
Data files, which store the database itself.
Data dictionary, which stores metadata about the structure of the
database, in particular the schema of the database.
Indices, which provide fast access to data items that hold particular values.
The Query Processor
The query processor components include DDL interpreter, which
interprets DDL statements and records the definitions in the data dictionary.
DML compiler, which translates DML statements in a query language into
an evaluation plan consisting of low-level instructions that the query
evaluation engine understands. A query can usually be translated into any
of a number of alternative evaluation plans that all give the same result. The
DML compiler also performs Query optimization, that is, it picks the
lowest cost evaluation plan from among the alternatives.
Query evaluation engine, which executes low-level instructions generated
by the DML compiler.
Application Architectures

K.S.GOBINATH M.E., (MBA)., (AP/CSE), SBM CET CS6302 –DBMS STUDY MATERIAL
 Most users of a database system today are not present at the site of the
database system, but connect to it through a network.
 We can therefore differentiate between client machines, on which
remote Database applications are usually partitioned into two or three
parts, as in Figure 1.5. In a two-tier architecture, the application is
partitioned into a component that resides at the client machine, which
invokes database system functionality at the server machine through
query language statements
 In contrast, in a three-tier architecture, the client machine acts as
merely a front end and does not contain any direct database calls.
Instead, the client end communicates with an application server,
usually through a forms interface.

3. Define data model. Explain the different types of data models with
relevant examples. (16)
Data Models

K.S.GOBINATH M.E., (MBA)., (AP/CSE), SBM CET CS6302 –DBMS STUDY MATERIAL
Data model: a collection of conceptual tools for describing data, data
relationships, data semantics, and consistency constraints.
The various data models that have been proposed fall into the three
different groups
1. Object based logical models
2. Record based logical models
3. Physical models
Object based logical models
object based logical models are used in describing data at the logical and
view levels
There are many different models and more are likely to come
 Entity –relationship model
 object oriented model
 semantic data model
 functional data model
Entity –relationship model
The entity-relationship (E-R) data model is based on a perception of a
real world that Consists of a collection of basic objects, called entities, and
of relationships among these objects.
ENTITY
An entity is a “thing” or “object” in the real world that is distinguishable
from other objects
For example,
Each person is an entity, and bank accounts can be considered as entities.
ATTRIBUTES
Entities are described in a database by a set of attributes
For example,

K.S.GOBINATH M.E., (MBA)., (AP/CSE), SBM CET CS6302 –DBMS STUDY MATERIAL
The attributes account-number and balance may describe one particular
account in a bank, and they form attributes of the account entity set. Similarly,
attributes customer-name, customer-street address and customer-city may
describe a customer entity.
RELATIONSHIP
A relationship is an association among several entities
For example,
A depositor relationship associates a customer with each account that she
has. The set of all entities of the same type and the set of all relationships of the
same type are termed an entity set and relationship set, respectively
E-R diagram
Rectangles, which represent entity sets
Ellipses, which represent attributes
Diamonds, which represent relationships among entity sets
Lines, which link attributes to entity sets and entity sets to
relationships

Object oriented model


 The object oriented model is based on a collection of object contains value
stored in instance variable within the object
 Object that contain the same types of value and the same methods are
grouped together in to the classes

Record based logical model


Record based logical model are used in describing data at the logical and
view levels.

K.S.GOBINATH M.E., (MBA)., (AP/CSE), SBM CET CS6302 –DBMS STUDY MATERIAL
Record based logical model are so named because the database is structured
in fixed format record of several types.
Each record type defines a fixed number of fileds or attributes and each
field is usually of a fixed length.
The three most widely accepted record based models are the relational
network and hierarchical model

Relational Model

The relational model uses a collection of tables to represent both data and
the relationships among those data. Each table has multiple columns, and
each column has a unique name.
Figure 1.3 presents a sample relational database comprising three tables:
One shows details of bank customers, the second shows accounts, and the
third shows which accounts belong to which customers.

The first table, the customer table, shows, for example, that the customer
identifiedby customer-id 192-83-7465 is named Johnson and lives at 12 Alma
St. in Palo Alto.

The second table, account, shows, for example, that account A-101 has a
balance of$500, and A-201 has a balance of $900.

The third table shows which accounts belong to which customers. For
example, account number A-101 belongs to the customer whose customer-id
is 192-83-7465,namely Johnson, and customers 192-83-7465 (Johnson) and

K.S.GOBINATH M.E., (MBA)., (AP/CSE), SBM CET CS6302 –DBMS STUDY MATERIAL
019-28-3746 (Smith) share account number A-201 (they may share a business
venture).
Other Data Models
 The object-oriented data model is another data model that has seen
increasing attention. The object-oriented model can be seen as extending
the E-R model with notions of encapsulation, methods (functions), and
object identity.

 The object-relational data model combines features of the object-oriented


data model and relational data model.

 Semistructured data models permit the specification of data where


individual data items of the same type may have different sets of attributes.
This is in contrast with the data models mentioned earlier, where every data
item of a particular type must have the same set of attributes.

 The extensible markup language (XML) is widely used to represent


semistructured data.
 Historically, two other data models, the network data model and the
hierarchical data model, preceded the relational data model. These
models were tied closely to the underlying implementation, and
complicated the task of modeling data. As a result they are little used now,
except in old database code that is still in servic
Network model
Data in the network model are represented by collection of records and
relationships among data are represented by links which can be viewed as pointer
Hierarchial model

K.S.GOBINATH M.E., (MBA)., (AP/CSE), SBM CET CS6302 –DBMS STUDY MATERIAL
Data in the hierarchial model are represented by collection of tree rather
than orbitary graphs

4. Explain View of Data? (8)


A database system is a collection of interrelated files and a set of
programs that allow users to access and modify these files. A major
purpose of a database system is to provide users with an abstract view of
the data. That is, the system hides certain details of how the data are stored
and maintained.
Data Abstraction
Since many database-systems users are not computer trained,
developers hide the complexity from users through several levels of
abstraction, to simplify users’ interactions with the system:

Physical level. The lowest level of abstraction describes how the data are
actually stored. The physical level describes complex low-level data
structures in detail.

Logical level. The next-higher level of abstraction describes what data are
stored in the database, and what relationships exist among those data. The
logical level thus describes the entire database in terms of a small number
of relatively simple structures.

View level. The highest level of abstraction describes only part of the entire
database. Even though the logical level uses simpler structures, complexity
remains because of the variety of information stored in a large database

K.S.GOBINATH M.E., (MBA)., (AP/CSE), SBM CET CS6302 –DBMS STUDY MATERIAL
Instances and Schemas
The collection of information stored in the database at a particular
moment is called an instance of the database. The overall design of the
database is called the database schema. Schemas are changed infrequently,
if at all.

Types of schema
Database systems have several schemas, partitioned according to the
levels of abstraction.The physical schema describes the database design at
the physical level, while the logical schema describes the database design
at the logical level.Adatabase may also have several schemas at the view
level, sometimes called subschemas, that describe different views of the
database.

Data independence
The ability to modify a schema definition in one level without
affecting a schema definition to the next higher level is called Data
independence

K.S.GOBINATH M.E., (MBA)., (AP/CSE), SBM CET CS6302 –DBMS STUDY MATERIAL
Two levels of Data independence
Physical Data independence
Logical Data independence

Physical Data independence

The ability to modify a physical schema definition in one level


without causing the application to be rewritten .Modify are rarely done to
improve performance

Logical Data independence

The ability to modify a logical schema definition in one level without


causing the application to be rewritten Modify are necessary whwnever the
logical structure of the database is altered
5. Explain E-R Model in detail with suitable example. (16)
The entity-relationship (E-R) data model perceives the real world as
consisting of basic objects, called entities, and relationships among these
objects. It was developed to facilitate database design by allowing
specification of an enterprise schema, which represents the overall logical
structure of a database.
The E-R data model is one of several semantic data models; the semantic
aspect of the model lies in its representation of the meaning of the data. The
E-R model is very useful in mapping the meanings and interactions of real-
world enterprises onto a conceptual schema. Because of this usefulness,
many database-design tools draw on concepts from the E-R model.

K.S.GOBINATH M.E., (MBA)., (AP/CSE), SBM CET CS6302 –DBMS STUDY MATERIAL
Basic Concepts
Entity Sets
 An entity is a “thing” or “object” in the real world that is distinguishable
from all other objects.

 An entity set is a set of entities of the same type that share the same
properties, or attributes. The set of all persons who are customers at a given
bank, for example, can be defined as the entity set customer

 An entity is represented by a set of attributes. Attributes are descriptive


properties possessed by each member of an entity set. The designation of an
attribute for an entity set expresses that the database stores similar
information concerning each entity in the entity set; however, each entity
may have its own value for each attribute.

 For each attribute, there is a set of permitted values, called the domain, or
value set, of that attribute. The domain of attribute customer-name might be
the set of all text strings of a certain length.
Types of Attributes
 Simple and composite attributes.
 Single-valued and multivalued attributes
 Derived attributes
 Null attributes

Simple and composite attributes.


Simple attributes.

The attributes have been simple; that is, they are not divided into subparts.

Eg: customer id

Composite attributes,

On the other hand, can be divided into subparts (that is, other attributes).
K.S.GOBINATH M.E., (MBA)., (AP/CSE), SBM CET CS6302 –DBMS STUDY MATERIAL
For example,

An attribute name could be structured as a composite attribute consisting of first-


name, middle-initial, and last-name.
Single-valued and multivalued attributes
Single-valued attributes
The attributes in our examples all have a single value for a particular entity.

For example: Register no

Multivalued attributes
An employee may have zero, one, or several phone numbers, and different
employees may have different numbers of phones. This type of attribute is said to
be multivalued

For Example : Hobbies


Derived attribute
The value for this type of attribute can be derived from the values of other
related attributes or
entities.
For example
age is a derived attribute
Dob & current date are the given attributre.
Null attributes:
Attributes takes a null value when an entity does not have a value for it.
1. Not applicable-value does not exist for the entity.
2. Unknown value
Missing-value does exist information missing
Unknown-value does not exist information missing
Relationship Sets
A relationship that associates customer Hayes with loan L-15. This
relationship specifies that Hayes is a customer with loan number L-15. A
K.S.GOBINATH M.E., (MBA)., (AP/CSE), SBM CET CS6302 –DBMS STUDY MATERIAL
relationship set is a set of relationships of the same type. Formally, it is a
mathematical relation on n ≥ 2 (possibly nondistinct) entity sets. If E1, E2, . .
.,En are entity sets, then a relationship set R is a subset of

{(e1, e2, . . . , en) | e1 ∈ E1, e2 ∈ E2, . . . , en ∈ En} where (e1, e2, . . . , en) is a
relationship
Consider the two entity sets customer and loan in Figure 2.1. We define the
relationship set borrower to denote the association between customers and the
bank loans that the customers have.
Constraints
An E-R enterprise schema may define certain constraints to which the contents of
a database must conform. In this section, we examine mapping cardinalities and
participation constraints, which are two of the most important types of
constraints.

Mapping Cardinalities

Mapping cardinalities, or cardinality ratios, express the number of entities to


which another entity can be associated via a relationship set.
Mapping cardinalities are most useful in describing binary relationship sets,
although they can contribute to the description of relationship sets that involve
more than two entity sets.
In this section, we shall concentrate on only binary relationship sets. For a binary
relationship set R between entity sets A and B, the mapping cardinality must be
one of the following:
• One to one. An entity in A is associated with at most one entity in B, and an
entity in B is associated with at most one entity in A. (See Figure 2.4a.)

K.S.GOBINATH M.E., (MBA)., (AP/CSE), SBM CET CS6302 –DBMS STUDY MATERIAL
• One to many. An entity in A is associated with any number (zero or more) of
entities in B. An entity in B, however, can be associated with at most one entity
in A. (See Figure 2.4b.)
• Many to one. An entity in A is associated with at most one entity in B. An
entity in B, however, can be associated with any number (zero or more) of
entities in A. (See Figure 2.5a.)
• Many to many. An entity in A is associated with any number (zero or more) of
entities in B, and an entity in B is associated with any number (zero or more) of
entities in A.
Participation Constraints

The participation of an entity set E in a relationship set R is said to be total


if every entity in E participates in at least one relationship in R. If only some
entities in E participate in relationships in R, the participation of entity set E in
relationship R is said to be partial.

For example, we expect every loan entity to be related to at least one


customer through the borrower relationship. Therefore the participation of loan in
the relationship set borrower is total
Keys

The values of the attribute values of an entity must be such that they can
uniquely identify the entity. In other words, no two entities in an entity set are
allowed to have exactly the same value for all attributes.
A key allows us to identify a set of attributes that suffice to distinguish
entities from each other. Keys also help uniquely identify relationships, and thus
distinguish relationships from each other
Types of key
 Super key
K.S.GOBINATH M.E., (MBA)., (AP/CSE), SBM CET CS6302 –DBMS STUDY MATERIAL
 Candidate key
 Primary key
Super key
A super key is a set of one or more attributes that allow identifying uniquely
an entity in a entity set
Ex: (cus_id,cus_name)
Candidate key
A Candidate key is a minimal super key for which no proper subset can
formed
Ex: (cus_name,Cus_street)
Primary key
Primary key is a key that has unique.
Ex: (cus_id)

Weak Entity Sets


An entity set may not have sufficient attributes to form a primary key. Such
an entity set is termed a weak entity set. An entity set that has a primary key is
termed a strong entity set.

K.S.GOBINATH M.E., (MBA)., (AP/CSE), SBM CET CS6302 –DBMS STUDY MATERIAL
Entity-Relationship Diagram
 Rectangles, which represent entity sets
 Ellipses, which represent attributes
 Diamonds, which represent relationship sets
 Lines, which link attributes to entity sets and entity sets to relationship sets
 Double ellipses, which represent multivalued attributes
 Dashed ellipses, which denote derived attributes
 Double lines, which indicate total participation of an entity in a
relationshipset
 Double rectangles, which represent weak entity sets

Extended E-R Features


Although the basic E-R concepts can model most database features, some
aspects of a database may be more aptly expressed by certain extensions to the
basic E-R model. In this section, we discuss the extended E-R features of
K.S.GOBINATH M.E., (MBA)., (AP/CSE), SBM CET CS6302 –DBMS STUDY MATERIAL
specialization, generalization, higher- and lower-level entity sets, attribute
inheritance, and aggregation.
Specialization

The process of designating subgroupings within an entity set is called


specialization. The specialization of person allows us to distinguish among
persons according to whether they are employees or customers
In terms of an E-R diagram, specialization is depicted by a triangle
component labeled ISA, as Figure 2.17 shows. The label ISA stands for “is a”
and represents, for example, that a customer “is a” person. The ISA relationship
may also be referred to as a superclass-subclass relationship. Higher- and lower-
level entity sets are depict
Generalization
The refinement from an initial entity set into successive levels of entity
subgroupings represents a top-down design process in which distinctions are
made explicit.
The design process may also proceed in a bottom-up manner, in which
multiple entity sets are synthesized into a higher-level entity set on the basis of
common features.
The database designer may have first identified a customer entity set with
the attributes name, street, city, and customer-id, and an employee entity set with
the attributes name, street, city, employee-id, and salary.
Attribute Inheritance
A crucial property of the higher- and lower-level entities created by
specialization and generalization is attribute inheritance. The attributes of the
higher-level entity sets are said to be inherited by the lower-level entity sets.

K.S.GOBINATH M.E., (MBA)., (AP/CSE), SBM CET CS6302 –DBMS STUDY MATERIAL
Constraints on Generalizations
Condition-defined.
In condition-defined lower-level entity sets, membership is evaluated on the basis
of whether or not an entity satisfies an explicit condition or predicate
User-defined.
User-defined lower-level entity sets are not constrained by a membership
condition; rather, the database user assigns entities to a given entity set
Disjoint.
A disjointness constraint requires that an entity belong to no more than one
lower-level entity set.
Overlapping.
In overlapping generalizations, the same entity may belong to more than
one lower-level entity set within a single generalization
Total generalization or specialization
Each higher-level entity must belong to a lower-level entity set.
Partial generalization or specialization
Some higher-level entities may not belong to any lower-level entity set
Aggregation
The best way to model a situation such as the one just described is to use
aggregation. Aggregation is an abstraction through which relationships are
treated as higherlevel entities. Thus, for our example, we regard the relationship
set works-on (relating the entity sets employee, branch, and job) as a higher-level
entity set called works-on.
Such an entity set is treated in the same manner as is any other entity set.
We can then create a binary relationship manages between works-on and
manager to represent who manages what tasks.

K.S.GOBINATH M.E., (MBA)., (AP/CSE), SBM CET CS6302 –DBMS STUDY MATERIAL
6. Draw an E – R Diagram for Banking, University, Company y, Airlines,
ATM, Hospital, Library, Super market, Insurance Company.
(16)

7. Explain in details about the various database languages. (16)


Database Languages

K.S.GOBINATH M.E., (MBA)., (AP/CSE), SBM CET CS6302 –DBMS STUDY MATERIAL
A database system provides a data definition language to specify the
database schema and a data manipulation language to express database queries
and updates In practice, the data definition and data manipulation languages are
not two separate languages; instead they simply form parts of a single database
language, such as the widely used SQL language.
Data-Definition Language
 We specify a database schema by a set of definitions expressed by a special
language called a data-definition language (DDL).
For instance, the following statement in the SQL language defines the account
table:
create table account (account-number char(10), balance integer)
 Execution of the above DDL statement creates the account table. In
addition, it updates a special set of tables called the data dictionary or
data directory.
 A data dictionary contains metadata—that is, data about data. The schema
of a table is an example of metadata. A database system consults the data
dictionary before reading or modifying actual data.
 We specify the storage structure and access methods used by the database
system by a set of statements in a special type of DDL called a data
storage and definition language.
 The data values stored in the database must satisfy certain consistency
constraints.For example, suppose the balance on an account should not fall
below $100. The DDL provides facilities to specify such constraints. The
database systems check these constraints every time the database is
updated.
Data-Manipulation Language
Data manipulation is

K.S.GOBINATH M.E., (MBA)., (AP/CSE), SBM CET CS6302 –DBMS STUDY MATERIAL
• The retrieval of information stored in the database
• The insertion of new information into the database
• The deletion of information from the database
• The modification of information stored in the database
A data-manipulation language (DML) is a language that enables users to
access or manipulate data as organized by the appropriate data model. There are
basically two types:
• Procedural DMLs require a user to specify what data are needed and how to
get those data.
• Declarative DMLs (also referred to as nonprocedural DMLs) require a user to
Database Access from Application Programs
Application programs are programs that are used to interact with the database.
Application programs are usually written in a host language, such as Cobol, C,
C++, or Java.
To access the database, DML statements need to be executed from the host
language. There are two ways to do this:
• By providing an application program interface (set of procedures) that can be
used to send DML and DDL statements to the database, and retrieve the results.
The Open Database Connectivity (ODBC) standard defined by Microsoft for use
with the C language is a commonly used application program interface standard.
The Java Database Connectivity (JDBC) standard provides corresponding
features to the Java language.
• By extending the host language syntax to embed DML calls within the host
language program. Usually, a special character prefaces DML calls, and a
preprocessor, called the DML precompiler, converts the DML statements to
normal procedure calls in the host language.

K.S.GOBINATH M.E., (MBA)., (AP/CSE), SBM CET CS6302 –DBMS STUDY MATERIAL
8. Discuss about database users and administrators. (8)
Database Users and Administrators
A primary goal of a database system is to retrieve information from and
store new information in the database. People who work with a database can be
categorized as database users or database administrators.
Database Users and User Interfaces
Naive users
Naive users are unsophisticated users who interact with the system by
invoking one of the application programs that have been written previously. For
example, a bank teller who needs to transfer $50 from account A to account B
invokes a program called transfer
Application programmers
Application programmers are computer professionals who write
application programs. Application programmers can choose from many tools to
develop user interfaces. Rapid application development (RAD) tools are tools
that enable an application programmer to construct forms and reports without
writing a program.
Sophisticated users
Sophisticated users interact with the system without writing programs.
Instead, they form their requests in a database query language. They submit each
such query to a query processor, whose function is to break down DML
statements into instructions that the storage manager understands. Analysts who
submit queries to explore data in the database fall in this category.

K.S.GOBINATH M.E., (MBA)., (AP/CSE), SBM CET CS6302 –DBMS STUDY MATERIAL
Online analytical processing (OLAP)
Online analytical processing (OLAP) tools simplify analysts’ tasks by
letting them view summaries of data in different ways. For instance, an analyst
can see total sales by region (for example, North, South, East, andWest), or by
product, or by a combination of region and product (that is, total sales of each
product in each region).
Another class of tools for analysts is data mining tools, which help them find
certain kinds of patterns in data
Specialized users
Specialized users are sophisticated users who write specialized database
applications that do not fit into the traditional data-processing framework.
Among these applications are computer-aided design systems, knowledge base
and expert systems, systems that store data with complex data types (for
example, graphics data and audio data), and environment-modeling systems.
Database Administrator
One of the main reasons for using DBMSs is to have central control of both
the data and the programs that access those data. A person who has such central
control over the system is called a database administrator (DBA). The
functions of a DBA include: • Schema definition. The DBA creates the original
database schema by executing a set of data definition statements in the DDL.
• Storage structure and access-method definition.
• Schema and physical-organization modification. The DBA carries out
changes to the schema and physical organization to reflect the changing needs of
the organization, or to alter the physical organization to improve performance.
• Granting of authorization for data access. By granting different types of
authorization, the database administrator can regulate which parts of the database
various users can access. The authorization information is kept in a special

K.S.GOBINATH M.E., (MBA)., (AP/CSE), SBM CET CS6302 –DBMS STUDY MATERIAL
system structure that the database system consults whenever someone attempts to
access the data in the system.
• Routine maintenance. Examples of the database administrator’s routine
maintenance activities are:
Periodically backing up the database, either onto tapes or onto remote
servers, to prevent loss of data in case of disasters such as flooding. �Ensuring
that enough free disk space is available for normal operations, and upgrading
disk space as required.
Monitoring jobs running on the database and ensuring that performance is
not degraded by very expensive tasks submitted by some users.

9. What is normal form? Explain the difference types of normal form.


First Normal Form:
*Tables are said to be in first normal form when:
* The table has a primary key.
* No single attribute (column) has multiple values.
*The non-key attributes (columns) depend on the primary key.
Some examples of placing a table in first normal form are:
author_id: stories:

000024 novelist, playwright // multiple values


000034 magazine columnist // multiple values
002345 novella, newpaper columnist

In first normal form the table would look like:


author_id: stories:

000024 novelist
000024 playwright
000034 magazine columnist
002345 novella
002345 newpaper columnist

K.S.GOBINATH M.E., (MBA)., (AP/CSE), SBM CET CS6302 –DBMS STUDY MATERIAL
Second Normal Form:
=================
*Tables are said to be in second normal form when:
*The tables meet the criteria for first normal form.
*If the primary key is a composite of attributes (contains multiple
columns), the non key attributes (columns) must depend on the whole
key.
Note: Any table with a primay key that is composed of a single
attribute (column) is automatically in second normal form.

Third Normal Form:


================
*Tables are said to be in third normal form when:
* The tables meet the criteria for second normal form.
*Each non-key attribute in a row does not depend on the entry in
another key column.

Dependency preservation

Databases can be described as all of the following:


Information – sequence of symbols that can be interpreted as a message.
Information can recorded as signs, or transmitted as signals.
Data – values of qualitative or quantitative variables, belonging to a set of items.
Data bicomputing (or data processing) are often represented by a combination of
items organized in rows and multiple variables organized in columns. Data are
typically the results of measurements and can be visualized using graphs or
images.

Computer data – information in a form suitable for use with a computer. Data is
often distinguished from programs. A program is a sequence of instructions that
detail a task for the
computer to perform. In this sense, data is everything in software that is not
program code.

Boyce/codd normal form


Boyce–Codd normal form (or BCNF or 3.5NF) is a normal form used in
database normalization. It is a slightly stronger version of the third normal form
(3NF). BCNF was developed in 1974 by Raymond F. Boyce and Edgar F. Codd
to address certain types of anomaly not dealt with by 3NF as originally defined
K.S.GOBINATH M.E., (MBA)., (AP/CSE), SBM CET CS6302 –DBMS STUDY MATERIAL
Chris Date has pointed out that a definition of what we now know as BCNF
appeared in a paper by Ian Heath in Date writes:
"Since that definition predated Boyce and Codd's own definition by some three
years, it seems to me that BCNF ought by rights to be called Heath normal form.
But it isn't.

If a relational scheme is in BCNF then all redundancy based on functional


dependency has been removed, although other types of redundancy may still
exist. A relational schema R is in

Boyce– Codd normal form if and only if for every one of its dependencies X →
Y, at least one of the following conditions hold:[4]
X → Y is a trivial functional dependency (Y ⊆ X)
X is a superkey for schema R

Fourth normal form


Fourth normal form (4NF) is a normal form used in database normalization.
Introduced by
Ronald Fagin in 1977, 4NF is the next level of normalization after Boyce–Codd
normal form
(BCNF). Whereas the second, third, and Boyce–Codd normal forms are
concerned with
functional dependencies, 4NF is concerned with a more general type of
dependency known as a multivalued dependency. A Table is in 4NF if and only
if, for every one of its non-trivial
multivalued dependencies X superset thereof.
Y, X is a superkey—that is, X is either a candidate key or a
Join dependencies and fifth normal form.
A join dependency is a constraint on the set of legal relations over a database
scheme. A table T is subject to a join dependency if T can always be recreated
by joining multiple tables each having a subset of the attributes of T. If one of
the tables in the join has all the attributes of the
table T, the join dependency is called trivial.

Fifth normal form

Fifth normal form (5NF), also known as Project-join normal form (PJ/NF) is
a level of atabase normalization designed to reduce redundancy in relational
K.S.GOBINATH M.E., (MBA)., (AP/CSE), SBM CET CS6302 –DBMS STUDY MATERIAL
databases recording multi-valued facts by isolating semantically related multiple
relationships. A table is said to be in the 5NF if and only if every join
dependency in it is implied by the candidate keys.

A join dependency *{A, B, … Z} on R is implied by the candidate key(s) of R if


and only if each of A, B, …, Z is a superkey for R.

Only in rare situations does a 4NF table not conform to 5NF. These are situations
in which a
complex real-world constraint governing the valid combinations of attribute
values in the 4NF table is not implicit in the structure of that table. If such a
table is not normalized to 5NF, the burden of maintaining the logical consistency
of the data within the table must be carried partly by the application responsible
for insertions, deletions, and updates to it; and there is a heightened risk that the
data within the table will become inconsistent. In contrast, the 5NF design
excludes the possibility of such inconsistencies.

Notation for Query Trees and Query Graphs


A query tree is a tree data structure that corresponds to a relational algebra
expression. It represents the input relations of the query as leaf nodes of the tree,
and represents the relational algebra operations as internal nodes. An execution
of the query tree consists of executing an internal node operation whenever its
operands are available and then replacing that internal node by the relation that
results from executing the operation. The order of execution of operations starts
at the leaf nodes, which represents the input database relations for the query, and
ends at the root node, which represents the final operation of the query. The
execution terminates when the root node operation is executed and produces the
result relation for the query.

10) Explain the purpose of DBMS


Purpose of Database System:

Database management systems were developed to handle the following


difficulties of typical file-processing systems supported by conventional
operating systems.
(i) Data redundancy and Consistency
(ii) Self Describing Nature (of Database System)
(iii) Data isolation or Abstraction
(iv) Integrity
K.S.GOBINATH M.E., (MBA)., (AP/CSE), SBM CET CS6302 –DBMS STUDY MATERIAL
(v) Atomicity
(vi) Concurrent Access or sharing of Data
(vii) Security
(viii) Support for multiple views of Data

(i) Data redundancy and Consistency:

In file System, each user maintains separate files and programs to manipulate
these files because each requires some data not available from other user‘s files.
This redundancy in defining and storage of data results in
 wasted storage space,
 redundant efforts to maintain common update,
 higher storage and access cost and

 Leads to inconsistency of data (ie.,) various copies of same data may not
agree.

In Database approach, a single repository of data is maintained that is maintained


that is defined once and then accessed by various users. Thus redundancy is
controlled and it is consistent.

(ii) Self Describing Nature of Database System In File System, the structure
of the data file is embedded in the access programs. A database system contains
not only the database itself but also a complete definition or description of
database structure and constraints. This definition is stored in System catalog
which contains information such as structure of each file, type and storage format
of each data item and various constraints on the data. Information stored in the
catalog is called Meta-Data. DBMS is not written for specific applications, hence
it must refer to catalog to know structure of file etc., and hence it can work
equally well with any number of database applications.

(iii) Data Isolation or Abstraction: Conventional File processing systems do


not allow data to be retrieved in convenient and efficient manner. More
responsive data retrieval systems are required for general use. The structure of
the data file is embedded in the access programs. So any changes to the structure
of a file may require changing all programs that access this file. Because data are
scattered in various files and files may be in different formats, writing new
K.S.GOBINATH M.E., (MBA)., (AP/CSE), SBM CET CS6302 –DBMS STUDY MATERIAL
application programs to retrieve appropriate data is difficult. But the DBMS
access programs do not require such changes in most cases. The structure of data
files is stored in DBMS catalog separately from the access programs. This
property is known as program data independence. Operations are separately
specified and can be changed without affecting the interface. User application
programs can operate on data by invoking these operations regardless of how
they are implemented. This property is known as program operation
independence. Program data independence and program operation independence
are together known as data independence.

(iv) Enforcing Integrity Constraints: The data values must satisfy certain types
of consistency constraints. In File System, Developers enforce constraints by
adding appropriate code in application program. When new constraints are
added, it is difficult to change the programs to enforce them. In data base system,
DBMS provide capabilities for defining and enforcing constraints. The
constraints are maintained in system catalog. Therefore application programs
work independently with addition or modification of constraints. Hence integrity
problems are avoided.

(v) Atomicity: A Computer system is subjected to failure. If failure occurs, the


data has to be restored to the consistent state that existed prior to failure. The
transactions must be atomic – it must happen in entirety or not at all. It is difficult
to ensure atomicity in File processing System. In DB approach, the DBMS
ensures atomicity using the Transaction manager inbuilt in it. DBMS supports
online transaction processing and recovery techniques to maintain atomicity.

(vi) Concurrent Access or sharing of Data: When multiple users update the
data simultaneously, it may result in inconsistent data. The system must maintain
supervision which is difficult because data may be accessed by many different
application programs that may have not been coordinated previously. The
database (DBMS) include concurrency control software to ensure that several
programs /users trying to update the same data do so in controlled manner, so
that the result of update is correct.

(vii) Security: Every user of the database system should not be able to access the
data. But since the application programs are added to the system in an adhoc
manner, enforcing such security constraints is difficult in file system. DBMS
provide security and authorization subsystem, which the DBA uses to create
accounts and to specify account restrictions.

K.S.GOBINATH M.E., (MBA)., (AP/CSE), SBM CET CS6302 –DBMS STUDY MATERIAL
(viii) Support for multiple views of Data: Database approach support multiple
views of data. A database has many users each of whom may require a different
view of the database. View may be a subset of database or virtual data retrieved
from database which is not explicitly stored. DBMS provide multiple views of
the data or DB. Different application programs are to be written for different
views of data.

UNIT II
1. List the string operations supported by SQL?

BIT_LENGTH( ) - Returns length of argument in bits


CHAR( )- Returns the character for each integer passed
CHAR LENGTH( )- Returns number of characters in argument
etc..,

2. List the set operations of SQL?

Set operators are called compound queries


UNION,
UNION ALL,
INTERSECT,
and MINUS.

3. What are aggregate functions? And list the aggregate functions supported
by SQL?

Aggregate function itself describes returns some value

AVG() - Returns the average value. COUNT() - Returns the number of rows.
FIRST() - Returns the first value. LAST() - Returns the last value. MAX() -
Returns the largest value. MIN() - Returns the smallest value. SUM() - Returns
the sum

4. What is the use of group by clause?


SQL - Group By. The SQL GROUP BY clause is used in collaboration
with the SELECT statement to arrange identical data into groups. The GROUP

K.S.GOBINATH M.E., (MBA)., (AP/CSE), SBM CET CS6302 –DBMS STUDY MATERIAL
BY clause follows the WHERE clause in a SELECT statement and precedes the
ORDER BY clause.

5. What is the use of sub queries?


A Subquery or Inner query or Nested query is a query within another SQL query
and embedded within the WHERE clause. A subquery is used to return data that
will be used in the main query as a condition to further restrict the data to be
retrieved.
Example – SELECT.

6. What is view in SQL? How is it defined?


In SQL, a view is a virtual table based on the result-set of
an SQL statement. A view contains rows and columns, just like a real table. The
fields in a view are fields from one or more real tables in the database.

7. Write a SQL statement to find the names and loan numbers of all
customers who have a loan at Chennai branch.

select distinct customer-name


from borrower
where customer-name in (select customer-name
from depositor)

8. Consider the following relation :


EMP (ENO, NAME, DATE_OF_BIRTH, SEX, DATE_OF_JOINING,
BASIC_PAY, DEPT)
Develop an SQL query that will find and display the average BASIC_PAY
in each DEPT.

9. What is the use of with clause in SQL?


Subquery Factoring. The WITH clause, or subquery factoring clause, The
WITH clause may be processed as an inline view or resolved as a temporary
table. The advantage of the latter is that repeated references to the subquery may
be more efficient as the data is easily retrieved from the temporary table, rather
than being requeried by each reference.

10. List the table modification commands in SQL?

K.S.GOBINATH M.E., (MBA)., (AP/CSE), SBM CET CS6302 –DBMS STUDY MATERIAL
List of common commands to be performed on a SQL table, to change its
structure. The commands to access the content with SELECT or adding rows
with INSERT or UPDATE are not addressed here.

11. What is transaction?


A database transaction is a sequence of actions that are treated as a single
unit of work. These actions should either complete entirely or take no effect at
all.Transaction management is an important part of and RDBMS oriented
enterprise applications to ensure data integrity and consistency

12. List the SQL domain Types?


An SQL Domain is a user-defined, named set of values.
Domain types: char(n) (or character(n)): fixed-length character string, with user-
specified length. varchar(n) (or character varying): variable-length character
string, with user-specified maximum length. int or integer: an integer (length is
machine-dependent).

13. Describe a circumstance in which you would choose to use Embedded


SQL rather than using SQL alone.
Embedded SQL is a method of combining the power of a programming
language and the database. Embedded SQL statements are processed by a special
SQL precompiler. The embedded SQL statements are parsed by an embedded
SQL.
The output from the preprocessor is then compiled by the host compiler. This
allows programmers to embed SQL statements in programs written in any
number of languages such as: C Programming language family, COBOL,
FORTRAN and Java.

14. What are the four broad categories of constraints?


Primary key
Foreign key
Stored procedures
Index

15. What is meant by Cost Estimation?


A cost estimate is the approximation of the cost of a program, project, or
operation. The cost estimate is the product of the cost estimating process.
The cost estimate has a single total value and may have identifiable component
values.
K.S.GOBINATH M.E., (MBA)., (AP/CSE), SBM CET CS6302 –DBMS STUDY MATERIAL
16. List the SQL statements used for transaction control.
A Transaction Control Language (TCL) is a computer language and a subset
of SQL, used to controltransactional processing in a database. A transaction is
logical unit of work that comprises one or more SQL statements, usually a
group of Data Manipulation Language (DML) statements.
 COMMIT: to save the changes.

 ROLLBACK: to rollback the changes.

 SAVEPOINT: creates points within groups of transactions in which to


ROLLBACK

 SET TRANSACTION: Places a name on a transaction

17. With an example explain referential integrity.


Referential integrity is a database concept that ensures that relationships
between tables remain consistent. When one table has a foreign key to another
table, the concept of referential integrity states that you may not add a record to
the table that contains the foreign key. For example, if one deletes a donor from
the Donor table, without also deleting the corresponding donations from
the Donation table, then the DonorID field in the Donationrecord would refer to a
non-existent donor.

18. What is domain integrity? Give example.


Domain integrity specifies that all columns in a relational database must be
declared upon a defined domain. The primary unit of data in the relational data
model is the data item. Such data items are said to be non-decomposable or
atomic. A domain is a set of values of the same type. Domains are therefore
pools of values from which actual values appearing in the columns of a table are
drawn
For example, if you define the attribute of Age, of an Employee entity, is an
integer, the value of every instance of that attribute must always be numeric and
an integer.

19. List SQL Data types.


Data type Description

K.S.GOBINATH M.E., (MBA)., (AP/CSE), SBM CET CS6302 –DBMS STUDY MATERIAL
VARCHAR(n) or CHARACTER Character string. Variable length.
VARYING(n) Maximum length n

BINARY(n) Binary string. Fixed-length n

BOOLEAN Stores TRUE or FALSE values

VARBINARY(n) or BINARY Binary string. Variable length.


VARYING(n) Maximum length n

20. Mention SQL Database Objects.


A View is subset of part of a database. It is a personalized model of a
database. A view can hide data that a user does not need to see. Simplifies the
usage of the system and enhance security. A user who is not allowed to directly
access a relation may be allowed to access a part of a relation through view.
Views may also be called as a virtual table.

16/10/8 Marks Questions

1.What is query processing technique?

Query processing refers to the range of activities involved in extracting data


from a database. The activities include translation of queries in high - level
database languages into expressions that can be used at the physical level of the
file system, a variety of query - optimizing transformations, and actual
evaluation of queries.
A database query is the vehicle for instructing a DBMS to update or
retrieve specific data to/from the physically stored medium. The actual updating
and retrieval of data is performed through various “low - level” operations

. Examples of such operations for a relational DBMS can be relational algebra


operations such as project, join, select, Cartesian product, etc
.

K.S.GOBINATH M.E., (MBA)., (AP/CSE), SBM CET CS6302 –DBMS STUDY MATERIAL
While the DBMS is designed to process these low - level o perations efficiently,
it can be quite the burden to a user to submit requests to the DBMS in these
formats. There are three phases that a query passes through during the DBMS’
processing of that query:
1. Parsing and translation
2. Optimization
3. Evaluation
The first step in processing a query submitted to a DBMS is to convert the
query into a form usable by the query processing engine. High - level query
languages such as SQL represent a query as a string, or sequence, of characters.
Certain sequences of characters represent various types of tokens such as
keywords, operators, operands, literal strings, etc. Like all languages, there are
rules ( syntax and grammar) that govern how the tokens can be combined into
understandable (i.e. valid) statements

2. What is a view? How can it be created? Explain with an example.


View
A View is subset of part of a database. It is a personalized model of a database. A
view can hide data that a user does not need to see. Simplifies the usage of the
system and enhance security. A user who is not allowed to directly access a
relation may be allowed to access a part of a relation through view. Views may
also be called as a virtual table.
Provide a mechanism to hide certain data from the view of certain users. To
create a view we use the command:
create view v as <query expression> where:<query expression> is any legal
expression The view name is represented by v create view <viewname> as select
<fields> from <table name>;
Update of a View
 Create a view of all loan data in loan relation, hiding the amount attribute
 create view branch-loan as select branch-name, loan-number from loan

Add a new tuple to branch-loan


 insert into branch-loan values (‗Perryridge‘, ‗L-307‘)
 This insertion must be represented by the insertion of the tuple
 (‗L-307‘, ‗Perryridge‘, null)
 into the loan relation
 Updates on more complex views are difficult or impossible to translate, and
hence are disallowed.
 Most SQL implementations allow updates only on simple views (without
aggregates) defined on a single relation
Three types
K.S.GOBINATH M.E., (MBA)., (AP/CSE), SBM CET CS6302 –DBMS STUDY MATERIAL
a) Read only view
b) Upgradable view
c) Destroy view

Advantages of Views:
1. Provide automatic security for hidden data.
2. Different views of same data for different users.
3. Provide logical data independence.
4. Provides the principle of interchangeability and principle of database
relativity.

3. Discuss in detail the operators SELECT, PROJECT, UNION with


suitable examples.

The Select Operation

The select operation selects tuples that satisfy a given predicate.We use the
lowercase Greek letter sigma (σ) to denote selection. The predicate appears as a
subscript to σ.
The argument relation is in parentheses after the σ. Thus, to select those
tuples of the loan relation where the branch is “Perryridge,” we write

σ branch-name =“Perryridge” (loan)

Then the relation that results from the preceding query is as shown in Figure
3.10.

We can find all tuples in which the amount lent is more than $1200 by writing
σamount>1200 (loan)

The Project Operation

Suppose we want to list all loan numbers and the amount of the loans, but
do not care about the branch name.

The project operation allows us to produce this relation. The project


operation is a unary operation that returns its argument relation, with certain
attributes left out.
K.S.GOBINATH M.E., (MBA)., (AP/CSE), SBM CET CS6302 –DBMS STUDY MATERIAL
Since a relation is a set, any duplicate rows are eliminated. Projection is
denoted by the uppercase Greek letter pi (Π).We list those attributes that we wish
to appear in the result as a subscript to Π.

The argument relation follows in parentheses. Thus, we write the query to


list all loan numbers and the amount of the loan as

Π loan-number, amount(loan)

The Union Operation

Consider a query to find the names of all bank customers who have either
an account or a loan or both. Note that the customer relation does not contain the
information, since a customer does not need to have either an account or a loan at
the bank.

To answer this query, we need the information in the depositor relation (Figure
3.5) and in the borrower relation (Figure 3.7).We know how to find the names of
all customers with a loan in the bank:

Πcustomer-name (borrower )

We also know how to find the names of all customers with an account in the
bank:

Πcustomer-name (depositor)

To answer the query, we need the union of these two sets; that is, we need all
customer names that appear in either or both of the two relations.We find these
data by the binary operation union, denoted, as in set theory, by ∪. So the
expression needed
Is Πcustomer-name (borrower ) ∪ Πcustomer-name (depositor)

4. Explain static and dynamic SQL in detail.

Static Vs Dynamic SQL


SLNO Static (embedded)Sql Dynamic (interactive)Sql

K.S.GOBINATH M.E., (MBA)., (AP/CSE), SBM CET CS6302 –DBMS STUDY MATERIAL
1 In static SQL how database will In dynamic SQL, how
be accessed is predetermined in database will be accessed
the embedded SQL is determined at run time.
statement.Hard coded

2 SQL statements are compiled at SQL statements are


compile time. compiled at run time.
It is more fast and efficient. It is less fast and efficient.

3 Parsing, validation, optimization, Parsing, validation,


and generation of application optimization, and
plan are done at compile time. generation of application
plan are done at run time.

4 It is generally used for situations It is generally used for


where data is distributed situations where data is
uniformly. distributed non-uniformly.

5 EXECUTE IMMEDIATE, EXECUTE IMMEDIATE,


EXECUTE and PREPARE EXECUTE and PREPARE
statements are not used. statements are used.

6 It is less flexible. It is more flexible.

5. Diagrammatically illustrate and discuss the steps involved in processing a


query.
1. Parsing and translation
2. Optimization
3. Evaluation

K.S.GOBINATH M.E., (MBA)., (AP/CSE), SBM CET CS6302 –DBMS STUDY MATERIAL
Parsing and Translation:
 Translates the query into its internal form. This is then translated into
relational algebra
 Parser checks syntax,verifies relations

Evaluation
 The query-execution engine takes a query-evaluation plan, executes that
plan, and returns the answers to the query.

Optimization
 A relational algebra expression may have many equivalent expressions
 E.g., sbalance<2500(II balance(account)) is equivalent to II balance(
balance<2500(account))
Each relational algebra operation can be evaluated using one of several
different algorithm

6. Give briefly about Query evaluations cost & Selection operation

Heuristics and Cost Estimates in Query Optimization

A drawback of cost-based optimization is the cost of optimization itself.


Although the cost of query processing can be reduced by clever optimizations,
cost-based optimization is still expensive. Hence, many systems use heuristics to
reduce the number of choices that must be made in a cost-based fashion. Some
systems even choose to use only heuristics, and do not use cost-based
optimization at all.
K.S.GOBINATH M.E., (MBA)., (AP/CSE), SBM CET CS6302 –DBMS STUDY MATERIAL
An example of a heuristic rule is the following rule for transforming
relational algebra queries:

Perform selection operations as early as possible.


A heuristic optimizer would use this rule without finding out whether the
cost is reduced by this transformation. In the first transformation the selection
operation was pushed into a join.
We say that the preceding rule is a heuristic because it usually, but not
always, helps to reduce the cost. For an example of where it can result in an
increase in cost, consider an expression σθ(r s), where the condition θ refers to
only attributes in s. The selection can certainly be performed before the join.
However, if r is extremely small compared to s, and if there is an index on the
join attributes of s, but no index on the attributes used by θ, then it is probably a
bad idea to perform the selection early.
Performing the selection early—that is, directly on s—would require doing
a scan of all tuples in s. It is probably cheaper, in this case, to compute the join
by using the index, and then to reject tuples that fail the selection.
The projection operation, like the selection operation, reduces the size of
relations. Thus, whenever we need to generate a temporary relation, it is
advantageous to apply immediately any projections that are possible. This
advantage suggests a companion to the ―perform selections early‖ heuristic:

Perform projections early

It is usually better to perform selections earlier than projections, since


selections have the potential to reduce the sizes of relations greatly, and
selections enable the use of indices to access tuples.
An example similar to the one used for the selection heuristic should
convince you that this heuristic does not always reduce the cost. A heuristic
optimization algorithm will reorder the components of an initial query tree to
achieve improved query execution.
Heuristics can be understood by visualizing a query expression as a tree.

7. How does a DBMS represent a relational query evaluation plan?

QUERY EVALUATION PLAN:

K.S.GOBINATH M.E., (MBA)., (AP/CSE), SBM CET CS6302 –DBMS STUDY MATERIAL
 A query evaluation plan (or simply plan) consist of an extended
relational relational algebra tree, with additional annotations at
each node indicating:
 Tha access methods to use for each table.
 The implementation method.
 SELECT S.name FROM Reserves R,Sailors R WHERE
R.sid=S.sid AND R.bid=100 AND S.rating>5
This query can be expressed as:
( =100^rating>5

 When the query involves several operators, sometimes the result of one
is pipelined into next.
 In this case, no temporary relation is written to disk (materialized).
 The result is fed to the next operator as soon as it is available.
 It is cheaper.
 When the input table to a unary operator is pipelined into it, we say it
is applied on-the-fly.

8. Since indices speed query processing, why might they not be kept on
several search keys? List as many reasons as possible.
Reason for not keeping several search indices include:
a.Every index requires additional CPU time and disk I/O overhead
during insert and deletion.
b.Indices on non-primary keys might have to be changed on updates
although an index on the primary key might not (this is because update
typically do not modify the primary key attributes).
c.`Each extra index requires additional storage space.
K.S.GOBINATH M.E., (MBA)., (AP/CSE), SBM CET CS6302 –DBMS STUDY MATERIAL
d.For queries which involve conditions on several search keys,efficiency
might not be bad even if only some of the keys have indices on
them.threfore database performance is improved less by adding indices
when many indices already exist.

9. Consider the database given by the following schemes.


Customer (Cust_No, Sales_ Person_No ,City)
Sales_ Person(Sales_ Person_No ,Sales_ Person_Name,
Common_Prec,Year_of_Hire)
Give an expression in SQL for each of the following queries:
Display the list of all customers by Cust_No with the city in which each is
located.
Select Cust No, city from Customer
List the names of the sales persons who have accounts in Delhi.
Select Sales_Person_Name from Sales_Person_Name where( select * from
customer where city = delhi)

10. Write short notes on the following:


Data Manipulation Language (DML), Data Definition Language (DDL)
Transaction Control Statements (TCS),Data Control Language (DCL)

Data Definition Language (DDL):


 Data definition language is used to create, alter and delete database
objects.
 The commands used are CREATE, ALTER and DROP.
 The principal logical data definition statements are CREATE
TABLE,CREATE VIEW,CREATE INDEX,ALTER TABLE,DROP
TABLE, DROP VIEW and DROP INDEX.
 The SQL DDL allow specifiction of not only a set of relations, but also
information about each relation, including:
 The schema for each relation.
 The domain values associated with each attribute.
 The integrity constraints.
K.S.GOBINATH M.E., (MBA)., (AP/CSE), SBM CET CS6302 –DBMS STUDY MATERIAL
 The set of indices to be maintained for each relation.
 The security and autherization information for each relation.
 The physical storage structure of each relation on disk.
Data Manipulation Languages (DML):
 Data manipulation laguages commands let users insert, delete and
modify the data in the database.
 SQL provides three data manipulation statements INSERT, UPDATE
and DELETE.
Data Control Languages (DCL):
 The data control languages consists of commands that control the user
access to the database object.
 Thus DCL mainly related to the security issues, i.e, determine who has
access to the database objects and what operation they can perform on
them.
 The task of the DCL is to prevent unautherized access to data.
 The database Administrator has the power to give and take the
privileges to a specific user, thus giving or denying access to the data.
 The DCL commands are GRANT and REVOKE.
Transaction Control Languages (TCL)
 Transaction control languages are languages, which manages all the
changes made the DML languages.
 For example, transaction languages commit data.
 Some of the transaction control languages are COMMIT,
ROLLBACK, SAVEPOINT and TRANSACTION.

11. Consider the employee database , where the primary keys are
Underlined.
employee(empname,street,city)
works(empname,companyname,salary)
company(companyname,city)
manages(empname,management)
K.S.GOBINATH M.E., (MBA)., (AP/CSE), SBM CET CS6302 –DBMS STUDY MATERIAL
Give an expression in the relational algebra for each request.
1) Find the names of all employees who work for First Bank
Corporation.
2) Find the names, street addresses and cities of residence of all
employees who work for First Bank Corporation and earn
more than 200000 per annum.
3) Find the names of all employees in this database who live in
the same city as the company for which they work.
4) Find the names of all employees who earn more than every
employees of small Bank Corporation.

a. ( =”First Bank Corporation”(work))


b. , ,
( First Bamk Corporation
⋈employee)
c. (employee ⋈ works ⋈ company)
d. ( " " (works))
If people may not work for any company:
(employee)-
( ( First Bank Corporation) (works))

12) Discuss about triggers. How do a triggers offer a powerful mechanism


A trigger is a statement that the system executes automatically as a side
effect of a modification to the database. To design a trigger mechanism, we must
meet two requirements:

1. Specify when a trigger is to be executed. This is broken up into an


event that causes the trigger to be checked and a condition
that must be satisfied for trig- ger execution to proceed.
2. Specify the actions to be taken when the trigger
executes.
The above model of triggers is referred to as the event-condition-action
model for triggers.
The database stores triggers just as if they were regular data, so that they
are per- sistent and are accessible to all database operations. Once we enter a
trigger into the database, the database system takes on the responsibility of
executing it whenever the specified event occurs and the corresponding
K.S.GOBINATH M.E., (MBA)., (AP/CSE), SBM CET CS6302 –DBMS STUDY MATERIAL
condition is satisfied.

Need for Triggers


• Insert a new tuple s in the loan relation with
s[loan-number] = t[account-
number] s[branch-name] =
t[branch-name] s[amount] =
−t[balance]
(Note that, since t[balance] is negative, we negate t[balance]
to get the loan amount — a positive number.)
• Insert a new tuple u in the borrower relation with
u[customer -name ] = “Jones”
u[loan-number] = t[account-number]
• Set t[balance] to 0.
Triggers in SQL
SQL-based database systems use triggers widely, although before SQL:1999
they were not part of the SQL standard. Unfortunately, each database system
implemented its
create trigger overdraft-trigger after update on account
referencing new row as nrow
for each row
when nrow.balance < 0
begin atomic
insert into borrower
(select customer-name, account-number
from depositor
where nrow.account-number = depositor.account-
number);
insert into loan values
(nrow.account-number, nrow.branch-name, −
nrow.balance);
update account set balance = 0
where account.account-number = nrow.account-number
end

UNIT III

1. What do you mean by a transaction?


A transaction is a unit of program execution that accesses and possibly
updates various data items. A transaction consists of collection of operations
used to perform a particular task. Each transaction begins with BEGIN
TRANSACTION statement and ends with END TRANSACTION statement.

K.S.GOBINATH M.E., (MBA)., (AP/CSE), SBM CET CS6302 –DBMS STUDY MATERIAL
Action, or series of actions, carried out by user or application, which
accesses or changes contents of database. It Transforms database from one
consistent state to another, although consistency may be violated during
transaction

2. Define the term ACID properties.


ACID (Atomicity, Consistency, Isolation, Durability) ACID is a set
of properties of database transactions. In the context of databases, a single
logical operation on the data is called a transaction.

3. What are the three kinds of intent locks?


There are three types of intent locks: Intent share (IS), intent exclusive
(IX), and share with intent exclusive (SIX). Intent share: The transaction intends
to read but not update data pages and, therefore, takes S locks on them; it
tolerates concurrent transactions taking S, IS, SIX, IX, or U locks.

4. What are two pitfalls (problems) of lock-based protocols?

 Deadlock
 Starvation

5. What is recovery management component?


Ensuring durability is the responsibility of a software component of the
base system called the recovery management component

6. When is a transaction rolled back?


Any changes that the aborted transaction made to the database must be
undone. Once the changes caused by an aborted transaction have been undone,
then the transaction has been rolled back.

7. What are the states of transaction?

 Active
 Partially committed
 Failed
 Aborted
 Committed
 Terminated

K.S.GOBINATH M.E., (MBA)., (AP/CSE), SBM CET CS6302 –DBMS STUDY MATERIAL
8. What is a shadow copy scheme?
It is simple, but efficient, scheme called the shadow copy schemes. It is
based on making copies of the database called shadow copies that one transaction
is active at a time. The scheme also assumes that the database is simply a file on
disk

9. Give the reasons for allowing concurrency?


The reasons for allowing concurrency is if the transactions run serially, a
short transaction may have to wait for a preceding long transaction to complete,
which can lead to unpredictable delays in running a transaction. So concurrent
execution reduces the unpredictable delays in running transactions.

10. What is average response time?


The average response time is that the average time for a transaction to be
completed after it has been submitted.

11. What are the different modes of lock?


The modes are
 SHARED
 EXCLUSIVE

12. Define deadlock?


Neither of the transaction can ever proceed with its normal execution. This
situation is called deadlock

13. Define upgrade and downgrade?


It provides a mechanism for conversion from shared lock to exclusive lock
is known as upgrade.
It provides a mechanism for conversion from exclusive lock to shared lock
is known as downgrade.

14. What is a database graph?


The partial ordering implies that the set D may now be viewed as a directed
acyclic graph, called a database graph.

15. What are the two methods for dealing deadlock problem?

The two methods for dealing deadlock problem is


 Deadlock detection and
 Deadlock recovery
K.S.GOBINATH M.E., (MBA)., (AP/CSE), SBM CET CS6302 –DBMS STUDY MATERIAL
16. What is a recovery scheme?
An integral part of a database system is a recovery scheme that can restore
the database to the consistent state that existed before the failure.

17. What are the two types of errors?

The two types of errors are:


 Logical error
 System error

18. Explain current page table and shadow page table.


The key idea behind the shadow paging technique is to maintain two page
tables during the life of the transaction: the current page table and the shadow p
age table. Both the page tables are identical when the transaction starts. The
current page table may b e changed when a transaction performs a write
operation.

19. What are the drawbacks of shadow-paging technique?

 Commit Overhead
 Data fragmentation
 Garbage collection

20.How the time stamps are implemented


Use the value of the system clock as the time stamp. That is a transaction‟s
time stamp is equal to the value of the clock when the transaction enters the
system.
Use a logical counter that is incremented after a new timestamp has been
assigned; that is the time stamp is equal to the value of the counter

21.What is meant by log-based recovery?


The most widely used structures for recording database modifications is the
log. The log is a sequence of log records, recording all the update activities in the
database. There are several types of log records.

22.What are the storage types?

The storage types are:


 Volatile storage
K.S.GOBINATH M.E., (MBA)., (AP/CSE), SBM CET CS6302 –DBMS STUDY MATERIAL
 Nonvolatile storage

23.Define blocks?
The database system resides permanently on nonvolatile storage, and is
partitioned into fixed-length storage units called blocks

24. What is meant by Physical blocks?


The input and output operations are done in block units. The blocks
residing on the disk are referred to as physical blocks.

26. What is meant by buffer blocks?


The blocks residing temporarily in main memory are referred to as buffer
blocks

27.Define garbage collection.


Garbage may be created also as a side effect of crashes. Periodically, it is
necessary to find all the garbage pages and to add them to the list of free pages.
This process is called garbage collection.

28.Differentiate strict two phase locking protocol and rigorous two phase
locking
protocol.
In strict two phase locking protocol all exclusive mode locks taken by a
transaction is held until that transaction commits.
Rigorous two phase locking protocol requires that all locks be held until the
transaction commits.

29. Define shadow paging.


An alternative to log-based crash recovery technique is shadow paging. This
technique needs fewer disk accesses than do the log-based methods

30.Explain current page table and shadow page table.


The key idea behind the shadow paging technique is to maintain two page tables
during the life of the transaction: the current page table and the shadow p age
table. Both the page tables are identical when the tran

31. Define page.


The database is partitioned into some number of fixed-length blocks, which are
referred to as pages.

K.S.GOBINATH M.E., (MBA)., (AP/CSE), SBM CET CS6302 –DBMS STUDY MATERIAL
32.How the time stamps are implemented
• Use the value of the system clock as the time stamp. That is a transaction‟s time
stamp is equal to the value of the clock when the transaction enters the system.
• Use a logical counter that is incremented after a new timestamp has been
assigned; that is the time stamp is equal to the value of the counter.

33. What are the time stamps associated with each data item?
• W-timestamp (Q) denotes the largest time stamp if any transaction that
executed WRITE (Q) successfully.
• R-timestamp (Q) denotes the largest time stamp if any transaction that executed
READ (Q) successfully.

UNIT:3

16/10/8 Marks Questions

1. Describe about testing of Serializability.


Basic assumption:
 Each transaction preserves database consistency.
 Thus serial execution of a set of transactions preserves database consistency
 A (possibly concurrent) schedule is serializable if it is equivalent to a serial
schedule.
Different forms of schedule equivalence give rise to the notions of:
1. Conflict Serializability
2. View Serializability
Conflict Serializability
A schedule is conflict serializable if it can be transformed into a serial schedule
by a series of swapping’s of adjacent non-conflicting actions,
Instructions Ii and Ij, of transactions Ti and Tj respectively, conflict if
and only if there exists some item Q accessed by both Ii and Ij, and at least
one of these instructions wrote Q.
1.Ii = read( Q), Ij = read( Q). Ii and Ij don‗t conflict.
2.Ii = read( Q), Ij = write( Q). They conflict.
3.Ii = write( Q), Ij = read( Q). They conflict.
4.Ii = write( Q), Ij = write( Q). They conflict.
Intuitively, a conflict between Ii and Ij forces a temporal order
between them. If Ii and Ij are consecutive in a schedule and they do not
conflict, their results would remain the same even if they had been
interchanged in the schedule.

K.S.GOBINATH M.E., (MBA)., (AP/CSE), SBM CET CS6302 –DBMS STUDY MATERIAL
If a schedule S can be transformed into a schedule S0 by a series of
swaps of non-conflicting instructions, we say that S and S0 are conflict
equivalent.
If a schedule Sis conflict serializable if it is conflict equivalent to
a serial schedule.

View Serializability Let Sand S0 be two schedules with the same set of
transactions. S and S0 are view equivalent if the following three conditions are
met: 1. For each data item Q, if transaction Ti reads the initial value of Q in
schedule S, then transaction Ti must, in schedule S0, also read the initial value of
Q. 2. For each data item Q, if transaction Ti executes read( Q) in schedule S, and
that value was produced by transaction Tj (if any), then transaction Ti must in
schedule S0 also read the value of Q that was produced by transaction Tj. 3. For
each data item Q, the transaction (if any) that performs the final write (Q)
operation in schedule S must perform the final write (Q) operation in schedule
S0. A schedule S is view serializable if it is view equivalent to a serial schedule.
Every conflict serializable schedule is also view serializable. The following
schedule is view-serializable but not conflict serializable

2. Explain the deferred and immediate modification versions of the log


based recovery scheme.
Log based recovery:
Log is the most widely used structure for recording database
modifications. The log is a sequence of log records, recording all the update
activities in the database.
K.S.GOBINATH M.E., (MBA)., (AP/CSE), SBM CET CS6302 –DBMS STUDY MATERIAL
There are several types of log records such as:
a) Update log record: It describes a single database write. It has following
fields
.Transaction identifier is the unique identifier of the transaction that
performed the write operation.
. Data item identifier is the unique identifier of the data item written.
Typically, it is the location on the disk of the data item.
. Old value is the value of the data item prior to the write.
. New value is the value of the data item that it will have after the write.
1) Deferred database modification
When a transaction partially commits, the information in the
log associated with the transaction is used in executing the deferred
writes. If a system crashes before the transaction completes its execution
or if transaction aborts, then the information on the log is simply
ignored.
EXAMPLE: Consider two transaction T0and T1. Transaction T0 transfers $
50 from account A to account B. i.e.
T0: read (B);
A: = A-50;
Write (A);
Read (B);
B: = B+50;
Write (B).
Let transaction T1 withdraws$ 100 from account C i.e.
T1: read (C);
C: = C-100;
Write (C).

K.S.GOBINATH M.E., (MBA)., (AP/CSE), SBM CET CS6302 –DBMS STUDY MATERIAL
2) Immediate database modification
The immediate modification technique allows database
modifications to be output to the database while the transaction is still in
the active state.
The recovery scheme uses two recovery procedures:
i) Undo (Ti): It restores the value of all data items updated by
transaction Ti to the old values.
ii) Redo (Ti): It sets the value of all data items updated by transaction
Ti to the new values.
After a failure has occurred, the recovery scheme consults the log to determine
which transactions need to be redone and which need to be undone.
. Transaction Ti needs to be undone if the log to contains the record <Ti
start>, but does not contain the record <Ti commit>.
. Transaction Ti needs to be redone if the log contains both the record
<Ti start> and the record <Ti commit>.
Let us reconsider the banking system, in which transactions T0 and T1 are
executed in the order To followed by T1. Suppose that the system crashes before
the completion of the transactions.

3. What are different types of schedules are acceptable for recoverability.

Recoverability
If a transaction Ti fails, we need to undo the effect of this transaction to
ensure the atomicity property of the transaction. In a system that allows
concurrent execution, it is necessary to ensure that any transaction T j that is
dependent on Ti should also be aborted. To achieve this surety, we need to place
restrictions on the type of schedules permitted in the system.
Types of schedules that are acceptable from the view point of recovery from
transaction failure are:

K.S.GOBINATH M.E., (MBA)., (AP/CSE), SBM CET CS6302 –DBMS STUDY MATERIAL
. Recoverable schedules
. Cascade less schedules
1) Recoverable schedules
A recoverable schedule is one where, for each pair of transactions
Ti and T j such that T j reads a data item previously written by Ti, the
commit operation of Ti appears before the commit operation of T j.
Consider schedule 11 in Fig. 4.29, in which T9 is a transaction that
performs only one instruction: read (A). Suppose that the system allows
T9 to commit immediately after executing the read (A) instruction. Thus,
T9 commits before T8 does. Suppose that T8 fails before it commits.
Since T9 has read the value of data item. A written by T8, we must abort
T9 to ensure transaction atomicity. However, T9 has already committed
and cannot be aborted. Thus, it is impossible to recover correctly from
the failure of T8. Thus, schedule 11 is non-recoverable schedule, which
should not be allowed. Most database system requires that all schedules
be recoverable.
T8 T9
Read (A)
Write (A)

Read (A)

Read (A)
2) Cascade less schedules
Even if a schedule is recoverable, to recover correctly from the
failure of a transaction Ti, we may have to roll back several transactions.
Such situations occur if transactions have read data written by Ti.
Consider schedule 12 of Fig. 4.30. Transaction T1 writes a
value of A that is read by transaction T2. Transaction T2 writes a value
of A that is read by T3. Suppose that, at this point, T1 fails, T1 must be
rolled back. Since T2 is dependent on T1, T2 must be rolled back.
Similarly as T3 is dependent on T2, T3 should also be rolled back. This

K.S.GOBINATH M.E., (MBA)., (AP/CSE), SBM CET CS6302 –DBMS STUDY MATERIAL
phenomenon, in which a single transaction failure leads to a series of
transaction roll backs, is called cascading rollback.
Cascading rollback is undesirable, since it leads to the
undoing of a significant amount of work. Therefore, schedules should
not contain cascading rollbacks. Such schedules are called cascading
schedules.
A cascading schedule is one where, for each pair of
transactions Ti and T j such that T j reads a data item previously written
by Ti, the commit operation of Ti appears before the read operation of T
j.
T1 T2 T3
Read (A)
Read (B)
Write (A)
Read (A)
Write (A)

Read (A)
Fig. 4.30 Schedule 12

4. Discuss on strict, two-phase locking protocol and time stamp-based


protocol.

Strict two-phase locking protocol: This protocol requires that locking should be
two- Phase, and all exclusive-mode locks taken by a transaction should be held
until the transaction. This requirement prevents any transaction from reading the
data written by any uncommitted transaction commits.
Timestamp based protocols
Time stamp based protocol ensures serialize ability. It selects an ordering
among transactions in advance using time stamps.
Timestamps
With each transaction in the system, a unique fixed timestamp is
associated. It is denoted by TS (Ti). This timestamp is assigned by the database
K.S.GOBINATH M.E., (MBA)., (AP/CSE), SBM CET CS6302 –DBMS STUDY MATERIAL
system before the transaction Ti status execution. If a transaction Ti has been
assigned timestamp TS (Ti), and a new transaction T j enters the system, then
TS<TS (T j).
Two methods are used for implementing timestamp:
i) Use the value of the system clock as the timestamp, that is, a
transactions timestamp is equal to the value of the clock when the
transaction enters the system.
ii) Use a logical counter, that is a transactions timestamp is equal to
the value of logical counter, when transaction enters the system.
After assigning a new timestamp, value of timestamp is increased.
The timestamps of the transactions determine the serialize ability order. Thus, if
TS (Ti)> TS (T j), then the system must ensure that in produced schedule
transaction Ti appears before transaction T j.
To implement this scheme, two timestamps are associated with each data
item Q.
i) W-timestamp (Q) denotes the largest timestamp of any transaction
that executed write (Q) successfully.
ii) R-timestamp (Q) denotes the largest timestamp of any transaction
that executed read (Q) successfully.
These timestamps are updated whenever a new read (Q) or write (Q)
instruction is executed.

5. Explain Time stamp-Based Concurrency Control protocol and the


modifications implemented in it.
The timestamp ordering protocol ensures that any conflicting read and write
operations are executed in timestamp order. This protocol operates as follows:
1. Suppose that transaction Ti issues read (Q).
a) If TS (Ti) < W-timestamp (Q), then Ti needs a value of Q that
was already overwritten. Hence, read operation is rejected, and Ti
is rolled back.

K.S.GOBINATH M.E., (MBA)., (AP/CSE), SBM CET CS6302 –DBMS STUDY MATERIAL
b) If TS (Ti) >_W-timestamp (Q), then the read operation is
executed, and R-timestamp (Q) is set to the maximum of R-
timestamp (Q) and TS (Ti).
2. Suppose that transaction Ti issues write (Q).
a) If TS (Ti) < R-timestamp (Q), then the value of Q that Ti is
producing was needed previously, and the system assumed that
the value would never produced. Hence, the system rejects write
operation and rolls Ti back.
b) If TS (Ti) < W-timestamp (Q), then Ti is attempting to write an
obsolete value of Q. Hence, the system rejects this write
operation and rolls back Ti.
c) Otherwise, the system executes the write operation and sets W-
timestamp (Q) to TS (Ti).
If a transaction Ti is rolled by the concurrency control scheme, the system
assigns it a new timestamp and restarts it.
Advantages
1) The timestamp ordering protocol ensures conflict serializability. This
is because conflicting operations are processed in timestamp order.
2) The protocol ensures freedom from deadlock, since no transaction
ever waits.
Disadvantage
1) There is a possibility of starvation of long transactions if a sequence of
conflicting short transactions causes repeated restarting of the long
transaction. If a transaction is found to be getting restarted repeatedly,
conflicting transactions need to be temporarily blacked to enable the
transaction to finish.
2) The protocol can generate schedules that are not recoverable.

6. Describe shadow paging recovery techniques.


Shadow paging
An alternative to log-based crash-recovery technique is shadow paging.
The database is partitioned into some number of fixed length
blocks, which are referred to as pages. The pages are stored in any random order
K.S.GOBINATH M.E., (MBA)., (AP/CSE), SBM CET CS6302 –DBMS STUDY MATERIAL
on disk. Therefore there should be some way to find out first page of the database
for any given i. For this purpose, page table is used. Page table has n entries
which points to n different pages on the disk. Each entry of the page table
contains pointer to one page on the disk.
The key idea behind the shadow paging technique is to maintain two page
tables during the life of a transaction:
1) Current page table
2) Shadow page table
When a transaction starts, both page tables are identical. The shadow page
table is never changed over the duration of the transaction. The current page table
may be changed when a transaction performs a write operation. All input and
output operations use the current page table to locate database pages on disk.
Suppose that the transaction T j performs a write (X) operation, and that X
resides on the first page. The system executes the write operation as follows:
.If the first page is not already in main memory, then the system issues
input (X).
. If this is the write first performed on first page by this transaction, the
system modifies the current page table as follows:
a. It finds an unused page on disk.
b. It deletes the page found in step 2a from the contents of the first
page to the page found in step 2a.
c. It modifies the current page table so that the first entry points to
the page found in step 2a.
. It assigns the value of X j to X in the buffer page.
The shadow page approach stores the shadow page table in nonvolatile
storage, so that the state of the database prior to the execution of the transaction
can be recovered in the event of a crash, or transaction abort. When the
transaction commits, the system writes the current page table to nonvolatile
storage. The current page table then becomes the new shadow page table and the
next transaction is allowed to begin execution.
To commit a transaction, do the following:
K.S.GOBINATH M.E., (MBA)., (AP/CSE), SBM CET CS6302 –DBMS STUDY MATERIAL
1. All buffer pages in main memory that have been changed by the
transaction should be output to disk.
2. Output the current page table to disk.
3. Output the disk address of the current page table to the fixed location
in stable storage containing the address of the shadow page table.
Therefore, the current page table has become the shadow page table
and the transaction is committed.
If a crash occurs prior to the completion of step 3, we revert to the state just
before the execution of the transaction. If the crash occurs after the completion of
step 3, the effects of the transaction will be preserved.
Advantage
1) Shadow paging requires fewer disk accesses than do the long based
recovery
2) The overhead of log record output is eliminated.
3) Recovery from crashes is significantly faster as no undo or redo
operation is required.

7. How can you implement atomicity in transactions? Explain.

* Atomicity: Transactions are atomic.

Consider the following example


Transaction to transfer $50 from account
A to account B:
read(A)
A := A – 50 write(A) read(B)
B := B + 50
write(B)
read(X), which transfers the data item X from the database to a local
buffer belonging to the transaction that executed the read operation.
write(X), which transfers the data item X from the local buffer of the
transaction that executed the write back to the database.

K.S.GOBINATH M.E., (MBA)., (AP/CSE), SBM CET CS6302 –DBMS STUDY MATERIAL
Before the execution of transaction Ti the values of accounts A and
B are $1000 and $2000, respectively.
Suppose if the transaction fails due to some power failure, hardware
failure and system error the transaction Ti will not execute successfully.
If the failure happens after the write(A) operation but before the write(B)
operation. The database will have values $950 and $2000 which results in
a failure.
The system destroys $50 as a result of failure and leads the system to
inconsistent state.
The basic idea of atomicity is: The database system keeps track of the
old values of any data on which a transaction performs a write, if the
transaction does not terminate successfully then the database system
restores the old values.
Atomicity is handled by transaction-management component.

9. How concurrency is performed? Explain the protocol that is used to


maintain the concurrency concept.

Concurrency Control

In a multiprogramming environment where more than one transactions can


be concurrently executed, there exists a need of protocols to control the
concurrency of transaction to ensure atomicity and isolation properties of
transactions.

Concurrency control protocols, which ensure serializability of transactions,


are most desirable. Concurrency control protocols can be broadly divided into
two categories:

 Lock based protocols

K.S.GOBINATH M.E., (MBA)., (AP/CSE), SBM CET CS6302 –DBMS STUDY MATERIAL
 Time stamp based protocols

Lock based protocols

Database systems, which are equipped with lock-based protocols, use


mechanism by which any transaction cannot read or write data until it acquires
appropriate lock on it first. Locks are of two kinds:

 Binary Locks: a lock on data item can be in two states; it is either locked
or unlocked.
 Shared/exclusive: this type of locking mechanism differentiates lock based
on their uses. If a lock is acquired on a data item to perform a write
operation, it is exclusive lock. Because allowing more than one transactions
to write on same data item would lead the database into an inconsistent
state. Read locks are shared because no data value is being changed.

There are four types lock protocols available:

 Simplistic

Simplistic lock based protocols allow transaction to obtain lock on


every object before 'write' operation is performed. As soon as 'write' has
been done, transactions may unlock the data item.

Pre-claiming

In this protocol, a transactions evaluations its operations and creates a


list of data items on which it needs locks. Before starting the execution,

K.S.GOBINATH M.E., (MBA)., (AP/CSE), SBM CET CS6302 –DBMS STUDY MATERIAL
transaction requests the system for all locks it needs beforehand. If all the
locks are granted, the transaction executes and releases all the locks when
all its operations are over. Else if all the locks are not granted, the
transaction rolls back and waits until all locks are granted.

Two Phase Locking - 2PL

This locking protocol is divides transaction execution phase into three


parts. In the first part, when transaction starts executing, transaction seeks grant
for locks it needs as it executes. Second part is where the transaction acquires all
locks and no other lock is required. Transaction keeps executing its operation. As
soon as the transaction releases its first lock, the third phase starts. In this phase a
transaction cannot demand for any lock but only releases the acquired locks.

Two phase locking has two phases, one is growing; where all locks are
being acquired by transaction and second one is shrinking, where locks held by
the transaction are being released.

To claim an exclusive (write) lock, a transaction must first acquire a shared


(read) lock and then upgrade it to exclusive lock.

Time stamp based protocols

The most commonly used concurrency protocol is time-stamp based


protocol. This protocol uses either system time or logical counter to be used as a
time-stamp.

Lock based protocols manage the order between conflicting pairs among
transaction at the time of execution whereas time-stamp based protocols start
working as soon as transaction is created.

K.S.GOBINATH M.E., (MBA)., (AP/CSE), SBM CET CS6302 –DBMS STUDY MATERIAL
Every transaction has a time-stamp associated with it and the ordering is
determined by the age of the transaction. A transaction created at 0002 clock
time would be older than all other transaction, which come after it. For example,
any transaction 'y' entering the system at 0004 is two seconds younger and
priority may be given to the older one.

In addition, every data item is given the latest read and write-timestamp. This lets
the system know, when last read was and write operation made on the data item

Time-stamp ordering protocol

The timestamp-ordering protocol ensures serializability among transaction


in their conflicting read and writes operations. This is the responsibility of the
protocol system that the conflicting pair of tasks should be executed according to
the timestamp values of the transactions.

 Time-stamp of Transaction Ti is denoted as TS(Ti).


 Read time-stamp of data-item X is denoted by R-timestamp(X).
 Write time-stamp of data-item X is denoted by W-timestamp(X).

Timestamp ordering protocol works as follows:

 If a transaction Ti issues read(X) operation:


o If TS(Ti) < W-timestamp(X)
 Operation rejected.
o If TS(Ti) >= W-timestamp(X)
 Operation executed.
o All data-item Timestamps updated.

 If a transaction Ti issues write(X) operation:


o If TS(Ti) < R-timestamp(X)
K.S.GOBINATH M.E., (MBA)., (AP/CSE), SBM CET CS6302 –DBMS STUDY MATERIAL
 Operation rejected.
o If TS(Ti) < W-timestamp(X)
 Operation rejected and Ti rolled back.
o Otherwise, operation executed.

UNIT IV

1. What is an index?
An index is a data structure which enables a query to run at a sublinear-time.
Instead of having to go through all records one by one to identify those which
match its criteria the query uses the index to filter out those which don't and
focus on those who do.

2. What is called remapping of bad sectors?


During the test phases of a hard disk at the factory, the platters are scanned
and all the bad sectors are mapped out-into a table or list usually called ‘primary
defect list’. The primary defect list is stored in within the firmware zone, or in
some cases the ROM of a hard disk

3. What is a block and a block number?


A block is a contiguous sequence of sectors from a single track of one
platter. Each request specifies the address on the risk to be referenced. That
address is in the form of a block number

4. What is called mirroring?


The simplest approach to introducing redundancy is to duplicate every disk. This
technique is called mirroring or shadowing.

5. What is called bit and block -level striping?


Data striping consists of splitting the bits of each byte across multiple disks.
This is called bit level striping
Block level striping stripes block across multiple disks. It treats the array of
disks as a large disk and gives blocks logical numbers.

6. What are the two main goals of parallelism?


1. Load balance multiple small accesses to increase throughput
2. Parallelize large accesses to reduce response time.
K.S.GOBINATH M.E., (MBA)., (AP/CSE), SBM CET CS6302 –DBMS STUDY MATERIAL
7. What are the factors to be taken into account when choosing a RAID
level?
 Monetary cost
 Performance: Number of I/O operations per second, and bandwidth during
normal operation
 Performance during failure
 Performance during rebuild of failed disk
 Including time taken to rebuild failed disk

8. What are the ways in which the variable-length records arise in database
systems?
 Storage of multiple record types in a file. Record types that allow variable
lengths for one or more fields.
 Record types that allow repeating fields (used in some older data models).
 Byte string representation
 Attach an end-of-record () control character to the end of each record.
 Difficulty with deletion.
 Difficulty with growth.
 Variable-Length Records: Slotted Page Structure

9. What is known as heap, sequential and hashing file organization?


Heap – a record can be placed anywhere in the file where there is space
Sequential – store records in sequential order, based on the value of the search
key of each record.
Hashing – a hash function computed on some attribute of each record; the result
specifies in which block of the file the record should be placed.

10. What are the techniques to be evaluated for both ordered indexing and
hashing?
Ordered indices: search keys are stored in sorted order.
Hash indices: search keys are distributed uniformly across ―bucketsǁ using a
―hash functionǁ.

11. Describe flash memory.

 Data survives power failure


 Data can be written at a location only once, but location can be erased and
written to again.
K.S.GOBINATH M.E., (MBA)., (AP/CSE), SBM CET CS6302 –DBMS STUDY MATERIAL
 Can support only a limited number of write/erase cycles.
 Erasing of memory has to be done to an entire bank of memory.
 Reads are roughly as fast as main memory.
 Widely used in embedded devices such as digital cameras. Also known as
EEPROM

12. List out the physical storage media.


Volatile storage: loses contents when power is switched off Non-volatile storage:
Contents persist even when power is switched off.

13. How does B-tree differ from a B+ - tree? Why is a B+ - tree usually
preferred as an access structure to a data file?
Advantage of B+-tree index files: automatically reorganizes itself with small,
local, changes, in the face of insertions and deletions. Reorganization of entire
file is not required to maintain performance
Advantages of B+-trees outweigh disadvantages, and they are used extensively

14. What are the types of transparencies that a distributed database must
support? Why?
Fragmentation transparency
Replication transparency
Location transparency

15. Give the measures of the quality of a disk.


 Capacity
 Access time
 Seek time
 Data transfer rate
 Reliability
 Rotational latency time.

16. What are the two types of ordered indices?


 Primary index
 Secondary index

17. What are structured data types? What are collection types in particular?

18. State the advantages of distributed systems.


Data Replication
 Availability
K.S.GOBINATH M.E., (MBA)., (AP/CSE), SBM CET CS6302 –DBMS STUDY MATERIAL
 Parallelism
 Reduced data transfer
Data Fragmentation
 Increased cost of updates
 Increased complexity of concurrency control

19. What is Data Warehousing?


 Large organizations have complex internal organizations, and have data
stored at different locations, on different operational (transaction
processing) systems, under different schemas
 Data sources often store only current data, not historical data
 Corporate decision making requires a unified view of all organizational
data, including historical data
 A data warehouse is a repository (archive) of information gathered from
multiple sources, stored under a unified schema, at a single site.
 Greatly simplifies querying, permits study of historical trends
 Shifts decision support query load away from transaction processing
systems.

20. What is data mining?


 Data mining is the process of semi-automatically analyzing large databases
to find useful patterns.
 Differs from machine learning in that it deals with large volumes of data
stored primarily on disk
 Pre-processing of data, choice of which type of pattern to find, post
processing to find novel patterns

16/10/8 Marks Questions

1. What is RAID? Explain in detail.

RAID: Redundant Arrays of Independent Disks


a. disk organization techniques that manage a large numbers of disks, providing a view of a single disk
of
i. high capacity and high speed by using multiple disks in parallel, and
ii. high reliability by storing data redundantly, so that data can be recovered even if a disk fails
» The chance that some disk out of a set of N disks will fail is much higher than the chance that
a specific single disk will fail.
» Originally a cost-effective alternative to large, expensive disks
RAID Levels
K.S.GOBINATH M.E., (MBA)., (AP/CSE), SBM CET CS6302 –DBMS STUDY MATERIAL
» Schemes to provide redundancy at lower cost by using disk striping combined with parity bits

a. Different RAID organizations, or RAID levels, have differing cost, performance


and reliability characteristics
» RAID Level 0: Block striping; non-redundant.
a. Used in high-performance applications where data lost is not critical.

» RAID Level 1: Mirrored disks with block


striping a.Offers best write performance.
b. Popular for applications such as storing log files in a database system.

» RAID Level 2: Memory-Style Error-Correcting-Codes (ECC) with bit striping.

» RAID Level 3: Bit-Interleaved Parity


a. a single parity bit is enough for error correction, not just detection, since we know
which disk has failed
i. When writing data, corresponding parity bits must also be computed and written to
a parity bit disk
ii. To recover data in a damaged disk, compute XOR of bits from other disks
(including parity bit disk)

b. Faster data transfer than with a single disk, but fewer I/Os per second since every disk has
to participate in every I/O.
c. Subsumes Level 2 (provides all its benefits, at lower cost).

K.S.GOBINATH M.E., (MBA)., (AP/CSE), SBM CET CS6302 –DBMS STUDY MATERIAL
» RAID Level 4: Block-Interleaved Parity; uses block-level striping, and keeps a parity block on
a separate disk for corresponding blocks from N other disks.
a. Provides higher I/O rates for independent block reads than Level 3
i. block read goes to a single disk, so blocks stored on different disks can be read
in parallel
b. Provides high transfer rates for reads of multiple blocks than no-striping
c. Before writing a block, parity data must be computed
1. More efficient for writing large amounts of data sequentially

» RAID Level 5: Block-Interleaved Distributed Parity; partitions data and parity among all N +
1 disks, rather than storing data in N disks and parity in 1 disk.
a. E.g., with 5 disks, parity block for nth set of blocks is stored on disk (n mod 5) + 1, with
the data blocks stored on the other 4 disks.
b. Higher I/O rates than Level 4.
i. Block writes occur in parallel if the blocks and their parity blocks are on different
disks.
c. Subsumes Level 4: provides same benefits, but avoids bottleneck of parity disk.

» RAID Level 6: P+Q Redundancy scheme; similar to Level 5, but stores extra redundant
information to guard against multiple disk failures.
a. Better reliability than Level 5 at a higher cost; not used as widely.

Choice of RAID Level

» Factors in choosing RAID level a. Monetary cost


b. Performance: Number of I/O operations per second, and bandwidth during normal
operation c. Performance during failure
d. Performance during rebuild of failed disk

K.S.GOBINATH M.E., (MBA)., (AP/CSE), SBM CET CS6302 –DBMS STUDY MATERIAL
i. Including time taken to rebuild failed disk
» RAID 0 is used only when data safety is not important
a. E.g. data can be recovered quickly from other sources
» Level 2 and 4 never used since they are subsumed by 3 and 5
» Level 3 is not used anymore since bit-striping forces single block reads to access all disks,
wasting disk arm movement, which block striping (level 5) avoids
» Level 6 is rarely used since levels 1 and 5 offer adequate safety for almost all applications
» So competition is between 1 and 5 only
» Level 1 provides much better write performance than level 5
» Level 1 had higher storage cost than level 5
» Level 5 is preferred for applications with low update rate, and large amounts of data
» Level 1 is preferred for all other applications.

2. Describe static hashing and dynamic hashing.

3. Describe the different types of file organization? Explain using a sketch of


each of them with their advantages and disadvantages.

» Heap – a record can be placed anywhere in the file where there is space

» Sequential – store records in sequential order, based on the value of the search key of each record

» Hashing – a hash function computed on some attribute of each record; the result specifies in
which block of the file the record should be placed
» Records of each relation may be stored in a separate file. In a cl ust ering file organization
records of several different relations can be stored in the same file
o Motivation: store related records on the same block to minimize I/O

Sequential File Organization


» Suitable for applications that require sequential processing of the entire file
» The records in the file are ordered by a search-key

K.S.GOBINATH M.E., (MBA)., (AP/CSE), SBM CET CS6302 –DBMS STUDY MATERIAL
Figure Sequential file for account records
Clustering File Organization :
» Simple file structure stores each relation in a separate file
» Can instead store several relations in one file using a clustering file organization

Example: Consider two relation


The depositor Relation The customer Relation

» E.g., clustering organization of customer and depositor:

o good for queries involving depositor customer, and for queries involving one single customer
and his accounts
o bad for queries involving only customer
o results in variable size records
Clustering File Structure with Pointer Chains

K.S.GOBINATH M.E., (MBA)., (AP/CSE), SBM CET CS6302 –DBMS STUDY MATERIAL
FigureClustering file structure with pointer
chains

4. Briefly write the overall process of data ware housing.

Data Warehouse:

 Large organizations have complex internal organizations, and have data stored at
different locations, on different operational (transaction processing) systems, under
different schemas.
 Data sources often store only current data, not historical data  Corporate decision
making requires a unified view of all organizational data, including historical data
 A data warehouse is a repository (archive) of information gathered from
multiple sources, stored under a unified schema, at a single site
 Greatly simplifies querying, permits study of historical trends
 Shifts decision support query load away from transaction processing systems

Components of a Datawarehouse

 When and how to gather data


 Source driven architecture: data sources transmit new information to warehouse, either
continuously or periodically (e.g. at night)
 Destination driven architecture: warehouse periodically requests new information
from data sources
 Keeping warehouse exactly synchronized with data sources (e.g. using two-phase
commit) is too expensive

 Usually OK to have slightly out-of-date data at warehouse


 Data/updates are periodically downloaded form online transaction processing (OLTP)
system. What schema to use
 Schema integration Data cleansing
K.S.GOBINATH M.E., (MBA)., (AP/CSE), SBM CET CS6302 –DBMS STUDY MATERIAL
 E.g. correct mistakes in addresses
 E.g. misspellings, zip code errors Merge address lists from different sources and purge
duplicates
 Keep only one address record per household (―householdingǁ) How to propagate
updates
 Warehouse schema may be a (materialized) view of schema from data sources
 Efficient techniques for update of materialized views What data to summarize
 Raw data may be too large to store on-line
 Aggregate values (totals/subtotals) often suffice
 Queries on raw data can often be transformed by query optimizer to use aggregate
values.

WAREHOUSE SCHEMA

 Typically warehouse data is multidimensional, with very large fact tables


 Examples of dimensions: item-id, date/time of sale, store where sale was made,
customer identifier
 Examples of measures: number of items sold, price of items Dimension values are
usually encoded using small integers and mapped to full values via dimension tables
Resultant schema is called a star schema More complicated schema structures
 Snowflake schema: multiple levels of dimension tables
 Constellation: multiple fact tables

5. Illustrate the issues to implement distributed databases.


Definition :
 A distributed database is a database in which portions of the database
are stored on multiple computers within a network.
 Users have access to the portion of the database at their location so that
they can access the data relevant to their tasks without interfering with
the work of others.
K.S.GOBINATH M.E., (MBA)., (AP/CSE), SBM CET CS6302 –DBMS STUDY MATERIAL
Concepts of distributed database:

Distributed Computing System


 Consists of a number of processing elements interconnected by a
computer network that cooperate in processing certain tasks
Distributed Database
 Collection of logically interrelated databases over a computer
network
Distributed DBMS
 Software system that manages a distributed DB

Design of distributed database:

 Fragmentation: Breaking up the database into logical units called


fragments and assigned for storage at various sites.
 Data replication: The process of storing fragments in more than one
site
 Data Allocation: The process of assigning a particular fragment to a
particular site in a distributed system.
 The information concerning the data fragmentation, allocation and
replication is stored in a global directory.

Issues on distributed database

Distributed networks have disadvantages and this must be consider before a system
is decentralized. They include:
COMPLEXITY: Distributed data bases that hide the distributed nature from the
user and provide acceptable level of performance, reliability and availability are more
complex than centralized DBMSs. The date replication, failure recovery, network
management, etc. make the system more complex
COST: Increased complexity means increased man power (skilled professionals)
requirements and complex and costly hardware and high procurement and
maintenance costs. Since a DBMS needs more people and more hardware-both of
which are costly, running and maintaining the system can be more expensive than
the centralized system.

K.S.GOBINATH M.E., (MBA)., (AP/CSE), SBM CET CS6302 –DBMS STUDY MATERIAL
TECHNICAL PROBLEMS OF CONNECTING DISSIMILAR MACHINES:
Technical problems can sometimes be overwhelming for a distributed system.
Additional layers of operating system software are needed to translate and
coordinate the flow of data between machines. Sometimes a link between main
frames and micro computers can be difficult to establish.
NEED FOR SOPHISTICATED COMMUNICATION SYSTEM: Distributed
processing required the development of a data communication system. This system
can be costly to develop and use. In addition, in their maintenance can be costly
affair.
DATA INTEGRETY AND SECURITY PROBLEMS: Because data maintained
by distributed system can be accessed at many location in the network, controlling
the integrity of a data base can be difficult.
LACK OF PROFESSIONAL SUPPORT: Finally, distributed computers are often
placed location where little or no data processing support is available. Consiquently,
they will be run by nonprofessionals .Another aspects is that the communicaton
system also require highly trained personal for their maintenance.

6. Describe the structure of B+ tree and give the algorithm for search in the
B+ tree with example.

TREE INDEXES:
In the case of index sequential file the performance degrades as the file grows. This is
because, the index lookups the sequential scans take more time as more records are
there in the file. Although this performance de re-gradation can be overcome (to a
certain extend) by reorganizing the file, frequent file reorganizations are undesirable
and will add to the file maintenance overheads. One of the index structures that
maintain its efficiency even with the insertion and deletion of data is the B+ -tree
index structure. AB+ - tree index takes the form of a balanced tree in which every
path from the root to the tree leaf is of the same length. Each non-leaf node in the
tree has between ‘n/2’ and ‘n’ children (n/2<=c <=n) , where ‘n’ is fixed for a
particular tree.

The B+ -tree structure creates performance overhead on insertion and deletion


and adds space overhead. This performance overhead is acceptable even for files
with high modification frequency, because the cost of file reorganization is
eliminated. Also there will be some amount of wasted space as some nodes will be
K.S.GOBINATH M.E., (MBA)., (AP/CSE), SBM CET CS6302 –DBMS STUDY MATERIAL
half empty (nodes with ‘n/2’ children) . These space overhead also is acceptable
when we consider the performance benefits of the B+ - tree structure. Figure shows a
node of B+ - tree index. It contains up to n-1 search key values k1,k2,…….kn-1, and an
n pointers p1,p2,…….pn. The search key values within a node are kept in sorter order.
P1 K1 P2 K2 ….. Pn-1 Kn-1 pn

The pointers (p1,p2,…..pn-1) points either to a file record with search key values
(k1,k2,……..kn-1) or to a bucket of pointers each of which points to a file records with
search key does not form a primary key and if the files is not sorted in the search key
does not form a primary key and if the file is not sorted in the search key order .
Pointer pn is used to chain to gether(as shown in the figure) the nodes on the search
key order the allowing efficient sequential processing of the file.

7. What are the types of Knowledge discovered data mining? Explain with
suitable example.

Data Mining

 Broadly speaking, data mining is the process of semi-automatically analyzing large


databases to find useful patterns.
 Like knowledge discovery in artificial intelligence data mining discovers statistical
rules and patterns
 Differs from machine learning in that it deals with large volumes of data stored
primarily on disk.
 Some types of knowledge discovered from a database can be represented by a set of
rules. e.g.,: ―Young women with annual incomes greater than $50,000 are most likely
to buy sports cars‖.
 Other types of knowledge represented by equations, or by prediction functions. 
Some manual intervention is usually required
 Pre-processing of data, choice of which type of pattern to find, postprocessing to find
novel patterns

Applications of Data Mining

 Prediction based on past history


 Predict if a credit card applicant poses a good credit risk, based on some attributes
(income, job type, age, ..) and past history
 Predict if a customer is likely to switch brand loyalty
K.S.GOBINATH M.E., (MBA)., (AP/CSE), SBM CET CS6302 –DBMS STUDY MATERIAL
 Predict if a customer is likely to respond to ―junk mailǁ
 Predict if a pattern of phone calling card usage is likely to be fraudulent
 Some examples of prediction mechanisms:

Classification
 Given a training set consisting of items belonging to different classes, and a new item
whose class is unknown, predict which class it belongs to.

Regression
 formulae
 Given a set of parameter-value to function-result mappings for an unknown function,
predict the function-result for a new parameter-value

Descriptive Patterns Associations


 Find books that are often bought by the same customers. If a new customer buys one
such book, suggest that he buys the others too.
 Other similar applications: camera accessories, clothes, etc. Associations may also be
used as a first step in detecting
causation
 E.g. association between exposure to chemical X and cancer, or new medicine and
cardiac problems
Clusters
E.g. typhoid cases were clustered in an area surrounding a contaminated well
Detection of clusters remains important in detecting epidemics

8. Briefly write the overall process of Multidimensional and Parallel


databases.
MULTIDIMENSIONAL DATABASES

A multidimensional database is a specific type of database that has been optimized for data
warehousing and OLAP (online analytical processing). A multi-dimensional database is
structured by a combination of data from various sources that work amongst databases
simultaneously and that offer networks, hierarchies, arrays, and other data formatting
methods. In a multidimensional database, the data is presented to its users through
multidimensional arrays, and each individual value of data is contained within a cell which
can be accessed by multiple indexes.

A multidimensional database uses the concept of a data cube (also referred to as a


hypercube) to represent the dimensions of data currently available to its user(s). The
multidimensional database concept is designed to assist with decision support systems. This
detailed organization of the data allows for advanced and complex query generation while
providing outstanding performance in certain cases when compared to traditional relational
structures and databases. This type of database is usually structured in an order that
optimizes OLAP and data warehouse applications.
Fig. Multidimensional Database
K.S.GOBINATH M.E., (MBA)., (AP/CSE), SBM CET CS6302 –DBMS STUDY MATERIAL
Dimensions and Members

This section introduces the concepts of outlines, dimensions, and members within a
multidimensional database. If you understand dimensions and members, you are well on
your way to understanding the power of a multidimensional database.
A dimension represents the highest consolidation level in the database outline.

The database outline presents dimensions and members in a tree structure to indicate a
consolidation relationship. . Standard dimensions represent the core components of a
business plan and often relate to departmental functions. Typical standard dimensions: Time,
Accounts, Product Line, Market, and Division. Dimensions change less frequently than
members.
Attribute dimensions are associated with standard dimensions. Members are the individual
components of a dimension. For example, Product A, Product B, and Product C might be
members of the Product dimension. Each member has a unique name. Essbase can store the
data associated with a member (referred to as a stored member in this chapter), or it can
dynamically calculate the data when a user retrieves it.

PARALLEL DATABASES

Data can be partitioned across multiple disks for parallel I/O. Individual relational operations
(e.g., sort, join, aggregation) can be executed in parallel data can be partitioned and each
processor can work independently on its own partition.
Queries are expressed in high level language (SQL, translated to relational algebra) makes
parallelization easier. Different queries can be run in parallel with each other. Concurrency
control takes care of conflicts. Thus, databases naturally lend themselves to parallelism.
Reduce the time required to retrieve relations from disk by partitioning the relations on
multiple disks. Horizontal partitioning – tuples of a relation are divided among many disks
such that each tuple resides on one disk. Partitioning techniques (number of disks = n):

Round-robin: Send the ith tuple inserted in the relation to disk i mod n.

Hash partitioning: Choose one or more attributes as the partitioning attributes. Choose hash
function h with range 0…n - 1 Let i denote result of hash function h applied to the
partitioning attribute value of a tuple. Send tuple to disk i.
K.S.GOBINATH M.E., (MBA)., (AP/CSE), SBM CET CS6302 –DBMS STUDY MATERIAL
Range partitioning: Choose an attribute as the partitioning attribute. A partitioning vector
[vo, v1, ..., vn-2] is chosen. Let v be the partitioning attribute value of a tuple. Tuples such
that vi vi+1 go to disk I + 1. Tuples with v < v0 go to disk 0 and tuples with v vn-2 go
to disk n-1. E.g., with a partitioning vector [5,11], a tuple with partitioning attribute value of
2 will go to disk 0, a tuple with value 8 will go to disk 1, while a tuple with value 20 will go
to disk2.

INTERQUERY PARALLELISM Queries/transactions execute in parallel with one


another. Increases transaction throughput; used primarily to scale up a transaction processing
system to support a larger number of transactions per second. Easiest form of parallelism to
support, particularly in a shared-memory parallel database, because even sequential database
systems support concurrent processing. More complicated to implement on shared-disk or
shared-nothing architectures Locking and logging must be coordinated by passing messages
between processors. Data in a local buffer may have been updated at another processor.

Cache-coherency has to be maintained — reads and writes of data in buffer must find latest
version of data.

INTRAQUERY PARALLELISM Execution of a single query in parallel on multiple


processors/disks; important for speeding up long-running queries. Two complementary
forms of intraquery parallelism :

Intraoperation Parallelism – parallelize the execution of each individual operation in the


query.

Interoperation Parallelism – execute the different operations in a query expression in


parallel the first form scales better with increasing parallelism because the number of tuples
processed by each operation is typically more than the number of operations in a query

9. Describe the structure of multimedia databases.


MULTIMEDIA DATABASE

Multimedia databases provide features that allow users to store and query different types of
multimedia information, which includes images (such as photos or drawings), video clips (such as movies,
newsreels, or home videos), audio clips (such as songs, phone messages, or speeches), and documents (such
as books or articles). Themain types of database queries that are needed involve locating multimedia sources
that contain certain objects of interest.
For example, one may want to locate all video clips in a video database that include a certain person,
say Michael Jackson. One may also want to retrieve video clips based on certain activities included in

K.S.GOBINATH M.E., (MBA)., (AP/CSE), SBM CET CS6302 –DBMS STUDY MATERIAL
them, such as video clips where a soccer goal is scored by a certain player or team. The above types of
queries are referred to as content-based retrieval, because the multimedia source is being retrieved based
on its containing certain objects or activities.
Hence, a multimedia database must use some model to organize and index the multimedia sources
based on their contents. Identifying the contents of multimedia sources is a difficult and time-consuming
task. There are two main approaches.
 The first is based on automatic analysis of the multimedia sources to identify certain
mathematical characteristics of their contents. This approach uses different techniques
depending on the type of multimedia source (image, video, audio, or text).
 The second approach depends on manual identification of the objects and activities of
interest in each multimedia source and on using this information to index the sources. This
approach can be applied to all multimedia sources
An image is typically stored either in raw form as a set of pixel or cell values, or in compressed form
to save space. The image shape descriptor describes the geometricshape of the raw image, which is typically
a rectangle of cells of a certain width and height. Hence, each image can be represented by an m by n grid of
cells. Each cell

4.14.1 Automatic Analysis of Images

Analysis of multimedia sources is critical to support any type of query or search interface.We need to
represent multimedia source data such as images in terms of features that would enable us to define
similarity. The work done so far in this area uses low-level visual features such as color, texture, and shape,
which are directlyrelated to the perceptual aspects of image content. These features are easy to extract and
represent, and it is convenient to design similarity measures based on their statistical properties.
o Color is one of the most widely used visual features in content-based image retrieval since it
does not depend upon image size or orientation.
o Retrieval based on color similarity is mainly done by computing a color histogram for each
image that identifies the proportion of pixels within an image for the three color channels
(red, green, blue—RGB).
o However, RGB representation is affected by the orientation of the object with respect to
illumination and camera direction.
o Therefore, current image retrieval techniques compute color histograms using competing
invariant representations such as HSV (hue, saturation, value).

K.S.GOBINATH M.E., (MBA)., (AP/CSE), SBM CET CS6302 –DBMS STUDY MATERIAL
o HSV describes colors as points in a cylinder whose central axis ranges from black at the
bottom to white at the top with neutral colors between them.
o The angle around the axis corresponds to the hue, the distance from the axis corresponds to
the saturation, and the distance along the axis corresponds to the value (brightness).
o Texture refers to the patterns in an image that present the properties of homogeneity that do
not result from the presence of a single color or intensity value.

4.14.2 .Object Recognition in Images:

o Object recognition is the task of identifying real-world objects in an image or a video


sequence. The system must be able to identify the object even when the images of the object
vary in viewpoints, size, scale, or even when they are rotated or translated. Some approaches
have been developed to divide the original image into regions based on similarity of
contiguous pixels.
o Thus, in a given image showing a tiger in the jungle, a tiger subimage may be detected
against the background of the jungle, and when compared with a set of training images, it
may be tagged as a tiger. The representation of the multimedia object in an object model is
extremely important.
o One approach is to divide the image into homogeneous segments using a homogeneous
predicate. For example, in a colored image, adjacent cells that have similar pixel values are
grouped into a segment.
o The homogeneity predicate defines conditions for automatically grouping those cells.
Segmentation and compression can hence identify the main characteristics of an image.
o Another approach finds measurements of the object that are invariant to transformations. It is
impossible to keep a database of examples of all the different transformations of an image.
o To deal with this, object recognition approaches find interesting points (or features) in an
image that are invariant to transformations.

4.14.3 .Semantic tagging of Images:

o The notion of implicit tagging is an important one for image recognition and comparison.
Multiple tags may attach to an image or a subimage: for instance, in the example we referred
to above, tags such as “tiger,” “jungle,” “green,” and “stripes” may be associated with that
image.
o Most image search techniques retrieve images based on user-supplied tags that are often not
very accurate or comprehensive. To improve search quality, a number of recent systems aim
at automated generation of these image tags.

K.S.GOBINATH M.E., (MBA)., (AP/CSE), SBM CET CS6302 –DBMS STUDY MATERIAL
o In case of multimedia data, most of its semantics is present in its content. These systems use
image-processing and statistical-modeling techniques toanalyze image content to generate
accurate annotation tags that can then be used to retrieve images by content. Since different
annotation schemes will use different vocabularies to annotate images, the quality of image
retrieval will be poor.
o To solve this problem, recent research techniques have proposed the use of concept
hierarchies, taxonomies, or ontologies using OWL (Web Ontology Language), in which
terms and their relationships are clearly defined.
o These can be used to infer higherlevel concepts based on tags. Concepts like “sky” and
“grass”may be further divided into “clear sky” and “cloudy sky” or “dry grass” and “green
grass” in such taxonomy.
o These approaches generally come under semantic tagging and can be used in conjunction
with the above feature-analysis and object-identification strategies.

4.14.4 .Analysis of Audio Data Sources

 Audio sources are broadly classified into speech, music, and other audio data. Eachof these
are significantly different from the other, hence different types of audio data are treated
differently.
 Audio data must be digitized before it can be processed and stored. Indexing and retrieval of
audio data is arguably the toughest among all types of media, because like video, it is
continuous in time and does not have easily measurable characteristics such as text.
 Clarity of sound recordings is easy to perceive humanly but is hard to quantify for machine
learning. Interestingly, speech data often uses speech recognition techniques to aid the actual
audio content, as this can make indexing this data a lot easier and more accurate.
 This is sometimes referred to as text-based indexing of audio data. The speech metadata is
typically content dependent, in that the metadata is generated from the audio content, for
example, the length of the speech, the number of speakers, and so on.
 However, some of the metadata might be independent of the actual content, such as the
length of the speech and the format in which the data is stored.
 Music indexing, on the other hand, is done based on the statistical analysis of the audio
signal, also known as content-based indexing. Content-based indexing often makes use of the
key features of sound: intensity, pitch, timbre, and rhythm.
 It is possible to compare different pieces of audio data and retrieve information from them
based on the calculation of certain features, as well as application of certain transforms.

K.S.GOBINATH M.E., (MBA)., (AP/CSE), SBM CET CS6302 –DBMS STUDY MATERIAL
10. Explain the architecture of mobile and web database with neat sketch.

Mobile Computing Architecture

The general architecture of a mobile platform

It is distributed architecture where a number of computers, generally referred to as Fixed


Hosts and Base Stations are interconnected through a high-speed wired network.
 Fixed hosts are general purpose computers configured to manage mobile units.  Base
stations function as gateways to the fixed network for the Mobile Units.

Wireless Communications –
 The wireless medium have bandwidth significantly lower than those of a wired network.
 The current generation of wireless technology has data rates range from the tens to
hundreds of kilobits per second (2G cellular telephony) to tens of megabits per second
(wireless Ethernet, popularly known as WiFi).
 Modern (wired) Ethernet, by comparison, provides data rates on the order of hundreds of
megabits per second.
 The other characteristics distinguish wireless connectivity options:
 interference,
 locality of access,
 range,
 support for packet switching,
 seamless roaming throughout a geographical region.
 Some wireless networks, such as WiFi and Bluetooth, use unlicensed areas of the
frequency spectrum, which may cause interference with other appliances, such as cordless
telephones.
 Modern wireless networks can transfer data in units called packets, that are used in wired
networks in order to conserve bandwidth.

K.S.GOBINATH M.E., (MBA)., (AP/CSE), SBM CET CS6302 –DBMS STUDY MATERIAL
Client/Network Relationships –

 Mobile units can move freely in ageographic mobility domain, an area that is
circumscribed by wireless network coverage.
 To manage entire mobility domain is divided into one or more smallerdomains, called
cells, each of which is supported by at least one base station.
 Mobile units be unrestricted throughout the cells of domain, while maintaining
information access contiguity.
 The communication architecture described earlier is designed to give the mobile unit the
impression that it is attached to a fixed network, emulating a traditional client-server
architecture.
 Wireless communications, however, make other architectures possible.
 In aMANET, co-located mobile units do not need to communicate via a fixed network,
but instead, form their own using cost-effective technologies such as Bluetooth.
 In aMANET, mobile units are responsible for routing their own data, effectively acting as
base stations as well as clients.
 Moreover, they must be robust enough to handle changes in the network topology, such as
the arrival or departure of other mobile units.
 MANET applications can be considered as peer-to-peer, meaning that a mobile unit is
simultaneously a client and a server.
 Transaction processing and data consistency control become more difficult since there is
no central control in this architecture.
 Resource discovery and data routing by mobile units make computing in a MANET even
more complicated.
 Sample MANET applications are multi-user games, shared whiteboard, distributed
calendars, and battle information sharing.

Characteristics of Mobile Environments

 The characteristics of mobile computing include:

 Communication latency
Intermittent connectivity
 Limited battery life
 Changing client location
 The server may not be able to reach a client.
 A client may be unreachable because it is dozing– in an energy-conserving state in which
many subsystems are shut down – or because it is out of range of a base station.
 In either case, neither client nor server can reach the other, and modifications must be
made to the architecture in order to compensate for this case.
 Proxies for unreachable components are added to the architecture.
 For a client (and symmetrically for a server), the proxy can cache updates intended for the
server.
 Mobile computing poses challenges for servers as well as clients.
 The latency involved in wireless communication makes scalability a problem.

K.S.GOBINATH M.E., (MBA)., (AP/CSE), SBM CET CS6302 –DBMS STUDY MATERIAL
 Since latency due to wireless communications increases the time to service each client
request, the server can handle fewer clients.
 One way servers relieve this problem is by broadcasting data whenever possible.  A
server can simply broadcast data periodically.
 Broadcast also reduces the load on the server, as clients do not have to maintain active
connections to it. Client mobility also poses many data management challenges.
 Servers must keep track of client locations in order to efficiently route messages to them.
 Client data should be stored in the network location that minimizes the traffic necessary
toaccess
 The act of moving between cells must be transparent to the client.
 The server must be able to gracefully divert theshipment of data from one base to another,
without the client noticing.
 Client mobility also allows new applications that are location-based.

WEB DATABASES

A web database is a system for storing information that can then be accessed via a website.
For example, an online community may have a database that stores the username, password,
and other details of all its members.

The most commonly used database system for the internet is MySQL due to its integration
with PHP — one of the most widely used server side programming languages.

At its most simple level, a web database is a set of one or more tables that contain data. Each
table has different fields for storing information of various types. These tables can then be
linked together in order to manipulate data in useful or interesting ways. In many cases, a
table will use a primary key, which must be unique for each entry and allows for
unambiguous selection of data.

A web database can be used for a range of different purposes. Each field in a table has to
have a defined data type. For example, numbers, strings, and dates can all be inserted into a
web database. Proper database design involves choosing the correct data type for each field
in order to reduce memory consumption and increase the speed of access. Although for small
databases this often isn't so important, big web databases can grow to millions of entries and
need to be well designed to work effectively.

UNIT V

1. Define mobile database with an example.

K.S.GOBINATH M.E., (MBA)., (AP/CSE), SBM CET CS6302 –DBMS STUDY MATERIAL
A mobile database is a database that resides on mobile device such as a
PDA, a smart phone, or a laptop. Such devices are often limited in resources such
as memory, computing power and battery power

2. List the markup languages which are suitable for web databases.
A web database is a system for storing information that can then be
accessed via a website. For example, an online community may have a database
that stores the username, password, and other details of all its members. The
most commonly used database system for the internet is MySQL due to its
integration with PHP — one of the most widely used server side programming
languages.

3. Write two examples of multimedia databases and multimedia structure.


 Digital library
 Geographic information system

4. Define spatial database.


A spatial database is a database that is optimized to store query data that
represents objects defined in geometric space. Most spatial databases allow
representing simple geometric objects such as points, lines and polygons. Some
spatial databases handle more complex structures such as 3D objects, topological
coverages,etc

5. Differentiate distributed database and normal database.


Distributed System-
A distributed operating system is a software over a collection of
independent, networked, communicating, and physically separate computational
nodes.
Normal Database:
Normal database is a database in which data is stored and maintained in a
single location.

6. How transaction is performed in Object oriented database?


These transactions can access and update the same persistent databases
and/or the same persistent data objects. The DBMS must guarantee the
consistency of the persistent database and the transaction result.
7. What is versioning in terms of object oriented database?
Versioning allows maintaining multiple versions of an object , and
OODBMS provide capabilities for dealing with all versions of the objects .

K.S.GOBINATH M.E., (MBA)., (AP/CSE), SBM CET CS6302 –DBMS STUDY MATERIAL
8. Specify the advantages of Data warehousing.
A data warehouse is a repository (archive) of information gathered from
multiple sources, stored under a unified schema at a single site.
 Greatly simplifies querying, permits study of historical trends
 Shifts decision support query load away from transaction processing
systems

9. How spatial databases are more helpful than active database?


Whereas most databases store and retrieve information numerically, spatial
databases do so in relation to space. One area of expertise which uses spatial
databases store, retrieve, and analyze information is geography.

10. What is deductive database?


A deductive database is a database system that can make deductions (i.e.,
conclude additional facts) based on rules and facts stored in the
(deductive)database. Datalog is the language typically used to specify facts,
rules and queries in deductive databases

11. Briefly explain about applications of data warehousing.

Marketing
Finance
Resource optimization
Image Analysis

11. Define Database Security.


Database security concerns the use of a broad range of information
security controls to protect databases (potentially including the data, the database
applications or stored functions, the database systems, the database servers and
the associated network links) against compromises of their confidentiality,
integrity and availability

12. Illustrate about Data Classification.


It involves various types or categories of controls, such as technical,
procedural / administrative and physical. Database security is a specialist topic
within the broader realms of computer security, information security and risk
management.
K.S.GOBINATH M.E., (MBA)., (AP/CSE), SBM CET CS6302 –DBMS STUDY MATERIAL
13. Define Threats and risks.

 LOSS OF INTEGRITY
 LOSS OF AVAILABILITY
 LOSS OF CONFIDENTIALITY

14. How Database access Control.

A dbms typically induce a database security and authorization subsystem


that is responsible for ensuring the security portions of a database against
un authorized access.

18. Give types of Privileges.


 System privileges
 Object privileges

19. Define Cryptography.


A Dbms can use encryption to protect information in certain situation
where the normal security mechanisms of the Dbms are not adequate.
Example: Hackers may access our data without our permission

16/10/8 Marks Questions

1. Give XML representation of bank management system and also explain


about Document Type Definition and XML schema.

1.Data-centric XML documents:


These documents have many small data items that follow a
specific structure, and hence may be extracted from a structured database. They
are formatted as XML documents in order to exchange them or display them over
the Web.
2.Document-centric XML documents:
K.S.GOBINATH M.E., (MBA)., (AP/CSE), SBM CET CS6302 –DBMS STUDY MATERIAL
These are documents with large amounts of text, such as
news articles or books. There is little or no structured data elements in these
documents.
3.Hybrid XML documents:
These documents may have parts that contains structured
data and other parts that are predominantly textual or unstructured.

STRUCTURE OF XML

Tag: label for a section of data


Element: section of data beginning with <tagname> and ending with matching
</tagname>.Elements must be properly nested.
Proper nesting
<account> … <balance> …. </balance> </account>
Improper nesting
<account> … <balance> …. </account> </balance>
Formally: every start tag must have a unique matching end tag, that is in the
context of the same parent element.Every document must have a single top-level
element

Example
<bank-1> <customer>

<customer_name> Hayes </customer_name>

<customer_street> Main </customer_street>

<customer_city> Harrison </customer_city>

<account> <account_number> A-102 </account_number>

<branch_name>

<balance>

Perryridge </branch_name>

400 </balance>
K.S.GOBINATH M.E., (MBA)., (AP/CSE), SBM CET CS6302 –DBMS STUDY MATERIAL
</account>

<account> …

</account>

</customer>

</bank-1>

2)Data Warehousing and Mining


Data sources often store only current data, not historical data.
Corporate decision making requires a unified view of all
organizational data, including historical data. A data warehouse is a
repository (archive) of information gathered from multiple sources,
stored under a unified schema, at a single site. Greatly simplifies
querying , permits study of historical trends Shifts decision support
query load away from transaction processing systems.

Data source 1

Data
loaders

Data source 2

:
DBMS
Query and analysis tools

Data source n Data warehouse

K.S.GOBINATH M.E., (MBA)., (AP/CSE), SBM CET CS6302 –DBMS STUDY MATERIAL
DESIGN ISSUES
When and how to gather data source driven architecture: data sources
transmit new information to Warehouse, either continuously or periodically (e.g.
at night). Destination driven architecture: warehouse periodically requests new
information from data sources. Keeping warehouse exactly synchronized with
data sources (e.g. using two-phase commit) is too expensive.
What schema to use
Schema integration
Data cleansing
E.g. correct mistakes in addresses (misspellings, zip errors)
Merge address lists from different sources and purge duplicates
How to propagate updates
Warehouse schema may be a (materialized) view of schema from data sources
What data to summarize
Raw data may be too large to store on-line Aggregate values (totals/subtotals)
often suffice Queries on raw data can often be transformed by query optimizer
to use aggregate values
Dimension values are usually encoded using small integers and mapped to full
values via dimension tables. Resultant schema is called a star schema. More
complicated schema structures

 Snowflake schema: multiple levels of dimension tables


K.S.GOBINATH M.E., (MBA)., (AP/CSE), SBM CET CS6302 –DBMS STUDY MATERIAL
 Constellation: multiple fact tables

DATA MINING
Data mining is the process of semi-automatically analyzing large database to
find useful patterns. Predict if a credit card applicant poses a good credit risk,
based on some attributes (income, job type, age, ..) and past history. Predict if a
pattern of phone calling card usage is likely to be fraudulent.

3)Information Retrieval

Definition:
Information Retrieval is a problem-oriented discipline, concerned
with the problem of the effective and efficient transfer of desired information
between human generator and human user.
Components of IR:
Three major components
1. Document Subsystem
a) Acquisition
b) Representation
c) File Organization
2. User Subsystem
a) Problem
b) Representation
c) Query
3. Searching / Retrieval Subsystem
a) Matching
b) Retrieved Object

K.S.GOBINATH M.E., (MBA)., (AP/CSE), SBM CET CS6302 –DBMS STUDY MATERIAL
Traditional IR System

System User

Acquisition Problem

Representation Representation

File Organization Query

Matching

Retrieved Object

Crawling
Overview
 A Web Crawler is software for downloading pages from the web.
 Also known as Web Spider , Web Robot or simply Bot.
K.S.GOBINATH M.E., (MBA)., (AP/CSE), SBM CET CS6302 –DBMS STUDY MATERIAL
 The crawler starts downloading a set of seed pages that are parsed
and scanned for new link.
Features a crawler must provide
Robustness
The web contains servers that create spider traps, which are
generators of web pages that mislead crawlers into getting stuck
fetching an infinite number of pages in a particular domain. Crawlers
must be designed to be resilient to such traps.
Politeness
Web servers have both implicit and explicit policies regulating the
rate at which a crawler can visit them. These politeness policies must
be respected.

Features a Crawler should provide


Distributer
The crawler should have the ability to execute in a distribute fashion
across multiple machine.
Scalable
The crawler architecture should permit scaling up the crawler rate by
adding extra machines and bandwidth.
Performance and Efficiency
The crawl system should make efficient use of various system
resources including processor, storage and network bandwidth.
Quality
Given that a significant fraction of all web pages are poor utility for
serving user query needs, the crawler should be biased towards fetching “useful”
pages first.

K.S.GOBINATH M.E., (MBA)., (AP/CSE), SBM CET CS6302 –DBMS STUDY MATERIAL
Freshness
In many applications, the crawler should operate in continuous mode.
It should obtain fresh copies of previously fetched pages. A search engine
crawler, for instance, can thus ensure that the search engine’s index contains a
fairly current representation of each indexed web page.
Extensible
Crawlers should be designed to be extensible in many ways – to cope
with new data formats, new fetch protocol, and so on. This demands that the
crawler architecture be modulated.

4)Data Classification
Database security concerns the use of a broad range of information
security controls to protect database (potentially including the data, the database
applications or stored functions, the database systems, the database servers and
the associated networks links) against compromises of their confidentiality,
integrity and availability.
It involves the various types or categories of controls, such as technical,
procedural/administrative and physical. Database security is a specialist topic
within the broader realms of computer security, information security, and risk
management. Security risks to database systems include, for example:
 Unauthorized or unintended activity or misuse by authorized database
users, database administrators, or network/systems managers, or by
unauthorized users or hackers (e.g. inappropriate access to sensitive
data, metadata or functions within databases, or inappropriate changes
to the database programs, structures or security configurations);

 Malware infections causing incidents such as unauthorized access,


leakage or disclosure of personal or proprietary data, deletion of or
damage to the data or programs, interruption or denial of authorized
access to the database, attacks on other systems and the unanticipated
failure of database services;

K.S.GOBINATH M.E., (MBA)., (AP/CSE), SBM CET CS6302 –DBMS STUDY MATERIAL
 Overloads, performance constraints and capacity issues resulting in the
inability of authorized users to use databases as intended;

 Physical damage to database servers caused by computer room fires or


floods, overheating, lightning, accidental liquid spills, static discharge,
electronic breakdowns/equipment failures and obsolescence;

 Design flaws and programming bugs in database and the associated


programs and systems, creating various security vulnerabilities (e.g.
unauthorized privilege escalation), data loss/corruption, performance
degradation etc.;
 Data corruption and /or loss caused by the entry of invalid data or
commands, mistakes in database or system administration processes,
sabotage/criminal damage etc.

Types of security
 Legal and ethical issues
 Policy issues
 System-related issues

Threats and risks


Types of threats to database security
1. Privilege abuse: When database users are provided with privileges that
exceeds their day-to-day job requirement, these privileges may be abused
intentionally or unintentionally.
2. Operating System vulnerabilities: Vulnerabilities in underlying operating
systems like Windows, UNIX, Linux, etc.
3. Database rootkits: A databse rootkit is a program or a procedure that is
hidden inside the database and that provides administrator-level privileges to
gain access to the data in the database. These rootkits may even turn off alerts
triggered by Instruction Prevention Systems (IPS).

K.S.GOBINATH M.E., (MBA)., (AP/CSE), SBM CET CS6302 –DBMS STUDY MATERIAL
4. Weak authentication: Weak authentication models allow attackers to employ
strategies such as social engineering and brute force to obtain database login
credentials and assume the identity of legitimate database users.
5. Weak audit trails: A weak audit logging mechanism in a database server
represents a critical risk to organization especially in retail, financial healthcare,
and other industries with stringent regulatory compliance.

5)Cryptography
A DBMS can use to protect information in certain situations where the
normal security mechanisms of the DBMS are not adequate. For example,
hackers may access our data without our permission.

Plaintext Intruder Plaintext


Encryption Description
method method

Cipher text

Encryption key Description


key

K.S.GOBINATH M.E., (MBA)., (AP/CSE), SBM CET CS6302 –DBMS STUDY MATERIAL
In encryption, the message to be encrypted is known as plaintext. The Plaintext is
transformed by a function that is parameterized by a key. The output of the
encryption is known as the cipher text.
Ciphertext is then transmitted over the network. The process of converting the
plaintext to ciphertext is called as Encryption and process of converting the
ciphertext to plaintext is called as Decryption.
Techniques used for Encryption: There are following techniques used for
encryption process: * Substitution Ciphers *Transportation Ciphers
Substitution Ciphers: In a substitution cipher each letter or group of letter is
replaced by another letter or group of letters to mask them For example: a is
replaced with D, b is replaced with E, c with F and z with C. In this way attack
becomes DWWDFN. The substitution ciphers are not much secure because
intruder can easily guess the substitution characters.
Transportation Ciphers: Substitution ciphers preserve the order of the plaintext
symbols but mask them-;-The transportation cipher in contrast reorders the letters
but do not mask them. For this process a key is used. For example: iliveinqadian
may be coded as divienaniqnli. The transportation ciphers are more secure as
compared to substitution ciphers.
Data Encryption Standards (DES): It uses both a substitution of characters
and a rearrangement of their order on the basis of an encryption key. The main
weakness of this approach is that authorized users must be told the encryption
key, and the mechanism for communicating this transformation is vulnerable to
clever intruders.
Public Key Encryption: Each authorized user has a public encryption key,
known to everyone and a private decryption key (used by the decryption
algorithm), chosen by the user and known only to him or her.
Disadvantages of encryption:
There are following problems of Encryption:

 Key management (i.e. keeping keys secret) is a problem. Even in


public-key encryption the decryption key must be kept secret.

K.S.GOBINATH M.E., (MBA)., (AP/CSE), SBM CET CS6302 –DBMS STUDY MATERIAL
 Even in a system that supports encryption, data must often be
processed in plaintext form. Thus sensitive data may still be accessible
to transaction programs.

K.S.GOBINATH M.E., (MBA)., (AP/CSE), SBM CET CS6302 –DBMS STUDY MATERIAL

You might also like