Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Rdbms Unit I

Download as pdf or txt
Download as pdf or txt
You are on page 1of 35

hundreds of gigabytes of data and thousands of users are common in current corporate databases DBMS

support becomes indispensable. UNIT I


INTRODUCTION to DBMS
What is Data ?

Data is stored raw facts ( or real world facts) that can be processed for any computing machine.

Data is collection of facts, which is in unorganized but they can be organized into useful form. Data may be
numerical data which may be integers or floating point numbers and non-numerical data such as characters, date
and etc., Data is of two types :

Raw data : it is a data which are collected from different sources and has no meaning.
Derived data : it is a data that are extracted from raw data and used for getting useful information.

Example: The above numbers may be anything: It may be distance in kms or amount in rupees or no of days
or marks in each subject etc.,

Information:

Information is data that has been converted into more useful or intelligible form. Example is Student
Mark Sheet.

Information is RELATED DATA. The data (information) which is used by an organization – a college, a
library, a bank, a manufacturing company – is one of its most valuable resources.

Knowledge:

Human mind purposefully organized the information and evaluate it to produce knowledge.

Example: 238 is a data,


Marks of student is information and
The hard work require to get mark is knowledge.
1. Fact based Knowledge :
It is knowledge gain from fundamental and through experiment.
The result is guaranteed.

2. Heuristic based Knowledge:


It is the knowledge of good practice and good judgment like hypothesis.
The result is not guaranteed.

Database :

Databases and database systems have become an essential component of everyday life in modern
society. In the course of a day, most of us encounter several activities that involve some interaction
with a database.
For example, if we go to the bank to deposit or withdraw funds, if we make a hotel or airline
reservation, if we access a computerized library catalog to search for a bibliographic item, or if we buy
some item-such as a book, toy, or computer-from an Internet vendor through its Web page, chances are
that our activities will involve someone or some computer program accessing a database. Even
purchasing items from a supermarket nowadays in many cases involves an automatic update of the
database that keeps the inventory of supermarket items.
• These interactions are examples of what we may call traditional database applications,
in which most of the information that is stored and accessed is either textual or numeric.
• In the past few years, advances in technology have been leading to exciting new
applications of database systems. Multimedia databases can now store pictures, video
clips, and sound messages.
• Geographic information systems (CIS) can store and analyze maps, weather data, and
satellite images.
• Data warehouses and online analytical processing (OLAP) systems are used in many
companies to extract and analyze useful information from very large databases for
decision making.
• Real-time and active database technology is used in controlling industrial and
manufacturing processes.
• And database search techniques are being applied to the World Wide Web to improve
the search for information that is needed by users browsing the Internet.
Databases and database technology are having a major impact on the growing use of computers.
It is fair to say that databases play a critical role in almost all areas where computers are used, including
business, electronic commerce, engineering, medicine, law, education, and library science, to name a
few.

Database
A database is a collection of related data. By data, we mean known facts that can be recorded
and that have implicit and useful meaning.
For example, consider the names, telephone numbers, and addresses of the people you know.
You may have recorded this data in an indexed address book, or you may have stored it on a hard drive,
using a personal computer and software such as Microsoft Access, or Excel. This is a collection of
related data with an implicit meaning and hence is a database.

Characteristics of the database in the DBMS:


1.Sharing of the data takes place amongst the different type of the users and the applications.
2.Data exists permanently.
3.Data must be very much correct in the nature and should also be in accordance with the real
world entity that they represent.
4.Data can live beyond the scope of the process that has created it.
5.Data is not at all repeated.
6.Changes that are made in the schema at one level should not at all affect the other levels.
7.Database should also provide security,

Database-Management System:
A database-management system (DBMS) is a collection of interrelated data and a set of
programs to access those data. The collection of data, usually referred to as the database, contains
information relevant to an enterprise. The primary goal of a DBMS is to provide a way to store and
retrieve database information that is both convenient and efficient.
Database systems are designed to manage large bodies of information. Management of data
involves both defining structures for storage of information and providing mechanisms for the
manipulation of information.
In addition, the database system must ensure the safety of the information stored, despite system
crashes or attempts at unauthorized access. If data are to be shared among several users, the system must
avoid possible anomalous results.

There are several Database Management Systems (DBMS), such as:

• Microsoft SQL Server


• Oracle
• Sybase
• DBase
• Microsoft Access
• MySQL from Sun Microsystems (Oracle)
• DB2 from IBM etc.

What is the need of DBMS?

Database systems are basically developed for large amount of data. When dealing with huge
amount of data, there are two things that require optimization: Storage of data and retrieval of data.

Storage: According to the principles of database systems, the data is stored in such a way
that it acquires lot less space as the redundant data (duplicate data) has been removed before
storage. Let’s take a layman example to understand this:

In a banking system, suppose a customer is having two accounts, one is saving account
and another is salary account. Let’s say bank stores saving account data at one place (these
places are called tables we will learn them later) and salary account data at another place, in that
case if the customer information such as customer name, address etc. are stored at both places
then this is just a wastage of storage (redundancy/ duplication of data), to organize the data in a
better way the information should be stored at one place and both the accounts should be linked
to that information somehow. The same thing we achieve in DBMS.

Fast Retrieval of data: Along with storing the data in an optimized and systematic
manner, it is also important that we retrieve the data quickly when needed. Database systems
ensure that the data is retrieved as quickly as possible.

Database-System Applications
Databases are widely used. Here are some representative applications:
Enterprise Information
Sales: For customer, product, and purchase information.
Accounting: For payments, receipts, account balances, assets and other accounting information.
Human resources: For information about employees, salaries, payroll taxes, and benefits, and
for generation of paychecks.
Manufacturing: For management of the supply chain and for tracking production of items in
factories, inventories of items in warehouses and stores, and orders for items.
Online retailers: For sales data noted above plus online order tracking, generation of
recommendation lists, and maintenance of online product evaluations.
Banking and Finance
Banking: For customer information, accounts, loans, and banking transactions.
Credit card transactions: For purchases on credit cards and generation of monthly statements.
Finance: For storing information about holdings, sales, and purchases of financial instruments
such as stocks and bonds; also for storing real-time market data to enable online trading by
customers and automated trading by the firm.
Universities: For student information, course registrations, and grades (in addition to standard enterprise
information such as human resources and accounting).
Airlines: For reservations and schedule information. Airlines were among the first to use databases in a
geographically distributed manner.
Telecommunication: For keeping records of calls made, generating monthly bills, maintaining balances
on prepaid calling cards, and storing information about the communication networks.

PURPOSE OF DATA BASE SYSTEMS:


Before DBMS was invented, Information was stored using File Processing System. In this
System, data is stored in permanent system files (secondary Storage). Different application programs are
written to extract data from these files and to add record to these files. But, there are Number of
disadvantages in using File Processing System, to store the data.
One way to keep the information on a computer is to store it in permanent system files. To allow
users to manipulate the stored information, the system has a number of application programs that
manipulate the organized files. These application programs are written by system programmers in
response to the needs of the organizations. New application programs are added to the system as the need
arises. Thus, as the time goes more files and more application programs are added to the system. A
typical file processing system described above is the system used to store information before the advent
of DBMS.
Characteristics of Traditional File Processing System:

• It stores data of an organization in group of files.


• Files carrying data are independent on each other.
• COBOL, C, C++ programming languages were used to design the files.
• Each file contains data for some specific area or department like library, student fees, and student
examinations.
• It is less flexible and has many limitations.
• It is very difficult to maintain file processing system.
• Any change in one file affects all the files that creates burden on the programmer.
• File in Traditional File Processing Systems are called flat files.

Overall, Traditional File Processing Systems was good in many cases in compare to manual non
computer based system but still it had many disadvantages that were overcome by Data Base
Management System.
Keeping the information of an organization in a file processing system has a number of
disadvantages, namely
FILE MANAGEMENT SYSTEM PROBLEMS
• Data Redundancy and Inconsistency: Since the files and applications programs are created by
different programmers over a long period, the various files are likely to have different formats
and the programs may be written in several programming languages. Moreover, the same
information may be duplicated in several places. This redundancy leads too higher storage and
access cost. In addition, it may lead to data inconsistency.
• Difficulty in Accessing Data: The file processing system does not allow needed data to be
retrieved in a convenient and efficient manner.
• Data Isolation: I a file processing system, as the data are scattered in various files, and files may
be in different formats. It is very difficult to write new application programs to retrieve the
appropriate data.
• Integrity problems: The data values stored in the database must satisfy certain types of
consistency Constraints (Conditions).
For example, the minimum balance in a bank account may never fall
below an amount of Rs. 500. Developers enforce these constraints in the system by
adding appropriate code in the application programs. However, when new
constraints are added, it is difficult to change the application programs to enforce
them.
• Atomicity problems. A computer system, like any other device, is subject to failure. In many
applications, it is crucial that, if a failure occurs, the data be restored to the consistent state that
existed prior to the failure.
Consider a program to transfer Rs.500 from the account balance of
department A to the account balance of department B. If a system failure occurs
during the execution of the program, it is possible that the Rs.500 was removed
from the balance of department A but was not credited to the balance of department
B, resulting in an inconsistent database state. Clearly, it is essential to database
consistency that either both the credit and debit occur, or that neither occur. That is,
the funds transfer must be atomic — it must happen in its entirety or not at all. It is
difficult to ensure atomicity in a conventional file-processing system.
• Concurrent-access anomalies. For the sake of overall performance of the system and faster
response, many systems allow multiple users to update the data simultaneously
• Security problems: Not every user of the database system should be able to access all the data.
For example, in a banking system, payroll personnel need to see only that
part of the database that has information about various bank employees. They do
not need access to information about customer accounts.

In the file processing systems, as the application programs are added to the system in an adhoc
manner, it is difficult to enforce security.

The above disadvantages can be overcome by use of DBMS and it provides the following
advantages.
1. Provides for mass storage of relevant data.
2. Make easy access of the data to user.
3. Allows for the modification of data in a consistent manner.
4. Allows multiple users to be active at a time
5. Eliminate or reduce the redundant data.
6. Provide prompt response to the users request for data.
7. Supports Backup and recovery of data.
8. Protect data from physical hardware failure and unauthorized access.
9. Constraints can be set to database to maintain data integrity.

ADVANTAGES AND DISADVANTAGES OF A DBMS

Using a DBMS to manage data has many advantages:


Reduction of Redundancy: This is perhaps the most significant advantage of using DBMS.
Redundancy is the problem of storing the same data item in more one place. Redundancy creates
several problems like requiring extra storage space, entering same data more than once during data
insertion, and deleting data from more than one place during deletion. Anomalies may occur in the
database if insertion, deletion etc are not done properly.

Sharing of Data: In a paper-based record keeping, data cannot be shared among many users. But in
computerized DBMS, many users can share the same database if they are connected via a network.

Data Integrity: We can maintain data integrity by specifying integrity constrains, which are rules
and restrictions about what kind of data may be entered or manipulated within the database. This
increases the reliability of the database as it can be guaranteed that no wrong data can exist within
the database at any point of time.

Data independence: Application programs should be as independent as possible from details of data
representation and storage. The DBMS can provide an abstract view of the data to insulate
application code from such details.

Efficient data access: A DBMS utilizes a variety of sophisticated techniques to store and retrieve
data efficiently. This feature is especially important if the data is stored on external storage devices.

Data integrity and security: If data is always accessed through the DBMS, the DBMS can enforce
integrity constraints on the data. For example, before inserting salary information for an employee,
the DBMS can check that the department budget is not exceeded. Also, the DBMS can enforce
access controls that govern what data is visible to different classes of users.

Data administration: When several users share the data, centralizing the administration of data can
offer significant improvements. Experienced professionals who understand the nature of the
data being managed, and how different groups of users use it, can be responsible for
organizing the data representation to minimize redundancy and fine-tuning the storage of the data to
make retrieval efficient.

Concurrent access and crash recovery: A DBMS schedules concurrent accesses to the data in such
a manner that users can think of the data as being accessed by only one user at a time. Further, the
DBMS protects users from the effects of system failures.

Reduced application development time: Clearly, the DBMS supports many important functions
that are common to many applications accessing data stored in the DBMS. This, in conjunction with
the high-level interface to the data, facilitates quick development of applications. Such
applications are also likely to be more robust than applications developed from scratch because many
important tasks are handled by the DBMS instead of being implemented by the application.

DISADVANTAGES OF A DBMS
Danger of a Overkill: For small and simple applications for single users a database system is often not
advisable.

Complexity: A database system creates additional complexity and requirements. The supply and
operation of a database management system with several users and databases is quite costly and
demanding.

Qualified Personnel: The professional operation of a database system requires appropriately trained staff.
Without a qualified database administrator nothing will work for long.

Costs: Through the use of a database system new costs are generated for the system itselfs but also for
additional hardware and the more complex handling of the system.

Lower Efficiency: A database system is a multi-use software which is often less efficient than specialised
software which is produced and optimised exactly for one problem.

VIEW OF DATA:
A DBMS is a collection of interrelated files and set of programs which allows the users to access
and modify these files.

Data Abstraction
A major purpose of a database system is to provide users with an abstract view of the data. That
is, the system hides certain details of how the data are stored and maintained. It is called data abstraction.
Level of Abstraction: basically, Abstraction can be divided in to 3 levels. They are
1. Physical Level : The lowest of abstraction describes how the data are actually stored. At the
physical level, complex low-level data structures are described in detail.

2. Logical Level (Conceptual Level) : This next higher level of abstraction describes what
data are stored in the database, and what relationship exist among those data. This level of abstraction is
used by Database Administrators (DBA), Who must decide what information is to be kept in the
database.

3. View Level : This Highest level of abstraction describes only part of the entire database. The
use of simpler structures at the logical level, some complexity remains, because of the large databases.
Many users of the database system will not be concerned with all this information. Such users need to
access only a Part of the database. So that their interaction with the system is simplified, the view level
of abstraction is defined. The system may provide views for the same database.

An analogy to the concept of data types in programming languages may clarify the distinction
among levels of abstraction. Many high-level programming languages support the notion of a structured
type. For example, we may describe a record as follows:
type instructor = record
ID : char (5);
name : char (20);
dept name : char (20);
salary : numeric (8,2);
end;

This code defines a new record type called instructor with four fields. Each field has a name and
a type associated with it. A university organization may have several such record types, including
department, with fields dept name, building, and budget

course, with fields course id, title, dept name, and credits

student, with fields ID, name, dept name, and tot cred

At the physical level, an instructor, department, or student record can be de-scribed as a block of
consecutive storage locations. The compiler hides this level of detail from programmers. Similarly, the
database system hides many of the lowest-level storage details from database programmers. Database
administrators, on the other hand, may be aware of certain details of the physical organization of the data.
At the logical level, each such record is described by a type definition, as in the previous code
segment, and the interrelationship of these record types is defined as well. Programmers using a
programming language work at this level of abstraction. Similarly, database administrators usually work
at this level of abstraction.
Finally, at the view level, computer users see a set of application programs that hide details of
the data types. At the view level, several views of the database are defined, and a database user sees
some or all of these views. In addition to hiding details of the logical level of the database, the views
also provide a security mechanism to prevent users from accessing certain parts of the database. For
example, clerks in the university registrar office can see only that part of the database that has
information about students; they cannot access information about salaries of instructors.

INSTANCES AND SCHEMAS:


Databases change over time as information is inserted and deleted.

Instances
The collection of information stored in the database at a particular moment is called an instance of the
database. It is also called as snapshot or set of occurrence or current state of the database.

Example: Instance of the employee schema


Eno Ename Salary Address
1 A 10000 1st street
2 B 20000 2nd steet
3 C 30000 3rd street

Schemas
The overall design of the database is called the database schema. Schemas are changed infrequently, if
at all.
The concept of database schemas and instances can be understood by analogy to a program
written in a programming language. A database schema corresponds to the variable declarations (along
with associated type definitions) in a program. Each variable has a particular value at a given instant. The
values of the variables in a program at a point in time correspond to an instance of a database schema. In
general, database system supports one physical schema, one logical schema and several subschema’s.
Database systems have several schemas, partitioned according to the levels of abstraction.

• The physical schema describes the database design at the physical level,
• The logical schema describes the database design at the logical level.
• A database may also have several schemas at the view level, sometimes called
subschemas, that describe different views of the database.

Database Schema diagram for a company:


EMPLOYEE DEPARTMENT
Eno Ename Salary Address Dno Dname Dlocation

DATA INDEPENDENCE
The three-schema architecture can be used to further explain the concept of data independence,
which can be defined as the capacity to change the schema at one level of a database system without
having to change the schema at the next higher level.
We can define two types of data independence:
1. Logical data independence is the capacity to change the conceptual schema without having
to change external schemas or application programs. We may change the conceptual schema to
expand the database (by adding a record type or data item), to change constraints, or to reduce
the database (by removing a record type or data item).
Only the view definition and the mappings need be changed in a DBMS that supports logical
data independence. After the conceptual schema undergoes a logical reorganization, application
programs that reference the external schema constructs must work as before. Changes to
constraints can be applied to the conceptual schema without affecting the external schemas or
application programs.
2. Physical data independence is the capacity to change the internal schema without having to
change the conceptual schema. Hence, the external schemas need not be changed as well.
Changes to the internal schema may be needed because some physical files had to be
reorganized.
For example, by creating additional access structures to improve the performance of
retrieval or update. If the same data as before remains in the database, we should not have to
change the conceptual schema.
Whenever we have a multiple-level DBMS, its catalog must be expanded to include information
on how to map requests and data among the various levels. The DBMS uses additional software to
accomplish these mappings by referring to the mapping information in the catalog.
Data independence occurs because when the schema is changed at some level, the schema at the
next higher level remains unchanged; only the mapping between the two levels is changed. Hence,
application programs referring to the higher-level schema need not be changed. The three-schema
architecture can make it easier to achieve true data independence, both physical and logical. However, the
two levels of mappings create an overhead during compilation or execution of a query or program,
leading to inefficiencies in the DBMS. Because of this, few DBMSs have implemented the full three-
schema architecture.

DATA MODELS:
A collection of tools for describing Data, Data relationships,Data semantics and Data constraints.
The data models can be classified into four different categories:
Relational model
Entity-Relationship data model (mainly for database design)
Object-based data models (Object-oriented and Object-relational)
Semi-structured data model (XML)
Other older models:
• Network model
• Hierarchical model

Relational Model
The relational model is currently the most popular data model in the database management
systems. The popularity is because of simplicity and understandability. This data model is developed by
E.F.Codd in 1970 which is based on relation, two dimensional table.
The relational data model uses a collection of tables (also called as relation) to both data and the
relationships among those data. Each table has multiple columns and each column has unique name. A
relation consists of rows and columns. The row in table (relation) is called as Tuple and column name
are known as attribute.

Ex : Customer Table

Customer Name UID Address Account No

Tanveer A12345 Hyd A – 101

Ramesh B23456 Sec’bad A – 215

Ravi C34567 Charminar A – 305

Prasad A345789 Banglore A – 201

Smith Z459087 Delhi A - 405

Advantages
1. In this model, data redundancy is controlled to a greater extent
2. The relational data model allows many-to-many relationships.
3. The relational data model structures are very simple and easy to build
4. Faster access of data is possible and storage space required is greatly
reduced.

Entity-Relationship Model
The entity-relationship (E-R) data model uses a collection of basic objects, called entities,
and relationships among these objects. An entity is a “thing” or “object” in the real world that is
distinguishable from other objects. The entity-relationship model is widely used in database design.
It is a high level conceptual data model that describes the structure of database in terms of
entities, relationship among entities & constraints on
them.
Basic Concepts of E-R Model:
Entity
Entity Set
Attributes
Relationship
Relationship set
Identifying Relationship
which is made up of components. Some of they are
• Rectangles : Which represent entity sets.
• Ellipses : Which represent attributes
• Diamonds : Which represent relationship sets
• Lines : Which link attributes to entity sets and entity sets to relationship sets.
Object-Based Data Model
Object-oriented programming (especially in Java, C++, or C#) has become the dominant
software-development methodology. This led to the development of an object-oriented data model that
can be seen as extending the E-R model with notions of encapsulation, methods (functions), and object
identity. The object-relational data model combines features of the object-oriented data model and
relational data model.

Semi-structured Data Model


The semi-structured data model permits the specification of data where individual data
items of the same type may have different sets of attributes. This is in contrast to the data models
mentioned earlier, where every data item of a particular type must have the same set of attributes. The
Extensible Markup Language (XML) is widely used to represent semi-structured data.

Historically, the network data model and the hierarchical data model preceded the
relational data model. These models were tied closely to the underlying implementation, and
complicated the task of modeling data. As a result they are used little now, except in old database
code that is still in service in some places.
Network Data Model:
Data in the network model are represented by collections of records and relationships among
data are represented by links,which can be viewed as pointers. The records in the database can be
organized as a collection of arbitrary graphs. The Network data model is similar to Hierarchical model
except that one data can have more than one parent. Any record in the database is allowed to own sets of
other type of record.
Advantages
o It can be used to represent many-to-many relationships
o It offers integration of data
o The storage space is reduced considerably due to less redundancy
o It provides faster access of data.

Hierarchical Data Model :

A hierarchical database model is a data model in which the data is organized into a tree-like
structure. The data is stored as records which are connected to one another through links. A record is a
collection of fields, with each field containing only one value.
In this model the relationship among the data is represented by records and links. It consists of
records which are connected to another through links. A link can be defined as an association between
two records. This hierarchical data model can do considered as an upside –down tree, with the highest
level of tree kept as root.
Advantages
o The hierarchical model, allows one-to-one-and one-to-many relationships.
o The model has got the ability to handle large amount of data.
Disadvantages
o The model involves with complicated querying.
o As duplication of data takes place, there is wastage of storage space.
o During updating of data inconsistency exists.
o The model does not allow many-to-many relationships.
DATA BASE LANGUAGES:
A database system provides two different types of Languages, one will specify the schema, and
other will express database queries and updates. They are

• Data-Definition Languages (DDL)


• Data-Manipulation Language (DML)
• Data Control language (DCL)

1. Data-Definition Languages (DDL) : A database scheme is specified by set of definitions


which are expressed by special language called Data Definition Language (DDL). The result of
compilation of DDL statements is a set of tables that is stored in a special file called ‘Data dictionary’ or
“data directory.
A data dictionary is a file that contains metadata, i.e. Data about data. This file is consulted
before actual data are read or modified in the database system
The storage structure and access methods used by the database system are specified by a set of
definitions in a special type of DDL called a ‘data storage and data definition language’. The result of
consultation of these definitions is a set instruction to specify the implementation details of the database
schemas. Which are usually hidden form the users.
The database systems implement integrity constraints that can be tested with minimal overhead:
Domain Constraints: A domain of possible values must be associated with every attribute (for
example, integer types, character types, date/time types). Declaring an attribute to be of a
particular domain acts as a constraint on the values that it can take. Domain constraints are the
most elementary form of integrity constraint. They are tested easily by the system whenever a
new data item is entered into the database.

Referential Integrity: There are cases where we wish to ensure that a value that appears in one
relation for a given set of attributes also appears in a certain set of attributes in another relation
(referential integrity).

Assertions: An assertion is any condition that the database must always satisfy. Domain
constraints and referential-integrity constraints are special forms of assertions.

For example, “Every department must have at least five courses offered every
semester” must be expressed as an assertion. When an assertion is created, the system tests it for
validity. If the assertion is valid, then any future modification to the database is allowed only if it
does not cause that assertion to be violated.

Authorization: We may want to differentiate among the users as far as the type of access they are
permitted on various data values in the database. These differentiations are expressed in terms of
authorization, the most common being:
read authorization, which allows reading, but not modification, of data;
insert authorization, which allows insertion of new data, but not modification of
existing data;
update authorization, which allows modification, but not deletion, of data; and
delete authorization, which allows deletion of data. We may assign the user all,
none, or a combination of these types of authorization.
The DDL commands are
• To create the database instance – CREATE
• To alter the structure of database – ALTER
• To drop database instances – DROP
• To delete tables in a database instance – TRUNCATE
• To rename database instances – RENAME

All these commands specify or update the database schema that’s why they come under Data
Definition language.

o Used by the DBA and database designers to specify the conceptual schema of a database.
o In many DBMSs, the DDL is also used to define internal and external schemas (views).
o In some DBMSs, separate storage definition language (SDL) and view definition language
(VDL) are used to define internal and external schemas.
o SDL is typically realized via DBMS commands provided to the DBA and database designers
o DDL compiler generates a set of tables stored in a data dictionary
o Data dictionary contains metadata (i.e., data about data)

2. Data-Manipulation Language (DML) : A DML is language which enables users to access or


manipulate data as organized by appropriate data model. The goal is to provide efficient human
interaction with the system. The DML allows following
(a) The retrieval information form the database
(b) The Insertion of new information in to existing database
(c) The deletion of existing information from database
(d) The modification of information stored in the database.
The DML commands are

• To read records from table(s) – SELECT


• To insert record(s) into the table(s) – INSERT
• Update the data in table(s) – UPDATE
• Delete all the records from the table – DELETE

o Used to specify database retrievals and updates


o DML commands (data sublanguage) can be embedded in a general-purpose
programming language (AKA host language), such as COBOL, C, C++, or Java.
o Alternatively, stand-alone DML commands can be applied directly (called a query
language).
o Language for accessing and manipulating the data organized by the appropriate
data model
o DML also known as query language

A DML is language which enables users to access or manipulate data. There are basically two types.
• Procedural DML: This requires a user to specify what data are needed and how to get those
data from existing database.
• Non procedural DML: Which require a user to specify what data are needed ‘without’
specifying how to get those data.

Non procedural DMLs are usually easier to learn and use than procedural DMLs. A user does not
have to specify how to the data, these languages may generate code that is not as that produced by
Procedural DML. Hence we can make remedy this difficulty by various optimization techniques.
A Query is a statement, a request for retrieval information. The portion of a DML, that involves
information retrieval is called a ‘Query Language’.
This query in the SQL language finds the name of the customer whose customer-id is 192-83-7465:
Select customer.customer-name from customer wherecustomer.customer-id = 192-83-7465
The query specifies that those rows from the table customer where the customer-id is 192-83-
7465 must be retrieved, and the customer-name attribute of these rows must be displayed.
Queries may involve information from more than one table. For instance, the following query finds the
balance of all accounts owned by the customer with customerid 192-83-7465.
Select account.balance from depositor, account where depositor.customer-id = 192-83-7465
and depositor.account-number= account.account-number

There are a number of database query languages in use, either commercially or experimentally.
The levels of abstraction apply not only to defining or structuring data, but also to manipulating data.
At the physical level, we must define algorithms that allow efficient access to data. At higher levels of
abstraction, we emphasize ease of use. The goal is to allow humans to interact efficiently with the
system. The query processor component of the database system translates DML queries into sequences
of actions at the physical level of the database system.

• Data Control language (DCL): DCL is used for granting and revoking user access on a
database

• To grant access to user – GRANT


• To revoke access from user – REVOKE

In practical data definition language, data manipulation language and data control languages are
not separate language; rather they are the parts of a single database language such as SQL.

DATA DICTIONARY
We can define a data dictionary as a DBMS component that stores the definition of data
characteristics and relationships. You may recall that such “data about data” were labeled metadata. The
DBMS data dictionary provides the DBMS with its self describing characteristic. In effect, the data
dictionary resembles and X-ray of the company’s entire data set, and is a crucial element in the data
administration function.
The two main types of data dictionary exist, integrated and stand alone.
An integrated data dictionary is included with the DBMS. For example, all relational
DBMSs include a built in data dictionary or system catalog that is frequently accessed and
updated by the RDBMS.
Other DBMSs – Stand alone especially older types, do not have a built in data dictionary
instead the DBA may use third party stand alone data dictionary systems.
Data dictionaries can also be classified as active or passive.
An active data dictionary is automatically updated by the DBMS with every database
access, thereby keeping its access information up-to-date.
A passive data dictionary is not updated automatically and usually requires a batch
process to be run. Data dictionary access information is normally used by the DBMS for query
optimization purpose.
The data dictionary’s main function is to store the description of all objects that interact with the
database. Integrated data dictionaries tend to limit their metadata to the data managed by the DBMS.
Stand alone data dictionary systems are more usually more flexible and allow the DBA to describe and
manage all the organization’s data, whether or not they are computerized. Whatever the data dictionary’s
format, its existence provides database designers and end users with a much improved ability to
communicate. In addition, the data dictionary is the tool that helps the DBA to resolve data conflicts.
Although, there is no standard format for the information stored in the data dictionary several
features are common. For example, the data dictionary typically stores descriptions of all:

• Data elements that are define in all tables of all databases. Specifically the data dictionary stores
the name, data types, display formats, internal storage formats, and validation rules. The data
dictionary tells where an element is used, by whom it is used and so on.
• Tables define in all databases. For example, the data dictionary is likely to store the name of the
table creator, the date of creation access authorizations, the number of columns, and so on.
• Indexes define for each database tables. For each index the DBMS stores at least the index name
the attributes used, the location, specific index characteristics and the creation date.
• Define databases: who created each database, the date of creation where the database is located,
who the DBA is and so on.
• End users and The Administrators of the data base
• Programs that access the database including screen formats, report formats application formats,
SQL queries and so on.
• Access authorization for all users of all databases.
• Relationships among data elements which elements are involved: whether the relationship
are mandatory or optional, the connectivity and cardinality and so on.

If the data dictionary can be organized to include data external to the DBMS itself, it becomes an
especially flexible to for more general corporate resource management. The management of such an
extensive data dictionary, thus, makes it possible to manage the use and allocation of all of the
organization information regardless whether it has its roots in the database data.
RELATIONAL DATABASES :
A relational database is based on the relational model and uses a collection of tables to represent
both data and the relationships among those data. It also includes a DML and DDL.
The purpose of the relational model is to provide a declarative method for specifying data and
queries: users directly state what information the database contains and what information they want from
it, and let the database management system software take care of describing data structures for storing
the data and retrieval procedures for answering queries.
Relational Database: One of the major advantages of using a relational database is its structural
flexibility. It allows the users to retrieve the data in any combination
A relation is a two-dimensional array, consisting of horizontal rows and vertical columns. Each
row, column ie a cell contains a unique value and no two rows are identical with respect to one another.
Columns are always self-consistent in the sense that it has the same meaning in every row. This
means that the database management system (DBMS) is not concerned with its appearance, either first
or next. The table will be processed the same way, regardless of the order of the columns.
Relations are commonly referred as tables.. Every column in a database table acts as attribute
since the meaning of the column is same for every row of the database .A row consists of a set of fields
and hence commonly referred as a record.
Properties of Relational Database: The important properties of a relational database are listed
below:
1. A relational database is a collection of relations.
2. The database tables have a row column format.
3. Operators are available either to join or separate columns of the database table.
4. Relations are formed with respect to data only.
5. The tables can be accessed by using simple non-procedural statements.
6. The data is fully independent, that is it will be the same irrespective of the access path used.

Database Access from Application Programs


SQL is not as powerful as a universal Turing machine; that is, there are some computations that
are possible using a general-purpose programming language but are not possible using SQL. SQL also
does not support actions such as input from users, output to displays, or communication over the
network. Such computations and actions must be written in a host language, such as C, C++, or Java,
with embedded SQL queries that access the data in the database. Application programs are programs
that are used to interact with the database in this fashion.
Examples in a university system are programs that allow students to register for courses,
generate class rosters, calculate student GPA, generate payroll checks, etc. To access the database, DML
statements need to be executed from the host language. There are two ways to do this:
By providing an application program interface (set of procedures) that can be used to send DML
and DDL statements to the database and retrieve the results.
The Open Database Connectivity (ODBC) standard for use with the C language is a commonly
used application program interface standard. The Java Database Connectivity (JDBC) standard provides
corresponding features to the Java language.
By extending the host language syntax to embed DML calls within the host language program.
Usually, a special character prefaces DML calls, and a preprocessor, called the DML precompiler,
converts the DML statements to normal procedure calls in the host language.

DATABASE DESIGN:
Database systems are designed to manage large bodies of information. These large bodies of
information do not exist in isolation. They are part of the operation of some enterprise whose end
product may be information from the database or may be some device or service for which the database
plays only a supporting role.

Database Design is a collection of processes that facilitate the designing, development,


implementation and maintenance of enterprise data management systems.
It helps produce database systems

1. That meet the requirements of the users


2. Have high performance.

A high-level data model provides the database designer with a conceptual frame-work in which to
specify the data requirements of the database users, and how the database will be structured to fulfill
these requirements. The initial phase of database design, then, is to characterize fully the data needs of
the prospective database users. The database designer needs to interact extensively with domain experts
and users to carry out this task. The outcome of this phase is a specification of user requirements.

Design Process:

The database development life cycle has a number of stages that are followed when developing
database systems. The steps in the development life cycle do not necessary have to be followed
religiously in a sequential manner.

On small database systems, the database system development life cycle is usually very simple
and does not involve a lot of steps.

In order to fully appreciate the above diagram, let's look at the individual components listed in each step.

Requirements analysis

• Planning - This stages concerns with planning of entire Database Development Life-
Cycle. It takes into consideration the Information Systems strategy of the
organization.
• System definition - This stage defines the scope and boundaries of the proposed database
system.

Database designing
The process of moving from an abstract data model to the implementation of the database proceeds in
two final design phases.

In the logical-design phase, the designer maps the high-level conceptual schema onto the
implementation data model of the database system that will be used.

The designer uses the resulting system-specific database schema in the subsequent physical-design
phase, in which the physical features of the database are specified.

Implementation

A fully developed conceptual schema indicates the functional requirements of the enterprise. In a
specification of functional requirements, users describe the kinds of operations (or transactions) that
will be performed on the data. Example operations include modifying or updating data, searching for
and retrieving specific data, and deleting data. At this stage of conceptual design, the designer can
review the schema to ensure it meets functional requirements.

• Data conversion and loading - this stage is concerned with importing and converting data from
the old system into the new database.
• Testing - this stage is concerned with the identification of errors in the newly implemented
system .It checks the database against requirement specifications.

Database Design for a University Organization

To illustrate the design process, let us examine how a database for a university could be designed.
The initial specification of user requirements may be based on interviews with the database users, and
on the designer’s own analysis of the organization. The description that arises from this design phase
serves as the basis for specifying the conceptual structure of the database. Here are the major
characteristics of the university.

The university is organized into departments. Each department is identified by a unique name (dept
name), is located in a particular building, and has a budget.

Each department has a list of courses it offers. Each course has associated with it a course id, title,
dept name, and credits, and may also have have associated prerequisites.

Instructors are identified by their unique ID. Each instructor has name, associated department (dept
name), and salary.

Students are identified by their unique ID. Each student has a name, an associated major department
(dept name), and tot cred (total credit hours the student earned thus far).

The university maintains a list of classrooms, specifying the name of the building, room number, and
room capacity.

The university maintains a list of all classes (sections) taught. Each section is identified by a course
id, sec id, year, and semester, and has associated with it a semester, year, building, room number,
and time slot id (the time slot when the class meets).
The department has a list of teaching assignments specifying, for each instructor, the sections the
instructor is teaching.

The university has a list of all student course registrations, specifying, for each student, the courses
and the associated sections that the student has taken (registered for).

Two Types of Database Techniques:

1. Normalization
2. ER Modeling

NORMALIZATION :
Another method for designing a relational database is to use a process commonly known as
normalization. The goal is to generate a set of relation schemas that allows us to store information
without unnecessary redundancy, yet also allows us to retrieve information easily. The approach is to
design schemas that are in an appropriate normal form. To determine whether a relation schema is in one
of the desirable normal forms, we need additional information about the real-world enterprise that we
are modeling with the database. The most common approach is to use functional dependencies.
To understand the need for normalization, let us look at what can go wrong in a bad database
design. Among the undesirable properties that a bad design may have are:

• Repetition of information
• Inability to represent certain information

Normalization is a process of organizing the data in database to avoid data redundancy, insertion
anomaly, update anomaly & deletion anomaly. Let’s discuss about anomalies first then we will discuss
normal forms with examples.

Anomalies in DBMS : There are three types of anomalies that occur when the database is not
normalized. These are – Insertion, update and deletion anomaly. Let’s take an example to understand
this.

Example: Suppose a manufacturing company stores the employee details in a table named employee
that has four attributes: emp_id for storing employee’s id, emp_name for storing employee’s name,
emp_address for storing employee’s address and emp_dept for storing the department details in which
the employee works. At some point of time the table looks like this:

emp_id emp_name emp_address emp_dept


101 Rick Delhi D001
101 Rick Delhi D002
123 Maggie Agra D890
166 Glenn Chennai D900
166 Glenn Chennai D004

The above table is not normalized. We will see the problems that we face when a table is not
normalized.
Update anomaly: In the above table we have two rows for employee Rick as he belongs to two
departments of the company. If we want to update the address of Rick then we have to update the same
in two rows or the data will become inconsistent. If somehow, the correct address gets updated in one
department but not in other then as per the database, Rick would be having two different addresses,
which is not correct and would lead to inconsistent data.

Insert anomaly: Suppose a new employee joins the company, who is under training and currently not
assigned to any department then we would not be able to insert the data into the table if emp_dept field
doesn’t allow nulls.

Delete anomaly: Suppose, if at a point of time the company closes the department D890 then deleting
the rows that are having emp_dept as D890 would also delete the information of employee Maggie since
she is assigned only to this department.

To overcome these anomalies we need to normalize the data.

The Entity-Relationship Model


The entity-relationship (E-R) data model uses a collection of basic objects, called entities, and
relationships among these objects. An entity is a “thing” or “object” in the real world that is
distinguishable from other objects. For example, each person is an entity, and bank accounts can be
considered as entities.
Entities are described in a database by a set of attributes. For example, the attributes dept name,
building, and budget may describe one particular department in a university, and they form attributes of
the department entity set. Similarly, attributes ID, name, and salary may describe an instructor entity.
The extra attribute ID is used to identify an instructor uniquely (since it may be possible to have two
instructors with the same name and the same salary). A unique instructor identifier must be assigned to
each instructor. In the United States, many organizations use the social-security number of a person (a
unique number the U.S. government assigns to every person in the United States) as a unique identifier.
A relationship is an association among several entities. For example, a member relationship
associates an instructor with her department. The set of all entities of the same type and the set of all
relationships of the same type are termed an entity set and relationship set, respectively.
The overall logical structure (schema) of a database can be expressed graph-ically by an entity-
relationship (E-R) diagram. There are several ways in which to draw these diagrams. One of the most
popular is to use the Unified Modeling Language (UML). In the notation we use, which is based on
UML, an E-R diagram is represented as follows:

Instructor Department

ID Member
Member Dept_Name
Name Building
Salary budget

Entity sets are represented by a rectangular box with the entity set name in the header and
the attributes listed below it.
Relationship sets are represented by a diamond connecting a pair of related entity sets. The
name of the relationship is placed inside the diamond.
As an illustration, consider part of a university database consisting of instruc-tors and the
departments with which they are associated. In the above Figure shows the corresponding E-R diagram.
The E-R diagram indicates that there are two entity sets, instructor and department, with attributes as
outlined earlier. The diagram also shows a relationship member between instructor and department.
In addition to entities and relationships, the E-R model represents certain constraints to which the
contents of a database must conform. One important constraint is mapping cardinalities, which
express the number of entities to which another entity can be associated via a relationship set. For
example, if each instructor must be associated with only a single department, the E-R model can
express that constraint.

DATA ARCHITECTURE:

Three important characteristics of the database approach are


(1) Insulation of programs and data (program-data and program-operation independence);
(2) Support of multiple user views; and
(3) Use of a catalog to store the database description (schema).
In this section we specify an architecture for database systems, called the three-schema
architecture, which was proposed to help achieve and visualize these characteristics.
The goal of the three-schema architecture, illustrated in Figure 1.1, is to separate the user
applications and the physical database. In this architecture, schemas can be defined at the following
three levels:
The internal level has an internal or Physical schema, which describes the physical storage
structure of the database. The internal schema uses a physical data model and describes the
complete details of data storage and access paths for the database.

The conceptual level has a conceptual or Logical schema, which describes the structure of the
whole database for a community of users. The conceptual schema hides the details of physical
storage structures and concentrates on describing entities, data types, relationships, user
operations, and constraints. A high-level data model or an implementation data model can be
used at this level.

The external or view level includes a number of external or View schemas or user views. Each
external schema describes the part of the database that a particular user group is interested in and
hides the rest of the database from that user group. A high-level data model or an implementation
data model can be used at this level.

Hence, the DBMS must transform a request specified on an external schema into a request against
the conceptual schema, and then into a request on the internal schema for processing over the stored
database. If the request is a database retrieval, the data extracted from the stored database must be
reformatted to match the user’s external view.
The processes of transforming requests and results between levels are called mappings. These
mappings may be time-consuming, so some DBMSs—especially those that are meant to support small
databases—do not support external views. Even in such systems, however, a certain amount of mapping
is necessary to transform requests between the conceptual and internal levels.
The DBMS accepts SQL commands generated from a variety of user interfaces, produces query
evaluation plans, executes these plans against the database, and returns the answers. (This is a
implication: SQL commands can be embedded in host language application programs, e.g., Java or
COBOL programs. We ignore these issues to concentrate on the core DBMS functionality.)

When a user issues a query, the parsed query is presented to a query optimizer, which uses
information about how the data is stored to produce an efficient execution plan for evaluating the query.
An execution plan is a blueprint for evaluating a query, and is usually represented as a tree of relational
operators

The files and access methods layer code sits on top of the buffer manager, which brings pages in
from disk to main memory as needed in response to read requests.

The lowest layer of the DBMS software deals with management of space on disk, where the data
is stored. Higher layers allocate, de-allocate, read, and write pages through (routines provided by) this
layer, called the disk space manager.

The DBMS supports concurrency and crash recovery by carefully scheduling user requests and
maintaining a log of all changes to the database. DBMS components associated with concurrency
control and recovery include the transaction manager, which ensures that transactions request and
release locks according to a suitable locking protocol and schedules the execution transactions; the lock
manager, which keeps track of requests for locks and grants locks on database objects when they
become available; and the recovery manager, which is responsible for maintaining a log, and restoring
the system to a consistent state after a crash. The disk space manager, buffer manager, and file and
access method layers must interact with these components.

The architecture of a database systems is greatly influenced by the underlying computer system
on which the database is running:
• Centralized
• Client-server
• Parallel (multi-processor)
• Distributed

The database and the DBMS catalog are usually stored on disk. Access to the disk is controlled
primarily by the operating system (OS), which schedules disk read/write. Many DBMSs have their own
buffer management module to schedule disk read/write, because this has a considerable effect on
performance. Reducing disk read/write improves performance considerably. A higher-level stored data
manager module of the DBMS controls access to DBMS information that is stored on disk, whether it is
part of the database or the catalog.

In the following Figure, in a simplified form, the typical DBMS components. The figure is
divided into two parts. The top part of the figure refers to the various users of the database environment
and their interfaces. The lower part shows the internals of the DBMS responsible for storage of data and
processing of transactions.
The architecture of a database system is greatly influenced by the underlying computer system on
which the database system runs. Database systems can be centralized, or client-server, where one server
machine executes work on behalf of multiple client machines. Database systems can also be designed to
exploit parallel computer architectures. Distributed databases span multiple geographically separated
machines.

Most users of a database system today are not present at the site of the database system, but
connect to it through a network. We can therefore differentiate between client machines, on which
remote database users work, and server machines, on which the database system runs.

One-tier architecture

Imagine a person on a desktop computer who uses Microsoft Access to load up a list of personal
addresses and phone numbers that he or she has saved in MS Windows' “My Documents” folder.
This is an example of a one-tier database architecture. The program (Microsoft Access) runs on
the user's local machine, and references a file that is stored on that machine's hard drive, thus using a
single physical resource to access and process information.

Two-tier architecture

Database applications are usually partitioned into two or three parts, as in Figure (a).
In a Two-tier architecture, the application resides at the client machine, where it invokes database
system functionality at the server machine through query language statements. Application program
interface standards like ODBC and JDBC are used for interaction between the client and the server.

Advantages:

1. Easy to maintain and modification is bit easy.


2. Communication is faster.

Disadvantages:

1. In two tier architecture application performance will be degrade upon increasing the users.
2. Cost-ineffective.

Three-tier architecture

In contrast (Figure (b)), in a Three-tier architecture, the client machine acts as merely a front
end and does not contain any direct database calls. Instead, the client end communicates with an
Intermediate layer called application server, usually through a forms interface. The application server
in turn communicates with a database system to access data. It is commonly used architecture for web
applications.

Advantages
1. High performance, lightweight persistent objects.
2. Scalability – Each tier can scale horizontally.
3. Performance – Because the Presentation tier can cache requests, network utilization is
minimized, and the load is reduced on the Application and Data tiers.
4. Better Re-usability.
5. Improve Data Integrity.
6. Improved Security – Client is not direct access to database.
7. Easy to maintain, to manage, to scale, loosely coupled etc.

Disadvantages

1. Increase Complexity/Effort

DATA STORAGE AND QUERYING:

A database system is partitioned into modules that deal with each of the responsibilities of the
overall system. The functional components of a database system can be broadly divided into the storage
manager and the query processor components.
The storage manager is important because databases typically require a large amount of storage
space. Corporate databases range in size from hundreds of gigabytes to, for the largest databases,
terabytes of data. A gigabyte is approximately 1000 megabytes (actually 1024) (1 billion bytes), and a
terabyte is 1 million megabytes (1 trillion bytes).
Since the main memory of computers cannot store this much information, the information is
stored on disks. Data are moved between disk storage and main memory as needed. Since the movement
of data to and from disk is slow relative to the speed of the central processing unit, it is imperative that
the database system structure the data so as to minimize the need to move data between disk and main
memory.
The query processor is important because it helps the database system to simplify and facilitate
access to data. The query processor allows database users to obtain good performance while being able
to work at the view level and not be burdened with understanding the physical-level details of the
implementation of the system. It is the job of the database system to translate updates and queries
written in a nonprocedural language, at the logical level, into an efficient sequence of operations at the
physical level.

Storage Manager

The storage manager is the component of a database system that provides the interface between
the low-level data stored in the database and the application programs and queries submitted to the
system. The storage manager is responsible for the interaction with the file manager. The raw data are
stored on the disk using the file system provided by the operating system. The storage manager
translates the various DML statements into low-level file-system commands.

Thus, the storage manager is responsible for storing, retrieving, and updating data in the
database.

The storage manager components include:


• Authorization and integrity manager, which tests for the satisfaction of integrity constraints
and checks the authority of users to access data.
• Transaction manager, which ensures that the database remains in a consistent (correct) state
despite system failures, and that concurrent transaction executions proceed without
conflicting.
• File manager, which manages the allocation of space on disk storage and the data structures
used to represent information stored on disk.
• Buffer manager,which is responsible for fetching data from disk storage into main memory,
and deciding what data to cache in main memory. The buffer manager is a critical part of the
database system, since it enables the database to handle data sizes that are much larger than
the size of main memory.

The storage manager implements several data structures as part of the physical system implementation:

• Data files, which store the database itself.


• Data dictionary, which stores metadata about the structure of the database, in particular the
schema of the database.
• Indices, which can provide fast access to data items. Like the index in this textbook, a database
index provides pointers to those data items that hold a particular value. For example, we could
use an index to find the instructor record with a particular ID, or all instructor records with a
particular name. Hashing is an alternative to indexing that is faster in some but not all cases.

The Query Processor

The query processor components include:

• DDL interpreter,which interprets DDL statements and records the definitions in the data
dictionary.
• DML compiler,which translates DML statements in a query language into an evaluation plan
consisting of low-level instructions that the query evaluation engine understands.
A query can usually be translated into any of a number of alternative evaluation plans
that all give the same result. The DML compiler also performs query optimization; that is, it
picks the lowest cost evaluation plan from among the alternatives.
• Query evaluation engine, which executes low-level instructions generated by the DML
compiler.

TRANSACTION MANAGEMENT:

A transaction is a collection of operations that performs a single logical function in a database


application. Each transaction is a unit of both atomicity and consistency. A transaction can be defined
as a group of tasks. A single task is the minimum processing unit which cannot be divided further.

Let’s take an example of a simple transaction. Suppose a bank employee transfers Rs 500 from A's
account to B's account. This very simple and small transaction involves several low-level tasks.

A’s Account B’s Account


Open_Account(A) Open_Account(B)
Old_Balance = A.balance Old_Balance = B.balance
New_Balance = Old_Balance – 500 New_Balance = Old_Balance + 500
A.balance = New_Balance B.balance = New_Balance
Close_Account(A) Close_Account(B)

ACID Properties

A transaction is a very small unit of a program and it may contain several lowlevel tasks. A
transaction in a database system must maintain Atomicity, Consistency, Isolation,
solation, and Durability −
commonly known as ACID properties − in order to ensure accuracy, completeness, and data integrity.

• Atomicity − This property states that a transaction must be treated as an atomic unit, that is,
either all of its operations are executed or none. There must be no state in a database where a
transaction is left partially completed. States should be defined either before the execution of the
transaction or after the execution/abortion/failure of the transaction.
• Consistency − The database must remain in a consistent state after any transaction. No
transaction should have any adverse effect on the data residing in the database. If the database
was in a consistent state before the execution of a transaction, it must remain consistent after the
execution of the
he transaction as well.
• Durability − The database should be durable enough to hold all its latest updates even if the
system fails or restarts. If a transaction updates a chunk of data in a database and commits, then
the database will hold the modified datdata.
a. If a transaction commits but the system fails before the
data could be written on to the disk, then that data will be updated once the system springs back
into action.
• Isolation − In a database system where more than one transaction are being executed
simultaneously
imultaneously and in parallel, the property of isolation states that all the transactions will be
carried out and executed as if it is the only transaction in the system. No transaction will affect
the existence of any other transaction.

States of Transactions

A transaction in a database can be in one of the following states −

Active − In this state, the transaction is being executed. This is the initial state of every transaction.

Partially Committed − When a transaction executes its final operation, it is said to be in a partially committed
state.

Failed − A transaction is said to be in a failed state if any of the checks made by the database recovery system
fails. A failed transaction can no longer proceed further.
Aborted − If any of the checks fails and the transaction has reached a failed state, then the recovery manager rolls
back all its write operations on the database to bring the database back to its original state where it was prior to
the execution of the transaction. Transactions in this state are called aborted. The database recovery module can
select one of the two operations after a transaction aborts −

o Re-start the transaction


o Kill the transaction

Committed − If a transaction executes all its operations successfully, it is said to be committed. All its effects are
now permanently established on the database system.

A transaction is a collection of operations that performs a single logical function in a database


application. Each transaction is a unit of both atomicity and consistency. Thus, we require that
transactions do not violate any database consistency constraints. That is, if the database was consistent
when a transaction started, the database must be consistent when the transaction successfully terminates.
However, during the execution of a transaction, it may be necessary temporarily to allow inconsistency,
since either the debit of A or the credit of B must be done before the other. This temporary
inconsistency, although necessary, may lead to difficulty if a failure occurs.
It is the programmer’s responsibility to define properly the various transactions, so that each
preserves the consistency of the database. For example, the transaction to transfer funds from the
account of department A to the account of department B could be defined to be composed of two
separate programs: one that debits account A, and another that credits account B. The execution of these
two programs one after the other will indeed preserve consistency. However, each program by itself
does not transform the database from a consistent state to a new consistent state. Thus, those programs
are not transactions. Ensuring the atomicity and durability properties are the responsibility of the
database system itself specifically, of the recovery manager. In the absence of failures, all transactions
complete successfully, and atomicity is achieved easily.
However, because of various types of failure, a transaction may not always complete its execution
successfully. If we are to ensure the atomicity property, a failed transaction must have no effect on the
state of the database. Thus, the database must be restored to the state in which it was before the
transaction in question started executing. The database system must therefore perform failure recovery,
that is, detect system failures and restore the database to the state that existed prior to the occurrence of
the failure.
Finally, when several transactions update the database concurrently, the consistency of data may no
longer be preserved, even though each individual transaction is correct. It is the responsibility of the
concurrency-control manager to control the interaction among the concurrent transactions, to ensure the
consistency of the database. The transaction manager consists of the concurrency-control manager
and the recovery manager.

DATA MINING AND INFORMATION RETRIEVAL:

The term data mining refers loosely to the process of semi-automatically analyzing large
databases to find useful patterns. Like knowledge discovery in artificial intelligence (also called
machine learning) or statistical analysis, data mining attempts to discover rules and patterns from data.
However, data mining differs from machine learning and statistics in that it deals with large volumes of
data, stored primarily on disk. That is, data mining deals with “knowledge discovery in databases.”
The practice of examining large pre-existing database in order to generate new information or
pattern.

Data mining applications that analyze large amounts of data searching for the occurrences of
specific patterns or relationships, and for identifying unusual patterns in areas such as credit card usage.
It was quickly apparent that basic relational systems were not very suitable for many of these
applications, usually for one or more of the following reasons:

More complex data structures were needed for modeling the application than the simple relational
representation.
New data types were needed in addition to the basic numeric and character string types.
New operations and query language constructs were necessary to manipulate the new data types.
New storage and indexing structures were needed for efficient searching on the new data types.

This led DBMS developers to add functionality to their systems. Some functionality was general
purpose, such as incorporating concepts from object-oriented databases into relational systems. Other
functionality was special purpose, in the form of optional modules that could be used for specific
applications.

Data mining is a process used by companies to turn raw data into useful information. By using
software, to look for pattern in large batch of data. Business can learn more about their customers and
develop more effective marketing strategies as well as increase sales and decrease costs.

The major steps involved in a data mining process are:

• Extract, transform and load data into a data warehouse


• Store and manage data in a multidimensional databases
• Provide data access to business analysts using application software
• Present analyzed data in easily understandable forms, such as graphs.

Data mining process depends on effective data collection and warehousing as well as computer
processing. When companies centralize their data into one database or program, It is called data
warehousing. Such as data warehouses, for efficient analysis, data mining algorithms, facilitating
business decision making and other information requirements to kindly cut costs and increase the sales.

Databases versus Information Retrieval

Textual data, too, has grown explosively. Textual data is unstructured, unlike the rigidly
structured data in relational databases. Querying of unstructured textual data is referred to as
information retrieval.

Traditionally, database technology applies to structured and formatted data that arises in routine
applications in government, business and industry. Database technology is heavily used in
manufacturing, retail, banking, insurance, finance, and health care industries, where structured data is
collected through forms, such as invoices or patient registration documents. An area related to database
technology is Information Retrieval (IR), which deals with books, manuscripts, and various forms of
library-based articles. Data is indexed, cataloged and annotated using keywords.
Information retrieval, as the name implies, concerns the retrieving of relevant information from
databases. It is basically concerned with facilitating the user's access to large amounts of (predominantly
textual) information.

The process of information retrieval involves the following stages:

1. Representing Collections of Documents


- how to represent, identify and process the collection of documents.
2. User-initiated querying
- understanding and processing of the queries.
3. Retrieval of the appropriate documents
- the searching mechanism used to obtain and retrieve the relevant documents

Applications of Information retrieval:


1. Text Information Retrieval

Terabytes of data are being cumulated on the internet which includes Facebook and Twitter
data as well as Instagrams and other social networking sites. This vast repository may be
mined, and controlled to some extent, to swerve public opinion in a candidate's favor
(election strategy) or evaluate a product's performance (marketing and sales strategy)
2. Multimedia Information Retrieval

Storage, indexing, search, and delivery of multimedia data such as images, videos, sounds,
3D graphics or their combination. By definition, it includes works on, for example,
extracting descriptive features from images, reducing high-dimensional indexes into low-
dimensional ones, defining new similarity metrics, efficient delivery of the retrieved data,
and so forth. Systems that provide all or part of the above functionalities are multimedia
retrieval systems.
The Google image search engine is a typical example of such a system. A video-
on-demand site that allows people to search movies by their titles is another example

DATABASE USERS AND ADMINISTRATORS:

A primary goal of a database system is to retrieve information from and store new information
into the database. People who work with a database can be categorized as database users or database
administrators.
Database Users and User Interfaces
There are four different types of database-system users, differentiated by the way they expect to
interact with the system. Different types of user interfaces have been designed for the different types of
users.
• Naive users are unsophisticated users who interact with the system by invoking one of the
application programs that have been written previously.
o Bank tellers check account balances and post withdrawals and deposits.
o Reservation agents for airlines, hotels, and car rental companies check availability for a
given request and make reservations.
o Employees at receiving stations for shipping companies enter package identifications via
bar codes and descriptive information through buttons to update a central database of
received and in-transit packages.
o As another example, consider a student, who during class registration period, wishes to
register for a class by using a Web interface. Such a user connects to a Web application
program that runs at a Web server. The application first verifies the identity of the user,
and allows her to access a form where she enters the desired information. The form
information is sent back to the Web application at the server, which then determines if
there is room in the class (by retrieving information from the database) and if so adds the
student information to the class master in the database.
• Application programmers are computer professionals who write application programs.
Application programmers can choose from many tools to develop user interfaces. Rapid
application development (RAD) tools are tools that enable an application programmer to
construct forms and reports with minimal programming effort.
• Sophisticated users interact with the system without writing programs. Instead, they form their
requests either using a database query language or by using tools such as data analysis software.
Analysts who submit queries to explore data in the database fall in this category.
• Specialized users are sophisticated users who write specialized database applications that do not
fit into the traditional data-processing framework. Among these applications are computer-aided
design systems, knowledge-base and expert systems, systems that store data with complex data
types (for example, graphics data and audio data), and environment-modeling systems.

• Database Administrator One of the main reasons for using DBMSs is to have central control of
both the data and the programs that access those data. A person who has such central control
over the system is called a database administrator (DBA). The functions of a DBA include:
• Schema definition. The DBA creates the original database schema by executing a set of
data definition statements in the DDL.
• Storage structure and access-method definition.
• Schema and physical-organization modification. The DBA carries out changes to the
schema and physical organization to reflect the changing needs of the organization, or to
alter the physical organization to improve performance.
• Granting of authorization for data access. By granting different types of authorization,
the database administrator can regulate which parts of the Database various users can
access. The authorization information is kept in a special system structure that the database
system consults whenever someone attempts to access the data in the system.
• Routine maintenance. Examples of the database administrator’s routine maintenance
activities are:
o Periodically backing up the database, either onto tapes or onto remote servers, to
prevent loss of data in case of disasters such as flooding.
o Ensuring that enough free disk space is available for normal operations, and
upgrading disk space as required.
o Monitoring jobs running on the database and ensuring that performance is not
degraded by very expensive tasks submitted by some users.

As a whole, the DBA jobs are


• Creating primary database storage structures
• Modifying the structure of the database
• Monitoring database performance and efficiently
• Transferring data between the database and external file
• Monitoring and reestablishing database consistency
• Controlling and monitoring user access to the database
• Manipulating the physical location of the database.
HISTORY OF DATABASE SYSTEMS:

1950s and early 1960s:

Data processing using magnetic tapes for storage


-Tapes provide only sequential access
Punched cards for input

Late 1960s and 1970s:


Hard disks allow direct access to data
Network and hierarchical data models in widespread use
Ted Codd defines the relational data model
-Would win the ACM Turing Award for this work
-IBM Research begins System R prototype
-UC Berkeley begins Ingres prototype
High-performance (for the era) transaction processing

1980s:
Research relational prototypes evolve into commercial systems
-SQL becomes industrial standard
Parallel and distributed database systems
Object-oriented database systems
1990s:
Large decision support and data-mining applications
Large multi-terabyte data warehouses
Emergence of Web commerce
2000s:
XML and XQuery standards
Automated database administration

Information processing drives the growth of computers, as it has from the earliest days of
commercial computers. In fact, automation of data processing tasks predates computers. Punched cards,
invented by Herman Hollerith, were used at the very beginning of the twentieth century to record U.S.
census data, And Mechanical systems were used to process the cards and tabulate results. Punched cards
were later widely used as a means of entering data into computers. Techniques for data storage and
processing have evolved over the years:
• 1950s and early 1960s:
Magnetic tapes were developed for data storage. Data processing tasks such as payroll were
automated, with data stored on tapes. Processing of data consisted of reading data from one or more
tapes and writing data to a new tape. Data could also be input from punched card decks, and output to
printers. For example, salary raises were processed by entering the raises on punched cards and reading
the punched card deck in synchronization with a tape containing the master salary details. The records
had to be in the same sorted order. The salary raises would be added to the salary read from the master
tape, and written to a new tape; the new tape would become the new master tape. Tapes (and card decks)
could be read only sequentially, and data sizes were much larger than main memory; thus, data
processing programs were forced to process data in a particular order, by reading and merging data from
tapes and card decks.
• Late 1960s and 1970s:
Wide spread use of hard disks in the late1960s changed the scenario for data processing greatly,
since hard disks allowed direct access to data. The position of data on disk was immaterial, since any
location on disk could be accessed in just tens of milliseconds. Data were thus freed from the tyranny of
sequentially. With disks, network and hierarchical databases could be created that allowed data
structures such as lists and trees to be stored on disk. Programmers could construct and manipulate these
data structures.
A landmark paper by Codd [1970] defined the relational model and nonprocedural ways of
querying data in the relational model, and relational databases were born. The simplicity of the relational
model and the possibility of hiding implementation details completely from the programmer were
enticing indeed. Codd later won the prestigious Association of Computing Machinery Turing Award for
his work.
•1980s:
Although academically interesting, the relational model was not used in practice initially,
because of its perceived performance disadvantages; relational databases could not match the
performance of existing network and hierarchical databases. That changed with System R, a ground
breaking project at IBM Research that developed techniques for the construction of an efficient
relational database system. Excellent overviews of System R are provided by Astrahan et al. [1976] and
Chamberlin et al. [1981]. The fully functional System R prototype led to IBM’s first relational database
product, SQL/DS. At the same time, the Ingres system was being developed at the University of
California at Berkeley. It led to a commercial product of the same name. Initial commercial relational
database systems, such as IBM DB2, Oracle, Ingres, and DEC Rdb, played a major role in advancing
techniques for efficient processing of declarative queries.
By the early 1980s, relational databases had become competitive with network and hierarchical
database systems even in the area of performance. Relational databases were so easy to use that they
eventually replaced network and hierarchical databases; programmers using such databases were forced
to deal with many low-level implementation details, and had to code their queries in a procedural
fashion. Most importantly, they had to keep efficiency in mind when designing their programs, which
involved a lot of effort.
In contrast, in a relational database, almost all these low-level tasks are carried out automatically
by the database, leaving the programmer free to work at a logical level. Since attaining dominance in the
1980s, the relational model has reigned supreme among data models. The 1980s also saw much research
on parallel and distributed databases, as well as initial work on object-oriented databases.
• Early 1990s:
The SQL language was designed primarily for decision support applications, which are query-
intensive, yet the main stay of databases in the 1980s was transaction-processing applications, which are
update-intensive. Decision support and querying re-emerged as a major application area for databases.
Tools for analyzing large amounts of data saw large growths in usage. Many database vendors
introduced parallel database products in this period. Database vendors also began to add object-
relational support to their databases.
• 1990s:
The major event of the 1990s was the explosive growth of the World Wide Web. Databases were
deployed much more extensively than ever before. Database systems now had to support very high
transaction-processing rates, as well as very high reliability and 24×7 availability (availability 24 hours a
day, 7 days a week, meaning no downtime for scheduled maintenance activities).Database systems also
had to support Web interfaces to data.
• 2000s:
The first half of the 2000s saw the emerging of XML and the associated query language XQuery
as a new database technology. Although XML is widely used for data exchange, as well as for storing
certain complex data types, relational databases still form the core of a vast majority of large-scale
database applications. In this time period we have also witnessed the growth in “autonomic-
computing/auto-admin” techniques for minimizing system administration effort. This period also saw a
significant growth in use of open-source database systems, particularly PostgreSQL and MySQL. The
latter part of the decade has seen growth in specialized databases for data analysis, in particular column-
stores, which in effect store each column of a table as a separate array, and highly parallel database
systems designed for analysis of very large data sets. Several novel distributed data-storage systems
have been built to handle the data management requirements of very large Web sites such as Amazon,
Facebook, Google, Microsoft and Yahoo!, and some of these are now offered as Web services that can
be used by application developers. There has also been substantial work on management and analysis of
streaming data, such as stock-market ticker data or computer network monitoring data. Data-mining
techniques are now widely deployed; example applications include Web-based product-recommendation
systems and automatic placement of relevant advertisements on Web pages.
-------
When Not to Use a DBMS
In spite of the advantages of using a DBMS, there are a few situations in which a DBMS may
involve unnecessary overhead costs that would not be incurred in traditional file processing. The
overhead costs of using a DBMS are due to the following:
• High initial investment in hardware, software, and training
• The generality that a DBMS provides for defining and processing data
• Overhead for providing security, concurrency control, recovery, and integrity functions
Therefore, it may be more desirable to use regular files under the following
circumstances:
• Simple, well-defined database applications that are not expected to change at all
• Stringent, real-time requirements for some application programs that may not be met
because of DBMS overhead.
• Embedded systems with limited storage capacity, where a general-purpose DBMS would
not fit.
• No multiple-user access to data
Certain industries and applications have elected not to use general-purpose DBMSs.
For example, many computer-aided design (CAD) tools used by mechanical and civil
engineers have proprietary file and data management software that is geared for the internal
manipulations of drawings and 3D objects. Similarly, communication and switching systems
designed by companies like AT&T were early manifestations of database software that was
made to run very fast with hierarchically organized data for quick access and routing of calls.
Similarly, GIS implementations often implement their own data organization schemes for
efficiently implementing functions related to processing maps, physical contours, lines,
polygons, and so on. General-purpose DBMSs are inadequate for their purpose.

You might also like