Rdbms Unit I
Rdbms Unit I
Rdbms Unit I
Data is stored raw facts ( or real world facts) that can be processed for any computing machine.
Data is collection of facts, which is in unorganized but they can be organized into useful form. Data may be
numerical data which may be integers or floating point numbers and non-numerical data such as characters, date
and etc., Data is of two types :
Raw data : it is a data which are collected from different sources and has no meaning.
Derived data : it is a data that are extracted from raw data and used for getting useful information.
Example: The above numbers may be anything: It may be distance in kms or amount in rupees or no of days
or marks in each subject etc.,
Information:
Information is data that has been converted into more useful or intelligible form. Example is Student
Mark Sheet.
Information is RELATED DATA. The data (information) which is used by an organization – a college, a
library, a bank, a manufacturing company – is one of its most valuable resources.
Knowledge:
Human mind purposefully organized the information and evaluate it to produce knowledge.
Database :
Databases and database systems have become an essential component of everyday life in modern
society. In the course of a day, most of us encounter several activities that involve some interaction
with a database.
For example, if we go to the bank to deposit or withdraw funds, if we make a hotel or airline
reservation, if we access a computerized library catalog to search for a bibliographic item, or if we buy
some item-such as a book, toy, or computer-from an Internet vendor through its Web page, chances are
that our activities will involve someone or some computer program accessing a database. Even
purchasing items from a supermarket nowadays in many cases involves an automatic update of the
database that keeps the inventory of supermarket items.
• These interactions are examples of what we may call traditional database applications,
in which most of the information that is stored and accessed is either textual or numeric.
• In the past few years, advances in technology have been leading to exciting new
applications of database systems. Multimedia databases can now store pictures, video
clips, and sound messages.
• Geographic information systems (CIS) can store and analyze maps, weather data, and
satellite images.
• Data warehouses and online analytical processing (OLAP) systems are used in many
companies to extract and analyze useful information from very large databases for
decision making.
• Real-time and active database technology is used in controlling industrial and
manufacturing processes.
• And database search techniques are being applied to the World Wide Web to improve
the search for information that is needed by users browsing the Internet.
Databases and database technology are having a major impact on the growing use of computers.
It is fair to say that databases play a critical role in almost all areas where computers are used, including
business, electronic commerce, engineering, medicine, law, education, and library science, to name a
few.
Database
A database is a collection of related data. By data, we mean known facts that can be recorded
and that have implicit and useful meaning.
For example, consider the names, telephone numbers, and addresses of the people you know.
You may have recorded this data in an indexed address book, or you may have stored it on a hard drive,
using a personal computer and software such as Microsoft Access, or Excel. This is a collection of
related data with an implicit meaning and hence is a database.
Database-Management System:
A database-management system (DBMS) is a collection of interrelated data and a set of
programs to access those data. The collection of data, usually referred to as the database, contains
information relevant to an enterprise. The primary goal of a DBMS is to provide a way to store and
retrieve database information that is both convenient and efficient.
Database systems are designed to manage large bodies of information. Management of data
involves both defining structures for storage of information and providing mechanisms for the
manipulation of information.
In addition, the database system must ensure the safety of the information stored, despite system
crashes or attempts at unauthorized access. If data are to be shared among several users, the system must
avoid possible anomalous results.
Database systems are basically developed for large amount of data. When dealing with huge
amount of data, there are two things that require optimization: Storage of data and retrieval of data.
Storage: According to the principles of database systems, the data is stored in such a way
that it acquires lot less space as the redundant data (duplicate data) has been removed before
storage. Let’s take a layman example to understand this:
In a banking system, suppose a customer is having two accounts, one is saving account
and another is salary account. Let’s say bank stores saving account data at one place (these
places are called tables we will learn them later) and salary account data at another place, in that
case if the customer information such as customer name, address etc. are stored at both places
then this is just a wastage of storage (redundancy/ duplication of data), to organize the data in a
better way the information should be stored at one place and both the accounts should be linked
to that information somehow. The same thing we achieve in DBMS.
Fast Retrieval of data: Along with storing the data in an optimized and systematic
manner, it is also important that we retrieve the data quickly when needed. Database systems
ensure that the data is retrieved as quickly as possible.
Database-System Applications
Databases are widely used. Here are some representative applications:
Enterprise Information
Sales: For customer, product, and purchase information.
Accounting: For payments, receipts, account balances, assets and other accounting information.
Human resources: For information about employees, salaries, payroll taxes, and benefits, and
for generation of paychecks.
Manufacturing: For management of the supply chain and for tracking production of items in
factories, inventories of items in warehouses and stores, and orders for items.
Online retailers: For sales data noted above plus online order tracking, generation of
recommendation lists, and maintenance of online product evaluations.
Banking and Finance
Banking: For customer information, accounts, loans, and banking transactions.
Credit card transactions: For purchases on credit cards and generation of monthly statements.
Finance: For storing information about holdings, sales, and purchases of financial instruments
such as stocks and bonds; also for storing real-time market data to enable online trading by
customers and automated trading by the firm.
Universities: For student information, course registrations, and grades (in addition to standard enterprise
information such as human resources and accounting).
Airlines: For reservations and schedule information. Airlines were among the first to use databases in a
geographically distributed manner.
Telecommunication: For keeping records of calls made, generating monthly bills, maintaining balances
on prepaid calling cards, and storing information about the communication networks.
Overall, Traditional File Processing Systems was good in many cases in compare to manual non
computer based system but still it had many disadvantages that were overcome by Data Base
Management System.
Keeping the information of an organization in a file processing system has a number of
disadvantages, namely
FILE MANAGEMENT SYSTEM PROBLEMS
• Data Redundancy and Inconsistency: Since the files and applications programs are created by
different programmers over a long period, the various files are likely to have different formats
and the programs may be written in several programming languages. Moreover, the same
information may be duplicated in several places. This redundancy leads too higher storage and
access cost. In addition, it may lead to data inconsistency.
• Difficulty in Accessing Data: The file processing system does not allow needed data to be
retrieved in a convenient and efficient manner.
• Data Isolation: I a file processing system, as the data are scattered in various files, and files may
be in different formats. It is very difficult to write new application programs to retrieve the
appropriate data.
• Integrity problems: The data values stored in the database must satisfy certain types of
consistency Constraints (Conditions).
For example, the minimum balance in a bank account may never fall
below an amount of Rs. 500. Developers enforce these constraints in the system by
adding appropriate code in the application programs. However, when new
constraints are added, it is difficult to change the application programs to enforce
them.
• Atomicity problems. A computer system, like any other device, is subject to failure. In many
applications, it is crucial that, if a failure occurs, the data be restored to the consistent state that
existed prior to the failure.
Consider a program to transfer Rs.500 from the account balance of
department A to the account balance of department B. If a system failure occurs
during the execution of the program, it is possible that the Rs.500 was removed
from the balance of department A but was not credited to the balance of department
B, resulting in an inconsistent database state. Clearly, it is essential to database
consistency that either both the credit and debit occur, or that neither occur. That is,
the funds transfer must be atomic — it must happen in its entirety or not at all. It is
difficult to ensure atomicity in a conventional file-processing system.
• Concurrent-access anomalies. For the sake of overall performance of the system and faster
response, many systems allow multiple users to update the data simultaneously
• Security problems: Not every user of the database system should be able to access all the data.
For example, in a banking system, payroll personnel need to see only that
part of the database that has information about various bank employees. They do
not need access to information about customer accounts.
In the file processing systems, as the application programs are added to the system in an adhoc
manner, it is difficult to enforce security.
The above disadvantages can be overcome by use of DBMS and it provides the following
advantages.
1. Provides for mass storage of relevant data.
2. Make easy access of the data to user.
3. Allows for the modification of data in a consistent manner.
4. Allows multiple users to be active at a time
5. Eliminate or reduce the redundant data.
6. Provide prompt response to the users request for data.
7. Supports Backup and recovery of data.
8. Protect data from physical hardware failure and unauthorized access.
9. Constraints can be set to database to maintain data integrity.
Sharing of Data: In a paper-based record keeping, data cannot be shared among many users. But in
computerized DBMS, many users can share the same database if they are connected via a network.
Data Integrity: We can maintain data integrity by specifying integrity constrains, which are rules
and restrictions about what kind of data may be entered or manipulated within the database. This
increases the reliability of the database as it can be guaranteed that no wrong data can exist within
the database at any point of time.
Data independence: Application programs should be as independent as possible from details of data
representation and storage. The DBMS can provide an abstract view of the data to insulate
application code from such details.
Efficient data access: A DBMS utilizes a variety of sophisticated techniques to store and retrieve
data efficiently. This feature is especially important if the data is stored on external storage devices.
Data integrity and security: If data is always accessed through the DBMS, the DBMS can enforce
integrity constraints on the data. For example, before inserting salary information for an employee,
the DBMS can check that the department budget is not exceeded. Also, the DBMS can enforce
access controls that govern what data is visible to different classes of users.
Data administration: When several users share the data, centralizing the administration of data can
offer significant improvements. Experienced professionals who understand the nature of the
data being managed, and how different groups of users use it, can be responsible for
organizing the data representation to minimize redundancy and fine-tuning the storage of the data to
make retrieval efficient.
Concurrent access and crash recovery: A DBMS schedules concurrent accesses to the data in such
a manner that users can think of the data as being accessed by only one user at a time. Further, the
DBMS protects users from the effects of system failures.
Reduced application development time: Clearly, the DBMS supports many important functions
that are common to many applications accessing data stored in the DBMS. This, in conjunction with
the high-level interface to the data, facilitates quick development of applications. Such
applications are also likely to be more robust than applications developed from scratch because many
important tasks are handled by the DBMS instead of being implemented by the application.
DISADVANTAGES OF A DBMS
Danger of a Overkill: For small and simple applications for single users a database system is often not
advisable.
Complexity: A database system creates additional complexity and requirements. The supply and
operation of a database management system with several users and databases is quite costly and
demanding.
Qualified Personnel: The professional operation of a database system requires appropriately trained staff.
Without a qualified database administrator nothing will work for long.
Costs: Through the use of a database system new costs are generated for the system itselfs but also for
additional hardware and the more complex handling of the system.
Lower Efficiency: A database system is a multi-use software which is often less efficient than specialised
software which is produced and optimised exactly for one problem.
VIEW OF DATA:
A DBMS is a collection of interrelated files and set of programs which allows the users to access
and modify these files.
Data Abstraction
A major purpose of a database system is to provide users with an abstract view of the data. That
is, the system hides certain details of how the data are stored and maintained. It is called data abstraction.
Level of Abstraction: basically, Abstraction can be divided in to 3 levels. They are
1. Physical Level : The lowest of abstraction describes how the data are actually stored. At the
physical level, complex low-level data structures are described in detail.
2. Logical Level (Conceptual Level) : This next higher level of abstraction describes what
data are stored in the database, and what relationship exist among those data. This level of abstraction is
used by Database Administrators (DBA), Who must decide what information is to be kept in the
database.
3. View Level : This Highest level of abstraction describes only part of the entire database. The
use of simpler structures at the logical level, some complexity remains, because of the large databases.
Many users of the database system will not be concerned with all this information. Such users need to
access only a Part of the database. So that their interaction with the system is simplified, the view level
of abstraction is defined. The system may provide views for the same database.
An analogy to the concept of data types in programming languages may clarify the distinction
among levels of abstraction. Many high-level programming languages support the notion of a structured
type. For example, we may describe a record as follows:
type instructor = record
ID : char (5);
name : char (20);
dept name : char (20);
salary : numeric (8,2);
end;
This code defines a new record type called instructor with four fields. Each field has a name and
a type associated with it. A university organization may have several such record types, including
department, with fields dept name, building, and budget
course, with fields course id, title, dept name, and credits
student, with fields ID, name, dept name, and tot cred
At the physical level, an instructor, department, or student record can be de-scribed as a block of
consecutive storage locations. The compiler hides this level of detail from programmers. Similarly, the
database system hides many of the lowest-level storage details from database programmers. Database
administrators, on the other hand, may be aware of certain details of the physical organization of the data.
At the logical level, each such record is described by a type definition, as in the previous code
segment, and the interrelationship of these record types is defined as well. Programmers using a
programming language work at this level of abstraction. Similarly, database administrators usually work
at this level of abstraction.
Finally, at the view level, computer users see a set of application programs that hide details of
the data types. At the view level, several views of the database are defined, and a database user sees
some or all of these views. In addition to hiding details of the logical level of the database, the views
also provide a security mechanism to prevent users from accessing certain parts of the database. For
example, clerks in the university registrar office can see only that part of the database that has
information about students; they cannot access information about salaries of instructors.
Instances
The collection of information stored in the database at a particular moment is called an instance of the
database. It is also called as snapshot or set of occurrence or current state of the database.
Schemas
The overall design of the database is called the database schema. Schemas are changed infrequently, if
at all.
The concept of database schemas and instances can be understood by analogy to a program
written in a programming language. A database schema corresponds to the variable declarations (along
with associated type definitions) in a program. Each variable has a particular value at a given instant. The
values of the variables in a program at a point in time correspond to an instance of a database schema. In
general, database system supports one physical schema, one logical schema and several subschema’s.
Database systems have several schemas, partitioned according to the levels of abstraction.
• The physical schema describes the database design at the physical level,
• The logical schema describes the database design at the logical level.
• A database may also have several schemas at the view level, sometimes called
subschemas, that describe different views of the database.
DATA INDEPENDENCE
The three-schema architecture can be used to further explain the concept of data independence,
which can be defined as the capacity to change the schema at one level of a database system without
having to change the schema at the next higher level.
We can define two types of data independence:
1. Logical data independence is the capacity to change the conceptual schema without having
to change external schemas or application programs. We may change the conceptual schema to
expand the database (by adding a record type or data item), to change constraints, or to reduce
the database (by removing a record type or data item).
Only the view definition and the mappings need be changed in a DBMS that supports logical
data independence. After the conceptual schema undergoes a logical reorganization, application
programs that reference the external schema constructs must work as before. Changes to
constraints can be applied to the conceptual schema without affecting the external schemas or
application programs.
2. Physical data independence is the capacity to change the internal schema without having to
change the conceptual schema. Hence, the external schemas need not be changed as well.
Changes to the internal schema may be needed because some physical files had to be
reorganized.
For example, by creating additional access structures to improve the performance of
retrieval or update. If the same data as before remains in the database, we should not have to
change the conceptual schema.
Whenever we have a multiple-level DBMS, its catalog must be expanded to include information
on how to map requests and data among the various levels. The DBMS uses additional software to
accomplish these mappings by referring to the mapping information in the catalog.
Data independence occurs because when the schema is changed at some level, the schema at the
next higher level remains unchanged; only the mapping between the two levels is changed. Hence,
application programs referring to the higher-level schema need not be changed. The three-schema
architecture can make it easier to achieve true data independence, both physical and logical. However, the
two levels of mappings create an overhead during compilation or execution of a query or program,
leading to inefficiencies in the DBMS. Because of this, few DBMSs have implemented the full three-
schema architecture.
DATA MODELS:
A collection of tools for describing Data, Data relationships,Data semantics and Data constraints.
The data models can be classified into four different categories:
Relational model
Entity-Relationship data model (mainly for database design)
Object-based data models (Object-oriented and Object-relational)
Semi-structured data model (XML)
Other older models:
• Network model
• Hierarchical model
Relational Model
The relational model is currently the most popular data model in the database management
systems. The popularity is because of simplicity and understandability. This data model is developed by
E.F.Codd in 1970 which is based on relation, two dimensional table.
The relational data model uses a collection of tables (also called as relation) to both data and the
relationships among those data. Each table has multiple columns and each column has unique name. A
relation consists of rows and columns. The row in table (relation) is called as Tuple and column name
are known as attribute.
Ex : Customer Table
Advantages
1. In this model, data redundancy is controlled to a greater extent
2. The relational data model allows many-to-many relationships.
3. The relational data model structures are very simple and easy to build
4. Faster access of data is possible and storage space required is greatly
reduced.
Entity-Relationship Model
The entity-relationship (E-R) data model uses a collection of basic objects, called entities,
and relationships among these objects. An entity is a “thing” or “object” in the real world that is
distinguishable from other objects. The entity-relationship model is widely used in database design.
It is a high level conceptual data model that describes the structure of database in terms of
entities, relationship among entities & constraints on
them.
Basic Concepts of E-R Model:
Entity
Entity Set
Attributes
Relationship
Relationship set
Identifying Relationship
which is made up of components. Some of they are
• Rectangles : Which represent entity sets.
• Ellipses : Which represent attributes
• Diamonds : Which represent relationship sets
• Lines : Which link attributes to entity sets and entity sets to relationship sets.
Object-Based Data Model
Object-oriented programming (especially in Java, C++, or C#) has become the dominant
software-development methodology. This led to the development of an object-oriented data model that
can be seen as extending the E-R model with notions of encapsulation, methods (functions), and object
identity. The object-relational data model combines features of the object-oriented data model and
relational data model.
Historically, the network data model and the hierarchical data model preceded the
relational data model. These models were tied closely to the underlying implementation, and
complicated the task of modeling data. As a result they are used little now, except in old database
code that is still in service in some places.
Network Data Model:
Data in the network model are represented by collections of records and relationships among
data are represented by links,which can be viewed as pointers. The records in the database can be
organized as a collection of arbitrary graphs. The Network data model is similar to Hierarchical model
except that one data can have more than one parent. Any record in the database is allowed to own sets of
other type of record.
Advantages
o It can be used to represent many-to-many relationships
o It offers integration of data
o The storage space is reduced considerably due to less redundancy
o It provides faster access of data.
A hierarchical database model is a data model in which the data is organized into a tree-like
structure. The data is stored as records which are connected to one another through links. A record is a
collection of fields, with each field containing only one value.
In this model the relationship among the data is represented by records and links. It consists of
records which are connected to another through links. A link can be defined as an association between
two records. This hierarchical data model can do considered as an upside –down tree, with the highest
level of tree kept as root.
Advantages
o The hierarchical model, allows one-to-one-and one-to-many relationships.
o The model has got the ability to handle large amount of data.
Disadvantages
o The model involves with complicated querying.
o As duplication of data takes place, there is wastage of storage space.
o During updating of data inconsistency exists.
o The model does not allow many-to-many relationships.
DATA BASE LANGUAGES:
A database system provides two different types of Languages, one will specify the schema, and
other will express database queries and updates. They are
Referential Integrity: There are cases where we wish to ensure that a value that appears in one
relation for a given set of attributes also appears in a certain set of attributes in another relation
(referential integrity).
Assertions: An assertion is any condition that the database must always satisfy. Domain
constraints and referential-integrity constraints are special forms of assertions.
For example, “Every department must have at least five courses offered every
semester” must be expressed as an assertion. When an assertion is created, the system tests it for
validity. If the assertion is valid, then any future modification to the database is allowed only if it
does not cause that assertion to be violated.
Authorization: We may want to differentiate among the users as far as the type of access they are
permitted on various data values in the database. These differentiations are expressed in terms of
authorization, the most common being:
read authorization, which allows reading, but not modification, of data;
insert authorization, which allows insertion of new data, but not modification of
existing data;
update authorization, which allows modification, but not deletion, of data; and
delete authorization, which allows deletion of data. We may assign the user all,
none, or a combination of these types of authorization.
The DDL commands are
• To create the database instance – CREATE
• To alter the structure of database – ALTER
• To drop database instances – DROP
• To delete tables in a database instance – TRUNCATE
• To rename database instances – RENAME
All these commands specify or update the database schema that’s why they come under Data
Definition language.
o Used by the DBA and database designers to specify the conceptual schema of a database.
o In many DBMSs, the DDL is also used to define internal and external schemas (views).
o In some DBMSs, separate storage definition language (SDL) and view definition language
(VDL) are used to define internal and external schemas.
o SDL is typically realized via DBMS commands provided to the DBA and database designers
o DDL compiler generates a set of tables stored in a data dictionary
o Data dictionary contains metadata (i.e., data about data)
A DML is language which enables users to access or manipulate data. There are basically two types.
• Procedural DML: This requires a user to specify what data are needed and how to get those
data from existing database.
• Non procedural DML: Which require a user to specify what data are needed ‘without’
specifying how to get those data.
•
Non procedural DMLs are usually easier to learn and use than procedural DMLs. A user does not
have to specify how to the data, these languages may generate code that is not as that produced by
Procedural DML. Hence we can make remedy this difficulty by various optimization techniques.
A Query is a statement, a request for retrieval information. The portion of a DML, that involves
information retrieval is called a ‘Query Language’.
This query in the SQL language finds the name of the customer whose customer-id is 192-83-7465:
Select customer.customer-name from customer wherecustomer.customer-id = 192-83-7465
The query specifies that those rows from the table customer where the customer-id is 192-83-
7465 must be retrieved, and the customer-name attribute of these rows must be displayed.
Queries may involve information from more than one table. For instance, the following query finds the
balance of all accounts owned by the customer with customerid 192-83-7465.
Select account.balance from depositor, account where depositor.customer-id = 192-83-7465
and depositor.account-number= account.account-number
There are a number of database query languages in use, either commercially or experimentally.
The levels of abstraction apply not only to defining or structuring data, but also to manipulating data.
At the physical level, we must define algorithms that allow efficient access to data. At higher levels of
abstraction, we emphasize ease of use. The goal is to allow humans to interact efficiently with the
system. The query processor component of the database system translates DML queries into sequences
of actions at the physical level of the database system.
• Data Control language (DCL): DCL is used for granting and revoking user access on a
database
In practical data definition language, data manipulation language and data control languages are
not separate language; rather they are the parts of a single database language such as SQL.
DATA DICTIONARY
We can define a data dictionary as a DBMS component that stores the definition of data
characteristics and relationships. You may recall that such “data about data” were labeled metadata. The
DBMS data dictionary provides the DBMS with its self describing characteristic. In effect, the data
dictionary resembles and X-ray of the company’s entire data set, and is a crucial element in the data
administration function.
The two main types of data dictionary exist, integrated and stand alone.
An integrated data dictionary is included with the DBMS. For example, all relational
DBMSs include a built in data dictionary or system catalog that is frequently accessed and
updated by the RDBMS.
Other DBMSs – Stand alone especially older types, do not have a built in data dictionary
instead the DBA may use third party stand alone data dictionary systems.
Data dictionaries can also be classified as active or passive.
An active data dictionary is automatically updated by the DBMS with every database
access, thereby keeping its access information up-to-date.
A passive data dictionary is not updated automatically and usually requires a batch
process to be run. Data dictionary access information is normally used by the DBMS for query
optimization purpose.
The data dictionary’s main function is to store the description of all objects that interact with the
database. Integrated data dictionaries tend to limit their metadata to the data managed by the DBMS.
Stand alone data dictionary systems are more usually more flexible and allow the DBA to describe and
manage all the organization’s data, whether or not they are computerized. Whatever the data dictionary’s
format, its existence provides database designers and end users with a much improved ability to
communicate. In addition, the data dictionary is the tool that helps the DBA to resolve data conflicts.
Although, there is no standard format for the information stored in the data dictionary several
features are common. For example, the data dictionary typically stores descriptions of all:
• Data elements that are define in all tables of all databases. Specifically the data dictionary stores
the name, data types, display formats, internal storage formats, and validation rules. The data
dictionary tells where an element is used, by whom it is used and so on.
• Tables define in all databases. For example, the data dictionary is likely to store the name of the
table creator, the date of creation access authorizations, the number of columns, and so on.
• Indexes define for each database tables. For each index the DBMS stores at least the index name
the attributes used, the location, specific index characteristics and the creation date.
• Define databases: who created each database, the date of creation where the database is located,
who the DBA is and so on.
• End users and The Administrators of the data base
• Programs that access the database including screen formats, report formats application formats,
SQL queries and so on.
• Access authorization for all users of all databases.
• Relationships among data elements which elements are involved: whether the relationship
are mandatory or optional, the connectivity and cardinality and so on.
If the data dictionary can be organized to include data external to the DBMS itself, it becomes an
especially flexible to for more general corporate resource management. The management of such an
extensive data dictionary, thus, makes it possible to manage the use and allocation of all of the
organization information regardless whether it has its roots in the database data.
RELATIONAL DATABASES :
A relational database is based on the relational model and uses a collection of tables to represent
both data and the relationships among those data. It also includes a DML and DDL.
The purpose of the relational model is to provide a declarative method for specifying data and
queries: users directly state what information the database contains and what information they want from
it, and let the database management system software take care of describing data structures for storing
the data and retrieval procedures for answering queries.
Relational Database: One of the major advantages of using a relational database is its structural
flexibility. It allows the users to retrieve the data in any combination
A relation is a two-dimensional array, consisting of horizontal rows and vertical columns. Each
row, column ie a cell contains a unique value and no two rows are identical with respect to one another.
Columns are always self-consistent in the sense that it has the same meaning in every row. This
means that the database management system (DBMS) is not concerned with its appearance, either first
or next. The table will be processed the same way, regardless of the order of the columns.
Relations are commonly referred as tables.. Every column in a database table acts as attribute
since the meaning of the column is same for every row of the database .A row consists of a set of fields
and hence commonly referred as a record.
Properties of Relational Database: The important properties of a relational database are listed
below:
1. A relational database is a collection of relations.
2. The database tables have a row column format.
3. Operators are available either to join or separate columns of the database table.
4. Relations are formed with respect to data only.
5. The tables can be accessed by using simple non-procedural statements.
6. The data is fully independent, that is it will be the same irrespective of the access path used.
DATABASE DESIGN:
Database systems are designed to manage large bodies of information. These large bodies of
information do not exist in isolation. They are part of the operation of some enterprise whose end
product may be information from the database or may be some device or service for which the database
plays only a supporting role.
A high-level data model provides the database designer with a conceptual frame-work in which to
specify the data requirements of the database users, and how the database will be structured to fulfill
these requirements. The initial phase of database design, then, is to characterize fully the data needs of
the prospective database users. The database designer needs to interact extensively with domain experts
and users to carry out this task. The outcome of this phase is a specification of user requirements.
Design Process:
The database development life cycle has a number of stages that are followed when developing
database systems. The steps in the development life cycle do not necessary have to be followed
religiously in a sequential manner.
On small database systems, the database system development life cycle is usually very simple
and does not involve a lot of steps.
In order to fully appreciate the above diagram, let's look at the individual components listed in each step.
Requirements analysis
• Planning - This stages concerns with planning of entire Database Development Life-
Cycle. It takes into consideration the Information Systems strategy of the
organization.
• System definition - This stage defines the scope and boundaries of the proposed database
system.
Database designing
The process of moving from an abstract data model to the implementation of the database proceeds in
two final design phases.
In the logical-design phase, the designer maps the high-level conceptual schema onto the
implementation data model of the database system that will be used.
The designer uses the resulting system-specific database schema in the subsequent physical-design
phase, in which the physical features of the database are specified.
Implementation
A fully developed conceptual schema indicates the functional requirements of the enterprise. In a
specification of functional requirements, users describe the kinds of operations (or transactions) that
will be performed on the data. Example operations include modifying or updating data, searching for
and retrieving specific data, and deleting data. At this stage of conceptual design, the designer can
review the schema to ensure it meets functional requirements.
• Data conversion and loading - this stage is concerned with importing and converting data from
the old system into the new database.
• Testing - this stage is concerned with the identification of errors in the newly implemented
system .It checks the database against requirement specifications.
To illustrate the design process, let us examine how a database for a university could be designed.
The initial specification of user requirements may be based on interviews with the database users, and
on the designer’s own analysis of the organization. The description that arises from this design phase
serves as the basis for specifying the conceptual structure of the database. Here are the major
characteristics of the university.
The university is organized into departments. Each department is identified by a unique name (dept
name), is located in a particular building, and has a budget.
Each department has a list of courses it offers. Each course has associated with it a course id, title,
dept name, and credits, and may also have have associated prerequisites.
Instructors are identified by their unique ID. Each instructor has name, associated department (dept
name), and salary.
Students are identified by their unique ID. Each student has a name, an associated major department
(dept name), and tot cred (total credit hours the student earned thus far).
The university maintains a list of classrooms, specifying the name of the building, room number, and
room capacity.
The university maintains a list of all classes (sections) taught. Each section is identified by a course
id, sec id, year, and semester, and has associated with it a semester, year, building, room number,
and time slot id (the time slot when the class meets).
The department has a list of teaching assignments specifying, for each instructor, the sections the
instructor is teaching.
The university has a list of all student course registrations, specifying, for each student, the courses
and the associated sections that the student has taken (registered for).
1. Normalization
2. ER Modeling
NORMALIZATION :
Another method for designing a relational database is to use a process commonly known as
normalization. The goal is to generate a set of relation schemas that allows us to store information
without unnecessary redundancy, yet also allows us to retrieve information easily. The approach is to
design schemas that are in an appropriate normal form. To determine whether a relation schema is in one
of the desirable normal forms, we need additional information about the real-world enterprise that we
are modeling with the database. The most common approach is to use functional dependencies.
To understand the need for normalization, let us look at what can go wrong in a bad database
design. Among the undesirable properties that a bad design may have are:
• Repetition of information
• Inability to represent certain information
Normalization is a process of organizing the data in database to avoid data redundancy, insertion
anomaly, update anomaly & deletion anomaly. Let’s discuss about anomalies first then we will discuss
normal forms with examples.
Anomalies in DBMS : There are three types of anomalies that occur when the database is not
normalized. These are – Insertion, update and deletion anomaly. Let’s take an example to understand
this.
Example: Suppose a manufacturing company stores the employee details in a table named employee
that has four attributes: emp_id for storing employee’s id, emp_name for storing employee’s name,
emp_address for storing employee’s address and emp_dept for storing the department details in which
the employee works. At some point of time the table looks like this:
The above table is not normalized. We will see the problems that we face when a table is not
normalized.
Update anomaly: In the above table we have two rows for employee Rick as he belongs to two
departments of the company. If we want to update the address of Rick then we have to update the same
in two rows or the data will become inconsistent. If somehow, the correct address gets updated in one
department but not in other then as per the database, Rick would be having two different addresses,
which is not correct and would lead to inconsistent data.
Insert anomaly: Suppose a new employee joins the company, who is under training and currently not
assigned to any department then we would not be able to insert the data into the table if emp_dept field
doesn’t allow nulls.
Delete anomaly: Suppose, if at a point of time the company closes the department D890 then deleting
the rows that are having emp_dept as D890 would also delete the information of employee Maggie since
she is assigned only to this department.
Instructor Department
ID Member
Member Dept_Name
Name Building
Salary budget
Entity sets are represented by a rectangular box with the entity set name in the header and
the attributes listed below it.
Relationship sets are represented by a diamond connecting a pair of related entity sets. The
name of the relationship is placed inside the diamond.
As an illustration, consider part of a university database consisting of instruc-tors and the
departments with which they are associated. In the above Figure shows the corresponding E-R diagram.
The E-R diagram indicates that there are two entity sets, instructor and department, with attributes as
outlined earlier. The diagram also shows a relationship member between instructor and department.
In addition to entities and relationships, the E-R model represents certain constraints to which the
contents of a database must conform. One important constraint is mapping cardinalities, which
express the number of entities to which another entity can be associated via a relationship set. For
example, if each instructor must be associated with only a single department, the E-R model can
express that constraint.
DATA ARCHITECTURE:
The conceptual level has a conceptual or Logical schema, which describes the structure of the
whole database for a community of users. The conceptual schema hides the details of physical
storage structures and concentrates on describing entities, data types, relationships, user
operations, and constraints. A high-level data model or an implementation data model can be
used at this level.
The external or view level includes a number of external or View schemas or user views. Each
external schema describes the part of the database that a particular user group is interested in and
hides the rest of the database from that user group. A high-level data model or an implementation
data model can be used at this level.
Hence, the DBMS must transform a request specified on an external schema into a request against
the conceptual schema, and then into a request on the internal schema for processing over the stored
database. If the request is a database retrieval, the data extracted from the stored database must be
reformatted to match the user’s external view.
The processes of transforming requests and results between levels are called mappings. These
mappings may be time-consuming, so some DBMSs—especially those that are meant to support small
databases—do not support external views. Even in such systems, however, a certain amount of mapping
is necessary to transform requests between the conceptual and internal levels.
The DBMS accepts SQL commands generated from a variety of user interfaces, produces query
evaluation plans, executes these plans against the database, and returns the answers. (This is a
implication: SQL commands can be embedded in host language application programs, e.g., Java or
COBOL programs. We ignore these issues to concentrate on the core DBMS functionality.)
When a user issues a query, the parsed query is presented to a query optimizer, which uses
information about how the data is stored to produce an efficient execution plan for evaluating the query.
An execution plan is a blueprint for evaluating a query, and is usually represented as a tree of relational
operators
The files and access methods layer code sits on top of the buffer manager, which brings pages in
from disk to main memory as needed in response to read requests.
The lowest layer of the DBMS software deals with management of space on disk, where the data
is stored. Higher layers allocate, de-allocate, read, and write pages through (routines provided by) this
layer, called the disk space manager.
The DBMS supports concurrency and crash recovery by carefully scheduling user requests and
maintaining a log of all changes to the database. DBMS components associated with concurrency
control and recovery include the transaction manager, which ensures that transactions request and
release locks according to a suitable locking protocol and schedules the execution transactions; the lock
manager, which keeps track of requests for locks and grants locks on database objects when they
become available; and the recovery manager, which is responsible for maintaining a log, and restoring
the system to a consistent state after a crash. The disk space manager, buffer manager, and file and
access method layers must interact with these components.
The architecture of a database systems is greatly influenced by the underlying computer system
on which the database is running:
• Centralized
• Client-server
• Parallel (multi-processor)
• Distributed
The database and the DBMS catalog are usually stored on disk. Access to the disk is controlled
primarily by the operating system (OS), which schedules disk read/write. Many DBMSs have their own
buffer management module to schedule disk read/write, because this has a considerable effect on
performance. Reducing disk read/write improves performance considerably. A higher-level stored data
manager module of the DBMS controls access to DBMS information that is stored on disk, whether it is
part of the database or the catalog.
In the following Figure, in a simplified form, the typical DBMS components. The figure is
divided into two parts. The top part of the figure refers to the various users of the database environment
and their interfaces. The lower part shows the internals of the DBMS responsible for storage of data and
processing of transactions.
The architecture of a database system is greatly influenced by the underlying computer system on
which the database system runs. Database systems can be centralized, or client-server, where one server
machine executes work on behalf of multiple client machines. Database systems can also be designed to
exploit parallel computer architectures. Distributed databases span multiple geographically separated
machines.
Most users of a database system today are not present at the site of the database system, but
connect to it through a network. We can therefore differentiate between client machines, on which
remote database users work, and server machines, on which the database system runs.
One-tier architecture
Imagine a person on a desktop computer who uses Microsoft Access to load up a list of personal
addresses and phone numbers that he or she has saved in MS Windows' “My Documents” folder.
This is an example of a one-tier database architecture. The program (Microsoft Access) runs on
the user's local machine, and references a file that is stored on that machine's hard drive, thus using a
single physical resource to access and process information.
Two-tier architecture
Database applications are usually partitioned into two or three parts, as in Figure (a).
In a Two-tier architecture, the application resides at the client machine, where it invokes database
system functionality at the server machine through query language statements. Application program
interface standards like ODBC and JDBC are used for interaction between the client and the server.
Advantages:
Disadvantages:
1. In two tier architecture application performance will be degrade upon increasing the users.
2. Cost-ineffective.
Three-tier architecture
In contrast (Figure (b)), in a Three-tier architecture, the client machine acts as merely a front
end and does not contain any direct database calls. Instead, the client end communicates with an
Intermediate layer called application server, usually through a forms interface. The application server
in turn communicates with a database system to access data. It is commonly used architecture for web
applications.
Advantages
1. High performance, lightweight persistent objects.
2. Scalability – Each tier can scale horizontally.
3. Performance – Because the Presentation tier can cache requests, network utilization is
minimized, and the load is reduced on the Application and Data tiers.
4. Better Re-usability.
5. Improve Data Integrity.
6. Improved Security – Client is not direct access to database.
7. Easy to maintain, to manage, to scale, loosely coupled etc.
Disadvantages
1. Increase Complexity/Effort
A database system is partitioned into modules that deal with each of the responsibilities of the
overall system. The functional components of a database system can be broadly divided into the storage
manager and the query processor components.
The storage manager is important because databases typically require a large amount of storage
space. Corporate databases range in size from hundreds of gigabytes to, for the largest databases,
terabytes of data. A gigabyte is approximately 1000 megabytes (actually 1024) (1 billion bytes), and a
terabyte is 1 million megabytes (1 trillion bytes).
Since the main memory of computers cannot store this much information, the information is
stored on disks. Data are moved between disk storage and main memory as needed. Since the movement
of data to and from disk is slow relative to the speed of the central processing unit, it is imperative that
the database system structure the data so as to minimize the need to move data between disk and main
memory.
The query processor is important because it helps the database system to simplify and facilitate
access to data. The query processor allows database users to obtain good performance while being able
to work at the view level and not be burdened with understanding the physical-level details of the
implementation of the system. It is the job of the database system to translate updates and queries
written in a nonprocedural language, at the logical level, into an efficient sequence of operations at the
physical level.
Storage Manager
The storage manager is the component of a database system that provides the interface between
the low-level data stored in the database and the application programs and queries submitted to the
system. The storage manager is responsible for the interaction with the file manager. The raw data are
stored on the disk using the file system provided by the operating system. The storage manager
translates the various DML statements into low-level file-system commands.
Thus, the storage manager is responsible for storing, retrieving, and updating data in the
database.
• DDL interpreter,which interprets DDL statements and records the definitions in the data
dictionary.
• DML compiler,which translates DML statements in a query language into an evaluation plan
consisting of low-level instructions that the query evaluation engine understands.
A query can usually be translated into any of a number of alternative evaluation plans
that all give the same result. The DML compiler also performs query optimization; that is, it
picks the lowest cost evaluation plan from among the alternatives.
• Query evaluation engine, which executes low-level instructions generated by the DML
compiler.
TRANSACTION MANAGEMENT:
Let’s take an example of a simple transaction. Suppose a bank employee transfers Rs 500 from A's
account to B's account. This very simple and small transaction involves several low-level tasks.
ACID Properties
A transaction is a very small unit of a program and it may contain several lowlevel tasks. A
transaction in a database system must maintain Atomicity, Consistency, Isolation,
solation, and Durability −
commonly known as ACID properties − in order to ensure accuracy, completeness, and data integrity.
• Atomicity − This property states that a transaction must be treated as an atomic unit, that is,
either all of its operations are executed or none. There must be no state in a database where a
transaction is left partially completed. States should be defined either before the execution of the
transaction or after the execution/abortion/failure of the transaction.
• Consistency − The database must remain in a consistent state after any transaction. No
transaction should have any adverse effect on the data residing in the database. If the database
was in a consistent state before the execution of a transaction, it must remain consistent after the
execution of the
he transaction as well.
• Durability − The database should be durable enough to hold all its latest updates even if the
system fails or restarts. If a transaction updates a chunk of data in a database and commits, then
the database will hold the modified datdata.
a. If a transaction commits but the system fails before the
data could be written on to the disk, then that data will be updated once the system springs back
into action.
• Isolation − In a database system where more than one transaction are being executed
simultaneously
imultaneously and in parallel, the property of isolation states that all the transactions will be
carried out and executed as if it is the only transaction in the system. No transaction will affect
the existence of any other transaction.
States of Transactions
Active − In this state, the transaction is being executed. This is the initial state of every transaction.
Partially Committed − When a transaction executes its final operation, it is said to be in a partially committed
state.
Failed − A transaction is said to be in a failed state if any of the checks made by the database recovery system
fails. A failed transaction can no longer proceed further.
Aborted − If any of the checks fails and the transaction has reached a failed state, then the recovery manager rolls
back all its write operations on the database to bring the database back to its original state where it was prior to
the execution of the transaction. Transactions in this state are called aborted. The database recovery module can
select one of the two operations after a transaction aborts −
Committed − If a transaction executes all its operations successfully, it is said to be committed. All its effects are
now permanently established on the database system.
The term data mining refers loosely to the process of semi-automatically analyzing large
databases to find useful patterns. Like knowledge discovery in artificial intelligence (also called
machine learning) or statistical analysis, data mining attempts to discover rules and patterns from data.
However, data mining differs from machine learning and statistics in that it deals with large volumes of
data, stored primarily on disk. That is, data mining deals with “knowledge discovery in databases.”
The practice of examining large pre-existing database in order to generate new information or
pattern.
Data mining applications that analyze large amounts of data searching for the occurrences of
specific patterns or relationships, and for identifying unusual patterns in areas such as credit card usage.
It was quickly apparent that basic relational systems were not very suitable for many of these
applications, usually for one or more of the following reasons:
More complex data structures were needed for modeling the application than the simple relational
representation.
New data types were needed in addition to the basic numeric and character string types.
New operations and query language constructs were necessary to manipulate the new data types.
New storage and indexing structures were needed for efficient searching on the new data types.
This led DBMS developers to add functionality to their systems. Some functionality was general
purpose, such as incorporating concepts from object-oriented databases into relational systems. Other
functionality was special purpose, in the form of optional modules that could be used for specific
applications.
Data mining is a process used by companies to turn raw data into useful information. By using
software, to look for pattern in large batch of data. Business can learn more about their customers and
develop more effective marketing strategies as well as increase sales and decrease costs.
Data mining process depends on effective data collection and warehousing as well as computer
processing. When companies centralize their data into one database or program, It is called data
warehousing. Such as data warehouses, for efficient analysis, data mining algorithms, facilitating
business decision making and other information requirements to kindly cut costs and increase the sales.
Textual data, too, has grown explosively. Textual data is unstructured, unlike the rigidly
structured data in relational databases. Querying of unstructured textual data is referred to as
information retrieval.
Traditionally, database technology applies to structured and formatted data that arises in routine
applications in government, business and industry. Database technology is heavily used in
manufacturing, retail, banking, insurance, finance, and health care industries, where structured data is
collected through forms, such as invoices or patient registration documents. An area related to database
technology is Information Retrieval (IR), which deals with books, manuscripts, and various forms of
library-based articles. Data is indexed, cataloged and annotated using keywords.
Information retrieval, as the name implies, concerns the retrieving of relevant information from
databases. It is basically concerned with facilitating the user's access to large amounts of (predominantly
textual) information.
Terabytes of data are being cumulated on the internet which includes Facebook and Twitter
data as well as Instagrams and other social networking sites. This vast repository may be
mined, and controlled to some extent, to swerve public opinion in a candidate's favor
(election strategy) or evaluate a product's performance (marketing and sales strategy)
2. Multimedia Information Retrieval
Storage, indexing, search, and delivery of multimedia data such as images, videos, sounds,
3D graphics or their combination. By definition, it includes works on, for example,
extracting descriptive features from images, reducing high-dimensional indexes into low-
dimensional ones, defining new similarity metrics, efficient delivery of the retrieved data,
and so forth. Systems that provide all or part of the above functionalities are multimedia
retrieval systems.
The Google image search engine is a typical example of such a system. A video-
on-demand site that allows people to search movies by their titles is another example
A primary goal of a database system is to retrieve information from and store new information
into the database. People who work with a database can be categorized as database users or database
administrators.
Database Users and User Interfaces
There are four different types of database-system users, differentiated by the way they expect to
interact with the system. Different types of user interfaces have been designed for the different types of
users.
• Naive users are unsophisticated users who interact with the system by invoking one of the
application programs that have been written previously.
o Bank tellers check account balances and post withdrawals and deposits.
o Reservation agents for airlines, hotels, and car rental companies check availability for a
given request and make reservations.
o Employees at receiving stations for shipping companies enter package identifications via
bar codes and descriptive information through buttons to update a central database of
received and in-transit packages.
o As another example, consider a student, who during class registration period, wishes to
register for a class by using a Web interface. Such a user connects to a Web application
program that runs at a Web server. The application first verifies the identity of the user,
and allows her to access a form where she enters the desired information. The form
information is sent back to the Web application at the server, which then determines if
there is room in the class (by retrieving information from the database) and if so adds the
student information to the class master in the database.
• Application programmers are computer professionals who write application programs.
Application programmers can choose from many tools to develop user interfaces. Rapid
application development (RAD) tools are tools that enable an application programmer to
construct forms and reports with minimal programming effort.
• Sophisticated users interact with the system without writing programs. Instead, they form their
requests either using a database query language or by using tools such as data analysis software.
Analysts who submit queries to explore data in the database fall in this category.
• Specialized users are sophisticated users who write specialized database applications that do not
fit into the traditional data-processing framework. Among these applications are computer-aided
design systems, knowledge-base and expert systems, systems that store data with complex data
types (for example, graphics data and audio data), and environment-modeling systems.
• Database Administrator One of the main reasons for using DBMSs is to have central control of
both the data and the programs that access those data. A person who has such central control
over the system is called a database administrator (DBA). The functions of a DBA include:
• Schema definition. The DBA creates the original database schema by executing a set of
data definition statements in the DDL.
• Storage structure and access-method definition.
• Schema and physical-organization modification. The DBA carries out changes to the
schema and physical organization to reflect the changing needs of the organization, or to
alter the physical organization to improve performance.
• Granting of authorization for data access. By granting different types of authorization,
the database administrator can regulate which parts of the Database various users can
access. The authorization information is kept in a special system structure that the database
system consults whenever someone attempts to access the data in the system.
• Routine maintenance. Examples of the database administrator’s routine maintenance
activities are:
o Periodically backing up the database, either onto tapes or onto remote servers, to
prevent loss of data in case of disasters such as flooding.
o Ensuring that enough free disk space is available for normal operations, and
upgrading disk space as required.
o Monitoring jobs running on the database and ensuring that performance is not
degraded by very expensive tasks submitted by some users.
1980s:
Research relational prototypes evolve into commercial systems
-SQL becomes industrial standard
Parallel and distributed database systems
Object-oriented database systems
1990s:
Large decision support and data-mining applications
Large multi-terabyte data warehouses
Emergence of Web commerce
2000s:
XML and XQuery standards
Automated database administration
Information processing drives the growth of computers, as it has from the earliest days of
commercial computers. In fact, automation of data processing tasks predates computers. Punched cards,
invented by Herman Hollerith, were used at the very beginning of the twentieth century to record U.S.
census data, And Mechanical systems were used to process the cards and tabulate results. Punched cards
were later widely used as a means of entering data into computers. Techniques for data storage and
processing have evolved over the years:
• 1950s and early 1960s:
Magnetic tapes were developed for data storage. Data processing tasks such as payroll were
automated, with data stored on tapes. Processing of data consisted of reading data from one or more
tapes and writing data to a new tape. Data could also be input from punched card decks, and output to
printers. For example, salary raises were processed by entering the raises on punched cards and reading
the punched card deck in synchronization with a tape containing the master salary details. The records
had to be in the same sorted order. The salary raises would be added to the salary read from the master
tape, and written to a new tape; the new tape would become the new master tape. Tapes (and card decks)
could be read only sequentially, and data sizes were much larger than main memory; thus, data
processing programs were forced to process data in a particular order, by reading and merging data from
tapes and card decks.
• Late 1960s and 1970s:
Wide spread use of hard disks in the late1960s changed the scenario for data processing greatly,
since hard disks allowed direct access to data. The position of data on disk was immaterial, since any
location on disk could be accessed in just tens of milliseconds. Data were thus freed from the tyranny of
sequentially. With disks, network and hierarchical databases could be created that allowed data
structures such as lists and trees to be stored on disk. Programmers could construct and manipulate these
data structures.
A landmark paper by Codd [1970] defined the relational model and nonprocedural ways of
querying data in the relational model, and relational databases were born. The simplicity of the relational
model and the possibility of hiding implementation details completely from the programmer were
enticing indeed. Codd later won the prestigious Association of Computing Machinery Turing Award for
his work.
•1980s:
Although academically interesting, the relational model was not used in practice initially,
because of its perceived performance disadvantages; relational databases could not match the
performance of existing network and hierarchical databases. That changed with System R, a ground
breaking project at IBM Research that developed techniques for the construction of an efficient
relational database system. Excellent overviews of System R are provided by Astrahan et al. [1976] and
Chamberlin et al. [1981]. The fully functional System R prototype led to IBM’s first relational database
product, SQL/DS. At the same time, the Ingres system was being developed at the University of
California at Berkeley. It led to a commercial product of the same name. Initial commercial relational
database systems, such as IBM DB2, Oracle, Ingres, and DEC Rdb, played a major role in advancing
techniques for efficient processing of declarative queries.
By the early 1980s, relational databases had become competitive with network and hierarchical
database systems even in the area of performance. Relational databases were so easy to use that they
eventually replaced network and hierarchical databases; programmers using such databases were forced
to deal with many low-level implementation details, and had to code their queries in a procedural
fashion. Most importantly, they had to keep efficiency in mind when designing their programs, which
involved a lot of effort.
In contrast, in a relational database, almost all these low-level tasks are carried out automatically
by the database, leaving the programmer free to work at a logical level. Since attaining dominance in the
1980s, the relational model has reigned supreme among data models. The 1980s also saw much research
on parallel and distributed databases, as well as initial work on object-oriented databases.
• Early 1990s:
The SQL language was designed primarily for decision support applications, which are query-
intensive, yet the main stay of databases in the 1980s was transaction-processing applications, which are
update-intensive. Decision support and querying re-emerged as a major application area for databases.
Tools for analyzing large amounts of data saw large growths in usage. Many database vendors
introduced parallel database products in this period. Database vendors also began to add object-
relational support to their databases.
• 1990s:
The major event of the 1990s was the explosive growth of the World Wide Web. Databases were
deployed much more extensively than ever before. Database systems now had to support very high
transaction-processing rates, as well as very high reliability and 24×7 availability (availability 24 hours a
day, 7 days a week, meaning no downtime for scheduled maintenance activities).Database systems also
had to support Web interfaces to data.
• 2000s:
The first half of the 2000s saw the emerging of XML and the associated query language XQuery
as a new database technology. Although XML is widely used for data exchange, as well as for storing
certain complex data types, relational databases still form the core of a vast majority of large-scale
database applications. In this time period we have also witnessed the growth in “autonomic-
computing/auto-admin” techniques for minimizing system administration effort. This period also saw a
significant growth in use of open-source database systems, particularly PostgreSQL and MySQL. The
latter part of the decade has seen growth in specialized databases for data analysis, in particular column-
stores, which in effect store each column of a table as a separate array, and highly parallel database
systems designed for analysis of very large data sets. Several novel distributed data-storage systems
have been built to handle the data management requirements of very large Web sites such as Amazon,
Facebook, Google, Microsoft and Yahoo!, and some of these are now offered as Web services that can
be used by application developers. There has also been substantial work on management and analysis of
streaming data, such as stock-market ticker data or computer network monitoring data. Data-mining
techniques are now widely deployed; example applications include Web-based product-recommendation
systems and automatic placement of relevant advertisements on Web pages.
-------
When Not to Use a DBMS
In spite of the advantages of using a DBMS, there are a few situations in which a DBMS may
involve unnecessary overhead costs that would not be incurred in traditional file processing. The
overhead costs of using a DBMS are due to the following:
• High initial investment in hardware, software, and training
• The generality that a DBMS provides for defining and processing data
• Overhead for providing security, concurrency control, recovery, and integrity functions
Therefore, it may be more desirable to use regular files under the following
circumstances:
• Simple, well-defined database applications that are not expected to change at all
• Stringent, real-time requirements for some application programs that may not be met
because of DBMS overhead.
• Embedded systems with limited storage capacity, where a general-purpose DBMS would
not fit.
• No multiple-user access to data
Certain industries and applications have elected not to use general-purpose DBMSs.
For example, many computer-aided design (CAD) tools used by mechanical and civil
engineers have proprietary file and data management software that is geared for the internal
manipulations of drawings and 3D objects. Similarly, communication and switching systems
designed by companies like AT&T were early manifestations of database software that was
made to run very fast with hierarchically organized data for quick access and routing of calls.
Similarly, GIS implementations often implement their own data organization schemes for
efficiently implementing functions related to processing maps, physical contours, lines,
polygons, and so on. General-purpose DBMSs are inadequate for their purpose.