Chapter-1 Notes - Introduction
Chapter-1 Notes - Introduction
4. Integrity problems
5. Atomicity of updates
Application programs to access the data in different files may be written in different
languages (C or C++).
Data inconsistency (change in address of customer may not get reflected in all sections).
3. Data isolation
Data isolation is a property that determines when and how changes made by one operation
become visible to other concurrent users and systems.
4. Integrity problems
Data integrity refers to the maintenance and assurance that the data in a database are correct
and consistent.
• Data values must satisfy certain consistency constraints that are specified in
the application programs. For example - account balance > 0
• It is difficult to make changes to the application programs in order to enforce
new constraints. For example - eligibility for home loans in bank
5. Atomicity of updates
Failures in maintaining atomicity in transactions, may leave database in an inconsistent
(incorrect) state with partial updates carried out.
Example - Transfer of funds from account A to account B should either complete or not happen
at all.
• Concurrency is the ability to allow multiple users access to the same record
simultaneously without adversely affecting transaction processing.
• Typically, in a file-based system, when an application opens a file, that file is locked. This
means that no one else has access to the file at the same time.
• MySql
• Oracle
• SQL Server
• IBM DB2
• PostgreSQL
• Amazon SimpleDB (cloud based) etc.
Characteristics of databases
Traditionally, data was organized in file formats. DBMS was a new concept then, and all the
research was done to make it overcome the deficiencies in traditional style of data
management.
Real-world entity
A modern DBMS is more realistic and uses real-world entities to design its architecture. It
uses the behaviour and attributes too. For example, a college database may use students as an
entity and their age as an attribute.
Relation-based tables
DBMS allows entities and relations among them to form tables. A user can understand the
architecture of a database just by looking at the table names.
A database system is entirely different than its data. A database is an active entity, whereas
data is said to be passive, on which the database works and organizes. DBMS also stores
metadata, which is data about data, to ease its own process.
Less redundancy
DBMS follows the rules of normalization, which splits a relation when any of its attributes is
having redundancy in values. Normalization is a mathematically rich and scientific process
that reduces data redundancy.
Consistency
Consistency is a state where every relation in a database remains consistent. There exist
methods and techniques, which can detect attempt of leaving database in inconsistent state. A
DBMS can provide greater consistency as compared to earlier forms of data storing
applications like file-processing systems.
Query Language
DBMS is equipped with query language, which makes it more efficient to retrieve and
manipulate data. A user can apply as many and as different filtering options as required to
retrieve a set of data. Traditionally it was not possible where file-processing system was used.
ACID Properties
DBMS follows the concepts of Atomicity, Consistency, Isolation, and Durability (normally
shortened as ACID). These concepts are applied on transactions, which manipulate data in a
database. ACID properties help the database stay healthy in multi-transactional environments
and in case of failure.
DBMS supports multi-user environment and allows them to access and manipulate data in
parallel. Though there are restrictions on transactions when users attempt to handle the same
data item, but users are always unaware of them.
Multiple views
DBMS offers multiple views for different users. A user who is in the Sales department will
have a different view of database than a person working in the Production department. This
feature enables the users to have a concentrate view of the database according to their
requirements.
Security
Features like multiple views offer security to some extent where users are unable to access
data of other users and departments. DBMS offers methods to impose constraints while
entering data into the database and retrieving the same at a later stage. DBMS offers many
different levels of security features, which enables multiple users to have different views with
different features. For example, a user in the Sales department cannot see the data that belongs
to the Purchase department. Additionally, it can also be managed how much data of the Sales
department should be displayed to the user. Since a DBMS is not saved on the disk as
traditional file systems, it is very hard for miscreants to break the code.
View of Data
A database system is a collection of interrelated data and a set of programs that allow users to
access and modify these data. A major purpose of a database system is to provide users with
an abstract view of the data. That is, the system hides certain details of how the data are stored
and maintained.
Data Abstraction
For the system to be usable, it must retrieve data efficiently. The need for efficiency has led
designers to use complex data structures to represent data in the database. Since many database-
system users are not computer trained, developers hide the complexity from users through
several levels of abstraction, to simplify users’ interactions with the system:
• Physical level
The lowest level of abstraction describes how the data are actually stored. The physical level
describes complex low-level data structures in detail.
• Logical level
The next-higher level of abstraction describes what data are stored in the database, and what
relationships exist among those data. The logical level thus describes the entire database in
terms of a small number of relatively simple structures. Although implementation of the simple
structures at the logical level may involve complex physical-level structures, the user of the
logical level does not need to be aware of this complexity. This is referred to as physical data
independence. Database administrators, who must decide what information to keep in the
database, use the logical level of abstraction.
• View level
The highest level of abstraction describes only part of the entire database. Even though the
logical level uses simpler structures, complexity remains because of the variety of information
stored in a large database. Many users of the database system do not need all this information;
instead, they need to access only a part of the database. The view level of abstraction exists to
simplify their interaction with the system. The system may provide many views for the same
database.
Figure below shows the relationship among the three levels of abstraction.
Data Models
Underlying the structure of a database is the data model: a collection of conceptual tools for
describing data, data relationships, data semantics, and consistency constraints. A data model
provides a way to describe the design of a database at the physical, logical, and view levels.
There are a number of different data models that we shall cover in the text.
Categories of data models are
• Relational Model
The relational model uses a collection of tables to represent both data and the relationships
among those data. Each table has multiple columns, and each column has a unique name.
Tables are also known as relations. The relational model is an example of a record-based
model.
Record-based models are so named because the database is structured in fixed-format records
of several types. Each table contains records of a particular type. Each record type defines a
fixed number of fields, or attributes. The columns of the table correspond to the attributes of
the record type. The relational data model is the most widely used data model, and a vast
majority of current database systems are based on the relational model.
• Entity-Relationship Model
The entity-relationship (E-R) data model uses a collection of basic objects, called entities, and
relationships among these objects. An entity is a “thing” or “object” in the real world that is
distinguishable from other objects. The entity-relationship model is widely used in database
design.
• Object-Based Data Model
Object-oriented programming (especially in Java, C++, or C#) has become the dominant
software-development methodology. This led to the development of an object-oriented data
model that can be seen as extending the E-R model with notions of encapsulation, methods
(functions), and object identity. The object-relational data model combines features of the
object-oriented data model and relational data model.
• Semi-structured Data Model
The semi-structured data model permits the specification of data where individual data items
of the same type may have different sets of attributes. This is in contrast to the data models
mentioned earlier, where every data item of a particular type must have the same set of
attributes. The Extensible Markup Language (XML) is widely used to represent semi-
structured data.
Historically, the network data model and the hierarchical data model preceded the relational
data model. These models were tied closely to the underlying implementation, and complicated
the task of modeling data. As a result, they are used little now, except in old database code that
is still in service in some places.
Database Architecture
Database Architecture represents the various components of a database system and the
connections among them.
A database system has several subsystems like
• storage manager subsystem
• query processor subsystem
• transaction management and
• disk storage
DML compiler - It translates DML statements in a query language into an evaluation plan
consisting of low-level instructions that the query evaluation engine understands.
The DML compiler performs query optimization; that is, it picks the lowest cost evaluation
plan from among the various alternatives.
Query evaluation engine – It executes low-level instructions generated by the DML compiler.
Storage manager
It provides the interface between the stored in the database and the application programs and
queries submitted to the system.
Disk storage
Disk storage has following data structures are part of the physical system implementation and
implemented by the storage manager-
• Data files - store the database itself
• Data dictionary - stores metadata about the structure of the database, in particular the
schema of the database.
• Indices - can provide fast access to data items. A database index provides pointers to
those data items that hold a particular value.
• Statistical data - maintains statistics about different query execution plans and time
required for execution
The architecture of a database system is greatly influenced by the underlying computer system
on which the database system runs. Database systems can be centralized, or client-server,
where one server machine executes work on behalf of multiple client machines. Database
systems can also be designed to exploit parallel computer architectures. Distributed databases
span multiple geographically separated machines.
Database applications are typically broken-up into a front-end part that runs at client machines
and a part that runs at the back end. In two-tier architectures, the front end directly
communicates with a database running at the back end. In three-tier architectures, the back-end
part is itself broken up into an application server and a database server.
Database Administrator
One of the main reasons for using DBMSs is to have central control of both the data and the
programs that access those data. A person who has such central control over the system is called
a database administrator (DBA).
Functions/duties/responsibilities of a DBA
• Schema definition. The DBA creates the original database schema by executing a set of
data definition statements in the DDL.
• Storage structure and access-method definition.
• Schema and physical-organization modification. The DBA carries out changes to the
schema and physical organization to reflect the changing needs of the organization, or to
alter the physical organization to improve performance.
• Granting of authorization for data access. By granting different types of authorization,
the database administrator can regulate which parts of the database various users can
access. The authorization information is kept in a special system structure that the database
system consults whenever someone attempts to access the data in the system.
• Routine maintenance. Examples of the database administrator’s routine maintenance
activities are:
• Periodically backing up the database, either onto tapes or onto remote servers, to
prevent loss of data in case of disasters such as flooding.
• Ensuring that enough free disk space is available for normal operations, and upgrading
disk space as required.
• Monitoring jobs running on the database and ensuring that performance is not degraded
by very expensive tasks submitted by some users.