Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
78 views

Chapter-1 Notes - Introduction

The document provides an overview of file processing systems and their disadvantages compared to database management systems (DBMS). It discusses how file processing systems store data in separate files that can result in data redundancy, inconsistencies, difficulty of access and updating. In contrast, DBMS store data in normalized tables to reduce redundancy and support features like transactions, queries, security and concurrent access that address the shortcomings of file processing systems. Examples of popular DBMS and common applications that utilize databases are also outlined.

Uploaded by

Adfar Rashid
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
78 views

Chapter-1 Notes - Introduction

The document provides an overview of file processing systems and their disadvantages compared to database management systems (DBMS). It discusses how file processing systems store data in separate files that can result in data redundancy, inconsistencies, difficulty of access and updating. In contrast, DBMS store data in normalized tables to reduce redundancy and support features like transactions, queries, security and concurrent access that address the shortcomings of file processing systems. Examples of popular DBMS and common applications that utilize databases are also outlined.

Uploaded by

Adfar Rashid
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Chapter 1.

Introduction to Database Concepts

File Processing System


In earlier days, data was stored manually, using pen and paper. But after discovery of computer,
the same task is done by using files.
A computer file is a resource which uniquely records data, in a storage device in a computer.
There are various formats in which data can be stored. e.g. text files as.txt while pictures as
.png files.
In Computer Science, file processing system (FPS) is a way of storing, retrieving and
manipulating data stored permanently in various file in a computer. FPS may use files such as
.txt, .jpg, .docx, or even structured datatypes such as .html or .xml. To manipulate data stored
in these files, a number of application programs needs to be written at the request of the users
in the organization like arranging monthly sales data or printing monthly reports of sales. New
applications are added to the system as the need arises.
Example- File processing system for traditional bank

Major disadvantages of file-processing system:


In the early days of their use, file processing systems were, major advances in data.
management. Even though, FPS are cost friendly and easy to understand and use, their major
disadvantages are -

1. Data redundancy and inconsistency

2. Difficulty in accessing data


3. Data isolation

4. Integrity problems

5. Atomicity of updates

6. Concurrent access by multiple users

1. Data redundancy and inconsistency:

Data is stored in multiple file formats (.csv, .txt or .doc)

Application programs to access the data in different files may be written in different
languages (C or C++).

Duplication of information in different files in different formats. (DOB as dd-mm-yy or mm-


dd-yyyy) Increases storage cost.

Data inconsistency (change in address of customer may not get reflected in all sections).

2. Difficulty in accessing data

Need to write a new program to carry out each new task.

3. Data isolation

Data isolation is a property that determines when and how changes made by one operation
become visible to other concurrent users and systems.

This is a problem because writing application programs is difficult as data is scattered in


multiple files (account and loan in different files) in multiple formats.

4. Integrity problems

Data integrity refers to the maintenance and assurance that the data in a database are correct
and consistent.

Factors to consider when addressing this issue are:

• Data values must satisfy certain consistency constraints that are specified in
the application programs. For example - account balance > 0
• It is difficult to make changes to the application programs in order to enforce
new constraints. For example - eligibility for home loans in bank
5. Atomicity of updates
Failures in maintaining atomicity in transactions, may leave database in an inconsistent
(incorrect) state with partial updates carried out.

Example - Transfer of funds from account A to account B should either complete or not happen
at all.

6. Concurrent access by multiple users

• Concurrency is the ability to allow multiple users access to the same record
simultaneously without adversely affecting transaction processing.

• Typically, in a file-based system, when an application opens a file, that file is locked. This
means that no one else has access to the file at the same time.

Data, Database and Database Management System


Data
Data can be facts related to any object in consideration. It is a set of different symbols and
characters.
Example: name, age, height, weight, etc. are some data related to person. A picture, image, file,
pdf, etc. can also be considered data.
Database
A database is a organized collection of inter-related data which represents some aspect of
the real world.
Example: Student database, Employee database, Patient database, Book database
Database Management System
A database management system (DBMS) is a collection of interrelated data and a set of
programs to access those data. The collection of data, usually referred to as the database,
contains information relevant to an enterprise. The primary goal of a DBMS is to provide a
way to store and retrieve database information that is both convenient and efficient.
Database systems are designed to manage large bodies of information. Management of data
involves both defining structures for storage of information and providing mechanisms for the
manipulation of information. In addition, the database system must ensure the safety of the
information stored, despite system crashes or attempts at unauthorized access. If data are to be
shared among several users, the system must avoid possible anomalous results.
Examples of popular DBMS:

• MySql
• Oracle
• SQL Server
• IBM DB2
• PostgreSQL
• Amazon SimpleDB (cloud based) etc.

Characteristics of databases

Traditionally, data was organized in file formats. DBMS was a new concept then, and all the
research was done to make it overcome the deficiencies in traditional style of data
management.

Following are the Characteristics of modern DBMS-

Real-world entity

A modern DBMS is more realistic and uses real-world entities to design its architecture. It
uses the behaviour and attributes too. For example, a college database may use students as an
entity and their age as an attribute.

Relation-based tables

DBMS allows entities and relations among them to form tables. A user can understand the
architecture of a database just by looking at the table names.

Isolation of data and application

A database system is entirely different than its data. A database is an active entity, whereas
data is said to be passive, on which the database works and organizes. DBMS also stores
metadata, which is data about data, to ease its own process.

Less redundancy

DBMS follows the rules of normalization, which splits a relation when any of its attributes is
having redundancy in values. Normalization is a mathematically rich and scientific process
that reduces data redundancy.
Consistency

Consistency is a state where every relation in a database remains consistent. There exist
methods and techniques, which can detect attempt of leaving database in inconsistent state. A
DBMS can provide greater consistency as compared to earlier forms of data storing
applications like file-processing systems.

Query Language

DBMS is equipped with query language, which makes it more efficient to retrieve and
manipulate data. A user can apply as many and as different filtering options as required to
retrieve a set of data. Traditionally it was not possible where file-processing system was used.

ACID Properties

DBMS follows the concepts of Atomicity, Consistency, Isolation, and Durability (normally
shortened as ACID). These concepts are applied on transactions, which manipulate data in a
database. ACID properties help the database stay healthy in multi-transactional environments
and in case of failure.

Multiuser and Concurrent Access

DBMS supports multi-user environment and allows them to access and manipulate data in
parallel. Though there are restrictions on transactions when users attempt to handle the same
data item, but users are always unaware of them.

Multiple views

DBMS offers multiple views for different users. A user who is in the Sales department will
have a different view of database than a person working in the Production department. This
feature enables the users to have a concentrate view of the database according to their
requirements.

Security

Features like multiple views offer security to some extent where users are unable to access
data of other users and departments. DBMS offers methods to impose constraints while
entering data into the database and retrieving the same at a later stage. DBMS offers many
different levels of security features, which enables multiple users to have different views with
different features. For example, a user in the Sales department cannot see the data that belongs
to the Purchase department. Additionally, it can also be managed how much data of the Sales
department should be displayed to the user. Since a DBMS is not saved on the disk as
traditional file systems, it is very hard for miscreants to break the code.

Database System Applications


Databases are widely used. Here are some representative applications:
• Enterprise Information
• Sales: For customer, product, and purchase information.
• Accounting: For payments, receipts, account balances, assets and other accounting
information.
• Human resources:
For information about employees, salaries, payroll taxes, and benefits, and for
generation of pay-checks.
• Manufacturing:
For management of the supply chain and for tracking production of items in factories,
inventories of items in warehouses and stores, and orders for items.
• Online retailers:
For sales data noted above plus online order tracking, generation of recommendation
lists, and maintenance of online product evaluations.
• Banking and Finance
Banking:
For customer information, accounts, loans, and banking transactions.
Credit card transactions:
For purchases on credit cards and generation of monthly statements.
Finance:
For storing information about holdings, sales, and purchases of financial instruments such as
stocks and bonds; also for storing real-time market data to enable online trading by customers
and automated trading by the firm.
• Universities:
For student information, course registrations, and grades (in addition to standard enterprise
information such as human resources and accounting).
• Airlines: For reservations and schedule information.
Airlines were among the first to use databases in a geographically distributed manner.
• Telecommunication:
For keeping records of calls made, generating monthly bills, maintaining balances on prepaid
calling cards, and storing information about the communication networks.
As the list illustrates, databases form an essential part of every enterprise today, storing not
only types of information that are common to most enterprises, but also information that is
specific to the category of the enterprise.
The importance of database systems can be judged in another way, today, database system
vendors like Oracle are among the largest software companies in the world, and database
systems form an important part of the product line of Microsoft and IBM.

File system vs Database Management System (DBMS)

File System Database Management System (DBMS)


1. It is a software system that manages 1. It is a software system used for creating and
and controls the data files in a computer managing the databases. DBMS provides a
system. systematic way to access, update, and delete
data.
2. File system does not support multi- 2. Database Management System supports
user access. multi-user access.
3. Data consistency is less in the file 3. Data consistency is more due to the use of
system. normalization.
4. File system is not secured. 4. Database Management System is highly
secured.
5. File system is used for storing the 5. Database management system is used for
unstructured data. storing the structured data.
6. In the file system, data redundancy is 6. In DBMS, Data redundancy is low.
high.
7. No data backup and recovery process 7. There is a backup recovery for data in DBMS.
is present in a file system.
8. Handling of a file system is easy. 8. Handling a DBMS is complex.
9. Cost of a file system is less than the 9. Cost of database management system is more
DBMS. than the file system.
10. If one application fails, it does not 10. If the database fails, it affects all application
affect other application in a system. which depends on it.
11. In file system, data cannot be shared 11. In DBMS, data can be shared as it is stored
because it is distributed in different at one place in a database.
files.
12. These system does not provide 12. This system provides concurrency facility.
concurrency facility.
13. Example: NTFS (New technology 13. Example: Oracle, MySQL, MS SQL Server,
file system) DB2, Microsoft Access, etc.
(https://www.tutorialandexample.com/difference-between-file-system-and-dbms/)

View of Data
A database system is a collection of interrelated data and a set of programs that allow users to
access and modify these data. A major purpose of a database system is to provide users with
an abstract view of the data. That is, the system hides certain details of how the data are stored
and maintained.
Data Abstraction
For the system to be usable, it must retrieve data efficiently. The need for efficiency has led
designers to use complex data structures to represent data in the database. Since many database-
system users are not computer trained, developers hide the complexity from users through
several levels of abstraction, to simplify users’ interactions with the system:
• Physical level
The lowest level of abstraction describes how the data are actually stored. The physical level
describes complex low-level data structures in detail.
• Logical level
The next-higher level of abstraction describes what data are stored in the database, and what
relationships exist among those data. The logical level thus describes the entire database in
terms of a small number of relatively simple structures. Although implementation of the simple
structures at the logical level may involve complex physical-level structures, the user of the
logical level does not need to be aware of this complexity. This is referred to as physical data
independence. Database administrators, who must decide what information to keep in the
database, use the logical level of abstraction.
• View level
The highest level of abstraction describes only part of the entire database. Even though the
logical level uses simpler structures, complexity remains because of the variety of information
stored in a large database. Many users of the database system do not need all this information;
instead, they need to access only a part of the database. The view level of abstraction exists to
simplify their interaction with the system. The system may provide many views for the same
database.
Figure below shows the relationship among the three levels of abstraction.

Instances and Schemas


Databases change over time as information is inserted and deleted. The collection of
information stored in the database at a particular moment is called an instance of the database.
The overall design of the database is called the database schema. Schemas are changed
infrequently, if at all.
The concept of database schemas and instances can be understood by analogy to a program
written in a programming language. A database schema corresponds to the variable
declarations (along with associated type definitions) in a program.
Each variable has a particular value at a given instant. The values of the variables in a program
at a point in time correspond to an instance of a database schema.
Database systems have several schemas, partitioned according to the levels of abstraction.
The physical schema describes the database design at the physical level, while the logical
schema describes the database design at the logical level.
A database may also have several schemas at the view level, sometimes called subschemas,
that describe different views of the database.
Of these, the logical schema is by far the most important, in terms of its effect on application
programs, since programmers construct applications by using the logical schema. The physical
schema is hidden beneath the logical schema, and can usually be changed easily without
affecting application programs. Application programs are said to exhibit physical data
independence if they do not depend on the physical schema, and thus need not be rewritten if
the physical schema changes.

Data Models
Underlying the structure of a database is the data model: a collection of conceptual tools for
describing data, data relationships, data semantics, and consistency constraints. A data model
provides a way to describe the design of a database at the physical, logical, and view levels.
There are a number of different data models that we shall cover in the text.
Categories of data models are
• Relational Model
The relational model uses a collection of tables to represent both data and the relationships
among those data. Each table has multiple columns, and each column has a unique name.
Tables are also known as relations. The relational model is an example of a record-based
model.
Record-based models are so named because the database is structured in fixed-format records
of several types. Each table contains records of a particular type. Each record type defines a
fixed number of fields, or attributes. The columns of the table correspond to the attributes of
the record type. The relational data model is the most widely used data model, and a vast
majority of current database systems are based on the relational model.
• Entity-Relationship Model
The entity-relationship (E-R) data model uses a collection of basic objects, called entities, and
relationships among these objects. An entity is a “thing” or “object” in the real world that is
distinguishable from other objects. The entity-relationship model is widely used in database
design.
• Object-Based Data Model
Object-oriented programming (especially in Java, C++, or C#) has become the dominant
software-development methodology. This led to the development of an object-oriented data
model that can be seen as extending the E-R model with notions of encapsulation, methods
(functions), and object identity. The object-relational data model combines features of the
object-oriented data model and relational data model.
• Semi-structured Data Model
The semi-structured data model permits the specification of data where individual data items
of the same type may have different sets of attributes. This is in contrast to the data models
mentioned earlier, where every data item of a particular type must have the same set of
attributes. The Extensible Markup Language (XML) is widely used to represent semi-
structured data.
Historically, the network data model and the hierarchical data model preceded the relational
data model. These models were tied closely to the underlying implementation, and complicated
the task of modeling data. As a result, they are used little now, except in old database code that
is still in service in some places.
Database Architecture
Database Architecture represents the various components of a database system and the
connections among them.
A database system has several subsystems like
• storage manager subsystem
• query processor subsystem
• transaction management and
• disk storage

Query processor subsystem


Query processor compiles and executes DDL and DML statements.
DDL interpreter – It interprets DDL statements and records the definitions in the data
dictionary.

DML compiler - It translates DML statements in a query language into an evaluation plan
consisting of low-level instructions that the query evaluation engine understands.

The DML compiler performs query optimization; that is, it picks the lowest cost evaluation
plan from among the various alternatives.

Query evaluation engine – It executes low-level instructions generated by the DML compiler.

Storage manager
It provides the interface between the stored in the database and the application programs and
queries submitted to the system.

The storage manager is responsible to the following tasks:

• Interaction with the OS file manager


• Efficient storing, retrieving and updating of data

The storage manager components are:


• Authorization and integrity manager
• Transaction manager
• File manager
• Buffer manager
Transaction management
Transaction manager performs two activities -

• Transaction-management ensures that the database remains in a consistent (correct) state


despite system failures (e.g., power failures and operating system crashes) and transaction
failures.
• Concurrency-control management controls the interaction among the concurrent
transactions, to ensure the consistency of the database.

Disk storage
Disk storage has following data structures are part of the physical system implementation and
implemented by the storage manager-
• Data files - store the database itself
• Data dictionary - stores metadata about the structure of the database, in particular the
schema of the database.
• Indices - can provide fast access to data items. A database index provides pointers to
those data items that hold a particular value.
• Statistical data - maintains statistics about different query execution plans and time
required for execution
The architecture of a database system is greatly influenced by the underlying computer system
on which the database system runs. Database systems can be centralized, or client-server,
where one server machine executes work on behalf of multiple client machines. Database
systems can also be designed to exploit parallel computer architectures. Distributed databases
span multiple geographically separated machines.
Database applications are typically broken-up into a front-end part that runs at client machines
and a part that runs at the back end. In two-tier architectures, the front end directly
communicates with a database running at the back end. In three-tier architectures, the back-end
part is itself broken up into an application server and a database server.

Database applications architectures


Most users of a database system today are not present at the site of the database system, but
connect to it through a network. We can therefore differentiate between client machines, on
which remote database users work, and server machines, on which the database system runs.
Database applications are usually partitioned into two or three parts.
In a two-tier architecture, the application resides at the client machine, where it invokes
database system functionality at the server machine through query language statements.
Application program interface standards like ODBC and JDBC are used for interaction
between the client and the server.
In contrast, in a three-tier architecture, the client machine acts as merely a front end and does
not contain any direct database calls. Instead, the client end communicates with an application
server, usually through a forms interface. The application server in turn communicates with a
database system to access data. The business logic of the application, which says what actions
to carry out under what conditions, is embedded in the application server, instead of being
distributed across multiple clients. Three-tier applications are more appropriate for large
applications, and for applications that run on the WorldWideWeb (WWW).

Database Users and Administrators


A primary goal of a database system is to retrieve information from and store new information
into the database. People who work with a database can be categorized as database users or
database administrators.
Database User types
There are four different types of database-system users, differentiated by the way they expect
to interact with the system. Different types of user interfaces have been designed for the
different types of users.
• Naive users are unsophisticated users who interact with the system by invoking one of the
application programs that have been written previously.
For example, a clerk in the university who needs to add a new instructor to department A
invokes a program called new hire. This program asks the clerk for the name of the new
instructor, her new ID, the name of the department (that is, A), and the salary.
The typical user interface for naive users is a forms interface, where the user can fill in
appropriate fields of the form. Naive users may also simply read reports generated from the
database.
• Application programmers are computer professionals who write application programs.
Application programmers can choose from many tools to develop user interfaces. Rapid
application development (RAD) tools are tools that enable an application programmer to
construct forms and reports with minimal programming effort.
• Sophisticated users interact with the system without writing programs. Instead, they form
their requests either using a database query language or by using tools such as data analysis
software. Analysts who submit queries to explore data in the database fall in this category.
• Database Administrator (DBA)
DBA is a type of user who has central control over the database system.

Database Administrator
One of the main reasons for using DBMSs is to have central control of both the data and the
programs that access those data. A person who has such central control over the system is called
a database administrator (DBA).
Functions/duties/responsibilities of a DBA
• Schema definition. The DBA creates the original database schema by executing a set of
data definition statements in the DDL.
• Storage structure and access-method definition.
• Schema and physical-organization modification. The DBA carries out changes to the
schema and physical organization to reflect the changing needs of the organization, or to
alter the physical organization to improve performance.
• Granting of authorization for data access. By granting different types of authorization,
the database administrator can regulate which parts of the database various users can
access. The authorization information is kept in a special system structure that the database
system consults whenever someone attempts to access the data in the system.
• Routine maintenance. Examples of the database administrator’s routine maintenance
activities are:
• Periodically backing up the database, either onto tapes or onto remote servers, to
prevent loss of data in case of disasters such as flooding.
• Ensuring that enough free disk space is available for normal operations, and upgrading
disk space as required.
• Monitoring jobs running on the database and ensuring that performance is not degraded
by very expensive tasks submitted by some users.

Reference: Database System Concepts, Abraham Silberschatz, Henry F. Korth, S.


Sudarshan, Sixth Edition Mcgraw-Hill Publication

You might also like