DB01 - Introduction To Database Systems
DB01 - Introduction To Database Systems
Chapter 1
Introduction to Database
Systems
Contents
• Data
• Information
• Data Processing
• Manual File-based System
• File Processing System
• Database
• Database Management System (DBMS)
• Relationship between DBMS & Application Program
• Brief History of Database Systems
Data
Chapter 1 - Introduction to Database Systems
Data
• Word "Data" is plural of "Datum"
• means a single piece of information, statistic, or code
• Data →A collection of raw facts and figures related to an object
• Object → a person, an organization, an event, or any other thing that is significant in a
system
• Data may be in form of text, numbers, images, sounds, and videos
• Collected for different purposes
• Processed to produce meaningful information
• reports, charts, and web pages, etc.
• In an organization, data is as important as blood in human body
• Gives a view of current and past activities
• History related to rise and fall of an organization
• Helps an organization in making decisions for future activities
Data
• Example
• Example
• "Student.dat" → created and maintained in admission office used by "Admission"
program
• also forwarded to registrar’s office
• "Course.dat" → created in registrar’s office contains information about courses of
different subjects
• "Course.dat" + “Student.dat" → used by "Schedule" program to generate individual
student schedules and class lists
• "Pay.dat“ → has information required for calculating pay and preparing payroll sheets
• used by "Payroll" application program
• used to create payroll of the employees
Disadvantages of File Processing System
- Data Redundancy
• means multiple copies of same data
• File processing system → each application program has its own set of data
files
• Same data may exist in multiple files
• Problems of duplication of data
• To update specific data/record → same data must be updated in all files;
• otherwise different files may have different information about a specific record
• Valuable storage space is wasted due to duplication of records
• For example, in university environment
• "student.dat“ of admission office → roll numbers, names, and addresses, etc.
• "result.dat“ of registrar’s office → roll numbers, names, addresses, obtained marks of
subjects, and results of all the examinations, etc.
• both files contain roll numbers, names, and addresses of the students
• Same data is stored in multiple files
Disadvantages of File Processing System
- Data Inconsistency
• means that different files may contain different information about a particular object or
person
• Redundancy leads to inconsistency
• When same data is stored at multiple locations → inconsistency may occur
• Example
• Address of a student is updated at "result.dat“ of registrar’s office
• data files are maintained by other departments also
• some data files may contain old address, while others contain new address
- Data Isolation
• Computer file-processing system → data is stored (or isolated) in various files
• becomes very difficult to access desired information from data files
• For example
• "student.dat“ stores addresses of students
• "result.dat“ stores examination marks
• Need data from both files
• Difficult job with computer file processing
• Programmer must write a new program to extract required data from both data files
Disadvantages of File Processing System
- Data Atomicity
• Transaction → collection of all the steps required to complete an operation on data
• Atomicity → means that either one transaction should take place as a whole or it should not take
place at all
• Suppose want to transfer amount of Rs.15000/- from account X to account Y
• Step-1: Deduct amount of Rs.15000/- from account X
• Step-2: Add amount of Rs.15000/- to account Y
• If system fails (e.g. due to power failure) after Step-1
• amount of Rs.15000/- has been deducted from account X but has not been added to account Y
• data atomicity problem → occurs in transaction-based file processing systems i.e. banking
- Program Data Dependency
• Program data dependency → relationship between data stored in a file and specific program
required to process that file
• File processing system → data stored in a data file depends upon the application program through
which data file was created
• Structure of data file is defined in application program
• Difficult to change the structure of data file
• If data file changed → modify application program also
Disadvantages of File Processing System
- Difficult Program Maintenance
• File-processing system → program maintenance is difficult
• data file depends upon application program
• Any modification in data file (size & data type of data field) → requires redesigning
application program
• Organization has to pay a lot of money for maintenance of program
- Limited Data Sharing
• File processing system → each application program uses its own data files
• Difficult to access data from a data file created by another application program
• Provides very limited facilities to share data among multiple users
- Security Problem
• File processing system does not provide proper security against illegal access to data files
• In most situations → different levels of security is required for different groups of users
• Customers should be allowed to only view and purchase products
• Sales Persons should be allowed to enter sales data along with view facility to view products
• Sales Officer should have the facility to modify or delete the sales data also
• File processing system does not provide such types of advanced security options
Database
Chapter 1 - Introduction to Database Systems
Database
• Database is an organized collection of related data stored in an efficient and
compact manner
• word "organized" → means that data is stored in such a way that it can easily be
accessed and updated
• phrase "related data“ → means that a database contains data or information about a
particular area such as:
• Database of employees that contains data of employees of an organization or department
• Database of students that contains data of students of a college/university etc.
• word "efficient“ → means that required data can be searched very easily and quickly
• word "compact“ → means that stored data takes up as little space as possible without
any duplication of data
Examples of Databases
• NADRA
• In Pakistan, NADRA maintains a database having information of all citizens of Pakistan
• Record of any citizen of Pakistan can be accessed very easily and quickly through CNIC #.
• Library
• In local library → computerized database containing details of books in library
• Computerized index (automated catalog), which allows to find a book with reference to its title, or its
author’s name, etc.
• College/University
• Database containing information about the present and previous students
• Bank Accounts
• When withdraw amount through ATM card → accessing a database of customers of a particular bank
• Bank account accessed through Auto Teller Machine (ATM)
• After withdraw → bank record is updated through a software application immediately
• E-mail Accounts
• Popular websites "hotmail.com", "gmail.com" and "yahoo.com" → contain online databases
• Having free e-mail accounts of users all over the world
• Manual and computerized databases
• Manual Database → library card catalog
• Computerized database created and maintained by a set of programs (database management system)
Database Table
• Database contains various objects used for different purposes
• Most important object → "table“
• Database may consist of many tables
• Data stored in tables
• Table is made up of columns and rows
• Rows represent records
• Each row is divided into columns called fields
• Fields contain different data values of a particular record
Metadata
• Database holds related data + description of that data
• database is also defined as → a self-describing collection of integrated records
• Description of data → metadata or system catalog or data dictionary
• Metadata → data about data
• When table is designed → data type, size, format, and other descriptions of fields are specified
• Metadata of the table
• Metadata describes properties or characteristics of actual data in database
• Describes logical structure of database
• field names i.e. data item names
• data type of each data item
• length or width of data items
• rules and constraints about data
• a brief description of each data item
• Helps database designers and users to understand data in database
• Metadata is saved in a data dictionary file
• Consulted before actual data is read or modified in the database
• DDL (Data Definition Language) is used to define metadata
• It defines a data dictionary of tables in the database
Types of Database
• Types based on databases architectures
• Centralized Databases
• Personal computer databases
• Client/Server databases
• Distributed Databases
• Homogeneous databases
• Heterogeneous databases
• Object-Oriented Databases
Centralized Databases
• All data (complete database) is stored and maintained in one location
• Location is most often a central computer or a server
• Data is managed, updated, and accessed at the central site
• Multiple users can access centralized database
• Centralized database systems are mostly used in colleges, banks, hospitals,
and small organizations
• Examples of centralized databases
• Personal computer databases
• Client/Server databases
Centralized Databases
• Personal Computer Databases
• Normally created and maintained by a single user on his/her personal computer
• Commonly used in small businesses or organizations
• Used for simple accounting, inventory management, and customer billing systems,
etc.
• Relatively simple to develop and use
- data cannot easily be shared among different users
• Client / Server Databases
• client/server architecture is used in which one computer acts as a server for storing
all data, while clients access data
• Server → includes software called DBMS and a computer
• provides back-end functions requested by clients
• Back-end functions → database management, communication management, printing, etc.
• Clients → provide front-end functions
• send requests to server and receives results from server
Centralized Databases
• Client / Server Databases
• Objective → to allow multiple users in network to access
or share data
• Usually, the database processing functions are performed
on database server
• Often used for work-group computing
• More secure than central computer databases
• server computer allows access to database to only
authenticated users on client computers
Centralized Databases
• Client / Server Databases
+Data integrity is maximized and data redundancy is minimized
+ Data can be accessed by many users simultaneously
+Easier to maintain and keep updated
+ Since all data are stored in a single location
+Easier to create a backup of data
+Helps in maintaining of data in an accurate and consistent state
+ Also enhances data reliability
+Gives strong and centralized security i.e. data in a centralized database is always secure
+Easier for use to end-users due to its simplicity of being a single database design
+Provides data portability and better database administration
+More cost-effective (i.e., cheaper) than other types of database systems as maintenance
costs are always minimum
+Keeping data at a single location is easier to change, re-organize, and analyze
+Updates to any given set of data are immediately received by every end-user
Centralized Databases
• Client / Server Databases
- Since all data is at one location, takes more time to search and access
- If network is slow, this process takes even more time
- A lot of data access traffic, may create a bottleneck situation
- data accessing from database becomes very slow down, So data availability is not as efficient as
in a distributed database
- Most of them are highly dependent on network connectivity. Due to slower network
connection (i.e. internet connection), a problem to access the database is created
- In case of any hardware failure → data availability within the entire network will be
affected
- If there is any problem in central site → complete database system fails
- If a set of data is accidentally lost → difficult to retrieve it back
- Deadlocks can occur while attempting to update shared tables that are already in
use
- Needs trained and experienced staff for its administration
Distributed Databases
• Data is stored across different physical locations
• portions of a database are physically distributed across different sites or locations in a computer network
• System administrator can distribute collections of data across multiple physical locations
• Can reside in network servers on Internet, on corporate Intranets or Extranets, or other
company networks
• Managed by a centralized distributed database management system (DDBMS)
• Access data through a computer network
• Some big and multi-national organizations/departments use distributed databases
• Processes to keep a distributed database up to date
• Replication → process identifies changes in the distributed database
• Once changes have been identified → replication process applies those changes to make sure that all
distributed databases look same
• Complex and time-consuming process depending on size and number of distributed databases
• Require a lot of time and computing resources
• Duplication → process identifies one database as a master database and duplicates that database at
different locations
• Not complicated process but it makes sure that all distributed databases have same data
• users may change only master database
• Ensures that local data will not be overwritten
Distributed Databases
• Homogeneous Databases
• means that database technology is same at each of
the locations (or sites)
• Data at various locations are also compatible
• All nodes use same hardware and software for the
database system
• Comparatively easier to design and manage
• Conditions must be satisfied
• Operating system used at each location must be the
same or compatible
• Data structures used at each location must be the same
or compatible
• Database application (or DBMS) used at each location
must be the same or compatible
Distributed Databases
• Heterogeneous Databases
• Different sites or locations may have different hardware and software
• Data structures at various sites are also incompatible
• Different computers, operating systems, and database applications (or data models) may be
used at each of locations
• For example
• one location → have latest relational database management technology
• Another location → store data using a network database or an old version of the DMBS
• One location → have Windows NT operating system
• Another location → have the UNIX
• Usually used when individual sites use their own hardware and software
• Translations are required to allow communication between different sites
• Users must be able to make requests in a database language at their local sites
• Usually, SQL database language is used
• A user at one location may be able to read but unable to update data at another location
• Often not technically or economically feasible
Distributed Databases
Heterogeneous Databases
+Local data management - Complexity
+Improved performance - Higher cost of installation and
+Reliability and availability maintenance
+Modularity - Security
+Protection of data - Difficult to maintain integrity
+Independence - Lack of standards
+Low Communication Cost - Database design more complex
(More Economical)
PM Series
Database Management
by
CM Aslam & Aqsa Aslam