Unit I
Unit I
Unit I
systems, database approach, range of database applications, advantages, costs and risks,
components, Database Development process, IS development, three schema Architecture,
Database Analysis: E-R Model - Entities, attributes, Relationships, degree and
cardinality - case studies.
Introduction to Databases
Introduction:
Good decisions require good information that is derived from the data. Data are likely to be
managed most efficiently when they are stored in a database.
In this “information age,” production of accurate, relevant, and timely information is the
key to good decision making. Good decision making is the key to business survival in a
global market.
Database Terminology
A database management system (DBMS) is the software that manages and controls
access to the database.
A File based system is a collection of application programs that perform services for the end-
users. Each program defines and manages its own data. File-based systems were the older
systems. File based systems were introduced to computerize the manual filing system. The file-
based system was useful for efficient data access than the manual system. It uses a decentralized
storage model for its operational data. In this model, each department manages its data with its
own Data Processing (DP) components.
For example: To illustrate Traditional File Processing Systems definition, lets us take an
example of college where student record for examination is stored n other file and his library
record is stored in different file that creates many duplicate values like roll Number, Name and
Father Name.
The file-based system was developed for more efficient data access. It uses a decentralized
storage model for its operational data. In this model, each department manages its data with its
own Data Processing (DP) components. The following figure shows File based processing:
When data is isolated in separate files, it is more difficult to access data. It also causes the data
processing more difficult. So that, those separate files must be synchronized, in order to access
the correct data. But, this process becomes too difficult when there exists more than two files.
2. Duplication of data:
In a file based system, each department maintains their own files through their own application
programs (decentralized approach). It leads to the duplication of data in those departments.
Uncontrolled duplication of data leads to several problems, including:
o It costs time and money to enter the data more than once.
o It takes up additional storage space.
o It lead to loss of data integrity.
3. Data dependence:
In a File based system, the physical structure and storage of the data files are defined in the
application code. Making any changes to the existing structure could be very time-consuming
and subject to error. This characteristic of file-based systems is known as program–data
dependence.
The structure of files are dependent on the application programming language. For example,
the structure of a file generated by a COBOL program may be different from the structure of a
file generated by a C program. Such incompatibility makes it difficult to process the files
jointly.
In a file-based system, the type of queries/reports are fixed. There was no facility for ad hoc
queries. Sometimes it may contain huge number of files and application programs. This put
tremendous pressure on the DP staff and it results inefficient programs, limited documentation,
and difficult maintenance.
It also causes the following:
Database Approach
The efforts to overcome the limitations and problems with file based systems resulted a new
approach called Database and the Database Management System (DBMS).
Database
A Database is a shared collection of logically related data and its description, designed to
meet the information needs of an organization.
A database is a single, possibly large repository of data that can be used simultaneously by
many departments and users.
It holds the organization’s operational data and also holds a description of this data. For
this reason, a database is also defined as a self-describing collection of integrated
records.
The description of the data is known as the system catalog or data dictionary or
metadata
The users of a database object see only the external definition and are unaware of how
the object is defined and how it functions. This approach is known as data abstraction.
A Database management System (DBMS) is a software system that enables users to define,
create, maintain, and control access to the database.
The DBMS is the software that interacts with the user’s application programs and the database.
It allows users to define the database by using a Data Definition Language (DDL).
It allows users to insert, update, delete, and retrieve data from the database by using a
Data Manipulation Language (DML).
It supports a query language. The most common query language is the Structured
Query Language (SQL, pronounced “S-Q-L”, or sometimes “See-Quel”).
It provides controlled access to the database. For example, it may provide:
It shows that different departments using their application programs to access the
database through the DBMS. The physical structure and storage of the data are managed
by the DBMS.
Views
A View is a subset of the database that allows each user to have his or her own view of
the database views have several other benefits:
Views provide a level of security. Views can be set up to exclude data that some users
should not see.
Views provide a mechanism to customize the appearance of the database. For example,
A user of a department may change a field by a more obvious name of his choice.
A view can present a consistent, unchanging picture of the structure of the database,
even if the underlying database is changed. Thus, a view helps provide the program–
data independence
1. Hardware
2. Software
3. Data
4. Procedures
5. People
The DBMS and the applications require hardware to run. The hardware can range from a single
personal computer to a single mainframe or a network of computers. The particular hardware
depends on the organization’s requirements and the DBMS used.
2. Software
The software component includes the DBMS, application programs, operating system, network
software etc.
3. Data
Data is the most important component of the DBMS environment. It acts as a bridge between
the machine components and the human components. The database contains both the
operational data and the metadata.
4. Procedures
Procedures are the instructions and rules that govern the design and use of the database. The
users of the system and the staff require the procedures on how to use or run the system.
5. People
People refers the different types of users who involved in the database environment. There are
four different types of people: data and database administrators, database designers, application
developers, and end-users.
When designing a database, the organization has to think of the data first and the application
second. This approach is sometimes referred to as a paradigm shift. The database design
activity is very crucial. A poorly designed database will generate errors that may lead to bad
decisions, which may have serious effect on the organization. A well-designed database
provides the correct information for the decision-making process to succeed in an efficient
way.
Unfortunately, many organizations and the designers are not using the proper database design
methodologies. This is a major cause of failure in the development of database systems. Lack
of structured approaches makes the database design and maintenance very difficult.
Q.6). List out and explain various roles involved in the database environment.
Data and database administration are the roles of management and control of a DBMS and its
data.
Data Administrator (DA) is responsible for the management of the data resources. It
includes database planning; development and maintenance of standards, policies and
procedures; and conceptual/logical database design.
The role of the DBA is more technically oriented than the role of the DA, requiring detailed
knowledge of the target DBMS and the system environment
2. Database Designers
There may be two types of designers: logical database designers and physical database
designers.
The logical database designer is concerned with identifying the data, the relationships
between the data, and the constraints on the data to be stored in the database.
The physical database designer decides how the logical database design is to be
physically realized.
It involves:
Mapping the logical database design into a set of tables and integrity constraints.
Selecting specific storage structures and access methods for the data to achieve good
performance.
Logical database design are concerned with the what, physical database design is concerned
with the how.
3. Application Developers
Application developers are the people that can implement the application programs. The
application developers work from a specification produced by systems analysts. Each program
contains statements that request the DBMS to perform some operation on the database, which
includes retrieving data, inserting, updating, and deleting data.
4. End-Users
The end-users are the “clients” of the database. End-users can be classified according to the
way they use the system:
Naive users are typically unaware of the DBMS. They access the database through
specially written application programs. They invoke database operations by entering
simple commands or choosing options from a menu.
Sophisticated users. The sophisticated end-user is familiar with the structure of the
database and the facilities offered by the DBMS. Sophisticated end-users may use a
high-level query language such as SQL to perform the required operations.
The database management system has promising potential advantages and there exists
disadvantages too.
Advantages
Control of data redundancy: The database approach controls the amount of redundancy
inherent in the database. Sometimes it is necessary to maintain redundancy to model
relationships and to improve performance.
More information from the same amount of data: With the integration of the operational
data, it may be possible for the organization to derive additional information from the same
data.
Sharing of data: The database belongs to the entire organization and can be shared by all
authorized users. In this way, more users share more of the data.
Improved data integrity: Database integrity refers to the validity and consistency of stored
data. Integrity is usually expressed in terms of constraints. A Constraint is a consistency rule
that the database is not permitted to violate.
Improved security: Database security is the protection of the database from unauthorized
users. Security may take the form of user names and passwords to identify people authorized
to use the database.
Enforcement of standards: Database integration allows the DBMS to enforce the necessary
standards. These may include exchange of data between systems, naming conventions,
documentation standards, update procedures, and access rules.
Economy of scale: Combining all the organization’s operational data into one database and
creating applications that work on this one database can result in cost savings.
Balance of conflicting requirements: As the database is under the control of the DBA, the
DBA can make decisions about the design and use of the database without conflicts.
Improved data accessibility and responsiveness: The database integrity feature provides the
data be directly accessible to the end-users. This provides more functionality and better services
to the end-users.
Improved maintenance through data independence: DBMS separates the data descriptions
from the applications. It makes the applications independent of the changes to the data
descriptions. This is known as data independence. It simplifies the database application
maintenance.
Increased concurrency: DBMSs allows concurrent database access and ensures database
integrity. DBMSs employs various concurrency control techniques.
Improved backup and recovery services Modern DBMSs requires less processing to restore
a database. They also provide automatic backup facilities.
Disadvantages
Complexity A good DBMS makes the DBMS an extremely complex piece of software. Failure
to understand the system can lead to bad design decisions.
Size An enterprise DBMS occupies many megabytes of disk space and requires extensive
amounts of memory.
Cost of DBMSs: The cost of DBMSs varies significantly. A large mainframe multi-user
DBMS can be extremely expensive. It also involves annual maintenance cost.
Additional hardware costs: The disk storage requirements for the DBMS may require
additional storage space. Sometimes it requires to purchase a larger machine for better
performance.
Cost of conversion: Sometimes the cost of converting older applications (legacy system) to
run on the new DBMS and hardware is too high.
Performance: A modern DBMS might need to run many applications. As a result, some
applications may not run as fast as they used to.
Greater impact of a failure: The centralization of resources increases the vulnerability of the
system.
In 1975, The American National Standards Institute (ANSI) Standards Planning and
Requirements Committee (SPARC), or ANSI/X3/SPARC, produced a three-level architecture.
This three-level architecture contains:
1. External Level,
2. Conceptual Level
3. Internal Level
1. External Level
This is the users’ view of the database. This level describes that part of the database that is
relevant to each user. Different views may have different representations of the same data. For
example, one user may view dates in the form (day, month, year), while another may view
dates as (year, month, day).
2. Conceptual Level
This is the community view of the database. This level describes what data is stored in the
database and the relationships among the data. This level contains the logical structure of the
entire database as seen by the DBA. It is a complete view of the data requirements of the
organization that is independent of any storage considerations. The conceptual model is
independent of both software and hardware.
3. Internal Level
This is the physical representation of the database on the computer. This level describes how
the data is stored in the database. The internal level covers the physical implementation of the
database. It covers the data structures and file organizations used to store data on storage
devices. Internal model is hardware-independent
There is a Physical level below the internal level. It may be managed by the operating system
under the direction of the DBMS. It describes the way how the data are saved on storage media.
The Physical Model is both software- and hardware dependent.
1.Personal Computer Database: Personal computer (PC) databases are designed to support
one user with a standalone personal computer (for example: a desktop or a laptop computer).
For example; consider a company that has a number of sales persons who call on actual or
prospective customers. Each salesperson might carry a laptop computer with a simple database
application to record customer information and the details of contacts with each customer.
Some of the key decisions that must be made in developing personal computer databases
are the following:
Should the application be purchased from an outside vendor or developed within the
organization?
What data are required by the user and how should the database design?
Who is responsible for accuracy of the data in the personal computer database?
In establishing a workgroup database, the organization must answer the same questions that
applied to personal computer databases. In addition, the following database management
question arise:
How can the design of the database be optimized for a variety of group members’
information requirement?
How can the various members’ use the database concurrently without comprising the
integrity of the database?
Typical questions that must be addressed when designing and implementing department
databases include the following:
How can the database and its environment be designed to produce adequate
performance give the large number of users and user transaction?
What database and application development tools should be used in this complex
environment?
Do other departments maintain the same type of data, and if so, how can data
redundancy and consistency of database managed?
4. Enterprise Database: An enterprise database is one whose scope is the entire organization
or enterprise. Such databases are extended to support organization-wide operation and decision
making. Arguably the most important type of enterprise database. Today is called a data
warehouse. A data warehouse is an integrated decision support database whose content is
derived from the various operational databases (such personal computer workgroup &
department databases).
Consider a large health care organization that operates a group of medical centers including
hospital, clinic and nursing home. As shown in the above figure each of this medical center has
a separate database. These databases contain data concerning patient, physician, medical
services, business operation etc.
Several questions that often arise in the context of an enterprise database are the
following:
How should the data be distributed among the various locations in the corporate
structure?
How can the organization develop and maintain standards concerning data names,
definitions, formats & related issues?
Database System Application:
Databases are widely used. Some representative applications are as follows:
1. Banking: For customer information, accounts, loans and banking transaction.
3. Universities: For student information, course, registration and results with grades.
4. Credit Card Transaction: For purchases on credit cards and generation of monthly
statements.
8. Online Retailers: For sales data noted above plus online order tracking, generation of
recommendation lists and maintenance of online product evaluations.
9. Manufacturing: For management of the supply chain and tracking production of items
in factories, inventories of items in warehouse and stores and orders for items.
10. Human Resources: For information about employees, salaries, payroll taxes, benefits
and for generation of pay checks.
In database approach in order to maintain or develop database we should take a risk and we
should invest money, time and environment. Database approach when we develop a new
database or when we maintain an existing database, we should consider the following points.
New specialized persons.
Installation and manage cost and complexity.
Conversion costs.
Need for explicit back up and recovery.
Organizational conflict.
New specialized persons: Organization adopted the database approach to maintain individuals
to design and implement database for that an organization provides DBA and manage a staff
of new people because there is a rapid changes in the technology .New people will have to
be retrained or upgraded their knowledge in database approach in an organization required
specialized skill persons to maintain the database.
Installation and manage cost and complexity: A multi user database management system is
a large and complex . It has a high level install cost requires a staff to trained persons to install
and operate . It also require professionals to maintain annual maintenance and cost .These
systems are require to install new software to upgrade the database for that we should
take a risk to modify Hard ware and data communications in an organization.
Components of a DBMS
A DBMS is partitioned into several software components. The major software components in
a
File manager: The file manager manipulates the underlying storage files and manages
the allocation of storage space on disk. It establishes and maintains the list of structures
and indexes defined in the internal schema.
DDL compiler: The DDL compiler converts DDL statements into a set of tables
containing metadata. These tables are then stored in the system catalog while control
information is stored in data file headers.
Catalog manager: The catalog manager manages access to and maintains the system
catalog. The system catalog is accessed by most DBMS components.
(a) A system is a set of related components which work together to perform a task or a specific
goal. Goals or objectives are completed by interfacing with the environment and the functions.
(b) An information system manipulates the data rather than the physical objects. This system
accepts the data from its environment, process it, and gives the required output for decision
making.
(c) Database is an essential component of the information system. Database provides the long
term memory for information system. The long term memory are such as in terms of entity and
relationship.
(d) Information system contains database as well as the information regarding people, place,
input data, procedure, output data, software, hardware, etc.
Normally, development of a system mean to create, update, maintain that particular system.
Likewise information system development process is a procedure through which we can
update, maintain the whole information system. The development process consists of several
phases through which the system is to be developed. These may include certain backtracking
process which may help to correct and modify the errors during the development process.
Information System Development Life Cycle
A life cycle is the diagrammatic representation of the development process. This cycle is easy
to understand to the customer as well as to the programmer. The information system
development life cycle consists of the following phases:
It is the first phase of the life cycle. It produces the problem statement and the feasibility study.
The problem statements include the objectives, constraints and scope of the system. The
feasibility study identifies the cost and benefit of the system. If the system is feasible, then the
system analysis phase will start.
System analysis phase is the second phase in the life cycle model. It comes after the completion
of feasibility study. This phase produces requirements describing processes, data and
environment interaction. In this phase, the diagrammatic techniques are used to document
processes, data and environment interactions. To produce requirements, the current system is
studied.
System design phase is the third phase in the information system development life cycle. It
comes after the system analysis phase. This phase produces the plans to efficiently implement
the requirements. Design specifications are created for processes , data and environment
interaction. The design specification focuses on choices to optimize resources of the given
constraints.
Goals of Database Development
Normally the goal of database development is to create a database which contains a high quality
of data and provides the important access to the organisation. Beside this there are certain other
goals of database development. These are:
To produce an operational database we need to define external, internal and conceptual schema
and supply the data to the database. To create these schema, we have to follow some processes.
The first two phases are concerned with the information content of the database whereas the
last two phases are concerned with the efficient implementation. The phases are listed below.
(a) Conceptual data modelling
As a database designer, we have the following two types skills in database development.
(a) Soft skill:- It is the first skill in database development. These type of skills are qualitative,
subjective and people oriented. Qualitative skill emphasize the generation of feasible
alternatives.
(b) Hard skill:– This is the second type of skill in database development. This type of skill is
quantitative, objective and data oriented. The quantitative disciplines are statistics, operational
management or any mathematical model.
ER Model - Basic Concepts
The ER model defines the conceptual view of a database. It works around real-world entities
and the associations among them. At view level, the ER model is considered a good option
for designing databases.
Entity
An entity can be a real-world object, either animate or inanimate, that can be easily
identifiable. For example, in a school database, students, teachers, classes, and courses offered
can be considered as entities. All these entities have some attributes or properties that give
them their identity.
An entity set is a collection of similar types of entities. An entity set may contain entities with
attribute sharing similar values. For example, a Students set may contain all the students of a
school; likewise a Teachers set may contain all the teachers of a school from all faculties.
Entity sets need not be disjoint.
Attributes
Entities are represented by means of their properties, called attributes. All attributes have
values. For example, a student entity may have name, class, and age as attributes.
There exists a domain or range of values that can be assigned to attributes. For example, a
student's name cannot be a numeric value. It has to be alphabetic. A student's age cannot be
negative, etc.
Types of Attributes
Simple attribute − Simple attributes are atomic values, which cannot be divided
further. For example, a student's phone number is an atomic value of 10 digits.
Composite attribute − Composite attributes are made of more than one simple
attribute. For example, a student's complete name may have first_name and last_name.
Derived attribute − Derived attributes are the attributes that do not exist in the
physical database, but their values are derived from other attributes present in the
database. For example, average_salary in a department should not be saved directly in
the database, instead it can be derived. For another example, age can be derived from
data_of_birth.
Single-value attribute − Single-value attributes contain single value. For example −
Social_Security_Number.
Multi-value attribute − Multi-value attributes may contain more than one values. For
example, a person can have more than one phone number, email_address, etc.
These attribute types can come together in a way like −
Binary = degree 2
Ternary = degree 3
n-ary = degree
Mapping Cardinalities
Cardinality defines the number of entities in one entity set, which can be associated with the
number of entities of other set via relationship set.
One-to-one − One entity from entity set A can be associated with at most one entity of
entity set B and vice versa.
One-to-many − One entity from entity set A can be associated with more than one
entities of entity set B however an entity from entity set B, can be associated with at
most one entity.
Many-to-one − More than one entities from entity set A can be associated with at most
one entity of entity set B, however an entity from entity set B can be associated with
more than one entity from entity set A.
Many-to-many − One entity from A can be associated with more than one entity from
B and vice versa.
ER Diagram Representation
Let us now learn how the ER Model is represented by means of an ER diagram. Any object,
for example, entities, attributes of an entity, relationship sets, and attributes of relationship
sets, can be represented with the help of an ER diagram.
Entity
Entities are represented by means of rectangles. Rectangles are named with the entity set they
represent.
Attributes
Attributes are the properties of entities. Attributes are represented by means of ellipses. Every
ellipse represents one attribute and is directly connected to its entity (rectangle).
If the attributes are composite, they are further divided in a tree like structure. Every node is
then connected to its attribute. That is, composite attributes are represented by ellipses that are
connected with an ellipse.
Multivalued attributes are depicted by double ellipse.
Relationship
Relationships are represented by diamond-shaped box. Name of the relationship is written
inside the diamond-box. All the entities (rectangles) participating in a relationship, are
connected to it by a line.
Binary Relationship and Cardinality
A relationship where two entities are participating is called a binary relationship. Cardinality
is the number of instance of an entity from a relation that can be associated with the relation.
One-to-one − When only one instance of an entity is associated with the relationship,
it is marked as '1:1'. The following image reflects that only one instance of each entity
should be associated with the relationship. It depicts one-to-one relationship.
Many-to-one − When more than one instance of entity is associated with the
relationship, it is marked as 'N:1'. The following image reflects that more than one
instance of an entity on the left and only one instance of an entity on the right can be
associated with the relationship. It depicts many-to-one relationship.
Many-to-many − The following image reflects that more than one instance of an entity
on the left and more than one instance of an entity on the right can be associated with
the relationship. It depicts many-to-many relationship.
Participation Constraints
Total Participation − Each entity is involved in the relationship. Total participation is
represented by double lines.
Partial participation − Not all entities are involved in the relationship. Partial
participation is represented by single lines.