1 Unit
1 Unit
1 Unit
A database is a collection of data, typically describing the activities of one or more related
organizations. For example, a university database might contain information about the following:
DBMS
A HISTORICAL PERSPECTIVE
The first general-purpose DBMS, designed by Charles Bachman at General Electric in the early
1960s, was called the Integrated Data Store.
In the late 1960s, IBM developed the Information Management System (IMS) DBMS, used even
today in many major installations.
In 1970, Edgar Codd, at IBM's San Jose Research Laboratory, proposed a new data
representation framework called the relational data model.
In the 1980s, the relational model consolidated its position as the dominant DBMS paradigm,
and database systems continued to gain widespread use.
SQL was standardized in the late 1980s, and used as the current standard.
Page 1
Page 2
APPLICATIONS:
Sales: For customer, product, and purchase information.
◦ Accounting: For payments, receipts, account balances, assets and other accounting
information.
◦ Human resources: For information about employees, salaries, payroll taxes, and benefits, and
for generation of paychecks.
◦ Manufacturing: For management of the supply chain and for tracking production of items in
factories, inventories of items inware houses and stores, and orders for items.
Online retailers: For sales data noted above plus online order tracking, generation of
recommendation lists, and maintenance of online product evaluations.
◦ Credit card transactions: For purchases on credit cards and generation of monthly statements.
Page 3
◦ Finance: For storing information about holdings, sales, and purchases of financial instruments
such as stocks and bonds; also for storing real-time market data to enable online trading by
customers and automated trading by the firm.
• Universities: For student information, course registrations, and grades (in addition to
• Airlines: For reservations and schedule information. Airlines were among the first to use
databases in a geographically distributed manner.
maintaining balances on prepaid calling cards, and storing information about the
communication networks.
Advantages of DBMS:
Controlling of Redundancy: Data redundancy refers to the duplication of data (i.e storing same
data multiple times). In a database system, by having a centralized database and centralized
control of data by the DBA the unnecessary duplication of data is avoided. It also eliminates the
extra time for processing the large volume of data. It results in saving the storage space.
Improved Data Sharing : DBMS allows a user to share the data in any number of application
programs.
Data Integrity : Integrity means that the data in the database is accurate. Centralized control of
the data helps in permitting the administrator to define integrity constraints to the data in the
database. For example: in customer database we can can enforce an integrity that it must accept
the customer only from Noida and Meerut city.
Security : Having complete authority over the operational data, enables the DBA in ensuring
that the only mean of access to the database is through proper channels. The DBA can define
authorization checks to be carried out whenever access to sensitive data is attempted.
Data Consistency : By eliminating data redundancy, we greatly reduce the opportunities for
inconsistency. For example: is a customer address is stored only once, we cannot have
disagreement on the stored values. Also updating data values is greatly simplified when each
value is stored in one place only. Finally, we avoid the wasted storage that results from
redundant data storage.
Efficient Data Access : In a database system, the data is managed by the DBMS and all access
Page 4
to the data is through the DBMS providing a key to effective data processing
Page 5
Data Independence : In a database system, the database management system provides the
interface between the application programs and the data. When changes are made to the data
representation, the meta data obtained by the DBMS is changed but the DBMS is continues to
provide the data to application program in the previously used way. The DBMs handles the task
of transformation of data wherever necessary.
Data Administration: When several users share the data, centralizing the administration of
data can offer sig11ificant improvements. Experienced professionals who understand the nature
of the data being managed, and how different groups of users use it, can be responsible for
organizing the data representation to minimize redundancy and for fine-tuning the storage of the
data to make retrieval efficient.
Concurrent Access and Crash Recovery: A DBMS schedules concurrent accesses to the data
in such a manner that users can think of the data as being accessed by only one user at a time.
Further, the DBMS protects users from the effects of system failures.
Disadvantages of DBMS
1) It is bit complex. Since it supports multiple functionality to give the user the best, the
underlying software has become complex. The designers and developers should have thorough
knowledge about the software to get the most out of it.
2) Because of its complexity and functionality, it uses large amount of memory. It also needs
large memory to run efficiently.
3) DBMS system works on the centralized system, i.e.; all the users from all over the world
access this database. Hence any failure of the DBMS, will impact all the users.
4) DBMS is generalized software, i.e.; it is written work on the entire systems rather specific
one. Hence some of the application will run slow.
Page 6
DATA MODELS:
A Database model defines the logical design and structure of a database and defines how data
will be stored, accessed and updated in a database management system.
Relational model
The most common model, the relational model sorts data into tables, also known as relations,
each of which consists of columns and rows. Each column lists an attribute of the entity.
Hierarchical Model
This model organizes data into a tree-like-structure, with a single root, to which all the other
data is linked. The hierarchy starts from the Root data, and expands like a tree, adding child
nodes to the parent nodes.
Page 7
Network Model
This is an extension of the Hierarchical model. In this model data is organized in the form of
graph, and are allowed to have more than one parent node. In this database model data is more
related as more relationships are established in this database model. Also, as the data is more
related, hence accessing the data is also easier and fast.
Entity-Relationship Model
Entity-Relationship (ER) Model is based on the notion of real-world entities and relationships
among them.
ER Model is based on −
Page 8
Entity − An entity in an ER Model is a real-world entity having properties
called attributes. Every attribute is defined by its set of values called domain. For
example, in a school database, a student is considered as an entity. Student has various
attributes like name, age, id etc.
LEVELS OF ABSTRACTION
• Physical level (or Internal View / Schema): The lowest level of abstraction describes how
the data are actually stored. The physical level describes complex low-level data structures. It
explains various types of stored records, what indexes exist, and so on.
• Logical level (or Conceptual View / Schema): The next-higher level of abstraction describes
what data are stored in the database, and what relationships exist among those data. The logical
level thus describes the entire database in terms of a small number of relatively simple structures.
This level of abstraction is used by Database Administrator (DBA)
• View level (or External View / Schema): The highest level of abstraction describes only part
of the database which are requested by the user, but not the entire database. Many users of the
database system do not need all this information; instead, they need to access only a part of the
database. The view level of abstraction exists to simplify their interaction with the system. The
Page 9
system may provide many views for the same database. Any given database has exactly one
conceptual schema and one physical schema but it can have more than one external schema.
This schema has permission to define the records and retrieve them.
DATA INDEPENDENCE
Data Independence is defined as a property of DBMS that helps you to change the Database
schema at one level of a database system without requiring to change the schema at the next
higher level.
o Physical data independence can be defined as the capacity to change the internal
schema without having to change the conceptual schema.
o If we do any changes in the storage size of the database system server, then
the Conceptual structure of the database will not be affected.
o Physical data independence is used to separate conceptual levels from the internal levels.
o Logical data independence is used to separate the external level from the
conceptual view.
o If we do any changes in the conceptual view of the data, then the user view of the
data would not be affected.
Page 10
STRUCTURE OF DBMS:
Page 11
Query Processor:
It takes commands from the users, evaluates plans, executes these plans and returns the answer .
DML Pre-compiler : It translates DML statements in a query language into low level
instructions that query evaluation engine understands. It also attempts to transform user's
request into an equivalent but more efficient form.
Embedded DML Pre-compiler : It converts DML statements embedded in an
application program to normal procedure calls in the host language. The Pre-compiler
must interact with the DML compiler to generate the appropriate code.
DDL Interpreter : It interprets the DDL statements and records them in a set of tables
containing meta data or data dictionary.
Query Evaluation Engine : It executes low-level instructions generated by the DML
compiler.
Application program object code: These are the low level instructions of the programs
written by the users, which query evaluation engine understands and execute them.
Storage Manager
A storage manager is a program module that provides the interface between the low level data
stored in the database and the application programs and queries submitted to the system. The
storage manager is responsible for the interaction with the file manager. The storage manager
components include:
Authorization and integrity manager, which tests for the satisfaction of integrity
constraints and checks the authority of users to access data.
Transaction manager, which ensures that the database remains in a consistent (correct)
state despite system failures, and that concurrent transaction executions proceed without
conflicting.
Page 12
File manager, which manages the allocation of space on disk storage and the data
structures used to represent information stored on disk.
Buffer manager, which is responsible for fetching data from disk storage into main
memory, and deciding what data to cache in main memory. The buffer manager is a
critical part of the database system, since it enables the database to handle data sizes that
are much larger than the size of main memory.
Lock Manager: Keep tracks of request for locks and grant locks on data.
Recovery manager: Responsible for maintaining a log and restoring system to
consistent state after a crash.
Following data structures are required as a part of the physical system implementation.
• Data Dictionary : It stores meta data (data about data) about the structure of the database.
• Indices : Provide fast access to data items that hold particular values.
• Statistical Data : It stores statistical information about the data in the database. This
information is used by query processor to select efficient ways to execute query.
Database Users are the one who interacts with the system. There can be four different types of
users according to the way they interact with the system and for all the different users, different
kind of user interfaces are designed as well. The four types of users are:
1. Naive users
2. Application Programmers
3. Sophisticated Users
4. Specialized users
Page 13
1) Naive Users
Naive users are also termed as unsophisticated users as they interact with the system by calling
anyone application program that has been written previously.
For e.g. – To transfer the program from one account to another, there is a need for an application
program called transfer.
The user interface that is required for the naïve users is a forms interface, in which the user can
fill the required fields.
Naive users can also easily read the reports that are generated from the database.
2) Application programmers
Application programmers are the one who is responsible to write the application programs. They
develop user interfaces through different tools. To construct the forms and the reports such that
there is no need to write the program, there is a tool named Rapid Application Development
(RAD).
Some special type of programming languages is also available such that includes vital control
structures such as for loops, while loops and many others with the data manipulation language's
statements. These special programming languages are termed as fourth-generation languages and
they include the special features to provide the ability for the generation of the forms and to
display the data on the screen.
3) Sophisticated users
Sophisticated users aren't interested in writing programs and they interact with the system
without writing any programs. Contrary, they use database query languages to interact with the
system.
Sophisticated Users submit their queries to a query processor. Query Processor provides the
facility to break the DML statements into the instruction that can be understood by the storage
manager.
Analysts are one among the sophisticated users. They use the tools to perform their task such as:
Page 14
4) Specialized users
Among Sophisticated Users, there are Specialized Users too who used to write specialized
database applications that differ to the traditional framework of data processing.
These users use the interfaces that are computer-aided design systems, knowledge base and
systems that store the complex data types and also environment-modeling systems.
FUNCTIONS OF DBA
Functions of a DBA include
Schema definition: The DBA creates the original database schema by executing a set
of data definition statements in the DDL.
Storage structure and access-method definition: Creates storage structure and access
methods by writing a set of definations which are translated by DDL complier.
Schema and physical-organization modification: The DBA carries out changes to the
schema and physical organization to reflect the changing needs of the organization, or
to alter the physical organization to improve performance.
Granting of authorization for data access. By granting different types of
authorization, the database administrator can regulate which parts of the database various
users can access. The authorization information is kept in a special system structure that
the database system consults when ever someone attempts to access the data in the
system.
Integrity Constraints: The data stored in the database must satisfy the
constraints specified by the DBA.
Routine maintenance. Examples of the database administrator’s routine maintenance
activities are:
1. Periodically backing up the database, either onto tapes or onto remote servers, to
prevent loss of data in case of disasters.
2. Ensuring that enough free disk space is available for normal operations, and
upgrading disk space as required.
Page 15
Page 16
3. Monitoring jobs running on the database and ensuring that performance is not degraded
by very expensive tasks submitted by some users.
DATABASE DESIGN
The database design process can be divided into six steps. The ER model is most relevant to the
first three steps.
3. Logical Database Design: The conceptual database design is converted into a data
model. We will consider only relational DBMSs, and therefore, the task in the logical
design step is to convert an ER schema into a relational database schema.
Beyond ER Design
4. Schema Refinement: The fourth step in database design is to analyze the collection of
relations in our relational database schema to identify potential problems, and to refine it.
5. Physical Database Design: This step may simply involve building indexes on some
tables and clustering some tables, or it may involve a substantial redesign of parts of the
database schema obtained from the earlier design steps.
Page 17
6. Application and Security Design: Any software project that involves a DBMS must
consider applications that involves describing the role of each entity. For security we
must identify the parts of the database that must be accessible and the parts of the
database that must not be accessible, and we must take steps to ensure that these access
rules are enforced.
An entity is an object or a thing that is distinguishable from other objects. For example, in
a College database, the entities can be Professor, Students, Courses, etc
Types of Entities:
Strong Entity
The strong entity has a primary key. Weak entities are dependent on strong entity. Its existence is
not dependent on any other entity.
Strong Entity is represented by a single rectangle:
Weak Entity
The weak entity in DBMS do not have a primary key and are dependent on the parent entity. It
mainly depends on other entities.
Weak Entity is represented by double rectangle:
Page 18
Attribute(s)
Entities are represented by means of their properties, called attributes. For example, Roll_No,
Name, DOB, Age, Address, Mobile_No are the attributes of Student. In ER diagram, attribute is
represented by an oval.
There exists a domain or range of values that can be assigned to attributes. For example, a
student's name cannot be a numeric value. It has to be alphabetic. A student's age cannot be
negative, etc.
1. Key Attribute –
The attribute which uniquely identifies each entity in the entity set is called key attribute.
For example, Roll_No will be unique for each student. In ER diagram, key attribute is
represented by an oval with underlying lines.
2. Composite Attribute
An attribute that can be divided into smaller independent attribute is known as composite
attribute. For example, Address attribute of student Entity type consists of Street, City,
State, and Country. In ER diagram, composite attribute is represented by an oval
comprising of ovals.
An attribute that has only single value for an entity is known as single valued attribute.
4. Multivalued Attribute
–
An attribute that can have multiple values for an entity is known as multi valued attribute.
Page 19
For example, Phone_No (can be more than one for a given student). In ER diagram,
multivalued attribute is represented by double oval.
5. Derived Attribute
An attribute which can be derived from other attributes is known as derived attribute.
e.g.; Age (can be derived from DOB). In ER diagram, derived attribute is represented by
dashed oval.
6. Null-Valued-Attribute
An attribute, which has not any value for an entity is known as null valued attribute.
For example, assume Student is an entity and its attributes are Name, Age, Address and
Phone no. There may be chance when a student has no phone no. In that case, phone no is
called null valued attributes.
7. Descriptive Attribute
The attribute used for describing the relationship is called descriptive attributes.
Page 20
ER Diagram
The ER or (Entity Relational Model) is a high-level conceptual data model diagram. Entity-
Relation model is based on the notion of real-world entities and the relationship between them.
ER- Diagram is a visual representation of data that describe how data is related to each other.
INSTANCE: Databases change over time as information is inserted and deleted. The collection
of information stored in the database at a particular moment is called an instance of the database.
Eg: Each variable has a particular value at a given instant. The values of the variables in a
program at a point in time corresponds to an instance of a database schema.
Page 21
SCHEMA: The overall design of the database is called the database schema.
Database systems have several schemas, partitioned according to the levels of abstraction. The
physical schema describes the database design at the physical level, the logical schema describes
the database design at the logical level and the external level where only a portion of database is
displayed.
{(e1, e2, … en) | e1 E1, e2 E2, …, en En} where (e1, e2, …, en) is a relationship
Each n-tuple denotes a relationship involving n entities el through en, where entity ei is in entity
set Ei .
Page 22
We show the relationship set Works_In, in which each relationship indicates a department in
which an employee works
Note: several relationship sets might involve the same entity sets. For example, we could also
have a Manages relationship set involving Employees and Departments.
INSTANCE OF REALTIONSHIP
Eg:
Page 23
MAPPING CARDINALITIES
Page 24
4. many-to-many - an entity in A is related to any number of entities in B, but an
entity in B is related to any number of entities in A.
Page 25
2. Binary Relationship
Binary relationship set is a relationship set where two entity sets participate in a
relationship set.
Ternary relationship set is a relationship set where three entity sets participate in a relationship
set.
Example-
Page 26
4. N-ary Relationship Set-
N-ary relationship set is a relationship set where ‘n’ entity sets participate in a relationship set.
ROLE INDICATOR
If an entity set plays more than one role, role indicators describe the different purpose in the
relationship.
1. Key Constraints
An employee can work in several departments, and a department can have several employees, as
illustrated in the works_In instance . Employee 231-31-5368 has worked in Department 51 since
3/3/93 and in Department 56 since 2/2/92. Department 51 has two employees.
Now consider another relationship set called Manages between the Employees and Departments
entity sets such that each department has at most one manager, although a single employee is
allowed to manage more than one department. The restriction that each department has at most
one manager is an example of a key constraint.
Intuitively, the arrow states that given a Departments entity, we can uniquely determine the
Manages relationship in which it appears.
Page 27
To indicate a key constraint on entity set E in relationship set R, we draw an arrow from E to R
(From Dept to manages)
A relationship set like Manages is sometimes said to be one-to-many, to indicate that one
employee can be associated with many departments (in thecapacity of a manager), whereas each
department can be associated with at most one employee as its manager. In contrast, the works-
.In relationship set, in which an employee is allowed to work in several departments and a
department is allowed to have several employees, is said to be many-to-many.
Page 28
Each employ works in at most one department and at a single location. Note that each
department can be associated with several employees and locations and each location can be
associated with several departments and employees; however, each employee is associated with
a single department and location.
3. Participation Constraints
1. Total participation
2. Partial participation
1. Total Participation
It specifies that each entity in the entity set must compulsorily participate in at least one
relationship instance in that relationship set.
Total participation is represented using a double line between the entity set and relationship
set.
2. Partial Participation
It specifies that each that each entity in the entity set may or may not participate in the
relationship instance in that relationship set.
Partial participation is represented using a single line between the entity set and relationship
set.
Page 29
Double line between entity set “Student” and relationship set “Enrolled In” signifies total
participation. It specifies that each student must be enrolled in at least one course.
Single line between the entity set “course” and relationship set “ enrolled In” signifies partial
participation. It specifies that there might exists some courses for which no enrollments have
been made.
4. WEAK ENTITIES
An entity type should have a key attribute which uniquely identifies each entity in the entity set,
but there exists some entity type for which key attribute can’t be defined. These are called Weak
Entity type. As the weak entities do not have any primary key, they cannot be identified on their
own, so they depend on some other entity (known as owner entity).
Weak entity is represented by double rectangle. The relation between one strong and one weak
entity is represented by double diamond.
Eg: The existence of rooms is entirely dependent on the existence of a hotel. So room can be
seen as the weak entity of the hotel.
5. CLASS HIERARCHIES
Generalization
Generalization is the process of extracting common properties from a set of entities and create a
generalized entity from it. It is a bottom-up approach in which two or more entities can be
generalized to a higher level entity if they have some attributes in common. For Example,
Page 30
STUDENT and FACULTY can be generalized to a higher level entity called PERSON . In this
case, common attributes like P_NAME, P_ADD become part of higher entity (PERSON) and
specialized attributes like S_FEE become part of specialized entity (STUDENT).
Specialization
In specialization, an entity is divided into sub-entities based on their characteristics. It is a top-
down approach where higher level entity is specialized into two or more lower level entities. For
Example, EMPLOYEE entity in an Employee management system can be specialized into
DEVELOPER, TESTER etc. as shown in Figure. In this case, common attributes like E_NAME,
E_SAL etc become part of higher entity (EMPLOYEE)
Page 31
6. Aggregation:
Aggregation allow us to indicate that a relationship set participate in another relationship set.
Page 32
\CONCEPTUAL DESIGN WITH ER
An alternative is to create an entity set called Addresses. This is used we have to record more
than one address for an employee.
Treating address as an entity better models a situation where one may want to keep extra
information about address such as city, pin, etc..
Page 33
Consider an example of a bank loan.
If this loan is held by one customer and customer is associated with one branch then this is
satisfactory. But what if the customer holds a loan jointly. Then two problems arises
1. The date, id is stored multiple times wasting storage.
2. Updates leave the data inconsistent.
So in such cases its better to model loan as entity rather than relationship.
Ternary relationships are required when binary relationships are not sufficient to
accurately describe the semantics of an association among three entities.
Suppose there is a database for a company that contains the entities, PRODUCT, SUPPLIER,
and CUSTOMER. The usual relationships might be PRODUCT/ SUPPLIER where the company
buys products from a supplier — a normal binary relationship. The intersection attribute for
PRODUCT/SUPPLIER is wholesale_price (as shown in A). Now consider the CUSTOMER
entity, and that the customer buys products. If all customers pay the same price for a product,
regardless of supplier, then you have a simple binary relationship between CUSTOMER and
PRODUCT. For the CUSTOMER/ PRODUCT relationship, the intersection attribute is
retail_price (as shown In B).
Page 34
A: A Binary Relationship of PRODUCT and SUPPLIER and an Intersection Attribute, wholesale_price
Now consider a different scenario. Suppose the customer buys products but the price depends
not only on the product, but also on the supplier. Suppose you needed a customerID, a
productID, and a supplierID to identify a price. Now you have an attribute that depends on three
things and hence you have a relationship between three entities (a ternary relationship) that will
have the intersection attribute, price. This situation is depicted in fig. C.
Page 35
Fig:C- Represents the entities PRODUCT, SUPPLIER, and CUSTOMER, and a relationship,
buy, among all three entities, shown by a single relationship diamond attached to all three
entities.
The choice between using aggregation or a ternary relationship is mainly determined by the
existence of a relationship that relates a relationship set to an entity set.
Page 36
A project can be sponsored by any number of departments, a department can sponsor one or
more projects, and each sponsorship is monitored by one or more employees. Consider the
constraint that each sponsorship (of a project by a department) be monitored by at most one
employee. We cannot express this.
If we don't need to record the until attribute of Monitors, then we might reasonably use a ternary
relationship,
Page 37