Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

DBMS and Data Model PDF

Download as pdf or txt
Download as pdf or txt
You are on page 1of 102

DBMS and Data

Model
KARTHICK SELVAM
What is data?

 Data is a collection of a distinct small unit of information. It can be


used in a variety of forms like text, numbers, media, bytes, etc. it can
be stored in pieces of paper or electronic memory, etc.
 Word 'Data' is originated from the word 'datum' that means 'single
piece of information.' It is plural of the word datum.
 In computing, Data is information that can be translated into a form
for efficient movement and processing. Data is interchangeable.
What is database?

 A database is an organized collection of data, so that it can be easily accessed and


managed.
 You can organize data into tables, rows, columns, and index it to make it easier to
find relevant information.
 Database handlers create a database in such a way that only one set of software
program provides access of data to all the users.
 The main purpose of the database is to operate a large amount of information by
storing, retrieving, and managing data.
 There are many dynamic websites on the World Wide Web nowadays which are
handled through databases. For example, a model that checks the availability of
rooms in a hotel. It is an example of a dynamic website that uses a database.
 There are many databases available like MySQL, Sybase, Oracle, MongoDB, Informix,
PostgreSQL, SQL Server, etc.
 Modern databases are managed by the database management system (DBMS).
 SQL or Structured Query Language is used to operate on the data stored in a
database. SQL depends on relational algebra and tuple relational calculus
DBMS

 Database Management System (DBMS) is a software for storing and


retrieving users' data while considering appropriate security
measures. It consists of a group of programs which manipulate the
database. The DBMS accepts the request for data from an
application and instructs the operating system to provide the
specific data. In large systems, a DBMS helps users and other third-
party software to store and retrieve data.
 DBMS allows users to create their own databases as per their
requirement. The term “DBMS” includes the user of the database
and other application programs. It provides an interface between
the data and the software application
History of DBMS

• 1960 - Charles Bachman designed first DBMS system


• 1970 - Codd introduced IBM'S Information Management System (IMS)
• 1976- Peter Chen coined and defined the Entity-relationship model also
know as the ER model
• 1980 - Relational Model becomes a widely accepted database
component
• 1985- Object-oriented DBMS develops.
• 1990s- Incorporation of object-orientation in relational DBMS.
• 1991- Microsoft ships MS access, a personal DBMS and that displaces all
other personal DBMS products.
• 1995: First Internet database applications
• 1997: XML applied to database processing. Many vendors begin to
integrate XML into DBMS products.
Characteristics of DBMS

• Provides security and removes redundancy


• Self-describing nature of a database system
• Insulation between programs and data abstraction
• Support of multiple views of the data
• Sharing of data and multiuser transaction processing
• DBMS allows entities and relations among them to form tables.
• It follows the ACID concept ( Atomicity, Consistency, Isolation, and
Durability).
• DBMS supports multi-user environment that allows users to access
and manipulate data in parallel.
Users in a DBMS environment

The Application programmers write


Application Programmers programs in various programming
languages to interact with databases.
Database Admin is responsible for
managing the entire DBMS system.
Database Administrators
He/She is called Database admin or
DBA.
The end users are the people who
interact with the database
End-Users management system. They conduct
various operations on database like
retrieving, updating, deleting, etc.
Popular DBMS Software's

• MySQL
• Microsoft Access
• Oracle
• PostgreSQL
• dBASE
• FoxPro
• SQLite
• IBM DB2
• LibreOffice Base
• MariaDB
• Microsoft SQL Server etc.
Application of DBMS
For customer information, account activities, payments, deposits,
Banking
loans, etc.

Airlines For reservations and schedule information.

Universities For student information, course registrations, colleges and grades.

It helps to keep call records, monthly bills, maintaining balances,


Telecommunication
etc.

For storing information about stock, sales, and purchases of


Finance
financial instruments like stocks and bonds.

Sales Use for storing customer, product & sales information.

It is used for the management of supply chain and for tracking


Manufacturing
production of items. Inventories status in warehouses.

For information about employees, salaries, payroll, deduction,


HR Management
generation of paychecks, etc.
Types of DBMS

• Hierarchical database
• Network database
• Relational database
• Object-Oriented database
Types of DBMS

 Hierarchical DBMS
 In a Hierarchical database, model data is organized in a tree-like structure.
Data is Stored Hierarchically (top down or bottom up) format. Data is
represented using a parent-child relationship. In Hierarchical DBMS parent
may have many children, but children have only one parent.
 Network Model
 The network database model allows each child to have multiple parents. It
helps you to address the need to model more complex relationships like as
the orders/parts many-to-many relationship. In this model, entities are
organized in a graph which can be accessed through several paths.
 Relational model
 Relational DBMS is the most widely used DBMS model because it is one of
the easiest. This model is based on normalizing data in the rows and columns
of the tables. Relational model stored in fixed structures and manipulated
using SQL.
Types of DBMS

 Object-Oriented Model
 In Object-oriented Model data stored in the form of objects. The
structure which is called classes which display data within it. It defines a
database as a collection of objects which stores both data members
values and operations.
Advantages of DBMS

• DBMS offers a variety of techniques to store & retrieve data


• DBMS serves as an efficient handler to balance the needs of multiple
applications using the same data
• Uniform administration procedures for data
• Application programmers never exposed to details of data representation and
storage.
• A DBMS uses various powerful functions to store and retrieve data efficiently.
• Offers Data Integrity and Security
• The DBMS implies integrity constraints to get a high level of protection against
prohibited access to data.
• A DBMS schedules concurrent access to the data in such a manner that only
one user can access the same data at a time
• Reduced Application Development Time
Disadvantage of DBMS

 DBMS may offer plenty of advantages but, it has certain flaws-


• Cost of Hardware and Software of a DBMS is quite high which
increases the budget of your organization.
• Most database management systems are often complex systems, so
the training for users to use the DBMS is required.
• In some organizations, all data is integrated into a single database
which can be damaged because of electric failure or database is
corrupted on the storage media
• Use of the same program at a time by many users sometimes lead
to the loss of some data.
• DBMS can't perform sophisticated calculations
When not to use a DBMS system?

 Although, DBMS system is useful. It is still not suited for specific task
mentioned below:
 Not recommended when you do not have the budget or the
expertise to operate a DBMS. In such cases, Excel/CSV/Flat Files
could do just fine.
Architecture

DBMS

1 Tier 2 Tier 3 Tier


1-Tier Architecture

• In this architecture, the database is directly available to the user. It


means the user can directly sit on the DBMS and uses it.
• Any changes done here will directly be done on the database itself.
It doesn't provide a handy tool for end users.
• The 1-Tier architecture is used for development of the local
application, where programmers can directly communicate with
the database for the quick response.
2-Tier Architecture

• The 2-Tier architecture is same as basic client-server. In the two-tier


architecture, applications on the client end can directly
communicate with the database at the server side. For this
interaction, API's like: ODBC, JDBC are used.
• The user interfaces and application programs are run on the client-
side.
• The server side is responsible to provide the functionalities like: query
processing and transaction management.
• To communicate with the DBMS, client-side application establishes a
connection with the server side.
2-Tier Architecture
3-Tier Architecture

• The 3-Tier architecture contains another layer between the client


and server. In this architecture, client can't directly communicate
with the server.
• The application on the client-end interacts with an application
server which further communicates with the database system.
• End user has no idea about the existence of the database beyond
the application server. The database also has no idea about any
other user beyond the application.
• The 3-Tier architecture is used in case of large web application.
3-Tier Architecture
Types of Databases

 Centralized Database
 Distributed Database
 Relational Database
 NoSQL Database
 Cloud Database
 Object-oriented Databases
 Hierarchical Databases
 Network Databases
 Personal Database
 Operational Database
 Enterprise Database
Centralized Database

 It is the type of database that stores data at a centralized database system. It


comforts the users to access the stored data from different locations through several
applications. These applications contain the authentication process to let users
access data securely. An example of a Centralized database can be Central Library
that carries a central database of each library in a college/university.
 Advantages of Centralized Database
• It has decreased the risk of data management, i.e., manipulation of data will not affect the
core data.
• Data consistency is maintained as it manages data in a central repository.
• It provides better data quality, which enables organizations to establish data standards.
• It is less costly because fewer vendors are required to handle the data sets.
 Disadvantages of Centralized Database
• The size of the centralized database is large, which increases the response time for fetching
the data.
• It is not easy to update such an extensive database system.
• If any server failure occurs, entire data will be lost, which could be a huge loss.
Distributed Database

 Unlike a centralized database system, in distributed systems, data is


distributed among different database systems of an organization.
These database systems are connected via communication links.
Such links help the end-users to access the data easily. Examples of
the Distributed database are Apache Cassandra, HBase, Ignite, etc.
Distributed Database
Distributed Database

• Homogeneous DDB: Those database systems which execute on the


same operating system and use the same application process and
carry the same hardware devices.
• Heterogeneous DDB: Those database systems which execute on
different operating systems under different application procedures,
and carries different hardware devices.
 Advantages of Distributed Database
• Modular development is possible in a distributed database, i.e., the
system can be expanded by including new computers and connecting
them to the distributed system.
• One server failure will not affect the entire data set.
Relational Database

 This database is based on the relational data model, which stores


data in the form of rows(tuple) and columns(attributes), and
together forms a table(relation).
 A relational database uses SQL for storing, manipulating, as well as
maintaining the data. E.F. Codd invented the database in 1970.
Each table in the database carries a key that makes the data
unique from others.
 Examples of Relational databases are MySQL, Microsoft SQL Server,
Oracle, etc.
Relational Database

 Properties of Relational Database


 There are following four commonly known properties of a relational model known as
ACID properties, where:
 A means Atomicity: This ensures the data operation will complete either with success
or with failure. It follows the 'all or nothing' strategy. For example, a transaction will
either be committed or will abort.
 C means Consistency: If we perform any operation over the data, its value before
and after the operation should be preserved. For example, the account balance
before and after the transaction should be correct, i.e., it should remain conserved.
 I means Isolation: There can be concurrent users for accessing data at the same time
from the database. Thus, isolation between the data should remain isolated. For
example, when multiple transactions occur at the same time, one transaction effects
should not be visible to the other transactions in the database.
 D means Durability: It ensures that once it completes the operation and commits the
data, data changes should remain permanent.
NoSQL Database

 Non-SQL/Not Only SQL is a type of database that is used for storing


a wide range of data sets. It is not a relational database as it stores
data not only in tabular form but in several different ways. It came
into existence when the demand for building modern applications
increased. Thus, NoSQL presented a wide variety of database
technologies in response to the demands. We can further divide a
NoSQL database into the following four types:
NoSQL Database
NoSQL Database

1. Key-value storage: It is the simplest type of database storage where


it stores every single item as a key (or attribute name) holding its
value, together.
2. Document-oriented Database: A type of database used to store
data as JSON-like document. It helps developers in storing data by
using the same document-model format as used in the application
code.
3. Graph Databases: It is used for storing vast amounts of data in a
graph-like structure. Most commonly, social networking websites use
the graph database.
4. Wide-column stores: It is similar to the data represented in relational
databases. Here, data is stored in large columns together, instead of
storing in rows.
Advantages of NoSQL Database

• It enables good productivity in the application development as it is


not required to store data in a structured format.
• It is a better option for managing and handling large data sets.
• It provides high scalability.
• Users can quickly access data from the database through key-value
Cloud Database:

 A type of database where data is stored in a virtual environment


and executes over the cloud computing platform. It provides users
with various cloud computing services (SaaS, PaaS, IaaS, etc.) for
accessing the database. There are numerous cloud platforms, but
the best options are:
• Amazon Web Services(AWS)
• Microsoft Azure
• Kamatera
• PhonixNAP
• ScienceSoft
• Google Cloud SQL, etc.
Object-oriented Databases

 The type of database that uses the object-based data model


approach for storing data in the database system. The data is
represented and stored as objects which are similar to the objects
used in the object-oriented programming language
Hierarchical Databases

 It is the type of database that stores data in the form of parent-


children relationship nodes. Here, it organizes data in a tree-like
structure.
 Data get stored in the form of records that are connected via links.
Each child record in the tree will contain only one parent. On the
other hand, each parent record can have multiple child records.
Hierarchical Databases
Network Databases

 It is the database that typically follows the network data model.


Here, the representation of data is in the form of nodes connected
via links between them. Unlike the hierarchical database, it allows
each record to have multiple children and parent nodes to form a
generalized graph structure.
Personal Database

 Collecting and storing data on the user's system defines a Personal


Database. This database is basically designed for a single user.
 Advantage of Personal Database
• It is simple and easy to handle.
• It occupies less storage space as it is small in size.
Operational Database

 The type of database which creates and updates the database in


real-time. It is basically designed for executing and handling the
daily data operations in several businesses. For example, An
organization uses operational databases for managing per day
transactions.
Enterprise Database

 Large organizations or enterprises use this database for managing a


massive amount of data. It helps organizations to increase and
improve their efficiency. Such a database allows simultaneous
access to users.
 Advantages of Enterprise Database:
• Multi processes are supportable over the Enterprise database.
• It allows executing parallel queries on the system.
What is Data Modelling?

 Data modeling in software engineering is the process of creating a


data model for an information system by applying certain formal
techniques.
 Model to describe the structure of the database
 Most data models also include a set of basic operation for
specifying data retravel and data updating
 The two types of Data Models techniques are
1. Entity Relationship (E-R) Model
2. UML (Unified Modelling Language)
What is Data Modelling?
Data modelling process

 The process of designing a database involves producing the previously described three types of
schemas - conceptual, logical, and physical. The database design documented in these
schemas are converted through a Data Definition Language, which can then be used to
generate a database. A fully attributed data model contains detailed attributes (descriptions) for
every entity within it.
 The term "database design" can describe many different parts of the design of an overall
database system. Principally, and most correctly, it can be thought of as the logical design of the
base data structures used to store the data.
 In the relational model these are the tables and views. In an object database the entities and
relationships map directly to object classes and named relationships. However, the term
"database design" could also be used to apply to the overall process of designing, not just the
base data structures, but also the forms and queries used as part of the overall database
application within the Database Management System or DBMS.
 In the process, system interfaces account for 25% to 70% of the development and support costs of
current systems. The primary reason for this cost is that these systems do not share a common
data model. If data models are developed on a system by system basis, then not only is the same
analysis repeated in overlapping areas, but further analysis must be performed to create the
interfaces between them. Most systems within an organization contain the same basic data,
redeveloped for a specific purpose. Therefore, an efficiently designed basic data model can
minimize rework with minimal modifications for the purposes of different systems within the
organization.
Why use Data Model?

 The primary goal of using data model are:


• Ensures that all data objects required by the database are accurately
represented. Omission of data will lead to creation of faulty reports and
produce incorrect results.
• A data model helps design the database at the conceptual, physical
and logical levels.
• Data Model structure helps to define the relational tables, primary and
foreign keys and stored procedures.
• It provides a clear picture of the base data and can be used by
database developers to create a physical database.
• It is also helpful to identify missing and redundant data.
• Though the initial creation of data model is labor and time consuming,
in the long run, it makes your IT infrastructure upgrade and
maintenance cheaper and faster.
Types of Data Models

 There are mainly three different types of data models:


1. Conceptual: This Data Model defines WHAT the system contains. This
model is typically created by Business stakeholders and Data Architects.
The purpose is to organize, scope and define business concepts and
rules.
2. Logical: Defines HOW the system should be implemented regardless of
the DBMS. This model is typically created by Data Architects and
Business Analysts. The purpose is to developed technical map of rules
and data structures.
3. Physical: This Data Model describes HOW the system will be
implemented using a specific DBMS system. This model is typically
created by DBA and developers. The purpose is actual implementation
of the database.
Conceptual Model

 The main aim of this model is to establish the entities, their attributes,
and their relationships. In this Data modeling level, there is hardly any
detail available of the actual Database structure.
 The 3 basic tenants of Data Model are
 Entity: A real-world thing
 Attribute: Characteristics or properties of an entity
 Relationship: Dependency or association between two entities
 For example:
• Customer and Product are two entities. Customer number and name
are attributes of the Customer entity
• Product name and price are attributes of product entity
• Sale is the relationship between the customer and product
Conceptual Model
Characteristics of a conceptual
data model
 Offers Organization-wide coverage of the business concepts.
 This type of Data Models are designed and developed for a
business audience.
 The conceptual model is developed independently of hardware
specifications like data storage capacity, location or software
specifications like DBMS vendor and technology. The focus is to
represent data as a user will see it in the "real world."
 Conceptual data models known as Domain models create a
common vocabulary for all stakeholders by establishing basic
concepts and scope.
Logical Data Model

 Logical data models add further information to the conceptual


model elements. It defines the structure of the data elements and
set the relationships between them.
Logical Data Model
Logical Data Model

 The advantage of the Logical data model is to provide a


foundation to form the base for the Physical model. However, the
modeling structure remains generic.
 At this Data Modeling level, no primary or secondary key is defined.
At this Data modeling level, you need to verify and adjust the
connector details that were set earlier for relationships.
Characteristics of Logical Data
Model
 Describes data needs for a single project but could integrate with
other logical data models based on the scope of the project.
 Designed and developed independently from the DBMS.
 Data attributes will have datatypes with exact precisions and
length.
 Normalization processes to the model is applied typically till 3NF
Physical Data Model

 A Physical Data Model describes the database specific


implementation of the data model. It offers an abstraction of the
database and helps generate schema. This is because of the
richness of meta-data offered by a Physical Data Model.
 This type of Data model also helps to visualize database structure. It
helps to model database columns keys, constraints, indexes,
triggers, and other RDBMS features.
Physical Data Model
Characteristics of a physical data
model:
 The physical data model describes data need for a single project or
application though it maybe integrated with other physical data
models based on project scope.
 Data Model contains relationships between tables that which
addresses cardinality and nullability of the relationships.
 Developed for a specific version of a DBMS, location, data storage
or technology to be used in the project.
 Columns should have exact datatypes, lengths assigned and
default values.
 Primary and Foreign keys, views, indexes, access profiles, and
authorizations, etc. are defined.
Advantages of Data model

 The main goal of a designing data model is to make certain that


data objects offered by the functional team are represented
accurately.
 The data model should be detailed enough to be used for building
the physical database.
 The information in the data model can be used for defining the
relationship between tables, primary and foreign keys, and stored
procedures.
 Data Model helps business to communicate the within and across
organizations.
 Data model helps to documents data mappings in ETL process
 Help to recognize correct sources of data to populate the model
Disadvantages of Data model

 To develop Data model one should know physical data stored


characteristics.
 This is a navigational system produces complex application
development, management. Thus, it requires a knowledge of the
biographical truth.
 Even smaller change made in structure require modification in the
entire application.
 There is no set data manipulation language in DBMS.
What is OLTP?

 OLTP is an operational system that supports transaction-oriented


applications in a 3-tier architecture. It administers the day to day
transaction of an organization. OLTP is basically focused on query
processing, maintaining data integrity in multi-access environments
as well as effectiveness that is measured by the total number of
transactions per second. The full form of OLTP is Online Transaction
Processing.
Characteristics of OLTP

• OLTP uses transactions that include small amounts of data.


• Indexed data in the database can be accessed easily.
• OLTP has a large number of users.
• It has fast response times
• Databases are directly accessible to end-users
• OLTP uses a fully normalized schema for database consistency.
• The response time of OLTP system is short.
• It strictly performs only the predefined operations on a small number of
records.
• OLTP stores the records of the last few days or a week.
• It supports complex data models and tables.
Type of queries that an OLTP system
can Process
 OLTP system is an online database changing system. Therefore, it
supports database query such as insert, update, and delete
information from the database.
 Consider a point of sale system of a supermarket, following are the
sample queries that this system can process:
 Retrieving the description of a particular product.
 Filtering all products related to the supplier.
 Searching the record of the customer.
 Listing products having a price less than the expected amount.
Architecture of OLTP
Architecture of OLTP

 Business / Enterprise Strategy: Enterprise strategy deals with the issues that affect the
organization as a whole. In OLTP, it is typically developed at a high level within the
firm, by the board of directors or the top management
 Business Process: OLTP business process is a set of activities and tasks that, once
completed, will accomplish an organizational goal.
 Customers, Orders, and Products: OLTP database store information about products,
orders (transactions), customers (buyers), suppliers (sellers), and employees.
 ETL Processes: It separates the data from various RDBMS source systems, then
transforms the data (like applying concatenations, calculations, etc.) and loads the
processed data into the Data Warehouse system.
 Data Mart and Data warehouse: A data mart is a structure/access pattern specific to
data warehouse environments. It is used by OLAP to store processed data.
 Data Mining, Analytics, and Decision Making: Data stored in the data mart and data
warehouse can be used for data mining, analytics, and decision making.
 This data helps you to discover data patterns, analyze raw data, and make analytical
decisions for your organization's growth.
Example of OLTP Transaction

 An example of the OLTP system is the ATM center. Assume that a couple has a
joint account with a bank. One day both simultaneously reach different ATM
centers at precisely the same time and want to withdraw the total amount
present in their bank account.
 However, the person that completes the authentication process first will be able
to get money. In this case, the OLTP system makes sure that the withdrawn
amount will be never more than the amount present in the bank. The key to
note here is that OLTP systems are optimized for transactional superiority instead
of data analysis.
 Other examples of OLTP system are:
 Online banking
 Online airline ticket booking
 Sending a text message
 Order entry
 Add a book to shopping cart
OLTP vs. OLAP
OLTP vs. OLAP

OLTP OLAP

OLTP is an online transactional system. OLAP is an online analysis and data retrieving process.

It is characterized by large numbers of short online transactions. It is characterized by a large volume of data.

OLTP is an online database modifying system. OLAP is an online database query management system.

OLTP uses traditional DBMS. OLAP uses the data warehouse.

Insert, Update, and Delete information from the database. Mostly select operations

OLTP and its transactions are the sources of data. Different OLTP databases become the source of data for OLAP.

OLAP database does not get frequently modified. Hence, data


OLTP database must maintain data integrity constraints.
integrity is not an issue.

It's response time is in a millisecond. Response time in seconds to minutes.

The data in the OLTP database is always detailed and organized. The data in the OLAP process might not be organized.

Allow read/write operations. Only read and rarely write.


It is a market-orientated process. It is a customer orientated process.
OLTP vs. OLAP
Queries in this process are standardized and simple. Complex queries involving aggregations.

Complete backup of the data combined with incremental backups. OLAP only need a backup from time to time. Backup is not important compared to OLTP

DB design is an application-oriented example: Database design changes with the industry DB design is subject-oriented. Example: Database design changes with subjects like sales,
like retail, airline, banking, etc. marketing, purchasing, etc.

It is used by Data critical users like clerk, DBA & Data Base professionals. It is used by Data knowledge users like workers, managers, and CEO.

It is designed for real time business operations. It is designed for analysis of business measures by category and attributes.

Transaction throughput is the performance metric Query throughput is the performance metric.

This kind of Database user allows thousands of users. This kind of Database allows only hundreds of users.

It helps to Increase user's self-service and productivity Help to Increase the productivity of business analysts.

Data Warehouses historically have been a development project which may prove costly to An OLAP cube is not an open SQL server data warehouse. Therefore, technical knowledge
build. and experience are essential to managing the OLAP server.

It provides a fast result for daily used data. It ensures that response to the query is quicker consistently.

It is easy to create and maintain. It lets the user create a view with the help of a spreadsheet.

A data warehouse is created uniquely so that it can integrate different data sources for
OLTP is designed to have fast response time, low data redundancy, and is normalized.
building a consolidated database
Advantages of OLTP

 OLTP offers accurate forecast for revenue and expense. It provides a solid
foundation for a stable business /organization due to timely modification of all
transactions.
 OLTP makes transactions much easier on behalf of the customers.
 It broadens the client base for an organization by speeding up and simplifying
individual processes.
 OLTP provides support for bigger databases.
 Partition of data for data manipulation is easy.
 We need OLTP to use the tasks which are frequently performed by the system.
 When we need only a small number of records.
 The tasks that include insertion, updating, or deletion of data.
 It is used when you need consistency and concurrency in order to perform tasks
that ensure its greater availability.
Disadvantages of OLTP

 If the OLTP system faces hardware failures, then online transactions get severely
affected.
 OLTP systems allow multiple users to access and change the same data at the
same time, which many times created an unprecedented situation.
 If the server hangs for seconds, it can affect to a large number of transactions.
 OLTP required a lot of staff working in groups in order to maintain inventory.
 Online Transaction Processing Systems do not have proper methods of
transferring products to buyers by themselves.
 OLTP makes the database much more susceptible to hackers and intruders.
 In B2B transactions, there are chances that both buyers and suppliers miss out
efficiency advantages that the system offers.
 Server failure may lead to wiping out large amounts of data from the database.
 You can perform a limited number of queries and updates.
Challenges of an OLTP System

 It allows more than one user to access and change the same data
simultaneously. Therefore, it requires concurrency control and
recovery technique in order to avoid any unprecedented situations
 OLTP system data are not suitable for decision making. You have to
use data of OLAP systems for "what if" analysis or the decision
making.
What is OLAP?

 Online Analytical Processing, a category of software tools which


provide analysis of data for business decisions. OLAP systems allow
users to analyze database information from multiple database
systems at one time.
 The primary objective is data analysis and not data processing.
Example of OLAP

 Any Datawarehouse system is an OLAP system. Uses of OLAP are as


follows
 A company might compare their mobile phone sales in September with
sales in October, then compare those results with another location
which may be stored in a sperate database.
 Amazon analyzes purchases by its customers to come up with a
personalized homepage with products which likely interest to their
customer.
Example of OLTP system

 An example of OLTP system is ATM center. Assume that a couple has a joint
account with a bank. One day both simultaneously reach different ATM centers
at precisely the same time and want to withdraw total amount present in their
bank account.
 However, the person that completes authentication process first will be able to
get money. In this case, OLTP system makes sure that withdrawn amount will be
never more than the amount present in the bank. The key to note here is that
OLTP systems are optimized for transactional superiority instead data analysis.
 Other examples of OLTP system are:
 Online banking
 Online airline ticket booking
 Sending a text message
 Order entry
 Add a book to shopping cart
KEY DIFFERENCE:

 Online Analytical Processing (OLAP) is a category of software tools


that analyze data stored in a database whereas Online transaction
processing (OLTP) supports transaction-oriented applications in a 3-
tier architecture.
 OLAP creates a single platform for all type of business analysis needs
which includes planning, budgeting, forecasting, and analysis while
OLTP is useful to administer day to day transactions of an
organization.
 OLAP is characterized by a large volume of data while OLTP is
characterized by large numbers of short online transactions.
 In OLAP, data warehouse is created uniquely so that it can integrate
different data sources for building a consolidated database
whereas OLTP uses traditional DBMS.
Benefits of using OLAP services

 OLAP creates a single platform for all type of business analytical


needs which includes planning, budgeting, forecasting, and
analysis.
 The main benefit of OLAP is the consistency of information and
calculations.
 Easily apply security restrictions on users and objects to comply with
regulations and protect sensitive data.
Benefits of OLTP method

 It administers daily transactions of an organization.


 OLTP widens the customer base of an organization by simplifying
individual processes.
Drawbacks of OLAP service

 Implementation and maintenance are dependent on IT professional


because the traditional OLAP tools require a complicated modeling
procedure.
 OLAP tools need cooperation between people of various
departments to be effective which might always be not possible.
Drawbacks of OLTP method

 If OLTP system faces hardware failures, then online transactions get


severely affected.
 OLTP systems allow multiple users to access and change the same
data at the same time which many times created unprecedented
situation.
What is Dimensional Modeling?

 DIMENSIONAL MODELING (DM) is a data structure technique optimized for data


storage in a Data warehouse. The purpose of dimensional model is to optimize
the database for fast retrieval of data. The concept of Dimensional Modelling
was developed by Ralph Kimball and consists of "fact" and "dimension" tables.
 A Dimensional model is designed to read, summarize, analyze numeric
information like values, balances, counts, weights, etc. in a data warehouse. In
contrast, relation models are optimized for addition, updating and deletion of
data in a real-time Online Transaction System.
 These dimensional and relational models have their unique way of data storage
that has specific advantages.
 For instance, in the relational mode, normalization and ER models reduce
redundancy in data. On the contrary, dimensional model arranges data in such
a way that it is easier to retrieve information and generate reports.
 Hence, Dimensional models are used in data warehouse systems and not a
good fit for relational systems.
Elements of Dimensional Data
Model
 Fact
 Facts are the measurements/metrics or facts from your business process.
For a Sales business process, a measurement would be quarterly sales
number
 Dimension
 Dimension provides the context surrounding a business process event. In
simple terms, they give who, what, where of a fact. In the Sales business
process, for the fact quarterly sales number, dimensions would be
 Who – Customer Names
 Where – Location
 What – Product Name
 In other words, a dimension is a window to view information in the facts.
Elements of Dimensional Data
Model
 Attributes
 The Attributes are the various characteristics of the dimension.
 In the Location dimension, the attributes can be
 State
 Country
 Zipcode etc.
 Attributes are used to search, filter, or classify facts. Dimension Tables
contain Attributes
 Fact Table
 A fact table is a primary table in a dimensional model.
 A Fact Table contains
1. Measurements/facts
2. Foreign key to dimension table
Elements of Dimensional Data
Model
 Dimension table
 A dimension table contains dimensions of a fact.
 They are joined to fact table via a foreign key.
 Dimension tables are de-normalized tables.
 The Dimension Attributes are the various columns in a dimension table
 Dimensions offers descriptive characteristics of the facts with the help of
their attributes
 No set limit set for given for number of dimensions
 The dimension can also contain one or more hierarchical relationships
Steps of Dimensional Modelling

 The accuracy in creating your Dimensional modeling determines


the success of your data warehouse implementation. Here are the
steps to create Dimension Model
1. Identify Business Process
2. Identify Grain (level of detail)
3. Identify Dimensions
4. Identify Facts
5. Build Star
Steps of Dimensional Modelling
Step 1) Identify the business process

 Identifying the actual business process a Datawarehouse should


cover. This could be Marketing, Sales, HR, etc. as per the data
analysis needs of the organization. The selection of the Business
process also depends on the quality of data available for that
process. It is the most important step of the Data Modelling process,
and a failure here would have cascading and irreparable defects.
 To describe the business process, you can use plain text or use basic
Business Process Modelling Notation (BPMN) or Unified Modelling
Language (UML).
Step 2) Identify the grain

 The Grain describes the level of detail for the business problem/solution. It is the
process of identifying the lowest level of information for any table in your data
warehouse. If a table contains sales data for every day, then it should be daily
granularity. If a table contains total sales data for each month, then it has
monthly granularity.
 During this stage, you answer questions like
 Do we need to store all the available products or just a few types of products? This
decision is based on the business processes selected for Datawarehouse
 Do we store the product sale information on a monthly, weekly, daily or hourly basis?
This decision depends on the nature of reports requested by executives
 How do the above two choices affect the database size?
 Example of Grain:
 The CEO at an MNC wants to find the sales for specific products in different locations
on a daily basis.
 So, the grain is "product sale information by location by the day."
Step 3) Identify the dimensions

 Dimensions are nouns like date, store, inventory, etc. These


dimensions are where all the data should be stored. For example,
the date dimension may contain data like a year, month and
weekday.
 Example of Dimensions:
 The CEO at an MNC wants to find the sales for specific products in
different locations on a daily basis.
 Dimensions: Product, Location and Time
 Attributes: For Product: Product key (Foreign Key), Name, Type,
Specifications
 Hierarchies: For Location: Country, State, City, Street Address, Name
Step 4) Identify the Fact

 This step is co-associated with the business users of the system


because this is where they get access to data stored in the data
warehouse. Most of the fact table rows are numerical values like
price or cost per unit, etc.
 Example of Facts:
 The CEO at an MNC wants to find the sales for specific products in
different locations on a daily basis.
 The fact here is Sum of Sales by product by location by time.
Step 5) Build Schema

 In this step, you implement the Dimension Model. A schema is nothing


but the database structure (arrangement of tables). There are two
popular schemas
 Star Schema
 The star schema architecture is easy to design. It is called a star schema
because diagram resembles a star, with points radiating from a center. The
center of the star consists of the fact table, and the points of the star is
dimension tables.
 The fact tables in a star schema which is third normal form whereas
dimensional tables are de-normalized.
 Snowflake Schema
 The snowflake schema is an extension of the star schema. In a snowflake
schema, each dimension are normalized and connected to more dimension
tables.
Rules for Dimensional Modelling

 Load atomic data into dimensional structures.


 Build dimensional models around business processes.
 Need to ensure that every fact table has an associated date
dimension table.
 Ensure that all facts in a single fact table are at the same grain or
level of detail.
 It's essential to store report labels and filter domain values in
dimension tables
 Need to ensure that dimension tables use a surrogate key
 Continuously balance requirements and realities to deliver business
solution to support their decision-making
Benefits of dimensional modeling

 Standardization of dimensions allows easy reporting across areas of the business.


 Dimension tables store the history of the dimensional information.
 It allows to introduced entirely new dimension without major disruptions to the fact table.
 Dimensional also to store data in such a fashion that it is easier to retrieve the information from the data
once the data is stored in the database.
 Compared to the normalized model dimensional table are easier to understand.
 Information is grouped into clear and simple business categories.
 The dimensional model is very understandable by the business. This model is based on business terms, so
that the business knows what each fact, dimension, or attribute means.
 Dimensional models are deformalized and optimized for fast data querying. Many relational database
platforms recognize this model and optimize query execution plans to aid in performance.
 Dimensional modeling creates a schema which is optimized for high performance. It means fewer joins
and helps with minimized data redundancy.
 The dimensional model also helps to boost query performance. It is more denormalized therefore it is
optimized for querying.
 Dimensional models can comfortably accommodate change. Dimension tables can have more columns
added to them without affecting existing business intelligence applications using these tables.
What is the ER Model?

 ENTITY RELATIONAL (ER) MODEL is a high-level conceptual data


model diagram. ER modeling helps you to analyze data
requirements systematically to produce a well-designed database.
The Entity-Relation model represents real-world entities and the
relationship between them. It is considered a best practice to
complete ER modeling before implementing your database.
 ER modeling helps you to analyze data requirements systematically
to produce a well-designed database. So, it is considered a best
practice to complete ER modeling before implementing your
database.
History of ER models

 ER diagrams are a visual tool which is helpful to represent the ER


model. It was proposed by Peter Chen in 1971 to create a uniform
convention which can be used for relational database and
network. He aimed to use an ER model as a conceptual modeling
approach.
What is ER Diagrams?

 ENTITY-RELATIONSHIP DIAGRAM (ERD) displays the relationships of


entity set stored in a database. In other words, we can say that ER
diagrams help you to explain the logical structure of databases. At
first look, an ER diagram looks very similar to the flowchart. However,
ER Diagram includes many specialized symbols, and its meanings
make this model unique. The purpose of ER Diagram is to represent
the entity framework infrastructure.
What is ER Diagrams?
Facts about ER Diagram Model

 ER model allows you to draw Database Design


 It is an easy to use graphical tool for modeling data
 Widely used in Database Design
 It is a GUI representation of the logical structure of a Database
 It helps you to identifies the entities which exist in a system and the
relationships between those entities
Why use ER Diagrams?

 Helps you to define terms related to entity relationship modeling


 Provide a preview of how all your tables should connect, what fields are
going to be on each table
 Helps to describe entities, attributes, relationships
 ER diagrams are translatable into relational tables which allows you to
build databases quickly
 ER diagrams can be used by database designers as a blueprint for
implementing data in specific software applications
 The database designer gains a better understanding of the information
to be contained in the database with the help of ERP diagram
 ERD is allowed you to communicate with the logical structure of the
database to users
Components of the ER Diagram

 This model is based on three basic concepts:


 Entities
 Attributes
 Relationships
Components of the ER Diagram

 For example, in a University database, we might have entities for


Students, Courses, and Lecturers. Students entity can have attributes
like Rollno, Name, and DeptID. They might have relationships with
Courses and Lecturers.
Components of the ER Diagram
WHAT IS ENTITY?

 A real-world thing either living or non-living that is easily


recognizable and nonrecognizable. It is anything in the enterprise
that is to be represented in our database. It may be a physical thing
or simply a fact about the enterprise or an event that happens in
the real world.
 An entity can be place, person, object, event or a concept, which
stores data in the database. The characteristics of entities are must
have an attribute, and a unique key. Every entity is made up of
some 'attributes' which represent that entity.
Examples of entities:

 Person: Employee, Student, Patient


 Place: Store, Building
 Object: Machine, product, and Car
 Event: Sale, Registration, Renewal
 Concept: Account, Course

You might also like