Business Analytics Process and Data Exploration

This document discusses the key steps in a business analytics life cycle for data exploration and model building. It covers collecting and preprocessing data, exploring the data through visualization, choosing appropriate modeling techniques, evaluating models, and deploying successful models. The goal is to understand a business problem, derive insights from data to inform decisions, and solve the problem through an iterative eight-phase process.

Uploaded by

J Warneck Gultøm

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

46 views

Business Analytics Process and Data Exploration

Uploaded by

J Warneck Gultøm

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 38

Business Analytics Process and Data

Exploration
Course Overview
• This chapter covers data exploration, validation, and cleaning
required for data analysis. You’ll learn the purpose of data
cleaning, why you need data preparation, how to go about
handling missing values, and some of the data-cleaning
techniques used in the industry.
Course Contents
• Business Analytics Life Cycle
• Understanding the Business Problem
• Collecting and Integrating the Data
• Preprocessing the Data
• Exploring and Visualizing the Data
• Using Modeling Techniques and Algorithms
• Evaluating the Model
• Presenting a Management Report and Review
• Deploying the Model
Business Analytics Life Cycle

• This purpose is to derive information from data in order to make

appropriate business decisions.
• Consists of eight phases:
a. Understand the Business Problem
b. Collect and Integrate the Data
c. Preprocess the Data
d. Explore and Visualize the Data
e. Choose Modeling Techniques and Algorithms
f. Evaluate the Model
g. Report to Management and Review
h. Deploy the Model

5-4
Business Analytics Life Cycle
Business Analytics Life Cycle

Phase 1  Understand the Business Problem

• the focus is to understand the problem, objectives, and
requirements from the perspective of the business.
• then converted into a data analytics problem with the aim of
solving it by using appropriate methods to achieve the objective.

Phase 2  Collect and Integrate the Data

• data is collected from various sources in this important phase.
• If the data is not available in a current database or data
warehouse, a survey may be required.

5-6
Business Analytics Life Cycle

Phase 3  Preprocess the Data

• data is cleaned and normalized
• This process may be repeated several times, depending on the
quality of data you get and the accuracy of the model obtained.

Phase 4  Explore and Visualize the Data

• to understand the characteristics of the data, such as its
distribution, trends, and relationships among variables

5-7
Business Analytics Life Cycle

Phase 5  Choose Modeling Techniques and Algorithms

• decide whether to use unsupervised or supervised machine-
learning techniques
• These choices depend on both the business requirements and the
data you have.

Phase 6  Evaluate the Model

• evaluate the model by using standard methods that measure the
accuracy of the model and its performance in the field.
• important to evaluate the model and to be certain that the model
achieves the business objectives specified by business leaders

5-8
Business Analytics Life Cycle

Phase 7  Report to Management and Review

• present the mathematical model to the business leaders

Phase 8  Deploy the Model

• a challenging phase of the project.
• The model is now deployed for end users and is in a production
environment, analyzing the live data

5-9
Understanding the Business
Problem
• Key purpose  to solve a business problem
• Need to thoroughly understand the problem from a business
perspective before solve the problem
Collecting and Integrating the Data
• the most important factor determining the accuracy of the results
is getting quality data.
• Can be either from a primary source or secondary source
• Most organizations have data spread across various databases
Collecting and Integrating the Data
• Sampling  a smaller collection of units from a population used
to determine truths about that population (Field, 2005)
• several variations on this type of sampling:
• Random sampling: A sample is picked randomly, and every
member has an equal opportunity to be selected.
• Stratified sampling: The population is divided into groups, and
data is selected randomly from a group, or strata.
• Systematic sampling: You select members systematically—say,
every tenth member—in that particular time or event.
𝑧∗𝑠𝑖𝑔𝑚𝑎 2
•𝑛 =  if standard deviation is known
𝐸
𝑧 2
•𝑛 = 𝑝 1 − 𝑝  if standard deviation is unknown
𝐸
Collecting and Integrating the Data
• Variable Selection
• If have more predictor variables, so need more records
• More records we have, the better the prediction results
Preprocessing the Data
• Data type: Qualitative or Quantitative
• Qualitative data  not numerical (e.g. type of car, favorite color)
• Quantitative data  numeric. Can be divided into discrete data or
continuous data
• Discrete Data  A variable can take a certain value that is
separate and distinct
• Continuous Data  A variable that can take numeric values
within a specific range or interval
Preprocessing the Data
• Handling Missing Values
• Methods used to resolve missing values:
a. Ignore the values (not very effective method)
b. Fill in the values with average value or mode (the simplest
method)
c. Fill in the values with an attribute mean belonging to the same
bin
Preprocessing the Data
• Handling Duplicates, Junk, and Null Values
• Should be cleaned from the database before the analytics process.
• Same process with handling missing values
• To identify the junk characters is the challenges
Preprocessing the Data
• Data preprocessing with R  discuss method:
a. Understanding the variable types
b. Changing the variable types
c. Finding missing and null values
d. Cleaning missing values with appropriate methods
• The following are the basic data types in R:
a. Numeric: Real numbers.
b. Integer: whole numbers.
c. Factor: Categorical data to define various categories.
d. Character: Data strings of characters defining
Exploring and Visualizing the Data
• Tables  View() command
Exploring and Visualizing the Data
• Summary Tables
Exploring and Visualizing the Data
• Box plots
Exploring and Visualizing the Data
• Scatter plots
Exploring and Visualizing the Data
• Scatter plots matrices: use pairs() function  > hou<-
read.table(header=TRUE,sep="\t","housing.data")
Exploring and Visualizing the Data
• Scatter plots matrices  Trellis Plot
Exploring and Visualizing the Data
• Scatter plots matrices  Correlation Plot
Exploring and Visualizing the Data
• Scatter plots matrices  Density by Class
Exploring and Visualizing the Data
• Normalization  some techniques:
a. Z-score normalization: The new value is created based on the
mean and standard deviations  A’ = A – meanA/SDA
b. Min-max normalization: values are transformed within the
range of values specified  A’ = ((A – MinA)/(MaxA –
MinA))(range of A’) + MinA, Range of A’ = MaxA – MinA
c. Data aggregation: sometime a new variable may be required to
𝐴′ −1
better understand the data  𝐴′ = ,𝜆>1
𝜆
Using Modelling Techniques And
Algorithms
• Descriptive Analytics  explains the patterns hidden in data.
• Any patterns like number of market segments, or sales numbers
based on regions are purely based on historical data.

• Predictive Analytics  consists of two methods:

a. Classification  a basic form of data analysis in which data is
classified into classes
b. Regression  predicting the value of a numerical variable
Using Modelling Techniques And
Algorithms
• Machine learning  making computers learn and perform task
better based on past historical data
• Two type of machine learning:
a. Supervised machine learning  a machine builds a predictive
model under supervision—that is, with the help of a training
data set.
b. Unsupervised machine learning  no training data to learn so
no target variable to predict.
Using Modelling Techniques And
Algorithms
Supervised Machine learning Unsupervised Machine Learning
Evaluating the Model
• Evaluating model performance is a key aspect to understanding
how good your prediction is when you apply new data.
• The data set is divided into three partitions:
a. Training Data Partition  used to train the model
b. Test Data Partition  a subset of the data set that is not in the
training set
c. Validation Data Partition  used to fine-tune the model
performance and reduce overfitting problems.
Evaluating the Model
• Cross Validation (k-fold cross validation)  divide the data into k
folds and build the model using k – 1 folds, and the last fold is
used for testing.
Classification Model Evaluation
• Confusion Matrix
The simplest way of measuring the performance of a classifier is by
judging the number of mistakes  misclassification error.
• Classification matrix  referred to as a confusion matrix  gives
an estimate of the true classification and misclassification rates
Classification Model Evaluation
• Lift Chart  commonly used for marketing problems
• The lift curve helps determine how effectively the online
advertisement campaign can be done by selecting a relatively small
group and getting maximum responders. The lift , is a measure of the
effectiveness of the model by calculating ratios with or with out the
model.
• A confusion matrix evaluates the effectiveness of the model as a
whole population, whereas the lift chart evaluates a portion of the
population.
Classification Model Evaluation
• ROC (receiver operating characteristic) Chart  similar to a lift
chart.
• It is a plot of the true-positive rate on the y axis and the false-
positive rate on the x axis.
• ROC is plotted as a function of true positive rate (sensitivity) vs.
function of false positive rate (specificity)
Regression Model Evaluation
• has many criteria for measuring its performance.
• Root-Mean-Square Error  A regression line predicts the y values
for a given x value. Note that the values are around the average

σ𝑛 𝑦ො
𝑘=0 𝑘 −𝑦 𝑘
2
•𝑅𝑀𝑆𝐸 =
𝑛
Presenting a Management Report
and Review
• Problem Description
• Data Set Used
• Data Cleaning Carried Out
• Method Used to Create The Model
• Model Deployment Prerequisities
• Model Deployment and Usage
• Issues Handling
Deploying the Model
• A challenging phase of the project.
• The model is now deployed for end users and is in a production
environment analyzing the live data
• Success of the deployment depends on the following:
a. Proper sizing of the hardware, ensuring required performance
b. Proper programming to handle the capabilities of the hardware
c. Proper data integration and cleaning
d. Effective reports, dashboards, views, decisions, and
interventions to be used by end users or end-user systems
e. Effective training to the users of the model
Question & Answers

Kantar - Consultant Interview Questions
No ratings yet
Kantar - Consultant Interview Questions
11 pages
The History of The Development of Management
No ratings yet
The History of The Development of Management
9 pages
Common Analytics Interview Questions
No ratings yet
Common Analytics Interview Questions
4 pages
Business Process Improvement Methodology
100% (1)
Business Process Improvement Methodology
262 pages
CH05 Business Analytics Process and Data Exploration
No ratings yet
CH05 Business Analytics Process and Data Exploration
37 pages
BANA 560 - Lecture - 2 - Data - Mining - Overview - Data - Exploration
No ratings yet
BANA 560 - Lecture - 2 - Data - Mining - Overview - Data - Exploration
38 pages
BIA 5000 Introduction To Analytics - Lesson 6
No ratings yet
BIA 5000 Introduction To Analytics - Lesson 6
59 pages
FDS Introduction
No ratings yet
FDS Introduction
41 pages
Data Preparation and Exploration: DSCI 5240 Data Mining and Machine Learning For Business Russell R. Torres
No ratings yet
Data Preparation and Exploration: DSCI 5240 Data Mining and Machine Learning For Business Russell R. Torres
28 pages
CH 2
No ratings yet
CH 2
36 pages
Introduction To Analytics
100% (1)
Introduction To Analytics
45 pages
Compendium Iim Shillong Analytics and Prod Man
No ratings yet
Compendium Iim Shillong Analytics and Prod Man
68 pages
DTS Modul Data Science Methodology
100% (1)
DTS Modul Data Science Methodology
56 pages
Data Analytics Template - Task 3 - Final
No ratings yet
Data Analytics Template - Task 3 - Final
11 pages
Big Data Day II
No ratings yet
Big Data Day II
38 pages
Data Analytics 1
No ratings yet
Data Analytics 1
4 pages
Cognizant Data Analyst Interview Questions 1745235888
No ratings yet
Cognizant Data Analyst Interview Questions 1745235888
18 pages
Statistics for Data Science
No ratings yet
Statistics for Data Science
39 pages
Data Science Theory: Analysis and Analytics
No ratings yet
Data Science Theory: Analysis and Analytics
14 pages
SML Updated UNIT-2
No ratings yet
SML Updated UNIT-2
43 pages
BUSINESS ANALYTICS UNIT I
No ratings yet
BUSINESS ANALYTICS UNIT I
45 pages
Lecture 2
No ratings yet
Lecture 2
18 pages
Big Data Analytics Introduction-lect 1
No ratings yet
Big Data Analytics Introduction-lect 1
26 pages
Module 1 - Introduction To Data Analytics
No ratings yet
Module 1 - Introduction To Data Analytics
21 pages
5 Data Science Project Lifecycle
No ratings yet
5 Data Science Project Lifecycle
33 pages
BI SHORT NOTES
No ratings yet
BI SHORT NOTES
15 pages
Study Material I
No ratings yet
Study Material I
140 pages
Da CH1 Slqa
No ratings yet
Da CH1 Slqa
6 pages
Pre Processing
No ratings yet
Pre Processing
68 pages
Analytics Methods
No ratings yet
Analytics Methods
40 pages
Preprocessing 935
No ratings yet
Preprocessing 935
68 pages
2.1_Data_Analytics[1]
No ratings yet
2.1_Data_Analytics[1]
16 pages
2 - Preprocessing
No ratings yet
2 - Preprocessing
74 pages
Que Es Datamin
No ratings yet
Que Es Datamin
52 pages
ModelQB - Part B&C-1
No ratings yet
ModelQB - Part B&C-1
51 pages
dm unit 3
No ratings yet
dm unit 3
15 pages
3_Preprocessing
No ratings yet
3_Preprocessing
82 pages
Business Analysis
No ratings yet
Business Analysis
61 pages
KMBN IT01 LM Consolidated
No ratings yet
KMBN IT01 LM Consolidated
123 pages
Data2 PDF
No ratings yet
Data2 PDF
48 pages
Unit 3
No ratings yet
Unit 3
18 pages
Data Understanding and Prepration
100% (1)
Data Understanding and Prepration
10 pages
Satyam Rana 4 sem business analytics
No ratings yet
Satyam Rana 4 sem business analytics
29 pages
BA Full Note 1
No ratings yet
BA Full Note 1
183 pages
2nd Unit - 2.2 - Data Analytics
No ratings yet
2nd Unit - 2.2 - Data Analytics
22 pages
Data Analysis and Information Management
No ratings yet
Data Analysis and Information Management
13 pages
640394541-Kantar-Consultant-Interview-questions-1
No ratings yet
640394541-Kantar-Consultant-Interview-questions-1
11 pages
2. Data Preparation (1)
No ratings yet
2. Data Preparation (1)
49 pages
How should data preparation be done for an analytics project_
No ratings yet
How should data preparation be done for an analytics project_
30 pages
CCW331 BA IAT 1 Set 1 & Set 2 Questions
No ratings yet
CCW331 BA IAT 1 Set 1 & Set 2 Questions
19 pages
CSC 3301-Lecture06 Introduction To Machine Learning
No ratings yet
CSC 3301-Lecture06 Introduction To Machine Learning
56 pages
03 Data Preparation
No ratings yet
03 Data Preparation
28 pages
analytics and data science
No ratings yet
analytics and data science
12 pages
3 DSEngineering
No ratings yet
3 DSEngineering
64 pages
Business Analytics
No ratings yet
Business Analytics
33 pages
Introduction To Data Mining For Business Analytics
No ratings yet
Introduction To Data Mining For Business Analytics
51 pages
Additional Notes BADS
No ratings yet
Additional Notes BADS
9 pages
File 1704445511 0009750 Unit-1 PPT 01
No ratings yet
File 1704445511 0009750 Unit-1 PPT 01
41 pages
BA Unit 1 Question Bank
No ratings yet
BA Unit 1 Question Bank
8 pages
Data Analytics
100% (1)
Data Analytics
13 pages
Ways to Achieve Quality
From Everand
Ways to Achieve Quality
chakrapani srinivasa
5/5 (1)
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
Rakhmonov
No ratings yet
Rakhmonov
100 pages
Admission Presection
No ratings yet
Admission Presection
49 pages
CSE311 IAH Slide01 Intro
No ratings yet
CSE311 IAH Slide01 Intro
17 pages
Socsci 03 00084
No ratings yet
Socsci 03 00084
31 pages
Levels of Abstraction
No ratings yet
Levels of Abstraction
11 pages
Data Modeling Made Simple with Embarcadero ER Studio Data Architect Adapting to Agile Data Modeling in a Big Data World 2nd Edition Steve Hoberman All Chapters Instant Download
100% (2)
Data Modeling Made Simple with Embarcadero ER Studio Data Architect Adapting to Agile Data Modeling in a Big Data World 2nd Edition Steve Hoberman All Chapters Instant Download
71 pages
A Review of Geological Modeling
No ratings yet
A Review of Geological Modeling
7 pages
Scopus 3 Full Artificial Intelligence
No ratings yet
Scopus 3 Full Artificial Intelligence
9 pages
(Tournier-07) ER OLAP Conceptual Model
No ratings yet
(Tournier-07) ER OLAP Conceptual Model
16 pages
Test Bank For Advancing Your Career Concepts in Professional Nursing 5th Edition
100% (56)
Test Bank For Advancing Your Career Concepts in Professional Nursing 5th Edition
5 pages
MatHEMATICS TEACHING and LEARNING FRAMEWORK Draft 6
No ratings yet
MatHEMATICS TEACHING and LEARNING FRAMEWORK Draft 6
92 pages
Annals of Software Engineering: Guest Editors Dilip Patel, PHD and Yingxu Wang, PHD
No ratings yet
Annals of Software Engineering: Guest Editors Dilip Patel, PHD and Yingxu Wang, PHD
10 pages
Food Safety Risk Assessment Tool
100% (3)
Food Safety Risk Assessment Tool
15 pages
Sushil 2018
No ratings yet
Sushil 2018
14 pages
Conceptual and Theoretical Framework
No ratings yet
Conceptual and Theoretical Framework
9 pages
Lecture2 3new
No ratings yet
Lecture2 3new
66 pages
Ga5 Assignment - Report Manual - Advanced Diploma
No ratings yet
Ga5 Assignment - Report Manual - Advanced Diploma
9 pages
LO1 Determine Database Requirements
No ratings yet
LO1 Determine Database Requirements
47 pages
Software Engineering P Questions
No ratings yet
Software Engineering P Questions
2 pages
QUANTATITIVE TECHNIQUES III
No ratings yet
QUANTATITIVE TECHNIQUES III
23 pages
Text Books
No ratings yet
Text Books
236 pages
AC4161
No ratings yet
AC4161
6 pages
Data Modeling 101
No ratings yet
Data Modeling 101
9 pages
Integrated Development of Space Systems - Design For AIT - Design For Assembly, Integration and Testing of Satellites - D4AIT
No ratings yet
Integrated Development of Space Systems - Design For AIT - Design For Assembly, Integration and Testing of Satellites - D4AIT
5 pages
MCSP-060 Project Guidelines
100% (1)
MCSP-060 Project Guidelines
24 pages
Dbms Final
No ratings yet
Dbms Final
19 pages
Redbook - Master Data Management With Infosphere
No ratings yet
Redbook - Master Data Management With Infosphere
596 pages
Sabsa Matrix 2009
No ratings yet
Sabsa Matrix 2009
1 page