Business Analytics, Volume II - A Data Driven Decision Making Approach For Business
Business Analytics, Volume II - A Data Driven Decision Making Approach For Business
Business Analytics, Volume II - A Data Driven Decision Making Approach For Business
VOLUME II
A Data-Driven Decision-Making
Approach for Business
Copyright © 2016. Business Expert Press. All rights reserved.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-17 07:59:14.
Business Analytics
Copyright © 2016. Business Expert Press. All rights reserved.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-17 07:59:14.
Praise for Business Analytics
“In this second volume on business analytics, Dr. Sahay provides a
useful overview of analytics in general with an emphasis on predictive
analytics. Given the booming interest in analytics and data science, his
book is timely and informative. It brings many terms, tools, and methods
together in a meaningful way. It is common for practitioners and even
scholars to conflate terms such as business intelligence, data analytics,
and data mining. Dr. Sahay clarifies such terms and helps differentiate
their meanings. I found the glossaries at the end of the early chapters to
be especially useful in making sense of all of the terms that have emerged
recently and are often used interchangeably.
Being an expert on quality management and Six Sigma, Dr. Sahay also
incorporated quality tools into the analytics process, something that is
rare, but in my opinion extremely important and helpful. Moreover, his
treatment of the tools for predictive analytics not only explains the tools,
but goes a step further in clarifying when each should be used and how
the tools fit together. Such clarification is often presented in tabular form,
which makes it easy to refer back to whenever the information is needed.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-17 07:59:14.
Business Analytics
A Data-Driven Decision-Making
Approach for Business
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-17 07:59:27.
Business Analytics: A Data-Driven Decision-Making Approach for Business,
Volume II (Predictive Analytics)
Copyright © Business Expert Press, LLC, 2020.
Business Expert Press Big Data, Business Analytics, and Smart Technology
Collection
Chennai, India
Cover image licensed by Ingram Image, StockPhotoSecrets.com
10 9 8 7 6 5 4 3 2 1
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-17 07:59:27.
Dedication
This book is dedicated to
Priyanka Nicole
Our Love and Joy
Copyright © 2016. Business Expert Press. All rights reserved.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-17 07:59:27.
Copyright © 2016. Business Expert Press. All rights reserved.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-17 07:59:27.
Abstract
This business analytics (BA) text discusses the models based on fact-based
data to measure past business performance to guide an organization in
visualizing and predicting future business performance and outcomes.
It provides a comprehensive overview of analytics in general with an
emphasis on predictive analytics. Given the booming interest in analytics
and data science, this book is timely and informative. It brings many
terms, tools, and methods of analytics together. The first three chap-
ters provide an introduction to BA, importance of analytics, types of
BA—descriptive, predictive, and prescriptive—along with the tools and
models. Business intelligence (BI) and a case on descriptive analytics are
discussed. Additionally, the book discusses the most widely used predic-
tive models, including regression analysis, forecasting, data mining, and
an introduction to recent applications of predictive analytics—machine
learning, neural networks, and artificial intelligence. The concluding
chapter discusses the current state, job outlook, and certifications in
analytics.
Keywords
analytics; business analytics; business intelligence; data analysis; decision
making; descriptive analytics; predictive analytics; prescriptive analytics;
statistical analysis; quantitative techniques; data mining; predictive mod-
eling; regression analysis; modeling; time series forecasting; optimization;
Copyright © 2016. Business Expert Press. All rights reserved.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-17 07:59:27.
Copyright © 2016. Business Expert Press. All rights reserved.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-17 07:59:27.
Contents
Preface...................................................................................................xi
Acknowledgments.................................................................................xvii
Chapter 1 Business Analytics at a Glance............................................1
Chapter 2 Business Analytics and Business Intelligence.....................23
Chapter 3 Analytics, Business Analytics, Data Analytics, and
How They Fit into the Broad Umbrella of Business
Intelligence......................................................................33
Chapter 4 Descriptive Analytics—Overview, Applications,
and a Case........................................................................57
Chapter 5 Descriptive versus Predictive Analytics.............................71
Chapter 6 Key Predictive Analytics Models (Predicting Future
Business Outcomes Using Analytic Models).....................83
Chapter 7 Regression Analysis and Modeling..................................103
Chapter 8 Time Series Analysis and Forecasting..............................195
Chapter 9 Data Mining: Tools and Applications in Predictive
Analytics........................................................................239
Chapter 10 Wrap-Up, Overview, Notes on Implementation, and
Current State of Business Analytics................................263
Appendices..........................................................................................281
Additional Readings............................................................................ 373
Copyright © 2016. Business Expert Press. All rights reserved.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-17 07:59:40.
Copyright © 2016. Business Expert Press. All rights reserved.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-17 07:59:40.
Preface
This book deals with business analytics (BA)—an emerging area in mod-
ern business decision making.
BA tools are also used to visualize and explore the patterns and trends
in the data to predict future business outcomes with the help of forecast-
ing and predictive modeling.
In this age of technology, companies collect massive amounts of data.
Successful companies view their data as an asset and use them to gain
a competitive advantage. These companies use BA tools as an organiza-
tional commitment to data-driven decision making. BA helps businesses
in making informed business decisions. It is also critical in automating
and optimizing business processes.
BA makes extensive use of data, statistical analysis, mathematical and
Copyright © 2016. Business Expert Press. All rights reserved.
• Descriptive analytics
• Predictive analytics
• Prescriptive analytics
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-17 07:59:48.
xii PREFACE
Each of the above categories uses different tools, and the use of these
analytics depend on the type of business and the operations a company
is involved in. For example, an organization may only use descriptive
analytics tools; whereas another company may use a combination of de-
scriptive and predictive modeling and analytics to predict future business
performance to drive business decisions.
The different types of analytics and the tools used in these analytics
are described below:
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-17 07:59:48.
PREFACE
xiii
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-17 07:59:48.
xiv PREFACE
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-17 07:59:48.
PREFACE
xv
The analytics tools come under the broad area of Business Intelligence
(BI) that incorporates Business Analytics (BA), data analytics, and
advanced analytics. All these areas come under the umbrella of BI and
use a number of visual and mathematical models.
Modeling is one of the most important parts of BA. Models are of
different types. An understanding of different types of models is critical
in selecting and applying the right model or models to solve business
problems. The widely used models are: (a) graphical models, (b) quantita-
tive models, (c) algebraic models, (d) spreadsheet models, and (e) other
analytic tools.
Most of the tools in descriptive, predictive, and prescriptive analyt-
ics are described using one or the other type of model which are usually
graphical, mathematical, or computer models. Besides these models, sim-
ulation and a number of other mathematical models are used in analytics.
BA is a vast area. It is not possible to provide a complete and in-depth
treatment of all the BA topics in one concise book; therefore, the book is
divided into two parts:
and the role and importance of these in the modern business decision
making. It introduces the different areas of BA: (1) descriptive analyt-
ics, (2) predictive analytics, and (3) prescriptive analytics. The tools and
topics covered under each area of these analytics along with their ap-
plications in decision-making process are discussed in the first volume.
The main focus of the first volume is descriptive analytics and its
applications.
The focus of this second volume is predictive analytics. The introductory
chapters of this volume outline the broad view of BI that constitutes not only
BA but also data analytics and advanced analytics. An overview of all these
areas is presented in the first two chapters followed by predictive analytics
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-17 07:59:48.
xvi PREFACE
topics which is the focus of this text. The topics and the chapters contained in
the second volume are outlined below. The specific topics covered in this second
volume are:
amar@realleansixsigmaquality.com
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-17 07:59:48.
Acknowledgments
I would like to thank the reviewers who took the time to provide excellent
insights, which helped shape this book.
I would especially like to thank Mr. Karun Mehta, a friend and
engineer. I greatly appreciate the numerous hours he spent correcting,
formatting, and supplying distinctive comments. The book would not
have been possible without his tireless effort.
I would like to express my gratitude to Prof. Susumu Kasai, Professor of
CSIS, for reviewing and providing invaluable suggestions.
I am very thankful to Prof. Edward Engh for his thoughtful advice
and counsel. Ed has been a wonderful friend and colleague.
Special thanks to Dr. Don Wardell, Professor of Operations Manage-
ment at the University of Utah. His comments and suggestions greatly
helped shape this book.
Special thanks are due to Mr. Anand Kumar, Domain Transformation
Leader at the Tata Consulting Services (TCS) for reviewing and providing
invaluable suggestions.
Thanks to all of my students for their input in making this book pos-
sible. They have helped me pursue a dream filled with lifelong learning.
This book couldn’t have been a reality without them.
I am indebted to senior acquisitions editor, Scott Isenberg; Charlene
Copyright © 2016. Business Expert Press. All rights reserved.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-17 08:00:01.
xviii ACKNOWLEDGMENTS
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-17 08:00:01.
Business Analytics
A Data-Driven Decision-Making
Approach for Business
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-17 07:59:27.
Business Analytics: A Data-Driven Decision-Making Approach for Business,
Volume II (Predictive Analytics)
Copyright © Business Expert Press, LLC, 2020.
Business Expert Press Big Data, Business Analytics, and Smart Technology
Collection
Chennai, India
Cover image licensed by Ingram Image, StockPhotoSecrets.com
10 9 8 7 6 5 4 3 2 1
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-17 07:59:27.
Dedication
This book is dedicated to
Priyanka Nicole
Our Love and Joy
Copyright © 2016. Business Expert Press. All rights reserved.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-17 07:59:27.
Copyright © 2016. Business Expert Press. All rights reserved.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-17 07:59:27.
Abstract
This business analytics (BA) text discusses the models based on fact-based
data to measure past business performance to guide an organization in
visualizing and predicting future business performance and outcomes.
It provides a comprehensive overview of analytics in general with an
emphasis on predictive analytics. Given the booming interest in analytics
and data science, this book is timely and informative. It brings many
terms, tools, and methods of analytics together. The first three chap-
ters provide an introduction to BA, importance of analytics, types of
BA—descriptive, predictive, and prescriptive—along with the tools and
models. Business intelligence (BI) and a case on descriptive analytics are
discussed. Additionally, the book discusses the most widely used predic-
tive models, including regression analysis, forecasting, data mining, and
an introduction to recent applications of predictive analytics—machine
learning, neural networks, and artificial intelligence. The concluding
chapter discusses the current state, job outlook, and certifications in
analytics.
Keywords
analytics; business analytics; business intelligence; data analysis; decision
making; descriptive analytics; predictive analytics; prescriptive analytics;
statistical analysis; quantitative techniques; data mining; predictive mod-
eling; regression analysis; modeling; time series forecasting; optimization;
Copyright © 2016. Business Expert Press. All rights reserved.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-17 07:59:27.
Copyright © 2016. Business Expert Press. All rights reserved.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-17 07:59:27.
Contents
Preface...................................................................................................xi
Acknowledgments.................................................................................xvii
Chapter 1 Business Analytics at a Glance............................................1
Chapter 2 Business Analytics and Business Intelligence.....................23
Chapter 3 Analytics, Business Analytics, Data Analytics, and
How They Fit into the Broad Umbrella of Business
Intelligence......................................................................33
Chapter 4 Descriptive Analytics—Overview, Applications,
and a Case........................................................................57
Chapter 5 Descriptive versus Predictive Analytics.............................71
Chapter 6 Key Predictive Analytics Models (Predicting Future
Business Outcomes Using Analytic Models).....................83
Chapter 7 Regression Analysis and Modeling..................................103
Chapter 8 Time Series Analysis and Forecasting..............................195
Chapter 9 Data Mining: Tools and Applications in Predictive
Analytics........................................................................239
Chapter 10 Wrap-Up, Overview, Notes on Implementation, and
Current State of Business Analytics................................263
Appendices..........................................................................................281
Additional Readings............................................................................ 373
Copyright © 2016. Business Expert Press. All rights reserved.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-17 07:59:40.
Copyright © 2016. Business Expert Press. All rights reserved.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-17 07:59:40.
Preface
This book deals with business analytics (BA)—an emerging area in mod-
ern business decision making.
BA tools are also used to visualize and explore the patterns and trends
in the data to predict future business outcomes with the help of forecast-
ing and predictive modeling.
In this age of technology, companies collect massive amounts of data.
Successful companies view their data as an asset and use them to gain
a competitive advantage. These companies use BA tools as an organiza-
tional commitment to data-driven decision making. BA helps businesses
in making informed business decisions. It is also critical in automating
and optimizing business processes.
BA makes extensive use of data, statistical analysis, mathematical and
Copyright © 2016. Business Expert Press. All rights reserved.
• Descriptive analytics
• Predictive analytics
• Prescriptive analytics
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-17 07:59:48.
xii PREFACE
Each of the above categories uses different tools, and the use of these
analytics depend on the type of business and the operations a company
is involved in. For example, an organization may only use descriptive
analytics tools; whereas another company may use a combination of de-
scriptive and predictive modeling and analytics to predict future business
performance to drive business decisions.
The different types of analytics and the tools used in these analytics
are described below:
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-17 07:59:48.
PREFACE
xiii
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-17 07:59:48.
xiv PREFACE
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-17 07:59:48.
PREFACE
xv
The analytics tools come under the broad area of Business Intelligence
(BI) that incorporates Business Analytics (BA), data analytics, and
advanced analytics. All these areas come under the umbrella of BI and
use a number of visual and mathematical models.
Modeling is one of the most important parts of BA. Models are of
different types. An understanding of different types of models is critical
in selecting and applying the right model or models to solve business
problems. The widely used models are: (a) graphical models, (b) quantita-
tive models, (c) algebraic models, (d) spreadsheet models, and (e) other
analytic tools.
Most of the tools in descriptive, predictive, and prescriptive analyt-
ics are described using one or the other type of model which are usually
graphical, mathematical, or computer models. Besides these models, sim-
ulation and a number of other mathematical models are used in analytics.
BA is a vast area. It is not possible to provide a complete and in-depth
treatment of all the BA topics in one concise book; therefore, the book is
divided into two parts:
and the role and importance of these in the modern business decision
making. It introduces the different areas of BA: (1) descriptive analyt-
ics, (2) predictive analytics, and (3) prescriptive analytics. The tools and
topics covered under each area of these analytics along with their ap-
plications in decision-making process are discussed in the first volume.
The main focus of the first volume is descriptive analytics and its
applications.
The focus of this second volume is predictive analytics. The introductory
chapters of this volume outline the broad view of BI that constitutes not only
BA but also data analytics and advanced analytics. An overview of all these
areas is presented in the first two chapters followed by predictive analytics
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-17 07:59:48.
xvi PREFACE
topics which is the focus of this text. The topics and the chapters contained in
the second volume are outlined below. The specific topics covered in this second
volume are:
amar@realleansixsigmaquality.com
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-17 07:59:48.
Acknowledgments
I would like to thank the reviewers who took the time to provide excellent
insights, which helped shape this book.
I would especially like to thank Mr. Karun Mehta, a friend and
engineer. I greatly appreciate the numerous hours he spent correcting,
formatting, and supplying distinctive comments. The book would not
have been possible without his tireless effort.
I would like to express my gratitude to Prof. Susumu Kasai, Professor of
CSIS, for reviewing and providing invaluable suggestions.
I am very thankful to Prof. Edward Engh for his thoughtful advice
and counsel. Ed has been a wonderful friend and colleague.
Special thanks to Dr. Don Wardell, Professor of Operations Manage-
ment at the University of Utah. His comments and suggestions greatly
helped shape this book.
Special thanks are due to Mr. Anand Kumar, Domain Transformation
Leader at the Tata Consulting Services (TCS) for reviewing and providing
invaluable suggestions.
Thanks to all of my students for their input in making this book pos-
sible. They have helped me pursue a dream filled with lifelong learning.
This book couldn’t have been a reality without them.
I am indebted to senior acquisitions editor, Scott Isenberg; Charlene
Copyright © 2016. Business Expert Press. All rights reserved.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-17 08:00:01.
xviii ACKNOWLEDGMENTS
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-17 08:00:01.
CHAPTER 1
Business Analytics
at a Glance
Chapter Highlights
• Introduction to Business Analytics—What Is It?
• Analytics and Business Analytics
• Business Analytics and Its Importance in Modern Business Decisions
• Types of Business Analytics
?? Tools of Business Analytics
• Predictive Analytics
?? Most Widely Used Predictive Analytics Models
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-17 08:00:17.
2 BUSINESS ANALYTICS, VOLUME II
goes well beyond simply presenting data and creating visuals, crunching
numbers, and computing statistics. The essence of analytics lies in the ap-
plication—making sense from the data using prescribed methods of sta-
tistical analysis, mathematical and statistical models, and logic to draw
meaningful conclusion from the data. It uses methods, logic, intelligence,
algorithms, and models that enable us to reason, plan, organize, analyze,
solve problems, understand, innovate, and make data-driven decisions, in-
cluding the decisions from dynamic real-time data.
BA covers a vast area. It is a complex field that encompasses visualiza-
tion, statistics and modeling, optimization, simulation-based modeling,
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-17 08:00:17.
Business Analytics at a Glance 3
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-17 08:00:17.
4 BUSINESS ANALYTICS, VOLUME II
from reporting in business intelligence (BI). Analytics models use the data
with a view to drawing out new, useful insights to improve business plan-
ning and boost future performance. BA helps the company adapt to the
changes and take advantage of future developments.
One of the major tools of analytics is data mining, which is a part
of predictive analytics. In business, data mining is used to analyze huge
amount of business data. Business transaction data, along with other
customer and product-related data, are continuously stored in the da-
tabases. The data mining software is used to analyze the vast amount
of customer data to reveal hidden patterns, trends, and other customer
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-17 08:00:17.
Business Analytics at a Glance 5
• Descriptive analytics
• Predictive analytics
• Prescriptive analytics
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-17 08:00:17.
6 BUSINESS ANALYTICS, VOLUME II
Each of the above mentioned categories uses different tools, and the
use of these analytics depends on the type of business and the operations
a company is involved in. For example, one organization may use only
descriptive analytics tools or a combination of descriptive and predictive
modeling and analytics to predict future business performance to drive
business decisions. Other companies may use prescriptive analytics to op-
timize business processes.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-17 08:00:17.
Business Analytics at a Glance 7
include the commonly used graphs and charts along with some newly
developed graphical tools such as bullet graphs, tree maps, and data dash-
boards. Dashboards are now becoming very popular with big data. They
are used to display the multiple views of the business data graphically.
The other aspect of descriptive analytics is an understanding of num-
erical methods, including the measures of central tendency, measures of
position, measures of variation, measures of shape, and how different
measures and statistics are used to draw conclusions and make decision
from the data. Some other topics of interest are the understanding of em-
pirical rule and the relationship between two variables—the covariance
and correlation coefficient. The tools of descriptive analytics are helpful in
understanding the data, identifying the trend or patterns in the data, and
making sense from the data contained in the databases of companies. The
understanding of databases, data warehouse, web search and query, and
big data concepts are important in extracting and applying descriptive
analytics tools. A number of statistical software are used for statistical an-
alysis. Widely used software are SAS, MINITAB, and R—programming
language for statistical computing. Volume I of this book is about descrip-
tive analytics that deals with a number of applications and a detailed case
to explain and implement the applications.
Tools of descriptive analytics: Figure 1.1 outlines the tools and
methods used in descriptive analytics. These tools are explained in subse-
quent chapters.
Copyright © 2016. Business Expert Press. All rights reserved.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-17 08:00:17.
8 BUSINESS ANALYTICS, VOLUME II
Predictive Analytics
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-17 08:00:17.
Business Analytics at a Glance 9
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-17 08:00:17.
10 BUSINESS ANALYTICS, VOLUME II
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-17 08:00:17.
Business Analytics at a Glance 11
Machine learning and data mining are similar in some ways and often
overlap in applications. Machine learning is used for prediction based on
known properties learned from the training data, whereas data mining
algorithms are used for discovery of (previously) unknown patterns. Data
mining is concerned with knowledge discovery in databases (or KDD).
Data mining uses many machine learning methods. On the other
hand, machine learning also employs data mining methods as “unsuper-
vised learning” or as a preprocessing step to improve learner accuracy.
The goals are somewhat different. The performance of machine learn-
ing is usually evaluated with respect to the ability to reproduce known
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-17 08:00:17.
12 BUSINESS ANALYTICS, VOLUME II
Machine learning tasks are typically classified into following three broad
categories, depending on the nature of the learning “signal” or “feedback”
available to a learning system. These are as follows [20]:
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-17 08:00:17.
Business Analytics at a Glance 13
They are usually used to model complex relationships between inputs and
outputs, to find patterns in data, or to capture the statistical structure in
an unknown joint probability distribution between observed variables.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-17 08:00:17.
14 BUSINESS ANALYTICS, VOLUME II
brain processes light and sound into vision and hearing. Some successful
applications of deep learning are computer vision and speech recognition.
Note: Neural networks use machine learning algorithms extensively, whereas machine learn-
ing is an application of artificial intelligence that automates analytical model building by
using algorithms that iteratively learn from data without being explicitly programmed [1].
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-17 08:00:17.
Business Analytics at a Glance 15
Copyright © 2016. Business Expert Press. All rights reserved.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-17 08:00:17.
16 BUSINESS ANALYTICS, VOLUME II
of operations management can be divided into mainly three areas: (a) plan-
ning, (b) analysis, and (c) control tools. The analysis part is the prescriptive
analysis part that uses the operations research, management science, and
simulation. The control part is used to monitor and control the product and
service quality. The prescriptive analytics models are shown in Figure 1.5.
Figure 1.6 outlines the tools of descriptive, predictive, and prescrip-
tive analytics tools together. This flow chart is helpful in outlining the dif-
ference and details of the tools for each type of analytics. The flow chart in
Figure 1.6 shows the vast areas of business analytics (BA) that come under
the umbrella of business intelligence (BI).
Types of Models
(i) Graphical models, (ii) quantitative models, (iii) algebraic models,
(iv) spreadsheet models, (v) simulation models, (vi) process optimization
models, and (vii) other—predictive and prescriptive models.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-17 08:00:17.
Business Analytics at a Glance 17
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-17 08:00:17.
18 BUSINESS ANALYTICS, VOLUME II
The first volume of this book provided the details of descriptive analytics
and outlined the tools of predictive and prescriptive analytics. The predic-
tive analytics is about predicting the future business outcomes. The sec-
ond volume of this book is about predictive modeling which provides the
background and the models used in predictive modeling with applications
and cases. We have explained the distinction between descriptive, predic-
tive, and prescriptive analytics. The prescriptive analytics is about optimiz-
ing certain business activities. A complete treatment of the topics used in
predictive and prescriptive analytics is not possible in one brief volume of
analytics book; therefore, this volume II focuses on predictive modeling.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-17 08:00:17.
Business Analytics at a Glance 19
Summary
Business analytics (BA) uses data, statistical analysis, mathematical and
statistical modeling, data mining, and advanced analytics tools, includ-
ing forecasting and simulation, to explore, investigate, and understand
the business performance. Through data, BA helps to gain insight and
drive business planning and decisions. The tools of BA focus on under-
standing business performance based on the data and a number of mod-
els derived from statistics, management science, and different types of
analytics tools.
BA helps companies to make informed business decisions and can
be used to automate and optimize business processes. Data-driven com-
panies treat their data as a corporate asset and leverage it for competi-
tive advantage. Successful business analytics depends on data quality and
skilled analysts who understand the technologies. BA is an organizational
commitment to data-driven decision making.
This chapter provided an overview of the field of BA. The tools
of BA, including the descriptive, predictive, and prescriptive analyt-
ics along with advanced analytics tools were discussed. This chapter
also introduced a number of terms related to and used in conjunction
with BA. Flow diagrams outlining the tools of each of the descriptive,
predictive, and prescriptive analytics were presented. This second vol-
ume of business analytics book is a continuation of the first volume.
A preview of this second volume entitled Business Analytics: A Data-
Driven Decision-Making Approach for Business: Volume II is provided
in this chapter.
Copyright © 2016. Business Expert Press. All rights reserved.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-17 08:00:17.
20 BUSINESS ANALYTICS, VOLUME II
Gartner was credited with the three “Vs” of big data. Gartner’s definition of big
data is as follows: high-volume, high-velocity, and high-variety information assets
that demand cost-effective, innovative forms of information processing that en-
able enhanced insight, decision making, and process automation.
Gartner is referring to the size of data (large volume), speed with which the
data is being generated (velocity), and the different types of data (variety), and
this seemed to align with the combined definition of Wikipedia and O’Reilly
media.
Mike Gualtieri of Forrester said that the three “Vs” mentioned by Gartner
are just measures of data. He insisted that following definition is more actionable
and can be seen as follows:
Big data is the frontier of a firm’s ability to store, process, and access (SPA)
all the data it need to operate effectively, make decisions, reduce risks, and serve
customers.
Algorithm A mathematical formula or statistical process used to analyze data.
Analytics Involves drawing insights from the data, including big data. Analyt-
ics uses simple to advanced tools depending upon the objectives. Analytics may
involve visual display of data (charts and graphs), descriptive statistics, making
predictions, forecasting future outcomes, or optimizing business processes. The
more recent terms is Big Data Analytics that involves making inferences using
very large sets of data. Thus, analytics can take different form depending on the
objectives and the decisions to be made. They may be descriptive, predictive, or
prescriptive analytics. These are briefly described here.
Descriptive Analytics If you are using charts and graphs or time series plots to
study the demand or the sales patters, or the trend for the stock market, you are
using descriptive analytics. Also, calculating statistics from the data such as the
mean, variance, median, or percentiles are all examples of descriptive analytics.
Some of the recent software are designed to create dashboards that are useful in
analyzing business outcomes. The dashboards are examples of descriptive analyt-
ics. Of course, a lot more details can be created from the data by plotting and
performing simple analyses.
Predictive Analytics As the name suggests, predictive analytics is about predict-
ing the future business outcomes. It also involves forecasting demand, sales, and
Copyright © 2016. Business Expert Press. All rights reserved.
profits for a company. The commonly used techniques for predictive analytics are
different types of regression and forecasting models. Some advanced techniques
are data mining, machine learning, neural networks, and advanced statistical
models. We will discuss the regression and forecasting techniques as well as the
related terms later in this book.
Prescriptive Analytics Prescriptive analytics involves analyzing the results of the
predictive analytics and “prescribes” the best category to target and minimize or
maximize the objective(s). It builds on predictive analytics and often suggests
the best course of action, leading to best possible solution. It is about optimizing
(maximizing or minimizing) an objective function. The tools of prescriptive ana-
lytics are now used with big data to make data-driven decisions by selecting the
best course of actions involving multicriteria decision variables. Some examples
of prescriptive analytics models are linear and nonlinear optimization models,
different types of simulations, and others.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-17 08:00:17.
Business Analytics at a Glance 21
Data Mining Data mining involves finding meaningful patterns and deriving
insights from large data sets. It is closely related to analytics. Data mining uses
statistics, machine learning, and artificial intelligence techniques to derive mean-
ingful patterns.
Analytical Models The most commonly used models that are parts of descrip-
tive, predictive, or prescriptive analytics are graphical models, quantitative mod-
els, algebraic models, spreadsheet models, simulation models, process models,
and other analytic models—predictive and prescriptive models.
IoT Stands for Internet of Things or IOT. It means the interconnection of com-
puting devices in embedded objects (sensors, cars, fridges, etc.) via Internet
with capabilities of sending or receiving data. The devices in IOT generate huge
amounts of data providing opportunities for big data applications and data ana-
lytics opportunities.
Machine Learning Machine learning is a method of designing systems that can
learn, adjust, and improve based on the data fed to them. Machine learning works
based on predictive and statistical algorithms that are provided to these machines.
The algorithms are designed to learn and improve as more data flow through the
system. Fraud detection, e-mail spam, and GPS systems are some examples of
machine learning applications.
R “R” is a programming language for statistical computing. It is one of the popu-
lar languages in data science.
Structured vs. Unstructured Data Refer to the “volume” and “variety”—the
“Vs” of big data. Structured data is the data that can be stored in the relational
databases. This type of data can be analyzed and organized in such a way that
can be related to other data via tables. Unstructured data cannot be directly put
in the databases or analyzed or organized directly. Some examples are e-mail/text
messages, social media posts, and recorded human speech, etc.
Copyright © 2016. Business Expert Press. All rights reserved.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-17 08:00:17.
Copyright © 2016. Business Expert Press. All rights reserved.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-17 08:00:17.
CHAPTER 2
Chapter Highlights
• Business Analytics and Business Intelligence—Overview
• Types of Business Analytics and Their Objectives
• Input to Business Analytics, Types of Business Analytics, and
Their Purpose
• Business Intelligence and Business Analytics: Differences
• Business Intelligence and Business Analytics: A Comparison
• Summary
are used interchangeably in the literature and are related to each other.
Analytics is a more general term and is about analyzing the data using
data visualization and statistical modeling to help companies make ef-
fective business decisions. The tools used in analytics, BA, and BI often
overlap. The overall analytics process includes descriptive analytics, in-
volving processing and analyzing big data, applying statistical techniques
(numerical methods of describing data, such as measures of central ten-
dency, measures of variation, etc.), and statistical modeling to describe the
data. Analytics also uses predictive analytics methods, such as regression,
forecasting, data mining, and prescriptive analytics tools of management
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-17 08:00:27.
24 BUSINESS ANALYTICS, VOLUME II
science and operations research. All these tools help businesses in making
informed business decisions. The analytics tools are also critical in auto-
mating and optimizing business processes.
The types of analytics are divided into different categories. Accord-
ing to the Institute of Operations Research and Management Science
(INFORMS)—(www.informs.org)—the field of analytics is divided into
three broad categories: descriptive, predictive, and prescriptive. We dis-
cussed each of the three categories along with the tools used in each one.
The tools used in analytics may overlap and the use of one or the other
type of analytics depends on the applications. A firm may use only the
descriptive analytics tools or a combination of descriptive and predictive
analytics depending upon the types of applications, analyses, and deci-
sions they encounter.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-17 08:00:27.
Business Analytics and Business Intelligence 25
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-17 08:00:27.
26 BUSINESS ANALYTICS, VOLUME II
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-17 08:00:27.
Business Analytics and Business Intelligence 27
tical tools.
• Prerequisite for predictive modeling:
(a) probability and probability dis-
tributions and their role in decision
making, (b) sampling and inference
procedures, (c) estimation and confi-
dence intervals, (d) hypothesis testing/
inference procedures for one and two
population parameters, and (e) chi-
square and nonparametric tests.
• Other tools of predictive analytics: ma-
chine learning, artificial intelligence,
neural networks, and deep learning
(discussed later).
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-17 08:00:27.
28 BUSINESS ANALYTICS, VOLUME II
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-17 08:00:27.
Business Analytics and Business Intelligence 29
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-17 08:00:27.
30 BUSINESS ANALYTICS, VOLUME II
Figure 2.4 Comparing business intelligence (BI) and business analytics (BA)
the business. The information about what went wrong or what is happen-
ing in the business provides opportunities for improvement.
BI may be seen as the descriptive part of data analysis but when combined
with other areas of analytics—predictive, advanced, and data analytics—
Copyright © 2016. Business Expert Press. All rights reserved.
provides a powerful combination of tools. These tools enable the analyst and
data scientists to look into the business data, the current state of the business,
and make use of predictive, prescriptive, data analytics tools as well as the
powerful tools of data mining to guide an organization in business planning,
predicting the future outcomes, and make effective data-driven decisions.
The flow chart in Figure 2.4 also outlines the purpose of BA program
and briefly mentions the tools and the objectives of BA. Different types of
analytics and their tools are discussed earlier and are shown in Table 2.2.
The terms business analytics (BA) and business intelligence (BI) are
used interchangeably and often the tools are combined and referred to as
business analytics or business intelligence program. Figure 2.5 shows the
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-17 08:00:27.
Figure 2.5 Business intelligence (BI) and business analytics (BA) tools
Copyright © 2016. Business Expert Press. All rights reserved.
31
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-17 08:00:27.
32 BUSINESS ANALYTICS, VOLUME II
tools of BI and BA. Note that the tools overlap in the two areas. Some of
these tools are common to both.
Summary
This chapter provided an overview of business analytics (BA) and busi-
ness intelligence (BI) and outlines the similarities and differences between
them. The BA, different types of analytics—descriptive, predictive, and
prescriptive—and the overall analytics process were explained using a
flow diagram. The input to the analytics process and the types of ques-
tions each analytics attempts to answer along with their tools were dis-
cussed in detail. The chapter also discussed BI and a comparison between
BA and BI. Different tools used in each type of analytics—descriptive,
predictive, and prescriptive—and their relationship were described. The
tools of analytics overlap in applications, and in many cases, a combina-
tion of these tools are used. The interconnection between different types
of analytics tools were explained. Finally, a comparison between the BI
and BA was presented. BA, data, analytics, and advanced analytics fall
under the broad area of BI. The broad scope of BI and the distinction
between the BI and BA tools were outlined.
Copyright © 2016. Business Expert Press. All rights reserved.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-17 08:00:27.
CHAPTER 3
Analytics, Business
Analytics, Data Analytics,
and How They Fit into the
Broad Umbrella of Business
Intelligence
Chapter Highlights
• Introduction: Analytics, Business Analytics, and Data Analytics
?? Analytics
?? Business Analytics
• Business Intelligence—Defined
Copyright © 2016. Business Expert Press. All rights reserved.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-17 08:00:37.
34 BUSINESS ANALYTICS, VOLUME II
?? Process Mining
?? Web Analytics
?? Financial Analytics
• Advanced Analytics
• BI Programs in Companies
• Specific Areas of BI Applications in an Enterprise
• Success Factors for BI Applications
• Comparing BI with BA
• Difference between BA and BI
• Glossary of Terms Related to Business Intelligence
• Summary
Analytics
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-17 08:00:37.
ANALYTICS, BUSINESS ANALYTICS, DATA ANALYTICS 35
(BA) goes beyond simply presenting data and creating visuals, crunch-
ing numbers, and computing statistics. The essence of analytics lies in
the application—making sense from the data using prescribed statistical
methods, tools, and logic to draw meaningful conclusion from the data.
It uses logic, learning, intelligence, and mental models that enable us to
reason, organize, analyze, and solve problems, and understand the data,
learn, and make data-driven decisions.
Business Analytics
Business analytics (BA) covers a vast area. It is a complex field that en-
compasses visualization, statistics, statistical analysis, and modeling. It
uses descriptive, predictive, and prescriptive analytics, including text and
speech analytics, web analytics, decision processes, and much more.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-17 08:00:37.
36 BUSINESS ANALYTICS, VOLUME II
Before the data can be used effectively for analysis, the following data
preparation steps are essential:
1. Data cleansing
2. Scripting
Copyright © 2016. Business Expert Press. All rights reserved.
3. Data transformation
4. Data warehousing
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-17 08:00:37.
ANALYTICS, BUSINESS ANALYTICS, DATA ANALYTICS 37
Accuracy Completeness
Update status Relevance
Consistency across data sources Reliability
Appropriate presentation Accessibility
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-17 08:00:37.
38 BUSINESS ANALYTICS, VOLUME II
Business Intelligence—Defined
According to David Loshin, business intelligence (BI) is “…the processes,
technologies and tools needed to turn data into information, information into
knowledge, and knowledge into plans that drive profitable business actions.”
According to Larissa Moss’, BI is “… an architecture and a collection of
integrated operational as well as decision-support applications and databases
that provide the business community easy access to business data.”
BI is a technology-driven process for processing and analyzing data to
make sense from huge quantities of data that businesses collect and obtain
from various sources. In a broad sense, BI is both visualization and ana-
lytics. The purpose of visualization or graphic presentation of data is to
obtain meaningful and useful information to help management, business
managers, and other end-users make more-informed business decisions.
BI uses a wide variety of tools, applications, and methodologies that en-
able organizations to collect data from internal systems and processes as
well as external sources. The collected data may be both structured and
unstructured. The first challenge is to prepare the data to run queries,
perform analysis, and create reports.
One of the major tasks is to create dashboards and other forms of
Copyright © 2016. Business Expert Press. All rights reserved.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-17 08:00:37.
ANALYTICS, BUSINESS ANALYTICS, DATA ANALYTICS 39
technology and computing power, visuals and data dashboards are com-
monly used in business reporting.
The BI tools, technologies, and technical architectures are used in
the collection, analysis, presentation, and dissemination of business in-
formation. The analysis of business data provides historical as well as
current and future views of the business performance. Specialized data
analysis and software are now available that are capable of processing
and analyzing big data. They can create multiple views of the busi-
ness performance in form of dashboards, which are extremely helpful
in displaying current business performance. The big data software is
now being used for analyzing vast amount of data. They are extremely
helpful in the decision-making process. Besides data visualization, a
number of models described earlier are used to predict and optimize
future business outcomes.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-17 08:00:37.
40 BUSINESS ANALYTICS, VOLUME II
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-17 08:00:37.
ANALYTICS, BUSINESS ANALYTICS, DATA ANALYTICS 41
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-17 08:00:37.
42 BUSINESS ANALYTICS, VOLUME II
Reporting
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-17 08:00:37.
ANALYTICS, BUSINESS ANALYTICS, DATA ANALYTICS 43
Data mining involves exploring new patterns and relationships from the
collected data. Data mining is a part of predictive analytics. It involves
processing and analyzing huge amount of data to extract useful infor-
mation and patterns hidden in the data. The overall goal of data min-
Copyright © 2016. Business Expert Press. All rights reserved.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-17 08:00:37.
44 BUSINESS ANALYTICS, VOLUME II
Process Mining
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-17 08:00:37.
ANALYTICS, BUSINESS ANALYTICS, DATA ANALYTICS 45
deriving patterns within the structured data, and finally evaluation and
interpretation of the output. “High quality” in text mining usually refers
to some combination of relevance (how well a retrieved document or set
of documents meets the information need of the user).
Typical text mining tasks include text categorization, text clustering [1],
concept/entity extraction, production of granular taxonomies, sentiment
analysis, document summarization, and entity relation modeling (i.e.,
learning relations between named entities).
Text analysis involves information retrieval, lexical analysis to study
word frequency distributions, pattern recognition, information extraction,
data mining techniques including link and association analysis, visualiza-
tion, and predictive analytics. The overall goal is to transform text into data
for analysis using natural language processing (NLP) [2] and analytical
methods.
A typical application is to scan a set of documents written in a natural
language. It is also known as ordinary language—any language that has
evolved naturally in humans through use and repetition without con-
scious planning or premeditation. Natural languages can take different
forms, such as speech or signing (sign language). They are distinguished
from constructed and formal languages such as those used to program
computers or to study logic [17].
Text Analytics
The term text analytics describes a set of linguistic applications (the sci-
Copyright © 2016. Business Expert Press. All rights reserved.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-17 08:00:37.
46 BUSINESS ANALYTICS, VOLUME II
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-17 08:00:37.
ANALYTICS, BUSINESS ANALYTICS, DATA ANALYTICS 47
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-17 08:00:37.
48 BUSINESS ANALYTICS, VOLUME II
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-17 08:00:37.
ANALYTICS, BUSINESS ANALYTICS, DATA ANALYTICS 49
sources internal to the business such as financial and operations data (in-
ternal data). When combined, external and internal data can provide a
more complete picture, which in effect, creates an “intelligence” that can-
not be derived by any singular set of data [3].
BI along with BA empower organizations to gain a better under-
standing of the existing markets and customer behavior. The tools of
BI are being used to study the markets, analyze massive amounts of
data to learn about customer behavior, conduct risk analysis, assess
demand and suitability of products and services for different m arket
segments, and predict and optimize business processes to name a
few [10–12].
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-17 08:00:37.
50 BUSINESS ANALYTICS, VOLUME II
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-17 08:00:37.
ANALYTICS, BUSINESS ANALYTICS, DATA ANALYTICS 51
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-17 08:00:37.
52 BUSINESS ANALYTICS, VOLUME II
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-17 08:00:37.
ANALYTICS, BUSINESS ANALYTICS, DATA ANALYTICS 53
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-17 08:00:37.
54 BUSINESS ANALYTICS, VOLUME II
Summary
This chapter discussed analytics, business analytics (BA), data analyt-
ics (DA), and business intelligence (BI) as decision-making tools in
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-17 08:00:37.
ANALYTICS, BUSINESS ANALYTICS, DATA ANALYTICS 55
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-17 08:00:37.
56 BUSINESS ANALYTICS, VOLUME II
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-17 08:00:37.
CHAPTER 4
Descriptive Analytics—
Overview, Applications,
and a Case
Chapter Highlights
• Overview: Descriptive Analytics
• Descriptive Analytics—Applications—A Business
Analytics Case
• Case Study: Buying Pattern of Online Customers in a Large
Department Store
• Summary
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-17 08:00:47.
58 BUSINESS ANALYTICS, VOLUME II
Analytics Case
A case analysis showing different aspects of descriptive analytics is presented
here. The case demonstrates the graphical and numerical analyses per-
formed in an online order database of a retail store and is described below.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-17 08:00:47.
Figure 4.1 Tools and methods of descriptive analytics
Copyright © 2016. Business Expert Press. All rights reserved.
59
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-17 08:00:47.
60 BUSINESS ANALYTICS, VOLUME II
customers placing orders online. As the orders are placed, customer infor-
mation is recorded in the database. Data on several categorical and num-
erical values are recorded. The categorical variables shown in the data file
are day of the week, time (morning, midday), payment type (credit, debit
cards, etc.), region of the country order was placed from, order volume,
sale or promotion item, free shipping offer, gender, and customer survey
rating. The quantitative variables include order quantity and the dollar
value of the order placed or “Total Orders.” Table 4.1 shows the part of
the data.
The operations manager of the store wants to understand the buying
pattern of the customers by summarizing and displaying the data visu-
ally and numerically. He believes that using the descriptive analytics tools
including the data visualization tools, numerical methods, graphical dis-
plays, dashboards, and tables of collected data can be created to gain more
insight into the online order process. They will also provide opportunities
for improving the process.
The manager hired an intern and gave her the responsibility to pre-
pare a descriptive analytics summary of the customer data using graphical
and numerical tools that can help understand the buying pattern of the
customers and help improve the online order process to attract more on-
line customers to the store.
The intern was familiar with one of the tools available in EXCEL—
the Pivot Table/Pivot Chart that she thought can be used in extracting
information from a large database. In this case, the pivot tables can help
break the data down by categories so that useful insight can be obtained.
Copyright © 2016. Business Expert Press. All rights reserved.
For example, this tool can create a table of orders received by the geo-
graphical region or summarize the orders by the day or time of the week.
She performed analyses on the data to answer the questions and concerns
the manager expressed in the meeting. As part of the analysis, the follow-
ing graphs, tables, and numerical analyses were performed.
1. A pivot table, a bar chart, and a pie chart of the pivot table providing
a summary of number of orders received on each day of the week were
created to visually see the orders received by the online department
on each day (Figures 4.2 and 4.3). The table and graphs show that the
maximum number of orders were received on Saturday and Sunday.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-17 08:00:47.
Copyright © 2016. Business Expert Press. All rights reserved.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
Wed Afternoon MasterCard South High 7 1 Yes 215.69 Male Excellent
Wed Afternoon Store Card South Low 3 0 No 80.89 Male Excellent
Thurs Afternoon MasterCard South High 8 0 Yes 184.19 Male Good
Fri Afternoon Store Card South Medium 4 1 Yes 181.28 Male Good
Fri Afternoon MasterCard South Medium 4 1 Yes 158.96 Male Poor
61
Fri Afternoon Store Card South Medium 4 1 Yes 198.28 Male Poor
62 BUSINESS ANALYTICS, VOLUME II
2. Table 4.2 and Figure 4.4 show the count of number of orders by the
time of the day (morning, midday, etc.). A bar chart and a pie chart of
the pivot table were created to visually see the orders received online
by the time of day. The pie chart shows both the numbers and the
percent for each category. The table and the pie chart indicate that
more orders are placed during night hours.
3. Orders by the region: The bar chart and the pie chart (Figures 4.5 and
4.6) summarize the number of orders by the region. These plots show
that the maximum orders were received from the North and South
regions. Marketing efforts are needed to target the other regions.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-17 08:00:47.
Descriptive Analytics—Overview, Applications, and a Case 63
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-17 08:00:47.
64 BUSINESS ANALYTICS, VOLUME II
4. A pivot table (Table 4.3) and a bar graph (Figure 4.7) were created
to summarize the customer rating by gender where the row labels
show “Gender” and the column labels show the count of “Customer
Survey Ratings” (excellent, good, fair, poor). A bar chart of the count
of “Customer Survey Ratings” (excellent, good, fair, poor) on the
y-axis and gender on the x-axis is shown below the table. This infor-
mation provided the customer opinion and was important to view
and improve the process.
Grand
Row Labels Excellent Fair Good Poor Total
Female 25 48 45 38 156
Male 89 62 110 83 344
Grand total 114 110 155 121 500
5. The descriptive statistics of the “total orders ($)” was calculated and
displayed in Table 4.4 and the plot below. The statistics show the
measures of central tendency and the measures of variation along
with other useful statistics of the total orders.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-17 08:00:47.
Descriptive Analytics—Overview, Applications, and a Case 65
Maximum
Minimum
SE Mean
Variable
Median
Mean
Q1
Q3
N*
N
Total 500 0 223.87 3.77 84.23 30.09 167.95 252.62 287.54 371.40
order
($)
clude that the total orders data are left skewed so that Chebyshev’s rule
can be applied. This rule applies to any distribution, symmetrical or
skewed, and relates the mean and standard deviation to provide more in-
sight. This rule is too general and does not provide definite conclusions.
More definite conclusions can be drawn using the other widely used
rule known as the empirical rule that applies to symmetrical or normal
distribution. This rule also provides a relationship between the mean and
standard deviation of the data and provides a more definite conclusion.
7. If the total orders data can be assumed to be approximately symmetri-
cal, what conclusions can we draw about the “total orders” (Figure 4.8)
received? Use the mean and standard deviation calculated in part (5).
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-17 08:00:47.
Figure 4.8 Graphical summary of the total orders data
Copyright © 2016. Business Expert Press. All rights reserved.
66
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-17 08:00:47.
Descriptive Analytics—Overview, Applications, and a Case 67
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-17 08:00:47.
Figure 4.9 A dashboard of online orders data
Copyright © 2016. Business Expert Press. All rights reserved.
68
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-17 08:00:47.
Descriptive Analytics—Overview, Applications, and a Case 69
gain insight and learn from the data. These tools help to understand what
has happened in the past and is very helpful in predicting future business
outcome. Predictive analytics tools help answer these questions. The rest
of the book explores predictive analytics tools and applications.
Summary
In this chapter, we provided a brief description of descriptive analytics
and a case to illustrate the tools and applications of visual techniques used
in descriptive analytics. The descriptive analytics is critical in studying the
current state of the business and to learn what has happened in the past
using the company’s data. The knowledge from the descriptive analytics
lays a foundation for further analysis and leads to predictive analytics. As
mentioned, the knowledge obtained by descriptive analytics helps us to
learn what has happened in the past. This information is used to create
predictive analytics models.
The subsequent chapters discuss the predictive analytics and back-
ground information needed for predictive analytics along with the ana-
lytical tools. Specific predictive analytics models and their applications
are the topics of chapters that follow The rest of this book covers mostly
predictive analytics.
Copyright © 2016. Business Expert Press. All rights reserved.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-17 08:00:47.
Copyright © 2016. Business Expert Press. All rights reserved.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-17 08:00:47.
CHAPTER 5
Chapter Highlights
• What Is Predictive Analytics and How Is It Different from
Descriptive Analytics?
• Exploring the Relationships between the Variables—Qualitative
Tools
• An Example of Logic-Driven Model—Cause-and-Effect
Diagram
• Data-Driven Predictive Models and Their Applications—
Quantitative Models
• Prerequisites and Background for Predictive Analytics
• Summary
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-17 08:00:55.
72 BUSINESS ANALYTICS, VOLUME II
and explaining data to reveal trends and patterns and to obtain information
not apparent otherwise. The objective is to obtain useful information that can
help organizations achieve its goals. Predictive analytics is about identifying
future business trends, creating, and describing predictive models to explore
the trends and relationships. The descriptive analytics tools are useful in visu-
alizing some of the trends and relationships among the variables, predictive
analytics provides information on what types of predictive models can be used
to predict the future business outcomes.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-17 08:00:55.
Figure 5.1 Logic-driven model of predictive analytics
Copyright © 2016. Business Expert Press. All rights reserved.
73
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-17 08:00:55.
74 BUSINESS ANALYTICS, VOLUME II
Table 5.1 outlines the statistical tools, their brief description, and ap-
plication areas of predictive analytics models.
The next chapter discusses the details of the above data-driven predic-
tive models with applications.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-17 08:00:55.
Copyright © 2016. Business Expert Press. All rights reserved.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
probability of success of the outcome.
(Continued)
75
Copyright © 2016. Business Expert Press. All rights reserved.
76
Statistical Tools and Models Brief Description Application Areas
Probability distributions
Discrete and continuous probability distributions Although probabilities are the way of dealing with Computer simulation is often used to study the
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
data is usually described using a normal distribution, clinic, etc.
whereas customer arrival or calls coming to a call
center is random and can be modeled using a Pois-
son distribution.
Copyright © 2016. Business Expert Press. All rights reserved.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
alysis. It tells us that if we take a large sample, that samples? Why do we need to have a homogeneous This is an example of using a sample of voters to
is a sample size of 30 or more or (n ≥ 30), we can sample? What are different ways of taking samples? predict the population of voters who favor certain
use the normal distribution to calculate the prob- What is a sampling distribution and what is the pur- candidates.
ability and draw conclusion about the population pose of it?
77
parameter.
(Continued)
Copyright © 2016. Business Expert Press. All rights reserved.
78
Statistical Tools and Models Brief Description Application Areas
In data analysis, sample data is used to draw conclu- A population is described by its parameters The first example studied the mean, whereas the
sion about the population. (population parameters) and a sample is described poll example is about studying proportion.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
known, and therefore, we must use the sample data methods of data analysis and statistical quality con- adults, the margin of sampling error is ±5.0 percentage
to estimate them. trol. Here we explain the concept of estimation. points at the 95% confidence level.
Copyright © 2016. Business Expert Press. All rights reserved.
Estimation is the simplest form of inferential sta- Parts of the claims made here may not make any
tistics in which a sample statistic is used to draw sense and perhaps you are wondering about some
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
may be evaluated and a conclusion can be reached sistent with or supported by the sample data. The following are the claims made by some of the
about the validity of this claim. A hypothesis may manufactures of hybrid cars: Toyota Prius claims to
test a claim, a design specification, a belief, or a provide about 50 mpg in the city and 48 mpg on the
theory, and sample data are used to verify these. highway. It tops the list of fuel-efficient hybrids. The
79
estimated annual fuel cost is less than $800.
(Continued)
Copyright © 2016. Business Expert Press. All rights reserved.
80
Statistical Tools and Models Brief Description Application Areas
Ford Fusion Hybrid claims to provide 41 mpg in
the city and 36 mpg on the highway. The average
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
tells us how week or strong the relationship is be- and its value is between −1 and +1. The rxy value sales and advertisement expenditures for a company.
tween variables. tells us the degree of association between the two Similarly, we can study the relationship between
variables. It also tells us how strong or weak the the home-heating cost and the average temperature
correlation is between the two variables. using the correlation analysis. Usually, the first step
in correlation analysis starts by constructing a scat-
ter plot.
Copyright © 2016. Business Expert Press. All rights reserved.
Summary
Predictive analytics is about predicting the future business outcomes.
This phase of analytics uses a number of models that can be divided into
logic-driven models and data-driven models. We discussed both types of
models and the difference between the two. The key discussion area of
this chapter was to introduce the readers to a number of tools and statis-
tical models—the understanding of which are critical in understanding
and applying the predictive analytics models. These are background infor-
mation and we call them prerequisite to predictive analytics. The chapter
provided a brief description and application areas of prerequisite tools.
These are probability concepts, probability distributions, sampling and
sampling distributions, correlation analysis, estimation and confidence
intervals, and hypothesis testing. These topics are investigated in detail in
the Appendix that accompanies this book. The appendix is available as a
free download.
Appendix A–D
The appendix contains the topics that are the prerequisite to data-driven
predictive analytics models. The concepts discussed here are essential in
applying predictive models. The Appendix A–D discuss the following sta-
tistical tools and models of business analytics: concept of probability, role
of probability distributions in decision making, sampling and sampling
distribution, inference procedures: estimation and confidence interval,
and inference procedures for one and two-population parameters—
Copyright © 2016. Business Expert Press. All rights reserved.
hypothesis testing.
Note: The following chapters discuss predictive analytics models—regression
analysis, modeling, time series forecasting, and data mining
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-17 08:00:55.
CHAPTER 6
Chapter Highlights
• Key Predictive Analytics Models and Their Brief Description
and Applications
• Regression Models
• Forecasting Models
• Analysis of Variance (ANOVA)
• Data Mining
• Simple Regression, Multiple Regression, Nonlinear Regression
• Forecasting Models
• Summary
Copyright © 2016. Business Expert Press. All rights reserved.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-17 08:01:06.
84 BUSINESS ANALYTICS, VOLUME II
Table 6.1 outlines key predictive analytics tools, the types of questions
they try to answer, and briefly explains the applications of the tools.
The descriptions and application areas of the statistical tools in predic-
tive analytics are outlined in Table 6.2.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-17 08:01:06.
Copyright © 2016. Business Expert Press. All rights reserved.
Table 6.1 Predictive analytics, questions they attempt to answer, and their tools
Predictive Analytics Attempts to Answer Tools and Applications
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
time series models as well as data mining techniques. ical or indicator variables, and other regression models
How the forecast can be used in short-term and long- • Regression-based models that use regression analysis to forecast fu-
term business planning? ture trends. Other time series forecasting models are simple moving
average, moving average with trend, exponential smoothing, expo-
85
nential smoothing with trend, and forecasting seasonal data.
(Continued)
Copyright © 2016. Business Expert Press. All rights reserved.
86
Predictive Analytics Attempts to Answer Tools and Applications
ANOVA (analysis of ANOVA in its simplest form is a way to study multiple ANOVA and DOE techniques include single-factor ANOVA, two-
variance) means. Single-factor, two- and multiple factor ANOVA factor ANOVA, and multiple factor ANOVA. Factorial designs and
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
Copyright © 2016. Business Expert Press. All rights reserved.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
87
Figure 6.2 Scatter plot of sales versus advertising
(Continued)
Copyright © 2016. Business Expert Press. All rights reserved.
88
Statistical Tools
and Models Brief Description Application Areas
Multiple regression In regression analysis, we have one A pharmaceutical company is concerned about declining sales of one of its drugs. The drug was introduced
models dependent or response variable y and in the market approximately two-and-a half years ago. In the recent few months the sales of this product
one or more independent variables, is in constant decline and the company is concerned about losing its market share as it is one of the major
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
y = b0 + b1x1 + b2x2 + b3x3 + … + bnxn,
where b0, b1, b2, …, bn are the re-
gression coefficients and x1, x2,…,xk
are the independent variables.
Figure 6.3 A multiple regression model
Copyright © 2016. Business Expert Press. All rights reserved.
Nonlinear regression The above models—simple and A nonlinear (second-order) regression model is described here:
(quadratic and poly- multiple regression—are based The life of an electronic component is believed to be related to the temperature in the operating environ-
89
Copyright © 2016. Business Expert Press. All rights reserved.
90
Statistical Tools
and Models Brief Description Application Areas
Figure 6.5 shows a second-order model with the regression equation that can be used to predict the life of
the components using temperature.
Multiple regression In regression we often encounter Application of a regression model with a dummy variable:
using dummy or indi- qualitative or indicator variables We would like to write a model relating the mean profit of a grocery chain. It is believed that the profit to a
cator variables that need to be included as one of large extent depends on the location of the stores. Suppose that the management is interested in three spe-
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
dummy or indicator variable need-
ed is one less than the total number
of indicator variables to be included
in the model.
91
(Continued)
Copyright © 2016. Business Expert Press. All rights reserved.
92
Statistical Tools
and Models Brief Description Application Areas
All subset and step- Finding the best set of predictor
wise regression variables to be included in the
This model is appropriate when x and y have an inverse relationship. Note that the inverse relationship is not linear.
Log transformation of The logarithmic transformation is of the form
x variable y = β0 + β1 ln(x) + ε
Log transformation of This is a useful curvilinear form where ln(x) is the natural logarithm of x and x > 0 .
x and y variables ln(y) = β0 + β1 ln(x) + ε
The purpose of this transformation is to achieve a linear relationship. The model is valid for positive values of x and y. This transformation is
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
Copyright © 2016. Business Expert Press. All rights reserved.
Statistical Tools
and Models Brief Description Application Areas
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
casts. These methods are used when
historical data on the variable being
forecast are usually not available.
The method is also known as judg- Figure 6.6 Plot of demand over time
mental as they use subjective inputs.
93
(Continued)
Copyright © 2016. Business Expert Press. All rights reserved.
94
Statistical Tools
and Models Brief Description Application Areas
These forecasts may be based on
consumer surveys, opinions of sales
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
Copyright © 2016. Business Expert Press. All rights reserved.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
Techniques for trend
Linear trend equation (similar to
simple regression)
(Continued)
95
Copyright © 2016. Business Expert Press. All rights reserved.
96
Table 6.2 (Continued)
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
Copyright © 2016. Business Expert Press. All rights reserved.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
(Continued)
97
Copyright © 2016. Business Expert Press. All rights reserved.
98
Statistical Tools
and Models Brief Description Application Areas
ANOVA (analysis of A single-factor completely random- Consider an example in which the marketing manager of a franchise wants to know whether there is a dif-
variance) ized design is the simplest experi- ference in the average profit among four of their stores. He randomly selected four stores and recorded the
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
studied using specially designed
experiments.
Copyright © 2016. Business Expert Press. All rights reserved.
Data mining Data mining involves exploring Data mining is one of the major tools of predictive analytics. In business, data mining is used to analyze busi-
new patterns and relationships ness data. Business transaction data along with other customer and product related data are continuously
from the collected data—a part of stored in the databases. The data mining software are used to analyze the vast amount of customer data to
predictive analytics that involves reveal hidden patterns, trends, and other customer behavior. Businesses use data mining to perform market
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
information harvesting, business this is the growing interest in knowledge management and in moving from data to information and finally
intelligence, analytics, etc. Besides to knowledge discovery.
statistics, data mining uses artificial
intelligence, machine learning, da-
99
tabase systems, advanced statistical
tools, and pattern recognition.
(Continued)
Copyright © 2016. Business Expert Press. All rights reserved.
100
Statistical Tools
and Models Brief Description Application Areas
In this age of technology, compan-
ies collect massive amount of data
automatically using different means.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
trends, patterns, and relationships in performance. With current research and the use of newer technology, the field of machine learning and arti-
the historical data. The algorithms are ficial intelligence are becoming more promising.
designed to learn iteratively from data
without being programmed. In a way,
machine learning automates model
building.
KEY PREDICTIVE ANALYTICS MODELS 101
Summary
This chapter provided a brief description and applications of key predic-
tive analytics models. These models are the core of predictive analytics
and are used to predict future business outcomes.
Copyright © 2016. Business Expert Press. All rights reserved.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-17 08:01:06.
Copyright © 2016. Business Expert Press. All rights reserved.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-17 08:01:06.
CHAPTER 7
Regression Analysis
and Modeling
Chapter Highlights
• Introduction to Regression and Correlation
• Linear Regression
?? Regression Model
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:24:58.
104 BUSINESS ANALYTICS, VOLUME II
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:24:58.
Regression Analysis and Modeling 105
Linear Regression
Regression analysis is used to investigate the relationship between two or
more variables. Often we are interested in predicting a variableusing one
or more independent variables x1 , x 2 ,.., xk . For example, we might be
interested in the relationship between two variables: sales and profit for
a chain of stores, number of hours required to produce a certain number
of products, number of accidents vs. blood alcohol level, advertising ex-
penditures and sales, or the height of parents compared to their children.
In all these cases, regression analysis can be applied to investigate the
relationship between the two variables.
In general, we have one dependent or response variable, y and one or
more independent variables, x1 , x 2 ,..., xk . The independent variables are
also called predictors. If there is only one independent variable x that we
are trying to relate to the dependent variable y, then this is a case of simple
regression. On the other hand, if we have two or more independent vari-
ables that are related to a single response or dependent variable, then we
have a case of multiple regression. In this section, we will discuss simple
regression, or to be more specific, simple linear regression. This means
that the relationship we obtain between the dependent or response vari-
able y and the independent variable x will be linear. In this case, there is
only one predictor or independent variable (x) of interest that will be used
to predict the dependent variable (y).
In regression analysis, the dependent or response variable y is a ran-
dom variable; whereas the independent variable or variables x1 , x 2 ,.., xn
are measured with negligible error and are controlled by the analyst. The
Copyright © 2016. Business Expert Press. All rights reserved.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:24:58.
106 BUSINESS ANALYTICS, VOLUME II
y = β0 + β1 x + ε (7.1)
E ( y ) = β0 + β1 x (7.2)
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:24:58.
Figure 7.1 Possible linear relationship between E(y) and x in simple linear regression
Copyright © 2016. Business Expert Press. All rights reserved.
107
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:24:58.
108 BUSINESS ANALYTICS, VOLUME II
ŷ = b0 + b1 x (7.3)
where ŷ = point estimator of E(y) or the mean value of y for a given value
of x
b0 = y - intercept of the regression line b1 = slope of the regression line
equation (7.3) are determined using the least squares method. Before we
discuss the least squares method in detail, we will describe the process of
estimating the regression equation. Figure 7.2 explains this process.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:24:58.
Figure 7.2 Estimating the regression equation
Copyright © 2016. Business Expert Press. All rights reserved.
109
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:24:58.
110 BUSINESS ANALYTICS, VOLUME II
Figure 7.3 shows a scatter plot of the data of Table 7.1. Scatter plots
are often used to investigate the relationship between two variables. An
investigation of the plot shows a positive relationship between sales and
advertising expenditures therefore, the manager would like to predict the
sales using the advertising expenditure using a simple regression model.
330 26
400 31
458 33
410 30
628 41
553 38
728 44
498 40
708 48
719 47
658 45
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:24:58.
Regression Analysis and Modeling 111
yˆ = −150.9 + 18.33 x
Copyright © 2016. Business Expert Press. All rights reserved.
The vertical distance of each point from the line is known as the error
or residual. Note that the residual or error of a point can be positive, nega-
tive, or zero depending upon whether the point is above, below, or on the
fitted line. If the point is above the line, the error is positive, whereas if
the point is below the fitted line, the error is negative.
Figure 7.4 shows graphically the errors for a few points. To demon-
strate how the error or residual for a point is calculated, refer to the data
in Table 7.1.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:24:58.
112 BUSINESS ANALYTICS, VOLUME II
Figure 7.4 Fitting the regression line to the sales and advertising data
of table 7.1
Figure 7.4 shows this error value. This error is negative because the
point y = 498 lies below the fitted regression line.
Now, consider the advertising expenditure of x = 44 . The observed
sales for this value is 728 or y = 728 (from Table 7.1). The predicted
sales for x = 44 equals the vertical distance from y = 728 to the fitted
regression line. This value is calculated as:
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:24:58.
Regression Analysis and Modeling 113
The value is shown in Figure 7.4. The error for this point is the dif-
ference between the observed and the predicted, or the estimated value
which is
This value of the error is positive because the point y = 728 lies
above the fitted line.
The errors for the other observed values can be calculated in a similar
way. The vertical deviation of a point from the fitted regression line rep-
resents the amount of error associated with that point. The least squares
method determines the values b0 and b1 in the fitted regression line
ŷ = b0 + b1 x that will minimize the sum of the squares of the errors.
Minimizing the sum of the squares of the errors provides a unique line
through the data points such that the distance of each point from the fit-
ted line is a minimum.
Since the least squares criteria require that the sum of the squares of
the errors be minimized, we have the following relationship:
∑ ( y − yˆ )2 = ∑ ( y − b0 − b1x )2 (7.4)
where y is the observed value and ŷ is the estimated value of the depend-
Copyright © 2016. Business Expert Press. All rights reserved.
∑ y = nb0 + b1 ∑ x (7.5)
∑ xy = b0 ∑ x + b1 ∑ x 2
These equations are known as the normal equations and can be solved
algebraically to obtain the unknown values of the slope and y-intercept b0
and b1. Solving these equations yields the results shown below.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:24:58.
114 BUSINESS ANALYTICS, VOLUME II
b1 =
n∑ xy − (∑ x )(∑ y ) (7.6)
n∑ x − ( ∑ x )
2
2
and b0 = y − b1 x (7.7)
y =
∑y and x = ∑x
where, n n
The values b0 and b1 when calculated using equations (7.6) and (7.7)
minimize the sum of the squares of the vertical deviations or errors. These
values can be calculated easily using the data points ( xi , yi ) which are
the observed values of the independent and dependent variables (the col-
lected data in Table 7.1).
(7.7) above. These values will provide the line of the form y = b0 + b1 x
that can be used to predict the sales (y) using the advertising expendi-
tures (x).
In order to evaluate b0 and b1, we need to perform some inter-
mediate calculations shown in Table 7.2. We must first calculate
∑ x , ∑ y, ∑ xy, ∑ x 2 , x , and y . These values can be calculated using
the data points x and y. For later calculations, we will also need the value
of ∑ y 2 therefore, an extra column for y2, or the squares of the depend-
ent variable (y) is added in this table.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:24:58.
Regression Analysis and Modeling 115
x =
∑x =
546
= 36.4 y =
∑y =
7, 742
= 516.133
n 15 n 15
Using the values in Table 7.2, and equations (7.6) and (7.7) we first
Copyright © 2016. Business Expert Press. All rights reserved.
b1 =
n∑ xy − (∑ x )(∑ y ) = 15(295, 509) − (546)(7, 742) = 18.326
n∑ x − ( ∑ x )
2
2 15(20, 622) − (546) 2
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:24:58.
116 BUSINESS ANALYTICS, VOLUME II
This gives us the following equation for the estimated regression line:
yˆ = −150.9 + 18.33 x
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:24:58.
Regression Analysis and Modeling 117
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:24:58.
118 BUSINESS ANALYTICS, VOLUME II
and interpret the results. You will find that all the formulas are written in
terms of the values calculated in Table 7.4.
The above plot clearly shows an increasing trend. It shows a linear re-
lationship between x and y; therefore, the data can be approximated using
a straight line with a positive slope.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:24:58.
Regression Analysis and Modeling 119
ŷ = b0 + b1 x
b1 =
n∑ xy −(∑ x )(∑ y ) = 30(357, 055) − (24,132)((431.23) = 0.00964
n∑ x − ( ∑ x )
2
2 30(20, 467, 220) − (24,132) 2
and
yˆ = b0 + b1 x = 6.62 + 0.00964 x
The regression equation or the equation of the “best” fitting line can
also be written as:
Copyright © 2016. Business Expert Press. All rights reserved.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:24:58.
120 BUSINESS ANALYTICS, VOLUME II
The error is also known as the residual. Figure 7.7 shows the least squares
line and the residuals for each of the points as the vertical distance from
the point to the estimated regression line.
[Note: The estimated line is denoted by ŷ and the residual for a point
yi is given by ( yi − yˆ )]
Recall that the error or the residual for a point is given by ( y − yˆ )
which is the vertical distance of a point from the estimated line. Figure 7.8
shows the fitted regression line over the scatter plot.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:24:58.
Regression Analysis and Modeling 121
yˆ = 6.62 + 0.00964 x
In this equation of the fitted line, 6.62 is the y-intercept and 0.00964
is the slope. This line provides the relationship between the hours and
the number of units produced. The equation means that for each unit
increase in(the number of units produced), (the number of hours) will
increase by 0.00964. The value 6.62 represents the portion of the hours
that is not affected by the number of units.
of the product. Note that making a prediction outside of the range will
introduce error in the predicted value. For example, if we want to predict
the time for producing 2,000 units; this prediction will be outside of the
data range (see the data in Table 7.3, the range of x values is from 445 to
1,125). The value x = 2, 000 is far greater than all the other x values in
the data. From the scatter plot, a straight line fit with an increasing trend
is evident for the data but we should be cautious about assuming that this
straight line trend will continue to hold for values as large as x = 2, 000 .
Therefore, it may not be reasonable to make this prediction for values that
are far beyond the range of the data values.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:24:58.
122 BUSINESS ANALYTICS, VOLUME II
s =
∑ ( y − yˆ )2 (7.7A)
n−2
The equation can also be written and evaluated using the values of b0,
b1 and the values in Table 7.4, the standard error of the estimate can be
calculated as:
s =
∑ y 2 − b0 ∑ y − b1 ∑ xy =
6, 302.3 − 6.62(431.23) − 0.00964(357, 055)
= 0.4
n−2 28
s = 0.4481
A small value of s indicates less scatter of the data points around the fit-
ted line of regression (see Figure 7.8). The value s = 0.4481 indicates that the
average deviation is 0.4481 hours (measured in units of dependent variable y).
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:24:58.
Regression Analysis and Modeling 123
used to judge the adequacy of the regression model. The value of r2 lies
between 0 and 1 (0 ≤ r2 ≤ 1) or 0 to 100 percent. The closer the value of r2
to 1 or 100 percent, the better is the model because the r2 value indicates
the amount of variation in the data explained by the regression model.
Figure 7.9 shows the relationship between the explained, unexplained,
and the total variation.
In regression, the total sum of squares is partitioned into two com-
ponents; the regression sum of squares and the error sum of squares giving
the following relationship:
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:24:58.
124 BUSINESS ANALYTICS, VOLUME II
(∑ y )
2
∑( y − y ) ∑y
2
SST = = 2
− (7.9)
n
and
SSE = ∑ ( y − yˆ )2 = ∑ y 2 − b0 ∑ y − b1 ∑ xy (7.10)
Note that we can calculate SSR by calculating SST and SSE since,
SSR
r2 = (7.11)
SST
(∑ y )2 (431.23)2
∑( y − y ) ∑ y2 −
2
SST = = = 6302.3 − = 103.68
80
n 30
Since
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:24:58.
Regression Analysis and Modeling 125
Therefore,
and
SSR 98.057
r2 = = = 0.946
SST 103.680
or, r2 = 94.6%
This means that 94.6 percent variation in the dependent variable, y is
explained by the variation in x and 5.4 percent of the variation is due to
unexplained reasons or error.
r = r2 (7.13)
Therefore,
r = r2 = 0.946 = 0.973
Copyright © 2016. Business Expert Press. All rights reserved.
−1 ≤ r ≤ 1 (7.14)
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:24:58.
126 BUSINESS ANALYTICS, VOLUME II
(∑ x )(∑ y )
∑ xy − n
r = (7.15)
(∑ x ) (∑ y )
2 2
∑ x2 − n
× ∑ y2 − n
Using the values in Table 7.4, we can calculate r from equation (7.15).
the equation of the best fitting line, (c) interpreting the fitted regression
line, and (d) making predictions using the fitted regression equation.
Other important measures critical to assessing the quality of the regres-
sion model were calculated and explained. These measures include: (a) the
standard error of the estimate (s) that measures the variation or scatter of
the points around the fitted line of regression, (b) the coefficient of deter-
mination (r2) that measures how well the independent variable predicts
the dependent variable or the percent of variation in the dependent vari-
able y explained by the variation in the independent variable, x, (c) the
coefficient of correlation (r) that measures the strength of relationship
between x and y.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:24:58.
Regression Analysis and Modeling 127
The instructions in Table 7.5 will produce the regression output shown in
Table 7.6. If you checked the boxes under Residuals and the Line Fit Plots,
the residuals and fitted line plot will be displayed.
5. Select Hours(y) for Input y range and Units(x) for Input x range (including the
labels)
6. Check the Labels box
7. Click on the circle to the left of Output Range, click on the box next to output
range and specify where you want to store the output by clicking a blank cell (or
select New Worksheet Ply)
8. Check the Line Fit Plot under residuals. Click OK
You may check the boxes under residuals and normal probability plot as desired.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:24:58.
Table 7.6 EXCEL regression output
Copyright © 2016. Business Expert Press. All rights reserved.
128
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:24:58.
Regression Analysis and Modeling 129
yˆ = 6.62 + 0.00964 x
SSR
r2 =
SST
The values of SSR, SSE, and SST can be obtained using the ANOVA
table of regression output above which is part of the regression analysis
output of EXCEL. Table 7.7 shows the EXCEL regression output with
SSR and SST values. Using these values, the coefficient of determination,
r 2 = SSR / SST = 0.9458 . This value is reported under regression sta-
tistics in Table 7.7.
The t-test and F-test for the significance of regression can be easily
Copyright © 2016. Business Expert Press. All rights reserved.
(1) Conducting the t-Test Using the Regression Output in Table 7.8.
t n − 2 = b1 sb1
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:24:58.
Table 7.7 EXCEL regression output
Copyright © 2016. Business Expert Press. All rights reserved.
130
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:24:58.
Table 7.8 EXCEL regression output
Copyright © 2016. Business Expert Press. All rights reserved.
131
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:24:58.
132 BUSINESS ANALYTICS, VOLUME II
The values of b1, sb1 and the test-statistic value t n − 2 are labeled in
Table 7.8 below.
Using the test-statistic value, the hypothesis test for the significance
of regression can be conducted. This test is explained here using the com-
puter results. The appropriate hypotheses for the test are:
H 0 : β1 = 0
H1 : β1 ≠ 0
The null hypothesis states that the slope of the regression line is zero.
Thus, if the regression is significant, the null hypothesis must be rejected.
A convenient way of testing the above hypotheses is to use the p-value
approach. The test statistic value t n − 2 and the corresponding p values are
reported in the regression output Table 7.8. Note that the p value is very
close to zero (p = 2.92278E-19). If we test the hypothesis at a 5 percent
level of significance (α = 0.05) then p = 0.000 is less than α = 0.05 and
we reject the null hypothesis and conclude that the regression is signifi-
cant overall.
[Note: Readers can download a free 30 days trial copy of the MINITAB
version 17 or 18 software from www.minitab.com]
The scatter plot shown in Figure 7.10 shows an increasing or direct
relationship between the number of units produced (x) and the number
of hours (y). Therefore, the data may be approximated by a straight line of
the form y = b0 + b1 x where, b0 is the y-intercept and b1 is the slope. The
fitted line plot with the regression equation from MINITAB is shown in
Figure 7.11. Also, the “Regression Analysis” and “Analysis of Variance” ta-
bles shown in Table 7.9 will be displayed. We will first analyze the regres-
sion and the analysis of variance tables and then provide further analysis.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:24:58.
Regression Analysis and Modeling 133
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:24:58.
134 BUSINESS ANALYTICS, VOLUME II
Refer to the Regression Analysis part. In this table, the regression equation
is printed as Hours(y) = 6.62 + 0.00964 Units(x). This is the equation of
the best fitting line using the least squares method. Just below the regression
equation, a table is printed that describes the model in more detail. The val-
ues under the Coef column means coefficients. The values in this column
refer to the regression coefficients b0 and b1 where b0 is the y-intercept or
constant and b1 is the slope of the regression line. Under the Predictor, the
value of Units (x) is 0.0096388 which is b1 (or the slope of the fitted line).
The Constant is 6.6209. These values form the regression equation.
1. The regression equation or the equation of the “best” fitting line is:
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:24:58.
Regression Analysis and Modeling 135
from each of the points to the line is minimum. The error or the
residual is the vertical distance of each point from the estimated line.
Figure 7.12 shows the least squares line and the residuals. The re-
sidual for a point is given by ( y − y ) which is the vertical distance
of a point from the estimated line.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:24:58.
136 BUSINESS ANALYTICS, VOLUME II
R-Sq = 94.6%
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:24:58.
Regression Analysis and Modeling 137
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:24:58.
138 BUSINESS ANALYTICS, VOLUME II
vs. order of data. The residuals can also be plotted with each of the in-
dependent variables.
Figures 7.13a and 7.13b are used to check the normality assumption.
The regression model assumes that the errors are normally distributed
with mean zero. Figure 7.13a shows the normal probability plot. This plot
is used to check for the normality assumption of regression model. In this
plot, if the plotted points lie on a straight line or close to a straight line
then the residuals or errors are normally distributed. The pattern of points
appear to fall on a straight line indicating no violation of the normality
assumption.
Figure 7.13b shows the histogram of residuals. If the normality as-
sumption holds, the histogram of residuals should look symmetrical or
approximately symmetrical. Also, the histogram should be centered at
zero because the sum of the residuals is always zero. The histogram of
residuals is approximately symmetrical which indicates that the errors ap-
pear to be approximately normally distributed. Note that the histogram
may not be exactly symmetrical. We would like to see a pattern that is
symmetrical or approximately symmetrical.
In Figures 7.13c, the residuals are plotted against the fitted value and
the order of the data points. These plots are used to check the assump-
tions of linearity. The points in this plots should be scattered randomly
around the horizontal line drawn through the zero residual value for the
linear model to be valid. As can be seen, the residuals are randomly scat-
tered about the horizontal line indicating that the relationship between x
and y is linear.
Copyright © 2016. Business Expert Press. All rights reserved.
The plot of residual vs. the order of the data shown in Figure 7.13d is
used to check the independence of errors.
The independence of errors can be checked by plotting the errors or
the residuals in the order or sequence in which the data were collected.
The plot of residuals vs. the order of data should show no pattern or ap-
parent relationship between the consecutive residuals. This plot shows
no apparent pattern indicating that the assumption of independence of
errors is not violated.
Note that checking the independence of errors is more important in
the case where the data were collected over time. Data collected over time
sometimes may show an autocorrelation effect among successive data
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:24:58.
Figure 7.13 Plots for residual analysis
Copyright © 2016. Business Expert Press. All rights reserved.
139
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:24:58.
140 BUSINESS ANALYTICS, VOLUME II
The mathematical form of multiple linear regression model relating the de-
pendent variable y and two or more independent variables x1 , x 2 ,… xk
with the associated error term is given by:
y = β0 + β1 x1 + β 2 x 2 + β3 x3 +…. + βk xk + ε (7.16)
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:24:58.
Regression Analysis and Modeling 141
E = ( y ) = β0 + β1 x1 + β 2 x 2 + β3 x3 +…. + βk xk (7.17)
The above equation relating the mean value of y and the k independ-
ent variables is known as the multiple regression equation.
It is important to note that β0 , β1 , β 2 ,.. βk are the unknown popula-
tion parameters, or regression coefficients and they must be estimated
using the sample data to obtain the estimated equation of multiple regres-
sion. The estimated regression coefficients are denoted by b0 , b1 , b2 ,.. bk .
These are the point estimates of the parameters β0 , β1 , β 2 ,.. βk . The esti-
mated multiple regression equation using the estimates of the unknown
population regression coefficients can be written as:
( yˆ ) = b0 + b1x1 + b2 x2 + b3 x3 +…. + bk xk
(7.18)
Copyright © 2016. Business Expert Press. All rights reserved.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:24:58.
142 BUSINESS ANALYTICS, VOLUME II
y = β0 + β1 x1 + β 2 x 2 + ε (7.19)
the least squares method requires fitting a line through the data points so that the
sums of the squares of errors or residuals are minimized. These errors or residuals
are the vertical distances of the points from the fitted line. The same concept
of simple regression is used to develop the multiple regression equation.
In a multiple regression, the least squares method determines the best
fitting plane or the hyperplane through the data points that ensures that
the sum of the squares of the vertical distances or deviations from the
given points and the plane are a minimum.
Figure 7.14 shows a multiple regression model with two independent
variables. The response y with two independent variables x1 and x2 forms
a regression plane. The observed data points in the figure are shown using
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:24:58.
Regression Analysis and Modeling 143
Figure 7.14 Scatter plot and regression plane with two independent
variables
dots. The stars on the regression plane indicate the corresponding points
that have identical values for x1 and x2. The vertical distance from the ob-
served points to the point on plane are shown using vertical lines. These
vertical lines are the errors. The error for a particular point yi is denoted by
( yi − yˆ ) where the estimated value ŷ is calculated using the regression
equation: ŷ = b0 + b1 x1 + b2 x 2 for a given value of x1 and x2.
The least squares criteria requires that the sum of the squares of the
errors be minimized, or,
∑ ( y − yˆ )2
Copyright © 2016. Business Expert Press. All rights reserved.
where y is the observed value and ŷ is the estimated value of the depend-
ent variable given by ŷ = b0 + b1 x1 + b2 x 2
[Note: The terms independent, or explanatory variables, and the predictors have the
same meaning and are used interchangeably in this chapter. The dependent variable
is often referred to as the response variable in multiple regression.]
Similar to the simple regression, the least squares method uses the
sample data to estimate the regression coefficients b0 , b1 , b2 ,.. bk and
hence the estimated equation of multiple regression. Figure 7.15 shows
the process of estimating the regression coefficients and the multiple re-
gression equation.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:24:58.
Figure 7.15 Process of estimating the multiple regression equation
Copyright © 2016. Business Expert Press. All rights reserved.
144
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:24:58.
Regression Analysis and Modeling 145
y = b0 + b1 x1 + b2 x 2 (7.20)
The graph of the first order model is shown in Figure 7.16. This graph
with two independent quantitative variables x1 and x2 plots a plane in a
three-dimensional space. The plane plots the value of y for every combin-
ation ( x1 , x 2 ). This corresponds to the points in the ( x1 , x 2 ) plane.
The first-order model with two quantitative variables x1 and x2 is
based on the assumption that there is no interaction between x1 and x2.
This means that the effect on the response of y of a change in x1(for a
fixed value of x2) is same regardless of the value of x2 and the effect on
y of a change in x2 (for a fixed value of x1) is same rardless of the value
of x1.
In case of simple regression analysis in the previous chapter, we pre-
sented both the manual calculations and the computer analysis of the
problem. Most of the concepts we discussed for simple regression also
Copyright © 2016. Business Expert Press. All rights reserved.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:24:58.
146 BUSINESS ANALYTICS, VOLUME II
is not related to the error for any other set of values of independent
variables. This assumption is critical when the data are collected over
different time periods. When the data are collected over time, the er-
rors in one-time period may be correlated with another time period.
2. The normality assumption. This means that the errors or residuals
(εi) calculated using ( yi − yˆ ) are normally distributed. The nor-
mality assumption in regression is fairly robust against departures
from normality. Unless the distribution of errors is extremely dif-
ferent from normal, the inferences about the regression parameters
β0 , β1 , β 2 ,.. βk are not affected seriously.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:24:58.
Regression Analysis and Modeling 147
E ( y ) = β0 + β1 x1 + β 2 x 2 + β3 x3 +…. + βk xk
lated to the average outside temperature, size of the house, and the
age of the heating furnace. A multiple regression model is to be fitted
to investigate the relationship between the heating cost and the three
predictors or independent variables. The data in Table 7.10 shows
the home heating cost (y), average temperature (x1), house size (x2)
in thousands of square feet, and the age of the furnace (x3) in years.
The home heating cost is the response variable and the other three
variables are predictors. (The data for this problem: HEAT_COST.
MTW, EXCEL data file: HEAT_COST.xlsx) is listed in Table 7.10
below.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:24:58.
148 BUSINESS ANALYTICS, VOLUME II
26 36 3.6 6 215
27 9 4.3 8 380
28 10 4.0 11 300
29 21 3.0 9 240
30 51 2.5 7 130
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:24:58.
Regression Analysis and Modeling 149
y = b0 + b1 x1 + b2 x 2 + b3 x3 + b4 x 4 + b5 x5 (7.21)
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:24:58.
Figure 7.17 Matrix plot of each y vs. each x
Copyright © 2016. Business Expert Press. All rights reserved.
150
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:24:58.
Regression Analysis and Modeling 151
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:24:58.
Copyright © 2016. Business Expert Press. All rights reserved.
152
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:24:58.
Regression Analysis and Modeling 153
and the house size (x2). The scatterplots showing the relationship between
the pairs of independent variables are obtained from columns 2 and 3 of
the matrix plot. The matrix plot is helpful in visualizing the interaction
relationships. For fitting the first order model, a plot of y versus each x is
adequate.
The matrix plots in Figures 7.17 and 7.18 show a negative association
or relationship between the heating cost (y) and the average temperature
(x1) and a positive association or relationship between the heating cost (y)
and the other two explanatory variables: house size (x2) and the age of the
furnace (x3). All these relationships are linear indicating that all the three
explanatory variables can be used to build a multiple regression model.
Constructing the matrix plot and investigating the relationships between
the variables can be very helpful in building a correct regression model.
y = b0 + b1 x1 + b2 x 2 + b3 x3
where,
(in years)
Table 7.10 and data file HEAT_COST.MTW shows the data for
this problem. We used MINITAB to run the regression model for this
problem.
Table 7.11 shows the results of running the multiple regression prob-
lem using MINITAB. In this table, we have marked some of the calcula-
tions (e.g., b0, b1, sbo, sb1, etc.) for clarity and explanation. These are not
the part of the computer output. The regression computer output has two
parts: Regression Analysis and Analysis of Variance.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:24:58.
154 BUSINESS ANALYTICS, VOLUME II
y = b0 + b1 x1 + b2 x 2 + b3 x3
or
Copyright © 2016. Business Expert Press. All rights reserved.
where, y is the response variable (Heating Cost), x1, x2, x3 are the in-
dependent variables as described above, the regression coefficients
b0 , b1 , b2 , b3 are stored under the column Coef. In the regression equation
these coefficients appear in rounded form.
The regression equation which can be stated in the form of equation
(7.22) or (7.23) is the estimated regression equation relating the heating
cost to all the three independent variables.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:24:58.
Regression Analysis and Modeling 155
• b1 = −1.65 means that for each unit increase in the average tem-
perature (x1), the heating cost y (in dollars) can be predicted to go
down by 1.65 (or, $1.65) when the house size (x2), and the age of
the furnace (x3) are held constant.
• b2 = +57.5 means that for each unit increase in the house size (x2
in thousands of square feet), the heating cost, y (in dollars) can be
predicted to go up by 57.5 when the average temperature (x1) and
the age of the furnace (x3) are held constant.
• b3 = + 7.91 means that for each unit increase in the age of the furnace
(x3 in years), the heating cost y can be predicted to go up by $7.91 when
the average temperature (x1) and the house size (x2) are held constant.
s = 37.32 dollars
The standard error of the estimate is used to check the utility of the
model and to provide a measure of reliability of the prediction made from
the model. One interpretation of s is that the interval ±2s will provide an ap-
proximation to the accuracy with which the regression model will predict the
future value of the response y for given values of. Thus, for our example, we
can expect the model to provide predictions of heating cost (y) to be within
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:24:58.
156 BUSINESS ANALYTICS, VOLUME II
can be read from the “Analysis of Variance” part of Table 7.11. From this
table, The value of r2 is calculated and reported in the “Regression Analy-
sis” part of Table 7.11. For our example the coefficient of multiple deter-
mination; r2 (reported as R-sq) is
r2 = 88.0%
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:24:58.
Regression Analysis and Modeling 157
The value of r2 = 88.0% for our example implies that using the three
independent variables; average temperature, size of the house, and the age
of the furnace in the model, 88.0 percent of the total variation in heating
cost (y) can be explained. The statistic r2 tells how well the model fits the
data, and thus, provides the overall predictive usefulness of the model.
The value of adjusted R2 is also used in comparing two regression
models that have the same response variable but different number of in-
dependent variables or predictors.
Recall that in simple regression analysis, we conducted the test for the sig-
nificance using a t-test and F-test. Both of these tests in simple regression
provided the same conclusion. If the null hypothesis is rejected in these
tests, it will lead to the conclusion that the slope was not zero, or β1 = 0.
In a multiple regression, the t-test and the F-test have somewhat different
interpretation. These tests have the following objectives:
The F-test in a multiple regression is used to test the overall signifi-
cance of the regression. This test is conducted to determine whether a
significant relationship exists between the response variable y and the set
of independent variables, or predictors x1, x2, …,xn.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:24:58.
158 BUSINESS ANALYTICS, VOLUME II
F-Test
The null and alternate hypotheses for the multiple regression model
y = b0 + b1 x1 + b2 x 2 + .. + bk xk are stated as
MSR
F =
MSE (7.25)
the larger the explained variation of the total variability, the larger is the
F-statistic. The values of MSR, MSE, and the F statistic are calculated
in the “Analysis of Variance” table of the multiple regression computer
output (see Table 7.12 below).
The critical value for the test is given by Fk ,n − ( k +1),α where, k is the
number of independent variables, n is the number of observations in
the model, and α is the level of significance. Note that k and (n-k-1) are
the degrees of freedom associated with MSR and MSE respectively. The
null hypothesis is rejected if F > Fk ,n − ( k +1),α where F is the calculated F
value or the test statistic value in the Analysis of Variance table.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:24:58.
Regression Analysis and Modeling 159
For the overall significance of regression, the null and alternate hypoth-
eses are:
MSR
F = (7.27)
MSE
Copyright © 2016. Business Expert Press. All rights reserved.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:24:58.
160 BUSINESS ANALYTICS, VOLUME II
The degrees of freedom (DF) for Regression and Error are k and n −
(k + 1) respectively where, k is the number of independent variables (k = 3
for our example) and n is the number of observations (n = 30). Also, the
total sum of squares (SST) is partitioned into two parts: sum of squares
due to regression (SSR) and the sum of squares due to error (SSE) having
the following relationship.
We have labeled SST, SSR, and SSE values in Table 7.12. The mean
square due to regression (MSR) and the mean squares due to error (MSE)
are calculated using the following relationships:
The test statistic value or the F statistic from the ANOVA table (see
Table 7.12) is
F = 63.62
The calculated F statistic value is 63.62. Since F = 63.62 > Fcritical = 2.74,
we reject the null hypothesis stated in equation (7.26) and conclude that
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:24:58.
Regression Analysis and Modeling 161
the regression is significant overall. This indicates that there exists a sig-
nificant relationship between the dependent and independent variables.
The hypothesis stated using equation (7.26) can also be tested using the
p-value approach. The decision rule using the p-value approach is given by
If p ≥ α, do not reject H0
If P < α, reject H0
From Table 7.12, the calculated p value is 0.000 (see the P column). Since
p = 0.000 < α = 0.05, we reject the null hypothesis H0 and conclude that
the regression is significant overall.
t-tests
H0:β j = 0
H1:β j ≠ 0 (7.28)
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:24:58.
162 BUSINESS ANALYTICS, VOLUME II
This hypothesis test also helps to determine if the model can be made
more effective by deleting certain independent variables, or by adding
extra variables. The information to conduct the hypothesis test for each of
the independent variables is contained in the “Regression Analysis” part
of the computer output which is reproduced in Table 7.13 below. The
columns labeled T and p are used to test the hypotheses. Since there are
three independent variables, we will test to determine whether each of the
three variables is a significant variable; that is, if each of the independent
Copyright © 2016. Business Expert Press. All rights reserved.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:24:58.
Regression Analysis and Modeling 163
b1 (7.30)
t =
sb1
where, b1 is the estimate of slope β1 and sb1 is the estimated standard devi-
ation of b1.
Step 3: Determine the value of the test statistic
The values b1, sb1 and t are all reported in the Regression Analysis part of
Table 7.13. From this table, these values for the variable x1 or the average
temperature (Avg. Temp.) are
b1 −1.6457
t = = = −2.36
sb 1 0.6967
tα / 2,[ n − ( k +1)]
which is the t-value from the t-table for [n − (k + 1)] degrees of freedom
and α /2, where n is the number of observations (n = 30), k is the number
of independent variables (k = 3) and α is the level of significance (0.05 in
this case). Thus,
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:24:58.
164 BUSINESS ANALYTICS, VOLUME II
Step 5: Specify the decision rule: The decision rule for the test:
If p ≥ α, do not reject H0
If P < α, reject H0 (7.31)
From Table 7.14, the p-value for the variable average temperature (Avg.
Temp., x1) is 0.026. Since, p = 0.026 < α = 0.05, we reject H0 and con-
clude that the variable average temperature (x1) is a significant variable.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:24:58.
Regression Analysis and Modeling 165
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:24:58.
166 BUSINESS ANALYTICS, VOLUME II
the null hypothesis will be rejected incorrectly at least once leading to the
conclusion that β differs from 0. Thus, in the multiple regression models
where a large number of independent variables are involved and a series
of t- tests are conducted, there is a chance of including a large number of
insignificant variables and excluding some useful ones from the model. In
order to assess the utility of the multiple regression models, we need to
conduct a test that will include all the β parameters simultaneously. Such
a test would test the overall significance of the multiple regression model.
The other useful measure of the utility of the model would be to find
some statistical quantity such as R2 that measures how well the model fits
the data.
A Note on Checking the Utility of a Multiple Regression Model
(Checking the Model Adequacy)
H 0 : β1 = β 2 = … = β k = 0 (No relationship)
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:24:58.
Regression Analysis and Modeling 167
Effects of Multicollinearity
A) Consider a regression model where the production cost (y) is related
to three independent variables: machine hours (x1), material cost (x2),
and labor hours (x3):
y = β0 + β1 x1 + β 2 x 2 + β3 x3
the result of the F-test indicates that at least one of the three variables
is significant, or is making a contribution to the prediction of re-
sponse y. It is also possible that at least two or all the three variables
are contributing to the prediction of y. Here, the contribution of one
variable is overlapping with that of the other variable or variables.
This is because of the multicollinearity effect.
B) Multicollinearity may also have an effect on the signs of the parameter
estimates. For example, refer to the regression equation in Table 7.15.
In this model, the production cost (y) is related to the three explana-
tory variables: machine hours (x1), material cost (x2), and labor
hours (x3). If we check the effect of the variable machine hours (x1),
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:24:58.
168 BUSINESS ANALYTICS, VOLUME II
Detecting Multicollinearity
Several methods are used to detect the presence of multicollinearity in
regression. We will discuss two of them.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:24:58.
Regression Analysis and Modeling 169
(VIF) for each predictor variable that measures how much the vari-
ance of the estimated regression coefficients are inflated as compared
to when the predictor variables are not linearly related. Use the
guidelines in Table 7.16 to interpret the VIF.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:24:58.
170 BUSINESS ANALYTICS, VOLUME II
plots were presented. These plots are helpful in the initial stages of model
building. Using the computer results, the following key features of mul-
tiple regression model were explained; (a) the multiple regression equation
and its interpretation, (b) the standard error of the estimate—a measure
used to check the utility of the model and to provide a measure of reliabil-
ity of the prediction made from the model, (c) the coefficient of multiple
determination r2 that explains the variability in the response y explained by
the independent variables used in the model. Besides these, we discussed
the hypothesis tests using the computer results. Step-wise instructions
were provided to conduct the F-test and t-tests. The overall significance
of the regression model is tested using the F-test. The t- test is conducted
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:24:58.
Regression Analysis and Modeling 171
y = b0 + b1 x + b2 x2 + b3 x3+….+bn xn (7.32)
In the above equation, n is an integer and b0, b1,...,bn are unknown par-
ameters that must be estimated.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:24:58.
172 BUSINESS ANALYTICS, VOLUME II
A) First-order Model
The first order model is given by:
y = b0 + b1 x
or y = b0 + b1 x1 + b2 x2 + b3 x3+….+bn xn (7.33)
y = b0 + b1 x + b2 x2 (7.34)
C) Third-order Model
A third order model can be written as:
y = b0 + b1 x + b2 x2 + b3 x3 (7.35)
b0: y-intercept and b3: controls the rate of reversal of the curvature of
curve.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:24:58.
Regression Analysis and Modeling 173
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:24:58.
174 BUSINESS ANALYTICS, VOLUME II
X(Temp.) 105 90 94 79 91
Figure 7.21 Scatter Plot of Life (y) vs. Operating Temp. (x)
Copyright © 2016. Business Expert Press. All rights reserved.
A second order model was fitted using MINITAB. The regression output
of the model is shown in Table 7.20.
A quadratic model in MINITAB can also be run using the fitted line
plot option. The results of the quadratic model using this option provide
a fitted line plot (shown in Figure 7.22).
While running the quadratic model, the data values and residuals can
be stored and the plots of residuals be created.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:24:58.
Regression Analysis and Modeling 175
Figure 7.23 shows the residual plots for this quadratic model. The residual
plots are useful in checking the assumptions of the model and the model
adequacy.
The analysis of residual plots for this model is similar to that of simple
and multiple regression models. The investigation of the plots shows that
the normality assumption is met. The plot of residuals versus the fitted
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:24:58.
Figure 7.23 Residual plots for the quadratic model example
Copyright © 2016. Business Expert Press. All rights reserved.
176
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:24:58.
Regression Analysis and Modeling 177
values shows a random pattern indicating that the quadratic model fitted
to the data is adequate.
y = b0 + b1x + b2 x2
In the EXCEL output, the prediction equation can be read from the
“coefficients” column.
The r2 value is 95.9 percent which is an indication of a strong model.
Copyright © 2016. Business Expert Press. All rights reserved.
H0:β2 = 0
H0:β2 ≠ 0 (7.36)
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:24:58.
Copyright © 2016. Business Expert Press. All rights reserved.
178
Table 7.21 EXCEL computer output for the quadratic model
Summary Output
Regression Statistics
ANOVA
df SS MS F Significance F
Regression 2 15,011.7720 7,505.8860 259.6872 0.0000
Residual 22 635.8784 28.9036
Total 24 15,647.6504
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
Regression Analysis and Modeling 179
b2
t =
sb2
b2
t = = 7.93
sb2
We reject the null hypothesis and conclude that the second order term in
Copyright © 2016. Business Expert Press. All rights reserved.
H0:β = 0
H0:β > 0
which will determine that the value of b2 = 0.0598 in the prediction equa-
tion is large enough to conclude that the life of the components increases
at an increasing rate with temperature. This hypothesis will have the same
test statistic and can be tested at α = 0.05.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:24:58.
180 BUSINESS ANALYTICS, VOLUME II
Figure 7.24 Fitted line plot showing the yield of a chemical process
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:24:58.
Regression Analysis and Modeling 181
1
x1 =
0
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:24:58.
182 BUSINESS ANALYTICS, VOLUME II
y = b0 +b1 x
1 if male
x1 =
0 if female
This coding scheme will allow us to compare the mean salary for male
and female employees by substituting the appropriate code in the regres-
sion equation: y = b0 + b1 x.
Thus, the mean salary for the female employees is b0. In a 0-1 coding
system, the mean response will always be b0 for the qualitative variable
that is assigned the value 0.This is also called the base level.
The difference in the mean salary for the male and female employees
can be calculated by taking the difference (µM − µF)
The above is the difference between the mean response for the level
that is assigned the value 1 and the level that is assigned the value 0 or the
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:24:58.
Regression Analysis and Modeling 183
base level. The mean salary for the male and female employees is shown
graphically in Figure 7.25. We can also see that
b0 = µ F
b1 = µ M − µ F
where the stores are located. We will call these locations A, B, and C. In this
case, the store location is a single qualitative variable which is at three levels
corresponding to the three locations A, B, and C. The prediction equation
relating the mean profit (y) and the three locations can be written as:
y = b0 + b1 x1 + b2 x2 where,
1 if location B
x1 =
0 if not
1 if location C
x2 =
0 if not
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:24:58.
184 BUSINESS ANALYTICS, VOLUME II
The variables x1 and x2 are known as the dummy variables that make
the model function.
µA = y = b0 + b1(0) + b2 (0)
or, µA = b0
µB = y = b0 + b1 x1 + b2 x2 = b0 + b1(1) + b2(0)
or, µB = b0 + b1
µB = µA + b1
or b1 = µB − µA
Copyright © 2016. Business Expert Press. All rights reserved.
µC = y = b0 + b1 x1 + b2 x2 = b0 + b1(0) + b2(1)
or, µC = b0 + b2
µC = µA + b2
b2 = µC − µA
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:24:58.
Regression Analysis and Modeling 185
µA = b0 and b1 = µB − µA
µB = b0 + b1
µC = b0 + b2 b2 = µC − µA
where µA, µB, µC are the mean profits for locations A, B, and C.
Note that the three levels of the qualitative variable can be described with only
two dummy variables. This is because the mean of the base level (in this case
location A) is accounted for by the intercept b0. In general form, for m levels
of qualitative variable, we need (m − 1) dummy variables.
The bar graph in Figure 7.26 shows the values of mean profit (y) for
the three locations.
Copyright © 2016. Business Expert Press. All rights reserved.
Figure 7.26 Bar chart showing the mean profit for three locations A,
B, C
In the above bar chart, the height of the bar corresponding to location
A is y = b0. Similarly, the heights of the bars corresponding to locations
B and C are y = b0 + b1 and y = b0 + b2 respectively. Note that either b1 or
b2, or both could be negative. In Figure 7.26, b1 and b2 are both positive.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:24:58.
186 BUSINESS ANALYTICS, VOLUME II
1 if zone A 1 if zone B
x4 x5
0 otherwise 0 otherwise
y = b0 + b1 x1 + b2 x 2 + b3 x3 + b4 x 4 + b5 x5
Copyright © 2016. Business Expert Press. All rights reserved.
1 if zone A 1 if zone B
x4 x5
0 otherwise 0 otherwise
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:24:58.
Regression Analysis and Modeling 187
Table 7.23 shows the data file for this regression model with the dummy
variables. The data can be analyzed using a MINITAB data file – [Data
File: DummyVar_File(2) or from the EXCEL data file – DummyVar_File
(2).xlsx].
We used both MINITAB and EXCEL to run this model The
MINITAB and EXCEL regression output and results are shown in Tables
7.24 and 7.25. Refer to the computer results to answer the following
questions.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:24:58.
188 BUSINESS ANALYTICS, VOLUME II
Table 7.23 Data file for the model with dummy variables
No. of
Volume Advertisement Commission Salespersons Zone A Zone B
Row (y) (x1) (x2) (x3) (x4) (x5)
1 973.62 580.17 235.48 8 1 0
2 903.12 414.67 240.78 7 1 0
3 1,067.37 420.48 276.07 10 1 0
4 1,193.37 454.59 295.70 14 0 1
5 1,429.62 524.05 286.67 16 0 0
6 1,557.87 623.77 325.66 18 1 0
7 1,590.12 641.89 298.82 17 1 0
8 1,081.62 403.03 210.19 12 0 0
9 1,088.37 415.76 202.91 13 0 0
10 1,132.62 506.73 275.88 11 0 1
11 1,314.87 490.35 337.14 15 1 0
12 1,562.37 624.24 266.30 19 0 0
13 1,050.12 459.56 240.13 10 0 0
14 1,055.37 447.03 254.18 12 0 1
15 1,112.37 493.96 237.49 14 0 1
16 1,235.37 543.84 276.70 16 0 1
17 1,518.12 618.38 271.14 18 1 0
18 1,574.37 690.50 281.94 15 0 0
19 1,644.87 591.27 316.75 20 0 0
20 1,169.37 530.73 297.37 10 0 0
21 1,212.87 541.34 272.77 13 0 1
22 1,304.37 492.20 344.35 11 0 1
23 1,477.62 546.34 295.53 15 0 0
Copyright © 2016. Business Expert Press. All rights reserved.
A) Using the EXCEL data file, run a regression model. Show your regres-
sion output.
B) Using the MINITAB or EXCEL regression output, write down the
regression equation.
C) Using a 5 percent level of significance and the column “p” in the
MINITAB regression output or “p-value” column in the EXCEL re-
gression output, conduct appropriate hypotheses tests to determine
that the independent variables advertisement, commission paid, and
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:24:58.
Regression Analysis and Modeling 189
Solution:
A) The MINITAB regression output is shown in Table 7.24.
B) Table 7.25 shows the EXCEL regression output.
C) From the MINITAB or the EXCEL regression outputs in Tables 7.24
and 7.25, the regression equation is:
The regression equation from the EXCEL output in Table 7.25 can be
written using the coefficients column.
The above hypothesis can be tested using the “p” column in either
MINITAB or the p-value column in EXCEL computer results. The deci-
sion rule for the p-value approach is given by
If p ≥ α , do not reject H0
If p < α , reject H0
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:24:58.
190 BUSINESS ANALYTICS, VOLUME II
Table 7.26 shows the p-value for each of the predictor variables. From
Copyright © 2016. Business Expert Press. All rights reserved.
MINITAB or EXCEL computer results in Table 7.24 or 7.25 (see the “p”
or the “p-value” columns in these tables).
From the above table it can be seen that all the three independent
variables are significant.
(E) As indicated, the overall regression equation is
Separate equations for each zone can be written from this equation.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:24:58.
Regression Analysis and Modeling 191
Zone A: x4 = 1.0, x5 = 0
Therefore, the equation for the sales volume of Zone A can be written as
Similarly, the regression equations for the other two zones are shown
below.
Zone B: x4 = 0, x5 = 1.0
Substituting these values in the overall regression equation of part (c)
(x2) + 33.8 No. of Salespersons(x3) − 105 or, Sales Volume (y) = −203.2
+ 0.884 Advertisement (x1) + 1.81 Commission (x2) +33.8 No. of Sales-
persons (x3)
Zone C: x4 = 0, x5 = 0
Substituting these values in the overall regression equation of part (c)
Note that in all of the above equations, the slopes are same but intercepts
are different.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:24:58.
192 BUSINESS ANALYTICS, VOLUME II
y = β0 + β1 x1 + β 2 x 22 + ... + βk x k + ε
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:24:58.
Regression Analysis and Modeling 193
Models with Dummy Variables General form of Model with one qualitative
(dummy)independent variable at m levels
y = b0 + b1 x1 + b2 x2 +……+ bm − 1 xm − 1
All Subset and Stepwise Regression Finding the best set of predictor variables to be
included in the model
Note; the Interaction Models and All Subset Regression are not discussed
in this chapter.
There are other regression models that are not discussed but can be de-
veloped using the concepts presented for the other models. Some of these
models are explained here.
ln( y ) = β0 + β1 ln( x ) + ε
The purpose of this transformation is to achieve a linear rela-
tionship. The model is valid for positive values of x and y. This
transformation is more involved and is difficult to compare it to
other models with y as the dependent variable.
Logistic Regression This model is used when the response variable is categorical. In
all the regression models we developed in this book, response
variable was a quantitative variable. In cases, where the response
is categorical or qualitative, the simple and multiple least-
squares regression model violates the normality assumption.
The correct model in this case is logistic regression and is not
discussed in this book.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:24:58.
194 BUSINESS ANALYTICS, VOLUME II
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:24:58.
CHAPTER 8
Chapter Highlights
• Introduction to Forecasting
• Forecasting Methods: An Overview
?? Qualitative Forecasting
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:29:41.
196 BUSINESS ANALYTICS, VOLUME II
Introduction to Forecasting
Forecasting and time series analysis are major tools of predictive analytics.
Forecasting involves predicting future business outcomes using a number
of qualitative and quantitative methods. In this chapter we discuss the
prediction techniques using forecasting and time series data. Many of the
business planning production, operations, sales, demand, and inventory
decisions are based on forecasting. We discuss here the broad meaning of
forecasting applications and a number of models. A forecast is a statement
about the future value of a variable of interest such as demand. Forecast-
ing is used to make informed decisions and may be divided into:
• Long range
Copyright © 2016. Business Expert Press. All rights reserved.
• Short range
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:29:41.
Time Series Analysis and Forecasting 197
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:29:41.
198 BUSINESS ANALYTICS, VOLUME II
Associative Forecasting
Associative forecasting methods use explanatory variables to predict the
future. These methods use one or several independent variables or factors
to predict the response variable. Regression methods using simple, mul-
tiple, nonlinear regression models and also indicator variables are some of
the methods used in this category. In this chapter, we will mainly focus on
quantitative forecasting methods.
Features of Forecasts
Copyright © 2016. Business Expert Press. All rights reserved.
• Forecasts are not exact and are rarely perfect because of randomness.
Also more than one forecasting method can often be used to forecast
the same data. They all produce different results. The forecast accuracy
differs based on the methods used. Applying the correct forecasting
technique is critical to achieving good forecasts. Some forecasting tech-
niques are more complex than the others. Applying the correct fore-
casting method requires experience and a knowledge of the process.
• Forecast accuracy depends on the randomness and noise present
in the data.
• Forecast accuracy decreases as the time horizon increases.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:29:41.
Time Series Analysis and Forecasting 199
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:29:41.
200 BUSINESS ANALYTICS, VOLUME II
Trend
Seasonal
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:29:41.
Time Series Analysis and Forecasting 201
These are the time series where the variable of interest shows a combin-
ation of a trend and seasonal pattern. Forecasting this type of pattern
requires a technique that can deal with both trend and seasonality and can
be achieved through time series decomposition to separate or decompose
a time series into trend and seasonal components. The methods to forecast
trend and seasonal patterns are usually more involved computationally.
Cyclical
Random Fluctuations
Random fluctuations are the result of chance variation and may be a com-
bination of constant fluctuations followed by trends. An example would
be the demand for electricity in summer. These patterns require special
forecasting techniques and are often complex in nature.
Usually the first step in forecasting is to plot the historical data. This
is critical in identifying the pattern in the time series and applying the
correct forecasting method. If the data are plotted over time, such plots
Copyright © 2016. Business Expert Press. All rights reserved.
are known as time series plots. This plot involves plotting the time on the
horizontal axis and the variable of interest on the vertical axis. The time
series plot is a graphical representation of data over time where the data
may be weekly, monthly, quarterly, or annually. Some of the common
time series patterns are shown in Figures 8.1 through 8.7.
Figure 8.1 shows that the demand data is fluctuating around an aver-
age. The averaging techniques such as, Simple Moving Average or Simple
Exponential Smoothing can be used to forecast such patterns. Figure 8.2
shows the actual data and the forecast for Figure 8.1.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:29:41.
202 BUSINESS ANALYTICS, VOLUME II
Figure 8.2 Forecast for the demand data in Figure 8.1 (forecasts are
dotted lines)
Figure 8.3 shows the sales data for a company over a period of
65 weeks. Clearly, the Data are fluctuating around an average and show-
ing an increasing trend. Forecasting techniques such as, Double Moving
Average or Exponential Smoothing with a trend can be used to forecast
such patterns. Figure 8.4 shows the sales and forecast for the data in
Figure 8.3. Figure 8.5 shows a seasonal pattern.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:29:41.
Time Series Analysis and Forecasting 203
Figure 8.4 Forecast for the sales data in Figure 8.3 using double
moving average
Copyright © 2016. Business Expert Press. All rights reserved.
The other class of models is based on regression. Figure 8.6 shows the
relationship between two variables—summer temperature and electricity
used. There is a clear indication that there exists a linear relationship be-
tween the two variables. Such a relationship between the variables enables
us to use regression models where one variable can be predicted using the
other variable. We have explained the regression models in the previous
chapter. Figure 8.7 shows a nonlinear relationship (quadratic model). A
nonlinear or quadratic model as explained in the previous chapter can be
used in such cases to predict the response variable (yield in this case) using
the independent variable (temperature).
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:29:41.
204 BUSINESS ANALYTICS, VOLUME II
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:29:41.
Time Series Analysis and Forecasting 205
as well as how close the forecast values are to the actual data. A good fore-
cast should respond and follow the actual data closely with minimum of
error. The error is the difference between the actual value and the forecast.
The forecast accuracy measures the errors in different forms. A num-
ber of accuracy measures are calculated. We will see that some accuracy
measures are preferred more than the others. Usually, the most accurate
forecast is the one that has minimum of errors. Note that different fore-
casting methods can be used to forecast the same time series data.
When different methods are used to forecast the same time series, the
forecast accuracy measures are calculated and compared for each method
to determine the best forecasting method.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:29:41.
206 BUSINESS ANALYTICS, VOLUME II
The forecast accuracy is related to the forecast error that is defined as:
Mean Error
Mean or the average forecast error is the simplest measure of forecast ac-
curacy. Since the error can be positive or negative, the positive and nega-
tive forecast errors tend to offset one another, resulting in a small value of
the mean error. Therefore, mean forecast error is not a very useful measure.
Copyright © 2016. Business Expert Press. All rights reserved.
The mean absolute error (MAE) is also known as mean absolute deviation
(MAD). It is the mean of the absolute values of the forecast errors. This
avoids the problem of offsetting the positive and negative mean errors.
The MAD can be calculated as:
MAD =
∑ Actual − Forecast
n
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:29:41.
Time Series Analysis and Forecasting 207
MAD shows the average size of the error (or average deviation of
forecast from the actual data). Note that n is the number of forecasts
generated.
This is another measure of forecast error that avoids the problem of posi-
tive and negative errors. It is the average of the squared forecast errors
(mean squared error, MSE) and is calculated using
MSE =
∑ ( Actual − Forecast )2
n −1
The MAE or MAD and the MSE depend upon the scale of the data. This
makes it difficult to compare the error for different time intervals. The
mean absolute percentage error (MAPE) provides a relative or percent
error measure that makes the comparison easier. The MAPE is the average
of the absolute percentage forecast errors and is calculated using:
MAPE =
∑ Actual − Forecast / Actual * 100
n
Tracking Signal
Copyright © 2016. Business Expert Press. All rights reserved.
Tracking Signal =
∑ ( Actual − Forecast )
MAD
MAD shows the average size of the error (or average deviation of fore-
cast from the actual data).
Bias is the persistent tendency for the forecasts to be greater or smaller
than the actual values. It indicates whether the forecast is typically too low
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:29:41.
208 BUSINESS ANALYTICS, VOLUME II
or too high and by how much. Thus, the bias shows the average total error
and its direction.
Tracking signal uses both bias and MAD and can also be calculated as:
Bias
Tracking Signal =
MAD
Forecasting Methods
Naïve Forecasting Method
This method uses the most recent observation in the time series as the
forecast for the next time period and generates short-term forecast.
The weekly demand (for the past 21 weeks) for a particular brand
Copyright © 2016. Business Expert Press. All rights reserved.
of cell phone is shown in Table 8.1. We will use the naïve forecasting
method to forecast one week ahead and calculate the forecast accuracy
by calculating the errors. The data and the forecast along with the fore-
cast errors, absolute errors, squared errors, and absolute percent errors are
shown in Table 8.1.
Note that this method uses the most recent observation in the time
series as the forecast for the next time period. Thus, the forecast for the
next period
^
X t +1 = Actual Value in Period t
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:29:41.
Time Series Analysis and Forecasting 209
Using the values from the Total row in Table 8.1, we can calculate the
forecast accuracies or errors as shown in Table 8.2.
MSE 88373
MSE = = 4418.65
20
MAPE 500.56
MAPE = = 25.01%
20
Copyright © 2016. Business Expert Press. All rights reserved.
The above measures are used in selecting the forecasting method for
the data by comparing them to the measures calculated using other meth-
ods. Usually a small deviation (MAD) or MAPE is an indication of better
forecast.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:29:41.
210 BUSINESS ANALYTICS, VOLUME II
(a) Simple moving average; (b) weighted moving average; (c) expo-
nential smoothing
The above methods are used for short-range forecast and are also known
as smoothing methods because their objective is to smooth out the random
fluctuations in the time series. A computer software is almost always used
to study the trend or the time series characteristics of the data. The ex-
amples below show the analysis of the class of forecasting techniques that
are based on averages.
the oldest value is discarded and the average is calculated using the most
recent observations in the series. This results in a move or change in the
average that keeps changing as new observations become available.
The weekly demand (for the past 65 weeks) for a particular brand of cell
phone is used to demonstrate the simple moving average method. The
partial data are show in Table 8.3. The plot of complete data is shown in
Figure 8.8.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:29:41.
Time Series Analysis and Forecasting 211
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:29:41.
212 BUSINESS ANALYTICS, VOLUME II
Figure 8.9 Plot of actual data and six-period moving average forecast
ing average forecast in Figure 8.10 for the same data. Note the effect
of lowering the moving average period in forecasts generated. A smaller
value tracks the shifts in the time series more quickly and may generate a
better forecast. Table 8.4 shows the accuracy measures. The three-period
moving average forecast has less deviation (MAD) and a smaller MAPE
so this should be preferred.
The accuracy measures for the six- and three-period moving average
are shown in Table 8.4.
Compare the forecast accuracies generated using a six-period mov-
ing average and a three-period moving average in Table 8.4. The fore-
cast using a three-period moving average has smaller deviation, and these
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:29:41.
Time Series Analysis and Forecasting 213
forecasts are responding better to the actual data compared to the six-
period moving average. Usually, a smaller averaging period will produce
a better forecast.
Copyright © 2016. Business Expert Press. All rights reserved.
X T + X T −1 + X T − 2 + + X T − N +1
MT = (8.1)
N
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:29:41.
214 BUSINESS ANALYTICS, VOLUME II
General Equation:
XT − XT − N
MT = MT −1 +
N (8.2)
XˆT + τ (T ) = MT (8.3)
Sample Calculations
Refer to the first 15 values of demand from Table 8.5 for sample calculation.
4 4 216 * * *
5 5 226 * * *
6 6 239 218.167 * *
7 7 206 226.167 218.167 −12.167
8 8 178 218.833 226.167 −48.167
9 9 169 205.667 218.833 −49.833
10 10 177 199.167 205.667 −28.667
11 11 290 209.833 199.167 90.833
12 12 245 210.833 209.833 35.167
13 13 318 229.500 20.833 107.167
14 14 158 226.167 229.500 −71.500
15 15 274 243.667 226.167 47.833
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:29:41.
Time Series Analysis and Forecasting 215
X T + X T −1 + X T − 2 + + X T − N +1
MT =
N
X 6 + X 5 + X 4 + X 3 + X 2 + X1
M6 =
6
239 + 226 + 216 + 248 + 222 + 158
M6 = = 218.17
6
In Table 8.5: Week is the time, Demand is the actual demand XT,
MA = moving average, Forecast = one-period ahead forecast, Error is the
difference between the actual and the forecast values (it is a measure of
deviation of actual and the forecast values).
Using equation (8.2) calculate the moving averages. Note that you
need to use equation (8.1) once.
XT − XT − N
MT = MT −1 +
N
Set: T = 7, N = 6
Copyright © 2016. Business Expert Press. All rights reserved.
XT − XT − N
MT = MT −1 +
N
X 7 − X1
M7 = M6 +
6
206 − 158
= 218.17 + = 226.17
6
In the computations shown below, note that each time the most re-
cent value is included in the average and the oldest one is discarded. To
calculate the next moving average,
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:29:41.
216 BUSINESS ANALYTICS, VOLUME II
Set: T = 8, N = 6
XT − XT − N
MT = MT −1 +
N
X8 − X 2
M8 = M 7 +
6
178 − 222
= 226.17 + = 218.33
6
Set: T = 9, N = 6
XT − XT − N
MT = MT −1 +
N
X9 − X3
M9 = M8 +
6
169 − 248
= 218.83 + = 205.66
6
Set: T = 10, N = 6
XT − XT − N
MT = MT −1 +
N
X 10 − X 4
M10 = M9 +
6
177 − 216
= 205.66 + = 199.167
6
The rest of the moving averages and forecasts are shown in Table 8.5.
Since we calculated a six-period moving average, the forecast for the 7th
Copyright © 2016. Business Expert Press. All rights reserved.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:29:41.
Time Series Analysis and Forecasting 217
MSE 52846.44
MSE = = 2935.91
18
MAPE 340.43
MAPE = = 18.91
18
better forecast. Note: the forecast errors are used to compare the error
from different forecasting methods. Often, more than one method is used
to forecast different sets of data. In such cases, the forecast errors are the
measures of the best forecast.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:29:41.
218 BUSINESS ANALYTICS, VOLUME II
the data values. The more recent observations are given more weights
compared to the older observations. The sum of the weights for the data
values included in the average is usually 1.0.
In Table 8.8, we used Excel to calculate a 4-period simple moving
average and 4-period weighted moving average forecasts for the 21 per-
iods of sales data in column B. Column C shows the 4-period simple
moving average forecasts and column D shows 4-period weighted mov-
ing average forecasts. The weights used for the four data points are 0.1,
0.2, 0.3, and 0.4 and are denoted using W(1) through W(4) shown in
columns A and B. Columns E to H show the forecast errors and absolute
errors for the simple and weighted 4-period forecasts.
Table 8.8 Four-period simple moving average and weighted moving average
forecasts and errors
Copyright © 2016. Business Expert Press. All rights reserved.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:29:41.
Time Series Analysis and Forecasting 219
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:29:41.
220 BUSINESS ANALYTICS, VOLUME II
Ft = α At −1 + (1 − α )Ft −1
where
Ft = forecast for period t, the next period, Ft–1 = forecast for period (t−1),
the prior period
At–1= actual data for (t−1), the prior period, α = smoothing constant
0≤α ≤1
4 514 391
5 402 403
6 343 403
7 438 397
8 419 401
9 374 403
10 415 400
11 451 402
12 333 407
13 386 399
14 408 398
15 333 399
16 392
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:29:41.
Time Series Analysis and Forecasting 221
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:29:41.
222 BUSINESS ANALYTICS, VOLUME II
Sample Calculations
Forecast for periods 2 through 5 using the forecasting equation:
Ft = α At −1 + (1 − α )Ft −1
Note: the smoothing constant, α = 0.1 and the initial forecast or the
forecast for the first period, F1 = 393
The forecasts for periods 2,3,4,… are shown below:
… and so on.
Another Example on Simple Exponential Smoothing for Inven-
tory Demand. The operations manager at a company talks to an an-
alyst at company headquarters about forecasting monthly demand for
inventory from her warehouse. The analyst suggests that she considers
using simple exponential smoothing with smoothing constant of 0.3. The
operations manager decides to use the most recent inventory demand (in
thousands of dollars) shown below. From the past experience, she decided
Copyright © 2016. Business Expert Press. All rights reserved.
to use 99.727 as the forecast for the first period. Use the simple exponen-
tial smoothing using α = 0.3 and F1 = 99.727 to develop the forecast for
months 2 through 11 for the data in Table 8.11. What is the MAD?
The results are shown in Table 8.11. MINITAB statistical software
was used to generate the forecast.
The inventory demand data and the forecast are plotted and shown
in Figure 8.14.
To see the effect of the smoothing constant α on the forecasts, two
sets of forecasts were generated with α = 0.3 and α = 0.1 and accuracy
measures were calculated. These are shown in Table 8.12.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:29:41.
Time Series Analysis and Forecasting 223
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:29:41.
224 BUSINESS ANALYTICS, VOLUME II
Changing α from 0.3 to 0.1 produced better forecast with less error
values. Both the MAD and MAPE decreased for smaller α. There is a way
of obtaining an optimal value of smoothing constant. The forecast using
exponential smoothing depends on the value of α; therefore, an optimal
value of α is recommended.
The previous forecasting methods were applied to the time series data that
did not show any trend. For the data showing a trend, the simple moving
average method will not provide correct forecasts.
A trend in the time series is identified by a gradual shift or move-
ments to relatively higher or lower values over a period of time. A trend
may be increasing or decreasing and may be linear or nonlinear. Some-
times an increasing or decreasing trend may depict a fluctuation around
an average. Some examples of trend may be changes in populations,
sales and revenue of a company, and demand for a particular technol-
ogy of consumers items showing increasing or decreasing demand.
Figure 8.15 shows the actual sales and double moving average forecast
Copyright © 2016. Business Expert Press. All rights reserved.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:29:41.
Time Series Analysis and Forecasting 225
for a company for the past 65 weeks (the dotted line represents the
forecast). Table 8.13 shows partial data. The time series clearly shows
an increasing trend. The appropriate method to forecast this pattern
is double moving average or the moving average with a trend. Double
moving average is the average of simple moving average. The forecasting
equation in this method is designed to incorporate both the average and
trend component.
12 12 55
13 13 67
14 14 42
15 15 61
: : 58
: : 49
65 65 74
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:29:41.
226 BUSINESS ANALYTICS, VOLUME II
MT + M T −1 + MT − 2 + + MT − N +1
MT2 = (8.4)
N
Copyright © 2016. Business Expert Press. All rights reserved.
where
MT2 = N-period double moving average
N = no. of periods in moving average
T = no. of observations
MT = N-period simple moving average
General Equation:
MT − MT − N
MT[ 2 ] = MT[ 2−]1 + (8.5)
N
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:29:41.
Time Series Analysis and Forecasting 227
2
XˆT + τ (T ) = 2 MT − MT[ 2 ] + τ
N − 1
MT − MT[ 2 ]( ) (8.6)
MT + M T −1 + MT − 2 + + MT − N +1
MT2 =
N
M9 + M 8 + M7 + M6 + M5
M 9[ 2 ] =
5
45.60 + 46.60 + 48.40 + 48.40 + 45.20
M 9[ 2 ] = = 46.84
5
MT − MT − N
MT[ 2 ] = MT[ 2−]1 +
N
Using this equation, calculate the other double moving average values as
Copyright © 2016. Business Expert Press. All rights reserved.
Set T = 10, N = 5
Set T = 11, N = 5
… and so on.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:29:41.
228 BUSINESS ANALYTICS, VOLUME II
2
XˆT + τ (T ) = 2 MT − MT[ 2 ] + τ
N − 1
MT − MT[ 2 ] ( )
Forecast for the 10th week using the first 9 periods of data (note τ is
always =1 because of one-period ahead forecast)
Set T = 9, τ = 1
2
Xˆ 9 +1(9) = 2 M 9 − M 9[ 2 ] + 1
5 − 1
(
M 9 − M 9[ 2 ] )
1
Xˆ10 (9) = 2(45.60) − 46.84 + ( 45.60 − 46.84 ) = 43.74 (shown in
2
Table 8.14 column 6)
2
Xˆ10 +1(10) = 2 M10 − M10
[2]
+ 1
5 − 1
M10 − M10
[2]
( )
1
Xˆ11(10) = 2(44.60) − 46.72 + ( 44.60 − 46.72 ) = 41.42
2
Copyright © 2016. Business Expert Press. All rights reserved.
… and so on.
The rest of the forecasts and complete data are shown in Appendix A.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:29:41.
Time Series Analysis and Forecasting 229
next-day closing price of XYZ Analytics Inc. common stock. The analyst
has obtained the closing stock prices for the past 40 days (see Appendix).
A) Forecast the stock price for days 3 through 41 using a 3-period mov-
ing average and calculate the forecast errors: MAD, MAPE, and MSD.
Plot the actual data and the forecast on one plot. Use a 6-period mov-
ing average to forecast the stock price data.
B) Use the simple exponential smoothing method to forecast periods 1
through 41 of the stock price. Note that the forecast for period 1 is the
actual price of day 1 (which is 43.50). Use the smoothing constant α
of 0.4. Then increase the value of α to 0.804 and develop your fore-
cast with this α value. Calculate the MAD, bias, and tracking signal
for α = 0.4 and for α = 0.804. The forecast and the error values should
be rounded to four decimal places.
C) Compare the MAD values in parts (a) and (b) and decide which fore-
casting approach to use. What does the bias and tracking signal tell
you? Make a table as shown below and show your values.
Figures 8.16 through 8.19 show the plots of actual data and the fore-
casts using moving average and exponential smoothing methods. The fore-
cast accuracies for comparison purposes are provided below the figures.
Copyright © 2016. Business Expert Press. All rights reserved.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:29:41.
230 BUSINESS ANALYTICS, VOLUME II
A close examination of the forecasts shows that all these methods pro-
vided good short-term forecast of the stock values. However, the forecasts
using the exponential smoothing with a smoothing constant (α = 0.8)
has the least MAD and also the MAPE.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:29:41.
Time Series Analysis and Forecasting 231
nential smoothing method just requires the forecast for the previous period to
generate the forecast for the next period.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:29:41.
232 BUSINESS ANALYTICS, VOLUME II
Step 1. Plot the data: Quarter vs. Sales. Figure 8.20 shows the plot of the
demand. This plot clearly shows a seasonal pattern.
2. Calculate the seasonal index for each quarter as shown in Table 8.16.
Copyright © 2016. Business Expert Press. All rights reserved.
The formula to calculate the seasonal index is explained below the table.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:29:41.
Time Series Analysis and Forecasting 233
4. Plot the deseasonalized data. Figure 8.21 shows the plot of deseason-
lized data.
Copyright © 2016. Business Expert Press. All rights reserved.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:29:41.
234 BUSINESS ANALYTICS, VOLUME II
5. Since the data show an increasing trend (see plot above), perform a
regression analysis on the deseasonalized data (x is quarter and y is
deseasonalized data). The computer result is shown in Figure 8.22
and the regression equation is shown below.
Y = 615.419 + 16.8652 x
S = 22.3799 R-Sq = 89.0%
6. Use the regression equation to forecast for quarters 13, 14, 15, and
16 of the following year or year 11. These are deseasonalized fore-
casts for the next four quarters of next year (note that quarter 13 is
the 1st quarter of the next year, quarter 14 is the 2nd quarter of the
next year, and so on).
y = 615.419 =16.8652x
y13 = 615.419 = 16.8652(13) = 834.67
y14 = 615.419 = 16.8652(14) = 851.53
y15 = 615.419 = 16.8652(15) = 868.39
y16 = 615.419 = 16.8652(16) = 885.26
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:29:41.
Time Series Analysis and Forecasting 235
7. Multiply the deseasonalized forecast for each quarter with the sea-
sonal index to get the seasonalized forecast. The forecasts are shown
in Table 8.18.
8. The actual data (for the first 12 quarters) and seasonal forecast (next
4 quarters 13 to 16) are shown in Table 8.19.
10 900
11 1,000
12 650
13 675
14 955
15 1,086
16 724
9. Plot the actual data and the forecast. Figure 8.23 shows the plot of
actual data and the forecast for the next quarter. Note how the fore-
cast follows the seasonal trend.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:29:41.
236 BUSINESS ANALYTICS, VOLUME II
Figure 8.23 Actual demand data (first 12 quarters) and the forecasts
for the next four quarters (quarters 13 through 16)
• Simple Regression
• Multiple Regression Analysis
Summary
This chapter discussed forecasting techniques. Forecasting is a critical part
of predictive analytics and involves predicting future business activities
including the sales, revenue, workforce requirements, demand, and in-
ventory, to name a few. Forecasts affect decisions and activities through-
out an organization. Produce-to-order and produce-to-stock companies
depend on forecast for production and operations planning. Inventory
planning and decisions are affected by forecast. The companies with good
forecasting in place are able to balance the demand and supply, thereby
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:29:41.
Time Series Analysis and Forecasting 237
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:29:41.
Copyright © 2016. Business Expert Press. All rights reserved.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:29:41.
CHAPTER 9
Chapter Highlights
• Introduction to Data Mining
• Data Mining Defined
• Some Application Areas of Data Mining
• Machine Learning and Data Mining
• Data Mining and Its Origins and Areas It Interacts with
• Process of Data Mining and Knowledge Discovery in Databases
(KDD)
• Data Mining Methodologies and Data Mining Tasks
?? Data Preparation or Data Preprocessing, and
▪▪ Data cleaning
▪▪ Data integration
Copyright © 2016. Business Expert Press. All rights reserved.
▪▪ Data selection
▪▪ Data transformation
?? Data Mining
▪▪ Pattern evaluation
▪▪ Knowledge representation
• Data Mining Tasks
?? Descriptive Data Mining
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:31:37.
240 BUSINESS ANALYTICS, VOLUME II
?? clustering,
?? sequence, and
transform data into an understandable structure for further use. The field
of data mining is rapidly growing, and statistics plays a major role in it.
Data mining is also known as knowledge discovery in databases (KDD),
pattern analysis, information harvesting, business intelligence, analytics,
etc. Besides statistics, data mining uses artificial intelligence (AI), ma-
chine learning, database systems, advanced statistical tools, and pattern
recognition.
Successful companies use their data as an asset and use them for
competitive advantage. These companies use business analytics and data
mining tools as an organizational commitment to data-driven decision
making. Business data mining combined with machine learning and AI
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:31:37.
Data Mining: Tools and Applications in Predictive Analytics 241
techniques.
The other reason to mine data is to discover the hidden patterns and
relationship in the data. There is often hidden information in the data
that is not readily apparent, and it is usually difficult to discover using
traditional statistical tools. Sometimes it may take significant amount of
time to discover useful information using traditional methods.
Data mining automatically processes massive amounts of data using
specially designed software. A number of techniques, for example, classi-
fication and clustering, are used to analyze huge quantities of data. These
provide useful information to the analysts and are critical in analyzing
business, financial, or scientific data.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:31:37.
242 BUSINESS ANALYTICS, VOLUME II
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:31:37.
Data Mining: Tools and Applications in Predictive Analytics 243
• Data Collection: The goal of this phase is to extract the data rel-
evant to data mining analysis. The data should be stored in a data-
base where data analysis will be applied.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:31:37.
Figure 9.2 The knowledge discovery in data mining (KDD) process
Copyright © 2016. Business Expert Press. All rights reserved.
244
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:31:37.
Data Mining: Tools and Applications in Predictive Analytics 245
The above steps are necessary to prepare the data for further process-
ing. The steps provide clean or processed data so that data mining tasks
can be performed. The data mining tasks involve:
A) Data mining
B) Pattern evaluation
C) Knowledge representation
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:31:37.
Figure 9.3 Data mining (KDD) process: data preprocessing and data mining tasks
Copyright © 2016. Business Expert Press. All rights reserved.
246
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:31:37.
Data Mining: Tools and Applications in Predictive Analytics 247
Data Cleaning
Data cleaning is the process of preparing and making data ready for
further processing. The data collected are raw data and are usually un-
structured, incomplete, noisy, have missing values, and are inconsistent.
The data may also be missing attributes, for example, a huge number of
customer data of a financial company may miss attributes like age and
gender. Such data are incomplete with missing values. Data may also have
outliers or extreme values. There may be recording errors, for example, a
person’s age may be wrongly recorded as 350 years.
The data available in data sources might be lacking attribute values.
For example, we may have data that do not include attributes for the
gender or age of the customers. These data are, of course, incomplete.
Sometimes the data might contain errors or outliers. An example is
Copyright © 2016. Business Expert Press. All rights reserved.
an age attribute with value 200. It is obvious that the age value is
wrong in this case. The data could also be inconsistent. For example,
the name of an employee might be stored differently in different data
tables or documents. Here, the data are inconsistent. If the data are
not clean and structured, the data mining results would be neither
reliable nor accurate.
Data cleaning involves a number of techniques including filling in the
missing values manually, combined computer and human inspection, etc.
The output of data cleaning process is adequately cleaned data ready for
further processing.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:31:37.
248 BUSINESS ANALYTICS, VOLUME II
Data Integration
Data integration is the process where data from different data sources are
integrated into one. Data lie in different formats in different locations
and could be stored in databases, text files, spreadsheets, documents, data
cubes, Internet, and so on. Data integration is a really complex and tricky
task because data from different sources may not match normally. For
example, suppose table A contains an entity, named customer-id, whereas
table B contains an entity named “number” instead of customer-id. In
such cases, it is difficult to ensure whether both these entities refer to the
same value. Metadata can be used effectively to reduce errors in the data
integration process. Another issue faced is data redundancy where the
same data may be available in different tables in the same database or are
available in different data sources. Data integration tries to reduce redun-
dancy to the maximum possible level without affecting the reliability of
data.
Data Selection
Data mining process uses large volumes of historical data for analysis.
Sometimes, the data repository with integrated data may contain much
more data than actually required. Before applying any data mining task or
algorithm, the data of interest needs to be separated, selected, and stored
from the available stored data. Data selection is the process of retrieving
the relevant data for analysis from the database.
Copyright © 2016. Business Expert Press. All rights reserved.
Data Transformation
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:31:37.
Data Mining: Tools and Applications in Predictive Analytics 249
Data Mining
Data mining is the core process that uses a number of complex methods
to extract patterns from data. This purpose of data mining phase is to
analyze the data using appropriate algorithms to discover meaningful pat-
terns and rules to produce predictive models. This is the most important
phase of KDD cycle.
Data mining process includes a number of tasks such as association,
classification, prediction, clustering, time series analysis, machine learning,
and deep learning. Table 9.1 outlines the data mining tasks.
Data mining tasks can be broadly classified into descriptive data mining
and predictive data mining.
There are a number of data mining tasks such as classification, pre-
diction, time series analysis, association, clustering, and summarization.
All these tasks are either predictive or descriptive data mining tasks.
Figure 9.4 shows a broad view of data mining tasks.
Descriptive data mining tasks make use of collected data and data min-
ing methodologies to look into the past behavior, relationships, and
patterns to understand and explain what exactly happened in the past.
Predictive analytics employs various predictive data mining and sta-
Copyright © 2016. Business Expert Press. All rights reserved.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:31:37.
Copyright © 2016. Business Expert Press. All rights reserved.
250
Mining Brief Description Application Areas
Data Min- Data Mining involves exploring new patterns and relation- Data mining is one of the major tools of predictive analytics. In business, data mining is used
ing and ships from the collected data—a part of predictive analytics to analyze business data. Business transaction data along with other customer and product-re-
Tasks that involves processing and analyzing huge amounts of data lated data are continuously stored in the databases. The data mining software are used to an-
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
The data in its raw form have no meaning unless processed previously unknown patterns in the databases containing massive amounts of data and to make
and analyzed. Among several tools and techniques available predictions that are critical in decision making and improving the overall system performance.
and currently emerging with the advancement of technol- In recent years, data mining combined with machine learning/AI is finding larger and
ogy and computers, it is now possible to analyze big data us- wider applications in analyzing business data, thereby predicting future business out-
ing data mining, machine learning, and AI techniques. comes. The reason for this is the growing interest in knowledge management and in
moving from data to information and finally to knowledge discovery.
Data Mining: Tools and Applications in Predictive Analytics 251
new information from the available data set that is not apparent other-
wise. Businesses use a number of data visualization techniques including
dashboards, heat maps, and a number of other graphical tools to study
the current behavior of their businesses. These visual tools are simple but
rather powerful tools in studying the current business behaviors and are
used in building predictive analytics models.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:31:37.
Figure 9.5 Data mining methodologies
Copyright © 2016. Business Expert Press. All rights reserved.
252
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:31:37.
Data Mining: Tools and Applications in Predictive Analytics 253
Classification
Clustering
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:31:37.
254 BUSINESS ANALYTICS, VOLUME II
Cluster Analysis
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:31:37.
Data Mining: Tools and Applications in Predictive Analytics 255
Prediction
Time series analysis involves data collected over time. Time series is a se-
quence of historical events over time and studies the past performance to
forecast or determine the future events where the next event is determined
by one or more of the preceding events.
A number of models are used to analyze time series data. The forecast-
ing chapter in this book discussed a number of time series patterns and
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:31:37.
Figure 9.6 Supervised and Unsupervised Learning Techniques
Copyright © 2016. Business Expert Press. All rights reserved.
256
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:31:37.
Data Mining: Tools and Applications in Predictive Analytics 257
firms and impacts future sales, revenue, demand, inventory, and work-
force requirements. Time series analysis and forecasting techniques look
into the data collected over time to extract useful patterns, trends, rules,
and statistical models. The chapter on forecasting in this book outlines
a number of time series analysis and forecasting models. Stock market
prediction is an important application of time series analysis. These mod-
els have a number of applications in businesses including the stock mar-
ket and predicting power need requirements during peak summer hours
when the electricity requirement is highly variable and fluctuates rapidly
in a short span of time.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:31:37.
258 BUSINESS ANALYTICS, VOLUME II
Summarization
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:31:37.
Data Mining: Tools and Applications in Predictive Analytics 259
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:31:37.
260 BUSINESS ANALYTICS, VOLUME II
Deep Learning
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:31:37.
Data Mining: Tools and Applications in Predictive Analytics 261
human brain processes light and sound into vision and hearing. Some
successful applications of deep learning are computer vision and speech
recognition.
Summary
This chapter introduced and provided an overview of the field of data
mining. Today, vast amounts of data are collected by businesses. Data
mining is an essential tool for extracting knowledge from massive
amounts of data. The tools of data mining are used in extracting knowl-
edge from the data—the process is known as KDD. The extracted in-
formation and knowledge are used in different models to predict future
business outcomes. Besides the process of data mining and KDD, the
chapter explained a number of data mining methodologies and tasks. We
outlined and discussed several areas where data mining finds application.
The essential tasks of data mining including data preparation or data pre-
processing, knowledge representation, pattern evaluation, and descriptive
and predictive data mining were discussed. The two broad areas of data
mining are descriptive and predictive data mining. We discussed both of
these areas and outlined the tools in each case with their objectives.
Data mining techniques are also classified as supervised and unsu-
pervised learning. We discussed the tasks of data mining that fall under
supervised and unsupervised learning. The key methodologies of data
mining including anomalies (or outlier) detection, association learning,
classification, clustering, sequence, prediction, and time series and fore-
Copyright © 2016. Business Expert Press. All rights reserved.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:31:37.
Copyright © 2016. Business Expert Press. All rights reserved.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 06:31:37.
CHAPTER 10
Overview
This book provided an overview of the field of business analytics (BA).
BA uses a set of methodology to extract, explore, and analyze big data. It
is about extracting information and making decisions from big data. BA
is a data-driven decision-making process.
The field of BA can be broken down into two broad areas: (1) business
intelligence (BI) and (2) statistical analysis. The flow diagram in
Figure 10.1 outlines the broad area of analytics.
Chapters 1, 2, and 3 provided an explanation on BI and BA. This
book mainly focuses on predictive analytics involving predictive analytics
models. Several chapters in the book are devoted to these models.
Copyright © 2016. Business Expert Press. All rights reserved.
The broad area of BA can be broken down into: (1) BI and (2) statistical
analysis
Business Intelligence
BA comes under the broad umbrella of BI discussed in Chapter 3. BI
has evolved from business data reporting that involves examining histori-
cal data to gain an insight into the performance of a company over time.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 07:44:41.
Copyright © 2016. Business Expert Press. All rights reserved.
264
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 07:44:41.
WRAP-UP, OVERVIEW, NOTES ON IMPLEMENTATION 265
Statistical Analysis
The field of analytics is about driving business decisions using data.
Therefore, statistical analysis is at the core of BA. A number of statistical
techniques and models—from descriptive and data visualization tools to
analytics models—are applied for drawing meaningful conclusions from
the data. Statistical analysis involves performing data analysis and creating
statistical models and can be broken down into the following categories:
II. Explore the relation of the data to the underlying population, (iii)
establish and understand how the collected sample data will be
used to draw conclusion about the population, (iv) perform data
analysis and create different descriptive and predictive analytics
models that can be used to predict the future outcomes, and (v)
prove the validity of the models.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 07:44:41.
Figure 10.2 Functions of BI and analytics in different areas
Copyright © 2016. Business Expert Press. All rights reserved.
266
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 07:44:41.
WRAP-UP, OVERVIEW, NOTES ON IMPLEMENTATION 267
data to create models that can be used in predictive analytics. These predic-
tive analytics models are used for predicting future business outcomes.
The other component of statistical analysis is data analytics. Statistical
analysis and data analytics have somewhat similar approaches, except that
data analytics goes beyond statistical analysis that includes more elaborate
and extensive applications. The data analytics is explained below.
Data Analytics
Data analytics is the process of exploring and investigating a company’s
data to find patterns and relationships in data and applying specialized
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 07:44:41.
268 BUSINESS ANALYTICS, VOLUME II
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 07:44:41.
WRAP-UP, OVERVIEW, NOTES ON IMPLEMENTATION 269
Predictive Analytics
Prescriptive Analytics
Advanced Analytics Models
These models are discussed in detail in Chapters 1 and 2. Figure 10.3
outlines the tools and models of BA.
The different types of analytics are briefly explained here.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 07:44:41.
Figure 10.3 Descriptive, predictive, and prescriptive analytics models
Copyright © 2016. Business Expert Press. All rights reserved.
270
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 07:44:41.
WRAP-UP, OVERVIEW, NOTES ON IMPLEMENTATION 271
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 07:44:41.
272 BUSINESS ANALYTICS, VOLUME II
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 07:44:41.
WRAP-UP, OVERVIEW, NOTES ON IMPLEMENTATION 273
Machine Learning
AI and ML are sometimes used synonymously, but there is a difference
between the two. ML is simply a way of achieving AI.
AI can be achieved without using ML, in which the AI system would
require specific program with millions of lines of codes with complex
rules and decision trees. Alternatively, ML algorithms can be developed.
These are a way of “training” an algorithm so that it can learn how. The
“training” requires feeding huge amounts of data to the algorithm and al-
lowing it to adjust, learn, and improve. One of the most successful appli-
cations of ML is in the area of computer vision—the ability of a machine
to recognize an object in an image or video.
Deep Learning
Deep learning is a class of ML algorithm and is one of many approaches
Copyright © 2016. Business Expert Press. All rights reserved.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 07:44:41.
274 BUSINESS ANALYTICS, VOLUME II
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 07:44:41.
Figure 10.6 Background and prerequisites to predictive analytics
275
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 07:44:41.
276 BUSINESS ANALYTICS, VOLUME II
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 07:44:41.
WRAP-UP, OVERVIEW, NOTES ON IMPLEMENTATION 277
https://www.forbes.com/sites/bernardmarr/2017/06/06/the-9-best-
free-online-big-data-and-data-science-courses/#6403190343cd
Foundations in Business Analytics — University of Maryland
Copyright © 2016. Business Expert Press. All rights reserved.
Summary
In this chapter, we provided an overview of the field of analytics. The
broad area of analytics can be divided into two broad categories: BI and
statistical analysis.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 07:44:41.
278 BUSINESS ANALYTICS, VOLUME II
tical analysis and statistical methods. The focus of this book is predictive
analytics. Predictive analytics techniques use mostly statistical models and
algorithms to predict future business trends. These statistical techniques
include regression, time series analysis and forecasting, data mining, ML,
and AI techniques and also advanced analytics techniques like cluster and
classification algorithms in different applications such as marketing ana-
lytics. Specific chapters in the book are devoted to these models. These are
described in different chapters of the book.
BA initiatives can help businesses increase revenues and profitabil-
ity, improve operational efficiency, optimize marketing campaigns and
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 07:44:41.
WRAP-UP, OVERVIEW, NOTES ON IMPLEMENTATION 279
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 07:44:41.
Copyright © 2016. Business Expert Press. All rights reserved.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 07:44:41.
APPENDICES
Background and
Prerequisite for Predictive
Analytics
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 07:45:12.
Copyright © 2016. Business Expert Press. All rights reserved.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 07:45:12.
APPENDIX A
Probability Concepts:
Role of Probability in
Decision Making
0 ≤ P ( A) ≤ 1
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 07:45:12.
284 APPENDIX A
2. Permutations
The number of ways of selecting n distinct objects from a group of N
objects—where the order of selection is important—is known as the
number of permutations on N objects using n at a time and is written as
Copyright © 2016. Business Expert Press. All rights reserved.
N!
PnN = = (n )(n − 1)...(n − k + 1)
( N − n )!
3. Combinations
Combination is selecting n objects from a total of N objects. The
order of selection is not important in combination. This disregard
of arrangement makes the combination different from the permuta-
tion. In general, an experiment will have more permutations than
combinations.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 07:45:12.
APPENDIX A 285
N N!
C nN = = Note 0! = 1 by definition.
n n !( N − n )!
Assigning Probabilities
0 ≤ P ( A ) ≤ 1.0
P ( A 1 ) + P ( A2 ) + P ( A3 ) + ... + P ( An ) = 1
1. Classical Method
2. Relative Frequency Approach
Copyright © 2016. Business Expert Press. All rights reserved.
3. Subjective Approach
1. Classical Method
The classical approach of probability is defined as the favorable number
of outcomes divided by the total number of possible outcomes. Suppose
an experiment has n number of possible outcomes and the event A occurs
in m of the n outcomes, then the probability that event A will occur is
m
P ( A) =
n
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 07:45:12.
286 APPENDIX A
P ( A) + P ( A ) = 1
which means that the probability that event A will occur plus the
probability that event A will not occur must be equal to 1.
2. Relative Frequency Approach
Probabilities are also calculated using relative frequency. In many
problems, we define probability by relative frequency.
3. Subjective Probability
Subjective probability is used when the events occur only once or
very few times and when little or no relevant data are available. In as-
signing subjective probability, we may use any information available,
such as our experience, intuition, or expert opinion. In this case the
experimental outcomes may not be clear and relative frequency of
occurrence may not be available. Subjective probability is a measure
of our belief that a particular event will occur. This belief is based on
any information that is available to determine the probability.
If we have two events A and B that are mutually exclusive, then the
probability that A or B will occur is given by
Copyright © 2016. Business Expert Press. All rights reserved.
P ( A ∪ B ) = P ( A ) + P (B )
Note that the “union” sign is used for “or” probability, that is, P ( A ∪ B ) .
This is the same as P (A or B). This rule can be extended to three or more
mutually exclusive events. If three events A, B, and C are mutually exclu-
sive, then the probability that A or B or C will happen can be given by
P ( A ∪ B ∪ C ) = P ( A ) + P (B ) + P (C )
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 07:45:12.
APPENDIX A 287
The occurrence of two events that are non-mutually exclusive means that
they can occur together. If the events A and B are non-mutually exclusive,
the probability that A or B will occur is given by
P ( A ∪ B ) = P ( A ) + P (B ) − P ( A and B )
or P ( A ∪ B ) = P ( A ) + P (B ) − P ( A ∩ B )
or
P ( A ∪ B ∪ C ) = P ( A ) + P (B ) + P (C ) − P ( A ∩ B ) − P ( A ∩ C ) − P (B ∩ C ) + P
( A ) + P (B ) + P (C ) − P ( A ∩ B ) − P ( A ∩
C ) − P (B ∩ C ) + P ( A ∩ B ∩ C )
Equally Likely Events are those that have an equal chance of occurrence
or those where there is no reason to expect one in preference to the other.
In many experiments it is natural to assume that each outcome in the
sample space is equally likely. Suppose that the sample space S consists
Copyright © 2016. Business Expert Press. All rights reserved.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 07:45:12.
288 APPENDIX A
When two or more events occur, the occurrence of one event has no ef-
fect on the probability of occurrence of any other event. In this case, the
events are considered independent. There are three types of probabil-
ities under statistical independence:
Statistical Independence
P ( AB ) = P ( A )P (B ) or
P ( A ∩ B ) = P ( A )P ( B )
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 07:45:12.
APPENDIX A 289
P( A B)
P ( A B ) = P ( A)
This means that if the events are independent, the probabilities are
not affected by the occurrence of each other. The probability of oc-
currence of B has no effect on the occurrence of A. That is, the condi-
tion has no meaning if the events are independent.
Copyright © 2016. Business Expert Press. All rights reserved.
Statistical dependence
When two or more events occur, the occurrence of one event has an effect
on the probability of the occurrence of any other event. In this case, the
events are considered to be dependent.
There are three types of probabilities under statistical dependence.
Statistical Dependence
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 07:45:12.
290 APPENDIX A
P ( A ∩ B ) P ( A and B )
P( A B) = =
P (B ) P (B )
P ( A ∩ B ) = P ( A B )P ( B )
or P ( A and B ) = P ( A B )P (B )
Similarly,
P (B ∩ A ) = P (B A )P ( A ) or
P (B and A ) = P (B A )P ( A )
P ( R ) = P ( D R )P ( R ) + P (S R )P ( R )
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 07:45:12.
APPENDIX A 291
Bayes’ Theorem
P ( Ai )P ( D Ai )
P ( Ai D ) =
P ( A1 )P ( D A1 ) + P ( A2 )P ( D A2 ) + ... + P ( An )P ( D An )
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 07:45:12.
292 APPENDIX A
A random variable that can assume only integer value or whole number
is known as discrete. An example would be the number of customers ar-
riving at a bank. Another example of a discrete random variable would be
rolling two dice and observing the sum of the numbers on the top faces.
In this case, the results are 2 through 12. Also, note that each outcome
is a whole number or a discrete quantity. The random variable can be
described by a discrete probability distribution.
Table A.1 shows the discrete probability distribution (in a table form)
of rolling two dice and observing the sum of the numbers. In rolling two
dice and observing the sum of the numbers on the top faces, the outcome
is denoted by x which is the random variable that denotes the sum of
the numbers.
Table A.1
X 2 3 4 5 6 7 8 9 10 11 12
Copyright © 2016. Business Expert Press. All rights reserved.
P(x) 1/36 2/36 3/36 4/36 5/36 6/36 5/36 4/36 3/36 2/36 1/36
The outcome X (which is the sum of the numbers on the top faces)
takes on different values each time the pair of dice is rolled. On each trial,
the sum of the numbers is going to be a number between 2 and 12 but
we cannot predict the sum with certainty in advance. In other words,
the outcomes or the occurrence of these numbers is a chance factor. The
probability distribution is the outcomes Xi, and the probabilities for these
outcomes P(Xi). The probability of each outcome of this experiment
can be found by listing the sample space of all 36 outcomes. These can
be shown both in a tabular and in a graphical form. Figure A.1 shows the
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 07:45:12.
APPENDIX A 293
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 07:45:12.
294 APPENDIX A
µ x = E ( X ) = ∑ xi P ( xi )
σ2 = ∑ ( xi − µ )2 P ( xi ) (A)
σ2 = ∑ x 2 P ( x ) − µ 2 (B)
Example A.1
Table A.2 shows the number of cars sold over the past 500 days for a par-
ticular car dealership in a certain city.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 07:45:12.
APPENDIX A 295
[b] Calculate the expected value or the mean number of cars sold
The expected value is given by:
µx = E (x ) = ∑ xi P ( xi )
or
or E ( x ) = 3.056
σ2 = ∑ ( x − µ )2 P ( x )
σ 2 = (0 − 3.056)2 (0.08) + (1 − 3.056)2 (0.200) + (2 − 3.056)2 (0.284)
+ (3 − 3.056)2 (0.132) + (4 − 3.056)2 (0.072) + (5 − 3.056)2 (0.060)
+ (6 − 3.056)2 (0.052) + (7 − 3.056)2 (0.040) + (8 − 3.056)2 (0.032)
+ (9 − 3.056)2 (0.028) + (10 − 3.056)2 (0.016) + (11 − 3.056)2 (0.004)
= 6.071296
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 07:45:12.
296 APPENDIX A
The variance can be more easily calculated using equation (2.5) with (B).
The standard deviation for this discrete distribution is
σ = σ2 = 6.071296 = 2.46
P ( x < 4) = P ( x = 0) + P ( x = 1) + P ( x = 2) + P ( x = 3)
= 0.08 + 0.200 + 0.284 + 0.132
= 0.696
These probability values are obtained from Table A.2 column (3).
P ( x ≤ 4 ) = P ( x = 0 ) + P ( x = 1) + P ( x = 2 ) + P ( x = 3) + P ( x = 4 )
= 0.08 + 0.200 + 0.284 + 0.132 + 0.072
= 0.768
[f] What is the probability of selling at least four cars?
The above probability can also be calculated as
P ( x ≥ 4) = 1 − P ( X < 4)
= 1 − [ P ( X = 0) + P ( X = 1) + P ( X = 2) + P ( X = 3)]
= 1 − [0.08 + 0.200 + 0.284 + 0.132]
= 0.304
Copyright © 2016. Business Expert Press. All rights reserved.
The random variable that might assume any value over a continuous
range of possibilities is known as continuous random variables. Some
examples of continuous variables are physical measurements of length,
volume, temperature, or time. These variables can be described using con-
tinuous distributions.
The continuous probability distribution is usually described using
a probability density function. The probability density function, f(x), de-
scribes the behavior of a random variable. It may be viewed as the shape
of the data. Figure A.2 shows the histogram of the diameter of a machined
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 07:45:12.
APPENDIX A 297
parts with a fitted curve. It is clear that the diameter can be approximated
by certain patterns that can be described by a probability distribution.
The shape of the curve in Figure A.2 can be described by a mathemat-
ical function, f ( x ) , or a probability density function. The area below the
probability density function to the left of a given value, x, is equal to the
probability of the random variable (the diameter in this case) shown on
the x-axis. The probability density function represents the entire sample
space; therefore, the area under the probability density function must
equal one.
Copyright © 2016. Business Expert Press. All rights reserved.
The probability density function, f(x), must be positive for all values of x
(as negative probabilities are impossible). Stating these two requirements
mathematically,
∫ f (x ) = 1
−∞
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 07:45:12.
298 APPENDIX A
n
∑ f ( x ) = 1.0 and f ( x ) > 0 .
i =1
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 07:45:12.
APPENDIX A 299
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 07:45:12.
300 APPENDIX A
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 07:45:12.
APPENDIX A 301
1
f ( x) = e − ( x − µ ) / 2σ
2 2
σ 2π
where f (x) is the probability density function, µ the mean, σ the standard
deviation, and e = 2.71828, which denotes the base of the natural loga-
rithm. The distribution has the following properties:
1. The normal curve is a bell-shaped curve. It is symmetrical about
the line x = µ. The mean, median, and mode of the distribution have the
same value.
2. The parameters of normal distribution are the mean µ and stan-
dard deviation σ. The interpretation of how the mean and standard devi-
ation are related in a normal curve is shown in Figure A.6.
Copyright © 2016. Business Expert Press. All rights reserved.
Figure A.6 states the area property of the normal curve. For a normal
curve, approximately 68 percent of the observations lie between the mean
and ±1σ (one standard deviation), approximately 95 percent of all obser-
vations lie between the mean and ±2σ (two standard deviations), and ap-
proximately 99.73 percent of all observations fall between the mean and
±3σ (three standard deviations). This is also known as the empirical rule.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 07:45:12.
302 APPENDIX A
The shape of the curve depends on the mean (µ) and standard devia-
tion (σ). The mean µ determines the location of the distribution, whereas
the standard deviation σ determines the spread of the distribution. Note
that larger the standard deviation (σ), more spread out is the curve (see
Figure A.7).
x2
1
∫σ e − ( x − µ ) / 2σ dx
2 2
(A)
x1
2π
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 07:45:12.
APPENDIX A 303
1
e − Z / 2σ
2 2
f (x ) = –∞ < z < ∞
σ 2π
z
P(Z ≤ z ) = ∫ f ( y ) dy
−∞
x−µ
z = (B)
σ
Equation (B) above is a simple equation that can be used to evaluate the
probabilities involving normal distribution.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 07:45:12.
304 APPENDIX A
Example A.2
x−µ
z =
σ
5.15 − 5.07
z = = 1.14 → 0.3729
0.07
Note: 0.3729 is the area corresponding to z = 1.14. This can be read from
the table of Normal Distribution provided in the Appendix. There are
many variations of this table. The normal table used here provides the
probabilities on the right side of the mean.
Copyright © 2016. Business Expert Press. All rights reserved.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 07:45:12.
APPENDIX A 305
or, there is 12.71 percent chance that piston ring diameter will exceed
5.15 cm.
Example A.3
The percentage of acceptable pipes is the shaded area shown in Figure A.9.
The required area or the percentage of acceptable pipes is explained below.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 07:45:12.
306 APPENDIX A
The area 0.4772 is the area between the mean 5.01 and 4.95 (see Fig-
ure A.9). The area left of 4.95 is 0.5 – 0.4772 = 0.0228.
The area 0.4082 is the area between the mean 5.01 and 5.05. The area
right of 5.05 is 0.5 – 0.4082 = 0.0918.
Therefore, the percentage of pipes not acceptable = 0.0228 + 0.0918
= 0.1146 or 11.46 percent. These probabilities can also be calculated
using a statistical software.
Probability Plots
Probability plots are used to determine if a particular distribution fits
sample data. The plot allows us to determine whether a distribution is
appropriate and also to estimate the parameters of fitted distribution. The
probability plots are a good way of determining whether the given data
follow a normal or any other assumed distribution. In regression analysis,
this plot is of great value because of its usefulness in verifying one of the
major assumption of regressions—the normality assumption.
MINITAB and other statistical software provide options for creating
individual probability plots for the selected distribution for one or more
variables. The steps to probability plotting procedure are:
(i − 0.5)100
PP =
n
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 07:45:12.
APPENDIX A 307
MINITAB provides the plot based on the above steps. To test the hypoth-
esis, an Anderson-Darling (AD) goodness-of-fit statistic and associated
p-value can be used. These values are calculated and displayed on the plot.
If the assumed distribution fits the data:
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 07:45:12.
308 APPENDIX A
From the probability plot of the length data (Figure A.11), we can see
that the cumulative percentage points approximately form a straight line
and the points are close to the straight line. The calculated p-value is 0.543.
At a 5 percent level of significance (α = 0.05), p-value is greater than α so
we cannot reject the null hypothesis that the data follow a normal distribu-
tion. We conclude that the data follow a normal distribution. The prob-
ability plot of failure time data shows that the cumulative percentage points
do not form a straight line. The plotted points show a curvilinear pattern.
The calculated p-value is less than 0.005. At a 5 percent level of significance
(α = 0.05), p-value is less than α so we reject the null hypothesis that the
Copyright © 2016. Business Expert Press. All rights reserved.
data follow a normal distribution. The deviation of the plotted points from
a straight line is an indication that the failure time data do not follow a nor-
mal distribution. This is also evident from the histogram of the failure data.
Statistics and data analysis cases involve making inferences about the pop-
ulation based on the sample data. Several of these inference procedures are
discussed in the chapters that follow. Many of these inference procedures
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 07:45:12.
Figure A.11 Histograms and Probability Plots of Length and Failure Time Data
Copyright © 2016. Business Expert Press. All rights reserved.
309
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 07:45:12.
310 APPENDIX A
are based on the assumption of normality; that is, the population from
which the sample is taken follows a normal distribution. Before we draw
conclusions based on the assumption of normality, it is important to de-
termine whether the sample data come from a population that is nor-
mally distributed. Below we present several descriptive methods that can
be used to check whether the data follow a normal distribution. Methods
most commonly used to assess the normality are described in Table A.3.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 07:45:12.
APPENDIX A 311
Check #1
The histogram of the data in Figure A.12 indicates that the shape very
closely resembles a bell shape or normal distribution. The bell curve su-
perimposed over the histogram shows that the data have a symmetric or
Copyright © 2016. Business Expert Press. All rights reserved.
Check #2
The values of mean and the median in Figure A.12 are 14.124 and
14.200, respectively. If the data are symmetrical or normal, the values of
the mean and median are very close. Since the mean and median for the
waiting time data are very close, it indicates that the distribution is sym-
metrical or normal.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 07:45:12.
Figure A.12: Graphical and Numerical Summary of Waiting Time
Copyright © 2016. Business Expert Press. All rights reserved.
312
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 07:45:12.
APPENDIX A 313
Check #3
x ± 2s 95.3
x ± 3s 99.3
The percentages between the mean and standard deviation of the ex-
ample problem (Table A.4 data) agree with the empirical rule or the nor-
mal distribution.
Check #4
Copyright © 2016. Business Expert Press. All rights reserved.
The box plot of the data in Figure A.12 shows that the waiting time data
very closely follow a normal distribution.
Check #5
The ratio of the IQR to the standard deviation is calculated below. The
values are obtained from Figure A.12.
The value is close to 1.3, indicating that the data are approximately normal.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 07:45:12.
314 APPENDIX A
Check #6
All of the above checks confirm that the waiting time data very closely
follow a normal distribution.
Copyright © 2016. Business Expert Press. All rights reserved.
Student t-Distribution
This is one of the useful sampling distributions related to the normal dis-
tribution. This distribution is used to check the adequacy of the regression
models. Suppose x is a normally distributed random variable with mean
0 and variance 1. Suppose we have another random variable χn 2 with
n degrees of freedom, then the random variable tn is given by:
x
tn =
χ n2 /n
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 07:45:12.
APPENDIX A 315
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 07:45:12.
316 APPENDIX A
of freedom. We will plot the normal and t-distributions on the same plot
and compare the shapes of the t-distributions for different degrees of free-
dom to that of the normal distribution. The steps are outlined below.
From Figure A.14, the innermost curve is the probability density for
t-distribution with one degree of freedom and the outermost curve is the
density function of a normal distribution. You can see that as we increase the
number of degrees of freedom for the t-distribution, the shape approaches
Copyright © 2016. Business Expert Press. All rights reserved.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 07:45:12.
APPENDIX A 317
a normal distribution. Also, note that the t-distribution is less peaked at the
center and higher in the tails compared to the normal distribution.
F-distribution
χ u2 / u
Fu ,v =
χ v2 / v
s12 / σ 12
s 22 / σ 22
Copyright © 2016. Business Expert Press. All rights reserved.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 07:45:12.
318 APPENDIX A
Summary
In this section, we provided an overview of statistical methods used in ana-
lytics. A number of statistical techniques both graphical and numerical were
presented. These descriptive statistical tools are used in modeling, studying,
and solving various problems. The graphical and numerical tools of descrip-
tive statistics are also used to describe variation in the process data. The
concept of graphical tools of descriptive statistics includes the concept of
frequency distribution, histograms, stem-and-leaf plot, and box plot. These
are simple but effective tools and their knowledge is essential in studying
Copyright © 2016. Business Expert Press. All rights reserved.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 07:45:12.
APPENDIX B
Sampling, Sampling
Distribution, and Inference
Procedure
most widely used statistics after the mean. The examples are proportion
of defective products, poll results, etc.
If the above measures are calculated from the population data they are
called the population parameters. These parameters are population size (N ),
population mean (µ), population variance (σ 2), population standard
deviation (σ), and population proportion (p). In most cases, the sample
statistics are used to estimate the population parameters. The reason for
this estimation is that the parameters of the population are unknown and
they must be estimated. In estimating these parameters, we take samples
and use the sample statistics to estimate the unknown population par-
ameters. For example, suppose we want to know the average height of
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 07:45:12.
320 APPENDIX B
ucts we buy, for example, a set of tires for our car, have a label indicating
the average life of 60,000 miles. A box of bulbs usually has a label indi-
cating the average life of 10,000 hours. These are usually the estimated
values. We don’t know the true mean, µ.
As indicated, there are a number of samples possible from a popula-
tion of interest. When we take such samples of size n and calculate the
sample mean x , each possible random sample has an associated value
of x , which is the sample mean. Thus, the sample mean x is a random
variable that assigns a number to x . This number is the calculated value
of the sample mean x . Recall that a random variable is a variable that
takes on different values as a result of an experiment. Since the samples
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 07:45:12.
APPENDIX B 321
are chosen randomly, each sample has equal probability of being selected
and the sample mean calculated from these samples has equal probability
of going up and down the true population mean.
Because the sample mean x is a random variable, it can be described
using a probability distribution. The probability distribution of a sample
statistic is called its sampling distribution and the probability distribu-
tion of the sample mean x is known as the sampling distribution of the
sample mean. The sampling distribution of the sample has certain prop-
erties that are used in making inference about the population. The central
limit theorem plays an important role in the study of sampling distribu-
tion. We will also study the central limit theorem and see how the amazing
results produced by it are applied in analyzing and solving many problems.
The concepts of sampling distribution form the basis for the inference
procedures. It is important to note that a population parameter is always
a constant, whereas a sample statistic is a random variable. Similar to the
other random variables, each sample statistic can be described using a
probability distribution.
Besides sampling and sampling distribution, other key topics in this
section include point and confidence interval estimates of means and
proportions. We also discuss the concepts of hypothesis testing. These
concepts are important in the study of analytics.
Sampling Distribution
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 07:45:12.
322 APPENDIX B
As indicated earlier, in most cases the true value of the population par-
ameters is not known. We must draw a sample or samples and calculate
the sample statistic to estimate the population parameter. The sampling
error of the sample mean is given by
Sampling error = x − µ
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 07:45:12.
APPENDIX B 323
Solution to (2): The last column shows the mean of each sample drawn.
Note that each row represents a sample of size 5.
Solution to (3): Figure 3.2 shows the histogram of the sample means
shown in the last column of Table B.1. The histogram shows that the
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 07:45:12.
324 APPENDIX B
Solution to (4): The mean and standard deviation of the sample means
shown in the last column of Table B.1 were calculated using a computer
Copyright © 2016. Business Expert Press. All rights reserved.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 07:45:12.
APPENDIX B 325
conclude that x —or the sample mean—values have much less variation
than the individual observations.
Solution to (5): Based on parts (3) and (4), we conclude that the sample
mean x follows a normal distribution, and this distribution is much
narrower than the population of individual observations. This is apparent
from the standard deviation of x value, which is 1.1035 (see Table B.2).
In general, the mean and standard deviation of the random variable x
are given as follows.
Mean of the sample mean, x is
µx = µ or E ( x ) = µ (i)
σ
σx = (ii)
n
µ x = µ = 25
σ 5
and σx = = = 2.236
n 5
Copyright © 2016. Business Expert Press. All rights reserved.
From Table B.2, the mean and the standard deviation of 50 sample
means were 25.0942 and 1.1035, respectively. These values will get closer
to 25 and 3.0 if we take more and more samples of size 5.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 07:45:12.
326 APPENDIX B
we can expect of the mean of one or more samples. The standard devi-
ation of the sample mean σ x is often called the standard error of the
mean. Using equation (ii), it can be shown that a sample of 16 observa-
tions (n = 16) is twice as precise as a sample of 4 (n = 4). It may be argued
that the gain in precision in this case is small, relative to the effort in
taking additional 12 observations. However, doubling the sample size in
other cases may be desirable.
Figure B.3 shows a comparison between the probability distribution
of individual observations and the probability distributions of means of
samples of various sizes drawn from the underlying population.
Note that as the sample size increases, the standard error becomes
smaller and hence the distribution becomes more peaked. It is obvious
from Figure B.3 that a sample of one does not tell us anything about the
precision of the estimated mean. As more samples are taken, the standard
error decreases, thus providing greater precision.
Copyright © 2016. Business Expert Press. All rights reserved.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 07:45:12.
APPENDIX B 327
This means that if samples of large size (n ≥ 30) are selected from a
population, then the sampling distribution of the sample means is ap-
proximately normal. This approximation improves with larger samples.
The Central Limit Theorem has major applications in sampling and
other areas of statistics. It tells us that if we take a large sample (n ≥ 30) ,
we can use the normal distribution to calculate the probability and draw
conclusion about the population parameter.
The above are useful results in drawing conclusions from the data. For
a sample size of n = 30 or more (large sample), we can always use the
normal distribution to draw conclusions from the sample data.
x −µ
Copyright © 2016. Business Expert Press. All rights reserved.
1
Ostle, Bernard and Mensing, Richard W., Statistics in Research, Third Edition, The
Iowa State University Press, Ames, Iowa, 1979, p. 76.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 07:45:12.
328 APPENDIX B
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 07:45:12.
APPENDIX C
Review of Estimation,
Confidence Intervals, and
Hypothesis Testing
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 07:45:12.
330 APPENDIX C
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 07:45:12.
APPENDIX C 331
There are two types of estimates: (a) point estimates, which are single-
value estimates of the population parameter, and (b) interval estimates
or the confidence intervals, which are a range of numbers that contain
the parameter with specified degree of confidence known as the confi-
dence level. Confidence level is a probability attached to a confidence
interval that provides the reliability of the estimate. In the discussion of
estimation, we will also consider the standard error of the estimates, the
margin of error, and the sample size requirement.
Point Estimate
A) The point estimate of the population mean (μ) is the sample mean ( x ),
x =
∑x
n
( ∑ x i )2
Copyright © 2016. Business Expert Press. All rights reserved.
∑ ( xi − x ) or
2 ∑ xi2 −
s = s = n
n −1 n −1
Interval Estimate
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 07:45:12.
332 APPENDIX C
16.8 ≤ µ ≤ 18.6
or
(16.8 to 18.6)
or
(16.8–18.6)
L≤µ≤U
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 07:45:12.
APPENDIX C 333
P {L ≤ β ≤ U} = 1−α (v)
The confidence interval means that if many random samples are col-
lected and a 100 (1−α) percent confidence interval computed from each
sample for β, then 100 (1−α) percent of these intervals will contain the
true value β.
In practice, we usually take one sample and calculate the confidence
interval. This interval may or may not contain the true value, and it is
not reasonable to attach a probability level to this specific event. The ap-
propriate statement would be that β lies in the observed interval [L,U]
Copyright © 2016. Business Expert Press. All rights reserved.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 07:45:12.
334 APPENDIX C
On the other hand, the wider the interval, the less information we have about
the true value of β. In an ideal situation, we would like to obtain a relatively
short interval with high confidence.
x −µ
z = (vi)
σ
Copyright © 2016. Business Expert Press. All rights reserved.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 07:45:12.
APPENDIX C 335
P {− zα / 2 ≤ z ≤ zα / 2 } = 1 − α
or
x −µ
P − zα / 2 ≤ ≤ zα / 2 = 1 − α
σ/ n
zα σ
P x − 2 ≤ µ ≤ x + zα σ / n = 1 − α
n 2
zα σ
x − 2
≤ µ ≤ x + z α σ / n (vii)
n 2
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 07:45:12.
336 APPENDIX C
σ σ
x − zα / 2 ≤ µ ≤ x + zα / 2 (viii)
n n
σ
E = zα /2 (ix)
n
s s
x − t n −1,α / 2 ≤ µ ≤ x + t n −1,α / 2 (x)
n n
Copyright © 2016. Business Expert Press. All rights reserved.
If the population variance is unknown and the sample size is large, the
confidence interval for the mean can also be calculated using a normal
distribution using the following formula:
s
x ± zα /2 (xi)
n
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 07:45:12.
APPENDIX C 337
Confidence interval for the mean when the sample size is small
and the population standard deviation s is unknown
When σ is unknown and the sample size is small, use t-distribution for
the confidence interval. The t-distribution is characterized by a single par-
ameter, the number of degrees of freedom (df ), and its density function
provides a bell-shaped curve similar to a normal distribution.
The confidence interval using t-distribution is given by
s s
x − t n −1,α / 2 ≤ µ ≤ x + t n −1,α / 2
n n (xii)
where t n−1,α/2 = t-value from the t-table for (n−1) degrees of freedom and
α/2 where α is the confidence level.
In this section, we will discuss the confidence interval estimate for the
proportions. A proportion is a ratio or fraction, or percentage that in-
dicates the part of the population or sample having a particular trait of
interest. Following are the examples of proportions: (1) a software com-
pany claiming that its manufacturing simulation software has 12 percent
of the market share, (2) a public policy department of a large university
wants to study the difference in proportion between male and female un-
employment rate, and (3) a manufacturing company wants to determine
the proportion of defective items produced by its assembly line. In all
Copyright © 2016. Business Expert Press. All rights reserved.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 07:45:12.
338 APPENDIX C
We consider the sample size (n) to be large. If the sample size is large and
np ≥ 5, and
n(1 − p ) ≥ 5
A) The large sample so that the sampling distribution of the sample pro-
portion ( p ) follows a normal distribution.
B) The value of sample proportion.
C) The level of confidence, denoted by z.
p (1 − p ) p (1 − p )
p − zα / 2 ≤ p ≤ p + zα / 2 (xiii)
n n
Copyright © 2016. Business Expert Press. All rights reserved.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 07:45:12.
APPENDIX C 339
A) The margin of error E (also known as tolerable error level or the ac-
curacy requirement). For example, suppose we want to estimate the
population mean salary within $500 or within $200. In the first case,
the error E = 500; in the second case, E = 200. A smaller value of the
error E means more precision is required, which in turn will require
a larger sample. In general, smaller the error, larger the sample size.
B) The desired reliability or the confidence level.
C) A good guess for σ.
Both the margin of error E and reliability are arbitrary choices that have
an impact on the cost of sampling and the risks involved. The following
formula is used to determine the sample size:
( zα / 2 )2 σ 2
n= (xiv)
E2
( zα / 2 )2 p(1 − p )
n=
E2
Copyright © 2016. Business Expert Press. All rights reserved.
(xv)
Example C.1
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 07:45:12.
340 APPENDIX C
Solution:
First, calculate the mean and standard deviation of 25 values in the data.
You should use your calculator or a computer to do this. The values are
x = 22.40
S = 2.723
s
x ±t α n
n −1,
2
2.723
22.40 ± ( 2.064 )
25
21.28 ≤ µ ≤ 23.52
The value 2.064 is the t-value from the t-table for n−1 = 24 degrees of
Copyright © 2016. Business Expert Press. All rights reserved.
σ
x ± Zα / 2
n
2.723
22.40 ± 1.96
25
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 07:45:12.
APPENDIX C 341
This interval is
21.33 ≤ µ ≤ 23.47
Example C.2
Since the sample size is large (n ≥ 30), and the population standard
deviation σ is known, the appropriate confidence interval formula is
σ
x ± zα /2
n
The confidence intervals using the above formula are shown below.
Copyright © 2016. Business Expert Press. All rights reserved.
3, 600
38, 000 ± 1.28
36
37, 232 ≤ µ ≤ 38, 768
3, 600
38, 000 ± 1.645
36
37, 013 ≤ µ ≤ 38, 987
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 07:45:12.
342 APPENDIX C
3, 600
38, 000 ± 1.96
36
36, 824 ≤ µ ≤ 39,176
3, 600
38, 000 ± 2.58
36
36, 452 ≤ µ ≤ 39, 548
Note that the z-values in the above confidence interval calculations are
obtained from the normal table. Refer to the normal table for the values
of z. Figure C.3 shows the confidence intervals graphically.
Figure C.3 shows that larger the confidence level, the wider is the
length of the interval. This indicates that for a larger confidence interval,
we gain confidence. There is higher chance that the true value of the par-
ameter being estimated is contained in the interval but at the same time,
we lose accuracy.
Example C.3
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 07:45:12.
APPENDIX C 343
candidate with a margin of error of ±3 percent. What does this mean? From
this information, determine the sample size that was used in this study.
Solution: The polls conducted by the news media use a 95 percent con-
fidence interval unless specified otherwise. Using a 95 percent confidence
interval, the confidence interval for the proportion is given by
p (1 − p )
p ± 1.96
n
0.48 (1 − 0.48)
0.48 ± 1.96
n
0.48 (1 − 0.48)
1.96 = 0.03
n
n = 1066
Copyright © 2016. Business Expert Press. All rights reserved.
Example C.4
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 07:45:12.
344 APPENDIX C
sample mean differs from the population mean by no more than 15 psi
is 0.95. From the past experience, it is known that the standard deviation
for bursting pressures of this seal is 150 psi.
2
z σ
n = α /2
E
2
(1.96)150
n= ≈ 385
15
explained below.
Case 1
Assumptions:
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 07:45:12.
APPENDIX C 345
If the above assumptions hold, then the confidence interval for the
difference between two population means is given by
σ 12 σ 22 (xvi)
( x1 − x 2 ) ± zα / 2 +
n1 n2
or
σ 12 σ 22 σ 12 σ 22
( x1 − x 2 ) − zα / 2 + ≤ µ1 − µ2 ≤ ( x1 − x 2 ) + zα / 2 +
n1 n2 n1 n2
(xvii)
Case 2
Assumptions:
If the above assumptions hold, then the confidence interval for the
difference between two population means is given by
Copyright © 2016. Business Expert Press. All rights reserved.
s12 s 22
( x1 − x 2 ) ± zα / 2 + (xviii)
n1 n2
or
s12 s 22 s2 s2
( x1 − x 2 ) − zα / 2 + ≤ µ1 − µ2 ≤ ( x1 − x 2 ) + zα / 2 1 + 2
n1 n2 n1 n2
(xix)
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 07:45:12.
346 APPENDIX C
Case 3
Assumptions:
If the above assumptions hold, then the confidence interval for the
difference between two population means is given by
1 1
( x 1 − x 2 ) ± t n1 + n2 − 2,α / 2 s 2p + (xx)
n1 n2
or
1 1 1 1
( x 1 − x 2 ) − t n1 + n2 − 2,α / 2 s 2p + ≤ µ1 − µ2 ≤ ( x 1 − x 2 ) + t n1 + n2 − 2,α / 2 s 2p +
n1 n2 n1 n2
1 1 2 1 1
n + n ≤ µ1 − µ2 ≤ ( x 1 − x 2 ) + t n1 + n2 − 2,α / 2 s p n + n
2
p (xxi)
1 2 1 2
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 07:45:12.
APPENDIX C 347
Example C.4
S p2 =
(n1 − 1) s12 + (n2 − 1) s22
n1 + n2 − 2
=
(14)( 2.24)2 + (19) (1.99)2 = 4.41
33
1 1
( x1 − x 2 ) ± t n + n − 2,α / 2 Sp 2 +
1 2
n1 n2
↓
Copyright © 2016. Business Expert Press. All rights reserved.
t 33,0.025 = 2.035
(14.54 − 15.36) ± ( 2.035)(0.72)
−0.82 ± 1.47
−2.29 to 0.65
or –2.29 ≤β μ1 – μ2 ≤ 0.65
Note that this interval contains zero (–2.29 to 0.65). This means that the
difference is zero at some point in the interval, indicating there is no dif-
ference in the average wage of union and nonunion workers.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 07:45:12.
348 APPENDIX C
x1 x
p1 = and p2 = 2
n1 n2
The point estimate for the difference between the population propor-
tions is given by
1 1
( p1 − p2 ) ± zα / 2 p (1 − p ) +
n1 n2
(xxiii)
Copyright © 2016. Business Expert Press. All rights reserved.
x1 + x 2 or p = n1 p1 + n2 p2
p = n1 + n 2
n1 + n 2
Example C.5
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 07:45:12.
APPENDIX C 349
x1 80
Proportion of defective using the improved method: p1 = = = 0.20
n1 400
x2 108
Proportion of defective using the old method: p2 = = = 0.24
n2 450
x1 + x 2 80 + 108
Combined or the “pooled” proportion: p = = = 0.221
n1 + n2 400 + 450
1 1
( p1 − p2 ) ± zα / 2 p (1 − p ) +
n1 n2
1 1
(0.20 − 0.24) ± (1.96) (0.221)(1 − 0.221) +
400 450
−0.04 ± 0.06
Copyright © 2016. Business Expert Press. All rights reserved.
−0.1 ≤ p1 − p2 ≤ 0.02
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 07:45:12.
Copyright © 2016. Business Expert Press. All rights reserved.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 07:45:12.
APPENDIX D
Hypothesis Testing
some parameter is true or false. This statement about the population par-
ameter is called a hypothesis.
Hypothesis testing is the decision-making procedure about a state-
ment being true or false. The statement is about a population parameter
of interest, such as a population mean, population variance, or a popula-
tion proportion. It involves making a decision about a population param-
eter based on the information contained in the sample data.
Hypothesis testing is one of the most useful aspects of statistical in-
ference because many types of decision problems can be formulated as
hypothesis testing problems.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 07:45:12.
352 APPENDIX D
The control charts used in statistical process control are closely related
to the hypothesis testing. The tests are also used in several quality control
problems and form the basis of many of the statistical process techniques
to be discussed in the coming chapters.
H0: μ = 60mpg
H1: μ ≠ 60mpg
• The consumer group would gather the sample data and calculate
the sample mean, x .
• Compare the difference between the hypothesized value (µ) and
Copyright © 2016. Business Expert Press. All rights reserved.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 07:45:12.
APPENDIX D 353
Note that in hypothesis testing, the decision to reject or not to reject the
hypothesis is based on a single sample and therefore, there is always a chance
of not rejecting a hypothesis that is false, or rejecting a hypothesis that is
true. In fact, we always encounter two types of errors in hypothesis test-
ing. These are:
We also use another term known as the power of the test defined as
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 07:45:12.
354 APPENDIX D
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 07:45:12.
APPENDIX D 355
Note that μ0 is the hypothesized value. There are three possible cases
for testing the population mean. The test statistic or the formulas used to
test the hypothesis are given below.
Case (1): Testing a single mean with known variance or known popula-
tion standard deviation σ and large sample: in this case, the sample mean
Copyright © 2016. Business Expert Press. All rights reserved.
x −µ
z = (ii)
σ/ n
Case (2): Testing a single mean with unknown variance or unknown pop-
ulation standard deviation σ and large sample: in this case, the sample
mean x follows a normal distribution and the test statistic is given by
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 07:45:12.
356 APPENDIX D
x −µ
z = (iii)
s/ n
Case (3): Testing a single mean with unknown variance or unknown pop-
ulation standard deviation σ and small (n < 30) sample. In this case, the
sample mean x follows a t-distribution and the test statistic is given by
x −µ
t n −1 = (iv)
s/ n
Note that s is the sample standard deviation and n is the sample size.
There are different ways of testing a hypothesis. These will be illus-
trated with examples.
H 0 : µ ≥ 60, 000
H1 : µ < 60, 000
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 07:45:12.
APPENDIX D 357
is written under the null hypothesis, and the statement µ < 60,000 is
written under the alternate hypothesis.
Note that the alternate hypothesis is opposite of the null hypothesis.
This is an example of a left-sided test. The left-sided test will reject the
null hypothesis (H0) below a specified hypothesized value of µ.
The alternate hypothesis is also known as the research hypothesis. If you
are trying to establish a certain hypothesis, then it should be written as
the alternate hypothesis.
The statement about the null hypothesis contains the claim or the
theory. Therefore, rejecting a null hypothesis is a strong statement. This is
the reason that the conclusion of a hypothesis test is stated as “reject the
null hypothesis” or “do not reject the null hypothesis.”
H 0 : µ ≤ 24
H1 : µ > 24
H 0 : µ = 1.5
H1 : µ ≠ 1.5
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 07:45:12.
358 APPENDIX D
Example D.3
H 0 : µ ≥ 600
H1 : µ < 600
B) State and explain the type I and type II errors in this situation.
Type I error: Reject H0: µ ≥ 600 and conclude that the average pro-
duction cost is less than $600 (µ < $600). Type II error would be to
conclude that the average operating cost is at least $600 when it is
not.
Example D.4
opens and weighs the content, tests the appropriate hypothesis, and makes
a decision whether to shut down the line for making adjustments. Write
the appropriate hypothesis to be tested in this situation and perform the
hypothesis test. A significance level of α = 0.05 is selected for the test.
The sample results indicate a sample mean of 16.32 oz and the standard
deviation is assumed to be 0.8 oz.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 07:45:12.
APPENDIX D 359
n = 30, α = 0.05
σ = 0.8, x = 16.32
H 0 : µ = 16
H1 : µ ≠ 16
3. Determine the appropriate level of significance (α) or use the given value
of significance, α
4. Select the appropriate distribution and test statistic to perform the test
The sample size is large and the population standard deviation is
known; therefore, use normal distribution with the following test
statistic:
x −µ
z =
Copyright © 2016. Business Expert Press. All rights reserved.
σ
n
5. Based on step 3, find the critical value or values and the area or areas of
rejection. Show the critical value(s) and the area or areas of rejection and
non rejection using a sketch
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 07:45:12.
360 APPENDIX D
7. Use the test data (sample data) and find the value of the test statistic
x −µ 16.32 − 16
Z = = = 2.19
Copyright © 2016. Business Expert Press. All rights reserved.
σ/ n 0.8 / 30
8. Find out if the value of the test statistic is in rejection or non rejection
region; make appropriate decision and state your conclusion in terms of
the problem
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 07:45:12.
APPENDIX D 361
The p-value is the probability (assuming that the null hypothesis is true)
of getting the value of the test statistic at least as extreme as or more ex-
treme than the value actually observed. The p-value is the smallest level of
significance at which the null hypothesis can be rejected. A small p-value
for example, p = 0.05 or less, is a strong indicator that the null hypothesis
is not true. The smaller the value of p, the greater the chance that the null
hypothesis is false.
If the computed p-value is smaller than the given level of significance
α, the null hypothesis H0 is rejected. If the p-value is greater than α then
H0 is not rejected. For example, a p-value of 0.002 indicates that there
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 07:45:12.
362 APPENDIX D
We will test the hypothesis using p-value for the following two-sided test:
H 0 : µ = 15
H1 : µ ≠ 15
If p ≥ α, do not reject H0
If p < α; reject H0
First, using the appropriate test statistic formula, calculate the test statistic
value.
Copyright © 2016. Business Expert Press. All rights reserved.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 07:45:12.
APPENDIX D 363
x −µ 14.2 − 15
Z = = = −1.13
s/ n 5 / 50
For a two-sided test, the p-value is the sum of the above two values, that is,
0.1292+0.1292 = 0.2584. Since p = 0.2584 > α = 0.02, do not reject H0.
esis testing procedures or steps are very similar to those for testing the
single mean but the data structure and the test statistic or the formulas
to test these hypotheses are different. In testing hypothesis involving two
populations, the samples will be drawn from both populations. The hy-
potheses tested are explained below.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 07:45:12.
364 APPENDIX D
Basic Assumptions:
The hypothesis for testing the two means can be a two-sided test or
a one-sided test. The hypothesis is written in one of the following ways:
H 0 : µ1 = µ2 or H 0 : µ1 − µ2 = 0
H1 : µ1 ≠ µ2 or H1 : µ1 − µ2 ≠ 0 (v)
H 0 : µ1 ≤ µ2 or H 0 : µ1 − µ2 ≤ 0
C) Test if one population mean is smaller than the other: a left-sided test
Copyright © 2016. Business Expert Press. All rights reserved.
H 0 : µ1 ≥ µ2 or H 0 : µ1 − µ2 ≥ 0
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 07:45:12.
APPENDIX D 365
the means. To test two means, the test statistics are selected based on the
following cases:
Case 1: Sample sizes n1 and n2 are large (≥ 30) and the population vari-
ances σ 12 and σ 22 are known
If the sample sizes n1 and n2 are large (≥ 30) and the population vari-
ances σ 12 and σ 22 are known, then the sampling distribution of the dif-
ference between the sample means follows a normal distribution and the
test statistic is given by
( x1 − x 2 ) − ( µ1 − µ2 )
z = (viii)
σ 12 σ 22
+
n1 n2
Case 2: Sample sizes n1 and n2 are large (≥ 30) and the population
variances σ 12 and σ 22 are unknown
If the sample sizes n1 and n2 are large (≥ 30) and the population vari-
ances σ 12 and σ 22 are unknown, then the sampling distribution of the
difference between the sample means follows a normal distribution and
the test statistic is given by
Copyright © 2016. Business Expert Press. All rights reserved.
( x1 − x 2 ) − ( µ1 − µ2 )
z = (ix)
s12 s 22
+
n1 n2
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 07:45:12.
366 APPENDIX D
Case 3: Sample sizes n1 and n2 are small (< 30) and the population
variances σ 1 and σ 2 are unknown
2 2
If the sample sizes n1 and n2 are small ( < 30) and the population vari-
ances σ 1 and σ 2 are unknown, then the sampling distribution of the
2 2
difference between the sample means follows a t-distribution and the test
statistic is given by
( x1 − x 2 ) − ( µ1 − µ2 )
t n1 + n2 − 2 = (x)
1 1
S p2 ( + )
n1 n2
Important Note:
In the equations (viii), (ix) and (x) the difference ( µ1 − µ2 ) is zero in most
cases. Also, these equations are valid under the following assumptions:
Copyright © 2016. Business Expert Press. All rights reserved.
The assumption that the two population variances are equal may not
be correct. In cases where the variances are not equal, the test statistic
formula for testing the difference between the two means is different.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 07:45:12.
APPENDIX D 367
Example D.6
Suppose that two independent random samples are taken from two pro-
cesses with equal variances and we would like to test the null hypothesis
that there is no difference between the means of two processes or the
means of the two processes are equal; that is,
H 0 : µ1 − µ2 = 0 or H 0 : µ1 = µ2
H1 : µ1 − µ2 ≠ 0 H1 : µ1 ≠ µ2
s1 = 8 . 4 s 2 = 7.6
α = 0.05
Note that when n1, n2 are large and σ1, σ2 are unknown, therefore use
normal distribution. The test statistic for this problem is
z =
( x1 − x 2 ) − ( µ1 − µ2 )
s12 s 22
+
n1 n2
Copyright © 2016. Business Expert Press. All rights reserved.
Solution: The test can be done using four methods that are explained
below.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 07:45:12.
368 APPENDIX D
z =
( x1 − x 2 ) − ( µ1 − µ2 ) =
(104 − 106) − 0 = −1.53
s12 s 22
+ (8.4)2 + ( 7.6)2
n1 n2 80 70
The test statistic value z –1.53 > Zcritical = –1.96; do not reject H0.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 07:45:12.
APPENDIX D 369
statistic value z. This value was –1.53 or z = –1.53. This test statistic value
is converted to a probability (see Figure D.5).
In the above figure, z = 1.53 is the test statistic value from method 1
above.
From the standard normal table, z = 1.53 corresponds to 0.4370. The
p-value is calculated as shown below.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 07:45:12.
370 APPENDIX D
from one population may not be independent of the sample values from
the other population. The two populations may be considered dependent
in such cases.
In cases where the populations are considered related, the observa-
tions are paired to prevent other factors from inflating the estimate of the
variance. This method is used to improve the precision of comparisons
between means. The method of testing the difference between the two
means when the populations are related is also known as matched sample
test or the paired t-test.
We are interested in testing a two-sided or a one-sided hypothesis for
the difference between the two population means. The hypotheses can be
written as
H 0 : µd = 0 H 0 : µd ≤ 0 H 0 : µd ≥ 0
H1 : µd ≠ 0 H1 : µd > 0 H1 : µd < 0
Two-tailed or two-sided test Right tailed or right-sided test Left-tailed or left-sided test
Test Statistic: If the pairs of data values X1n and X2n are related and are not
independent, the average of the differences ( d ) follows a t-distribution
and the test statistic is given by
d − µd
t n −1 = (xii)
sd / n
Copyright © 2016. Business Expert Press. All rights reserved.
where
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 07:45:12.
APPENDIX D 371
The confidence interval given below can also be used to test the hypothesis
sd
d ±t α (xiii)
n −1, n
2
Summary
This section discussed three important topics that are critical to analyt-
ics. In particular, we studied sampling and sampling distribution, estima-
tion and confidence intervals, and hypothesis testing. Samples are used
to make inferences about the population and this can be done through
sampling distribution. The probability distribution of a sample statistic is
called its sampling distribution. We explained the central limit theorem
and its role in sampling, sampling distribution, and sample size deter-
mination. Besides sampling and sampling distribution, other key topics
covered included point and confidence interval estimates of means and
proportions.
Two types of estimates used in inferential statistics were discussed.
These estimates include (a) point estimates, which are single-value es-
timates of the population parameter, and (b) interval estimates or the
confidence intervals, which are a range of numbers that contain the
parameter with specified degree of confidence known as the confidence
level. Confidence level is a probability attached to a confidence interval
that provides the reliability of the estimate. In the discussion of estima-
tion, we also discussed the standard error of the estimates, the margin of
Copyright © 2016. Business Expert Press. All rights reserved.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 07:45:12.
Copyright © 2016. Business Expert Press. All rights reserved.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 07:45:12.
Additional Readings
Albright, S. C, and W. Winston. 2015. Business Analytics: Data Analysis
and Decision Making. 5th ed. Boston, MA: Cengage Learning.
Albright, S. C., W. Winston, and C. Zappe. 2011. Data Analysis and Deci-
sion Making. 4th ed. Boston, MA: South Western Cengage Learning.
Anderson, D. R., D. J. Sweeny, T. A. William, J. D. Camm, and J. J.
Cochran. 2003. An Introduction to Management Science – Quantitative
Approaches to Decision Making. 10th ed. Boston, MA: South Western
Cengage Learning.
Benisis, A. (2010). Business Process Management: A Data Cube To Analyze
Business Process Simulation Data For Decision Making. Saarbrücken,
Germany: VDM Verlag Dr. Müller. p. 204. ISBN:978-3-639-22216-6.
Bowerman, B. L., R. T. O’Connell, and E. S. Murphree. 2017. Busi-
ness Statistics in Practice Using Data, Modeling, and Analytics. 8th ed.
New York, NY: McGraw-Hill Education.
Box, G. E. P., and G. M. Jenkins. 1976. Time Series Analysis: Forecasting
and Control. 2nd ed. San Francisco, CA: Wiley.
Camm, J. D., J. J. Cochran, M. J. Fry, J. W. Ohlmann, D. R. Anderson,
D. J. Sweeney, and T. A. Williams. 2015. Essentials of Business Analyt-
ics, 1st ed. Boston, MA: Cengage Learning.
Gould, F. J., C. P. Schmidt, J. H. Moore, and L. R. Weatherford. 1998.
Copyright © 2016. Business Expert Press. All rights reserved.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 07:45:25.
374 ADDITIONAL READINGS
Online References
The list of online research and related websites are as follows:
[1] Geisser, S. (1993). Predictive Inference: An Introduction. Chapman & Hall.
ISBN 978-0- 412-03471-8
[2] Bishop, C. M. (2006). Pattern recognition and machine learning. Berlin,
Germany: Springer. ISBN 97387-31073-2
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 07:45:25.
ADDITIONAL READINGS 375
[25] https://en.wikipedia.org/wiki/Deep_learning
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 07:45:25.
Copyright © 2016. Business Expert Press. All rights reserved.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 07:45:25.
About the Author
Dr. Amar Sahay is a professor of decision
sciences engaged in teaching, research,
consulting, and training. He holds a BS in
production engineering (BIT, India), MS
in industrial engineering, and a PhD in
mechanical engineering --both from the
University of Utah, USA. He has taught
and is teaching at several institutions in
Utah, including the University of Utah
(School of Engineering and Manage-
ment), SLCC, Westminster College, and
others. Amar is a certified Six Sigma Master Black Belt and is also lean
manufacturing/lean management certified. He has contributed a number
of research papers in national and international journals/proceedings to
his credit. Amar has authored around 10 books in the areas of data visual-
ization, business analytics, Six Sigma, statistics and data analysis, model-
ing, and applied regression. He is also associated with QMS Global LLC,
a company engaged in data visualization, analytics, quality, lean six sigma,
manufacturing, and systems analysis services. Amar is a senior member
of the Industrial & Systems Engineers, the American Society for Quality
Copyright © 2016. Business Expert Press. All rights reserved.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 07:45:35.
Copyright © 2016. Business Expert Press. All rights reserved.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 07:45:35.
Index
Advanced analytics, 39, 48 in modern business decision, 4–5
Analysis of variance (ANOVA), 86, 98 overall process, 25
Analytics. See also specific analytics overview of, 263
applications of, 43–46 statistical analysis, 265, 267
and business analytics, 2–4 tools of, 6–16, 31
defined, 2, 34–35 types of, xi–xii, 3, 5–6, 24
purpose of, 46 Business intelligence (BI), 5, 263,
types of, 46–48 265–266
ANN. See Artificial neural networks with analytics, 39
Anomaly detection, 253 applications of, 40–43, 49–50
ANOVA. See Analysis of variance versus business analytics, 29–32,
Artificial intelligence, 86 51–54
Artificial neural networks (ANN), 13, in companies, 48–49
86, 259–260 defined, 29, 38
Association learning, 251 origin of, 38–39
Associative forecasting techniques, overview of, 23–24
198–199, 236 success factors for
Autocorrelation, 166–167 implementation, 51
Average forecast error. See Mean error and support systems, 40
tools of, 31, 41
BA. See Business analytics Business process management (BPM),
Bayes’ theorem, 291 42–44
BI. See Business intelligence Business reporting, 38–39, 42
Bias, 207–208
Copyright © 2016. Business Expert Press. All rights reserved.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 07:45:44.
380 INDEX
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 07:45:44.
INDEX
381
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 07:45:44.
382 INDEX
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 07:45:44.
INDEX
383
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 07:45:44.
384 INDEX
Standard error of estimate, 122, 155 Time series analysis, 255, 257
Standard normal distribution, 302–306 Time series forecasting, 198
Stata, 127 Tracking signal, 207–208
Statistical analysis, 265 Trend, 200, 202–203
data analytics, 267–268 forecasting data with, 224–228
descriptive statistics, 265, 267 and seasonal patterns, 201
inferential statistics, 267
Statistical dependence, 289–290 Unsupervised learning, 12
Statistical independence, 288–289
Statistical inference, 321. See also Variables, exploring relationships, 72
Inferential statistics Variance, 294
Subjective probability, 286 Variance inflation factor (VIF),
Supervised learning, 12 detecting multicollinearity
using, 168–169
t-distribution, 314–315 VIF. See Variance inflation factor
versus normal distribution,
315–317 Web analytics, 47–48
t-test, 129–132, 161–162 Weighted moving average method,
Text analytics, 45–46 217–219
Text data mining. See Text mining
Text mining, 44–45 z-value approach, hypothesis testing,
Third order model, 172–173 367–368
Copyright © 2016. Business Expert Press. All rights reserved.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 07:45:44.
OTHER TITLES IN OUR BIG DATA, BUSINESS ANALYTICS,
AND SMART TECHNOLOGY COLLECTION
Mark Ferguson, University of South Carolina, Editor
• a one-time purchase,
• that is owned forever,
• allows for simultaneous readers,
• has no restrictions on printing, and
• can be downloaded as PDFs from within the library community.
Our digital library collections are a great solution to beat the rising cost of textbooks. E-books
Copyright © 2016. Business Expert Press. All rights reserved.
can be loaded into their course management systems or onto students’ e-book readers.
The Business Expert Press digital libraries are very affordable, with no obligation to buy in
future years. For more information, please visit www.businessexpertpress.com/librarians.
To set up a trial in the United States, please email sales@businessexpertpress.com.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 07:45:55.
B ig D ata , B usiness A nalytics , and S mart T echnology C ollection
Mark Ferguson, Editor
This business analytics (BA) text discusses the models based on fact-based data to measure past business
performance to guide an organization in visualizing and predicting future business performance and outcomes.
It provides a comprehensive overview of analytics in general with an emphasis on predictive analytics. Given
the booming interest in analytics and data science, this book is timely and informative. It brings many terms,
tools, and methods of analytics together.
The first three chapters provide an introduction to BA, importance of analytics, types of BA–descriptive,
predictive, and prescriptive–along with the tools and models. Business intelligence (BI) and a case on
descriptive analytics are discussed. Additionally, the book discusses the most widely used predictive models,
including regression analysis, forecasting, data mining, and an introduction to recent applications of predictive
analytics–machine learning, neural networks, and artificial intelligence. The concluding chapter discusses the
current state, job outlook, and certifications in analytics.
certified. He has contributed a number of research papers in national and international journals/proceedings to
his credit. Amar has authored around 10 books in the areas of data visualization, business analytics, Six Sigma,
statistics and data analysis, modeling, and applied regression. He is also associated with QMS Global LLC, a
company engaged in data visualization, analytics, quality, lean six sigma, manufacturing, and systems analysis
services. Amar is a senior member of the Industrial & Systems Engineers, the American Society for Quality (ASQ),
and Data Science.
Sahay, Amar. Business Analytics, Volume II : A Data Driven Decision Making Approach for Business, Business Expert Press,
2016. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/rmit/detail.action?docID=5975610.
Created from rmit on 2021-09-18 07:46:04.