Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Business Analytics Anna University Ba4206 Study Material

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 110

Business Analytics –

Anna University –
BA4206 - Study Material

1
Unit 1 – Introduction to Business Analytics

What are business analytics?

Business analytics (BA) is a set of disciplines and technologies for solving business problems
using data analysis, statistical models and other quantitative methods. It involves an iterative,
methodical exploration of an organization's data, with an emphasis on statistical analysis, to
drive decision-making.

Data-driven companies treat their data as a business asset and actively look for ways to turn it
into a competitive advantage. Success with business analytics depends on data quality,
skilled analysts who understand the technologies and the business, and a commitment to
using data to gain insights that inform business decisions.

How business analytics works

Before any data analysis takes place, BA starts with several foundational processes:

• Determine the business goal of the analysis.


• Select an analysis methodology.
• Get business data to support the analysis, often from various systems and sources.
• Cleanse and integrate data into a single repository, such as a data warehouse or
data mart.

Initial analysis is typically performed on a smaller sample data set of data. Analytics tools
range from spreadsheets with statistical functions to complex data mining and predictive
modeling applications. Patterns and relationships in the raw data are revealed. Then new
questions are asked, and the analytic process iterates until the business goal is met.

Deployment of predictive models involves a statistical process known as scoring and uses
records typically located in a database. Scores help enterprises make more informed, real-
time decisions within applications and business processes.

BA also supports tactical decision-making in response to unforeseen events. Often the


decision-making is automated using artificial intelligence to support real-time responses.

Types of business analytics

Different types of business analytics include the following:

descriptive analytics, which tracks key performance indicators (KPIs) to understand the
present state of a business;

2
predictive analytics, which analyzes trend data to assess the likelihood of future outcomes;
and

prescriptive analytics, which uses past performance to generate recommendations for


handling similar situations in the future.

Business analytics vs. business intelligence

The terms business intelligence (BI) and business analytics are often used interchangeably.
However, there are key differences.

Companies usually start with BI before implementing business analytics. BI analyzes


business operations to determine what practices have worked and where opportunities for
improvement lie. BI uses descriptive analytics.

In contrast, business analytics focuses on predictive analytics, generating actionable insights


for decision-makers. Instead of summarizing past data points, BA aims to predict trends.

The data collected using BI lays the groundwork for BA. From that data, companies can
choose specific areas to analyze further using business analytics.

Business analytics vs. data analytics

Data analytics is the analysis of data sets to draw conclusions about the information they
contain. Data analytics does not have to be used in pursuit of business goals or insights. It is a
broader practice that includes business analytics.

3
BA involves using data analytics tools in pursuit of business insights. However, because it's a
general term, data analytics is sometimes used interchangeably with business analytics.

Business analytics vs. data science

Data science uses analytics to inform decision-making. Data scientists explore data using
advanced statistical methods. They allow the features in the data to guide their analysis. The
more advanced areas of business analytics resemble data science, but there is a distinction
between what data scientists and business analysts do.

Even when advanced statistical algorithms are applied to data sets, it doesn't necessarily mean
data science is involved. That's because true data science uses custom coding and explores
answers to open-ended questions. In contrast, business analytics aims to solve a specific
question or problem.

Common challenges of business analytics

Businesses might encounter both business analytics and business intelligence challenges
when trying to implement a business analytics strategy:

• Too many data sources. There is an increasingly large spectrum of internet-


connected devices generating business data. In many cases, they are generating
different types of data that must be integrated into an analytics strategy. However, the
more complex a data set becomes, the harder it is to use it as part of an analytics
framework.
• Lack of skills. The demand for employees with the data analytic skills necessary to
process BA data has grown. Some businesses, particularly small and medium-sized
businesses (SMBs), may have a hard time hiring people with the BA expertise and
skills they need.
• Data storage limitations. Before a business can begin to decide how it will process
data, it must decide where to store it. For instance, a data lake can be used to capture
large volumes of unstructured data.

Roles and responsibilities in business analytics

Business analytics professionals' main responsibility is to collect and analyze data to


influence strategic decisions that a business makes. Some initiatives they might provide
analysis for include the following:

• identifying strategic opportunities from data patterns;

4
• identifying potential problems facing the business and solutions;
• creating a budget and business forecast;
• monitoring progress with business initiatives;
• reporting progress on business objectives back to stakeholders;
• understanding KPIs; and
• understanding regulatory and reporting requirements

Terminologies in Business Analytics

Analytics – Analytics can simply be defined as the process of breaking a problem into
simpler parts and using inferences based on data to drive decisions. Analytics is not a tool or
a technology; rather it is a way of thinking and acting.

Analytics has widespread applications in spheres as diverse as science, astronomy, genetics,


financial services, telecom, retail, marketing, sports, gaming and health care.

Business analytics – This term refers to the application of analytics specifically in the sphere
of business. It includes subsets like –

• Marketing analytics

• Risk analytics

• Fraud analytics

• CRM analytics

• Loyalty analytics

• Operations analytics

• HR analytics

Industries which rely extensively on analytics include –

• Financial Services (Banks, Credit Cards, Loans, Insurance etc.)

• Retail

• Telecom

• Health care

• Consumer goods

• Manufacturing

• Sports

5
• Hotels

• Airlines

• Any industry where large amounts of data are generated

Predictive Analytics – Predictive analytics is one of the most popular analytics terms.
Predictive analytics is used to make predictions on the likelihood of occurrence of an event or
determine some future patterns based on data. Remember it does not tell whether an event
will happen. It only assigns probabilities to the future events or patterns.

Google Trends analysis of “predictive analytics”

The term emphasizes the predictive nature of analytics (as opposed to, say the retrospective
nature of tools like OLAP). This is one of those terms that is designed by sales people and
marketers to add glamour to any business. “Predictive analytics” sounds fancier than just
plain “analytics”. In practice, predictive analytics is rarely used in isolation from descriptive
analytics.

Descriptive analytics – Descriptive analytics refers to a set of techniques used to describe or


explore or profile any kind of data. Any kind of reporting usually involves descriptive
analytics. Data exploration and data preparation are essential ingredients for predictive
modelling and these rely heavily on descriptive analytics.

Inquisitive analytics – Whereas descriptive analytics is used for data presentation and
exploration, inquisitive analytics answers terms why, what, how and what if. Ex: Why have
the sales in the Q4 dropped could be a question based on which inquisitive analysis can be
performed on the data

Advanced analytics – Like “Predictive analytics”, “Advanced analytics” too is a marketing


driven terminology. “Advanced” adds a little more punch, a little more glamour to
“Analytics” and is preferred by marketers.

Big data analytics – When analytics is performed on large data sets with huge volume,
variety and velocity of data it can be termed as big data analytics. The annual amount of data
we have is expected to grow from 8 zettabytes (trillion gigabytes) in 2015 to 35 zettabytes in
2020.

Growing data sizes would inevitably require advanced technology like Hadoop and Map
Reduce to store and map large chunks of data. Also, large variety of data (structured,
unstructured) is flowing in at a very rapid pace. This would not only require advance
technology but also advanced analytical platforms. So, to summarize, large amounts of data
together with the technology and the analytics platforms to get insights out of such a data can
be called as the big data analytics.

Data Mining – Data mining is the term that is most interchangeably used with “Analytics”.
Data Mining is an older term that was more popular in the nineties and the early 2000s.

6
However, data mining began to be confused with OLAP and that led to a drive to use more
descriptive terms like “Predictive analytics”.

According to Google trends, “Analytics” overtook “Data mining” in popularity at some point
in 2005 and is about 5 times more popular now. Incidentally, Coimbatore is one of the only
cities in the world where “Data mining” is still more popular than “Analytics”.

Data Science – Data science and data analytics are mostly used interchangeably. However,
sometimes a data scientist is expected to possess higher mathematical and statistical
sophistication than a data analyst. A Data scientist is expected to be well versed in linear
algebra, calculus, machine learning and should be able to navigate the nitty-gritty details of
mathematics and statistics with much ease.

Artificial Intelligence –During the early stages of computing, there were a lot of
comparisons between computing and human learning process and this is reflected in the
terminology.

The term “Artificial intelligence” was popular in the very early stages of computing and
analytics (in the 70s and 80s) but is now almost obsolete.

Machine learning – involves using statistical methods to create algorithms. It replaces


explicit programming which can become cumbersome due to the large amounts of data,
inflexible to adapt to the solution requirements and also sometimes illegible.

It is mostly concerned with the algorithms which can be a black box to interpret but good
models can give highly accurate results compared to conventional statistical methods. Also,
visualization, domain knowledge etc. are not inclusive when we speak about machine
learning. Neural networks, support vector machines etc. are the terms which are generally
associated with the machine learning algorithms

Algorithm – Usually refers to a mathematical formula which is output from the tools. The
formula summarizes the model

Ex: Amazon recommendation algorithm gives a formula that can recommend the next best
buy

Machine Learning – Similar to “Artificial intelligence” this term too has lost its popularity
in the recent past to terms like “Analytics” and its derivatives.

OLAP – Online analytical processing refers to descriptive analytic techniques of slicing and
dicing the data to understand it better and discover patterns and insights. The term is derived
from another term “OLTP” – online transaction processing which comes from the data
warehousing world.

Reporting – The term “Reporting” is perhaps the most unglamorous of all terms in the world
of analytics. Yet it is also one of the most widely used practices within the field. All
businesses use reporting to aid decision making. While it is not “Advanced analytics” or even

7
“Predictive analytics”, effective reporting requires a lot of skill and a good understanding of
the data as well as the domain.

Data warehousing – Ok, this may actually be considered more unglamorous than even
“Reporting”. Data warehousing is the process of managing a database and involves
extraction, transformation and loading (ETL) of data. Data warehousing precedes analytics.
The data managed in a data warehouse is usually taken out and used for business analytics.

Statistics – Statistics is the study of the collection, organization, and interpretation of data.
Data mining does not replace traditional statistical techniques. Rather, it is an extension of
statistical

methods that are in part the result of a major change in the statistics community. The
development of

most statistical techniques were, until recently, based on elegant theory and analytical
methods that

worked quite well on the modest amounts of data being analyzed. The increased power of
computers and their lower cost, coupled with the need to analyze enormous data sets with
millions of rows, have allowed the development of new techniques based on a brute-force
exploration of possible solutions.

Analytics platform – Software that provides for the computation required to carry out the
statistical methods, descriptive and inquisitive queries, machine learning, visualization and
big data (which is software plus hardware).

Ex: SAS, R, Tableau, Hadoop etc.

Clickstream analytics/ Web analytics – Analysis on user imprints created on the web

Ex: Number of clicks, probability to buy based on search times of a particular word etc.

Text analytics – Usually refers to analyzing unstructured (not tabulated) data in the form of
continuous text.

Ex: Facebook data analysis, twitter analysis etc.

Location analytics – With advanced GPS and location data available location analytics has
become quite popular

Ex: Offers based on customer location, insurance risk calculations based on proximity to
hazards

Sports analytics – Analysis of sports data using analytical tool and methods. Performance as
well as revenue data can be subjected to analytical procedures to achieve better results

8
The 7-step Business Analytics Process

Real-time analysis is an emerging business tool that is changing the traditional ways
enterprises do business. More and more organizations are today exploiting business analytics
to enable proactive decision making; in other words, they are switching from reacting to
situations to anticipating them.

One of the reasons for the flourishing of business analytics as a tool is that it can be applied
in any industry where data is captured and accessible. This data can be used for a variety of
reasons, ranging from improving customer service as well improving the organization’s
capability to predict fraud to offering valuable insights on online and digital information.

However business analytics is applied, the key outcome is the same: The solving of business
problems using the relevant data and turning it into insights, providing the enterprise with the
knowledge it needs to proactively make decisions. In this way the enterprise will gain a
competitive advantage in the marketplace.

So, what is business analytics? Essentially, business analytics is a 7-step process, outlined
below.

Step 1. Defining the business needs

The first stage in the business analytics process involves understanding what the business
would like to improve on or the problem it wants solved. Sometimes, the goal is broken down

9
into smaller goals. Relevant data needed to solve these business goals are decided upon by
the business stakeholders, business users with the domain knowledge and the business
analyst. At this stage, key questions such as, “what data is available”, “how can we use it”,
“do we have sufficient data” must be answered.

Step 2. Explore the data

This stage involves cleaning the data, making computations for missing data, removing
outliers, and transforming combinations of variables to form new variables. Time series
graphs are plotted as they are able to indicate any patterns or outliers. The removal of outliers
from the dataset is a very important task as outliers often affect the accuracy of the model if
they are allowed to remain in the data set. As the saying goes: Garbage in, garbage out
(GIGO)!

Once the data has been cleaned, the analyst will try to make better sense of the data. The
analyst will plot the data using scatter plots (to identify possible correlation or non-linearity).
He will visually check all possible slices of data and summaries the data using appropriate
visualization and descriptive statistics (such as mean, standard deviation, range, mode,
median) that will help provide a basic understanding of the data. At this stage, the analyst is
already looking for general patterns and actionable insights that can be derived to achieve the
business goal.

Step 3. Analyze the data

At this stage, using statistical analysis methods such as correlation analysis and hypothesis
testing, the analyst will find all factors that are related to the target variable. The analyst will
also perform simple regression analysis to see whether simple predictions can be made. In
addition, different groups are compared using different assumptions and these are tested
using hypothesis testing. Often, it is at this stage that the data is cut, sliced and diced and
different comparisons are made while trying to derive actionable insights from the data.

Step 4. Predict what is likely to happen

Business analytics is about being proactive in decision making. At this stage, the analyst will
model the data using predictive techniques that include decision trees, neural networks and
logistic regression. These techniques will uncover insights and patterns that highlight
relationships and ‘hidden evidences’ of the most influential variables. The analyst will then
compare the predictive values with the actual values and compute the predictive errors.
Usually, several predictive models are run and the best performing model selected based on
model accuracy and outcomes.

Step 5. Optimize (find the best solution)

At this stage the analyst will apply the predictive model coefficients and outcomes to run
‘what-if’ scenarios, using targets set by managers to determine the best solution, with the
given constraints and limitations. The analyst will select the optimal solution and model

10
based on the lowest error, management targets and his intuitive recognition of the model
coefficients that are most aligned to the organization’s strategic goal.

Step 6. Make a decision and measure the outcome

The analyst will then make decisions and take action based on the derived insights from the
model and the organizational goals. An appropriate period of time after this action has been
taken, the outcome of the action is then measured.

Step 7. Update the system with the results of the decision

Finally, the results of the decision and action and the new insights derived from the model are
recorded and updated into the database. Information such as, ‘was the decision and action
effective?’, ‘how did the treatment group compare with the control group?’ and ‘what was the
return on investment?’ are uploaded into the database. The result is an evolving database that
is continuously updated as soon as new insights and knowledge are derived.

Importance of business analytics

• Organizations employ Business analytics so they can make data-driven decisions.


Business analytics gives business an excellent overview and insight on how
companies can become more efficient, and these insights will enable such business
optimize and automate their processes. It is no surprise that data-driven companies,
and also make use of business analytics usually outperform their contemporaries. The
reason for this is that the insights gained via business analytics enable them to;
understand why specific results are achieved, explore more effective business
processes, and even predict the likelihood of certain results.

• Business analytics also offers adequate support and coverage for businesses who are
looking to make the right proactive decisions. Business analytics also allows
organizations to automate their entire decision-making process, so as to deliver real-
time responses when needed.

• One of the apparent importance of business analytics is the fact that it helps to gain
essential business insights. It does this by presenting the right data to work it. This
goes a long way in making decision making more efficient, but also easy.

• Efficiency is one area of business analytics helps any organization to achieve


immediately. Since its inception, business analytics have played a key role in helping
business improve their efficiency. Business analytics collates a considerable volume
of data in a timely manner, and also in a way that it can easily be analyzed. This
allows businesses to make the right decisions faster.

• Business analytics help organizations to reduce risks. By helping them make the right
decisions based on available data such as customer preferences, trends, and so on, it
can help businesses to curtail short and long-term risk.

11
• Business analytics is a methodology or tool to make a sound commercial decision.
Hence it impacts functioning of the whole organization. Therefore, business analytics
can help improve profitability of the business, increase market share and revenue and
provide better return to a shareholder.

• Facilitates better understanding of available primary and secondary data, which again
affect operational efficiency of several departments.

• Provides a competitive advantage to companies. In this digital age flow of


information is almost equal to all the players. It is how this information is utilized
makes the company competitive. Business analytics combines available data with
various well thought models to improve business decisions.

• Converts available data into valuable information. This information can be presented
in any required format, comfortable to the decision maker.

Evolution of Business Analytics

Business analytics has been existence since very long time and has evolved with availability
of newer and better technologies. It has its roots in operations research, which was
extensively used during World War II. Operations research was an analytical way to look at
data to conduct military operations. Over a period of time, this technique started getting
utilized for business. Here operation’s research evolved into management science. Again,
basis for management science remained same as operation research in data, decision making
models, etc.

As the economies started developing and companies became more and more competitive,
management science evolved into business intelligence, decision support systems and into PC
software.

Scope of Business Analytics

Business analytics has a wide range of application and usages. It can be used for descriptive
analysis in which data is utilized to understand past and present situation. This kind of
descriptive analysis is used to asses’ current market position of the company and
effectiveness of previous business decision.

It is used for predictive analysis, which is typically used to asses’ previous business
performance.

Business analytics is also used for prescriptive analysis, which is utilized to formulate
optimization techniques for stronger business performance.

For example, business analytics is used to determine pricing of various products in a


departmental store based past and present set of information.

12
Relationship of BA Process and Organization Decision-Making Process

The BA process can solve problems and identify opportunities to improve business
performance. In the process, organizations may also determine strategies to guide operations
and help achieve competitive advantages. Typically, solving problems and identifying
strategic opportunities to follow are organization decision-making tasks. The latter,
identifying opportunities, can be viewed as a problem of strategy choice requiring a solution.
It should come as no surprise that the BA process closely parallels classic organization
decision-making processes. As depicted in below shown Figure, the business analytic process
has an inherent relationship to the steps in typical organization decision-making processes.

Figure: Comparison of business analytics and organization decision-making processes

The organization decision-making process (ODMP) developed by Elbing (1970) and


presented in Figure 1.2 is focused on decision making to solve problems but could also be
applied to finding opportunities in data and deciding what is the best course of action to take
advantage of them. The five-step ODMP begins with the perception of disequilibrium, or the
awareness that a problem exists that needs a decision. Similarly, in the BA process, the first
step is to recognize that databases may contain information that could both solve problems

13
and find opportunities to improve business performance. Then in Step 2 of the ODMP, an
exploration of the problem to determine its size, impact, and other factors is undertaken to
diagnose what the problem is. Likewise, the BA descriptive analytic analysis explores factors
that might prove useful in solving problems and offering opportunities. The ODMP problem
statement step is similarly structured to the BA predictive analysis to find strategies, paths, or
trends that clearly define a problem or opportunity for an organization to solve problems.
Finally, the ODMP’s last steps of strategy selection and implementation involve the same
kinds of tasks that the BA process requires in the final prescriptive step (make an optimal
selection of resource allocations that can be implemented for the betterment of the
organization).

The decision-making foundation that has served ODMP for many decades parallels the BA
process. The same logic serves both processes and supports organization decision-making
skills and capacities.

Business analytics is the process of gathering data, measuring business performance, and
producing valuable conclusions that can help companies make informed decisions on the
future of the business, through the use of various statistical methods and techniques.

Analytics has become one of the most important tools at an organization’s disposal. When
data and analytics work hand in hand, the benefits become obvious. Companies can leverage
data to improve cost savings, redefine processes, drive market strategy, establish competitive
differentiators and, perhaps most importantly, build exceptional and truly personalized
customer experience.

The Competitive Advantage of Business Analytics

Business analytics for organizations is becoming a competitive advantage and is now


necessary to apply business analytics, particularly its subset of predictive business analytics.
The use of business analytics is a skill that is gaining mainstream value due to the
increasingly thinner margin for decision error. It is there to provide insights, predict the future
of the business and inferences from the treasure chest of raw transactional data, that is
internal and external data that many organizations now store (and will continue to store) as
soft copy.

Business analytics enables differentiation. It is primarily about driving change. Business


analytics drives competitive advantage by generating economies of scale, economies of
scope, and quality improvement. Taking advantage of the economies of scale is the first way
organizations achieve comparative cost efficiencies and drive competitive advantage against
their peers. Taking advantage of the economies of scope is the second-way organizations
achieve comparative cost efficiencies and drive competitive advantage against their peers.

14
Business analytics improves the efficiency of business operations. The efficiencies that
accumulate when a firm embraces big data technology eventually contributes to a ripple
effect of increased production and reduced business costs. In the modern world, the vast
quantities of data produced by corporations make their study and management practically
impossible.

One can make the case that increasing the primary source of attaining a competitive
advantage will be an organization’s competence in mastering all flavors of analytics. If your
management team is analytics-impaired, then your organization is at risk. Predictive business
analytics is arguably the next wave for organizations to successfully compete. This will result
not only from being able to predict outcomes but also to reach higher to optimize the use of
their resources, assets and trading partners. It may be that the ultimate sustainable business
strategy is to foster analytical competency and eventually mastery of analytics among an
organization’s workforce.

Analytics gives companies an insight into their customers’ behavior and needs. It also makes
it possible for a company to understand the public opinion of its brand, to follow the results
of various marketing campaigns, and strategize how to create a better marketing strategy to
nurture long and fruitful relationships with its customers.

Business analytics helps organizations to know where they stand in the industry or a
particular niche provides the company with the needed clarity to develop effective strategies
to position itself better in the future.

For a company to remain competitive in the modern marketplace that requires constant
change and growth, it must stay informed on the latest industry trends and best practices. Not
only does business analytics provide the needed knowledge for companies to survive in
today’s constantly changing business environment, but it also makes room for growth and
improvement, providing a detailed look into various opportunities and challenges that
companies face on a day-to-day basis.

The retention of company employees has been a concern for business enterprises although it
is taken more seriously in some niches that it is in other industries. A recent study that was
conducted by IBM infers that a business enterprise had over 5,000 job applications reviewed
but only hired 200 employees monthly. Big data has made it possible for companies to
quickly analyses long time worker’s histories to identify the job traits for long-term
employment prospects.

As a result, corporations and small business enterprises are revamping their recruitment
process which reduces employee turnover significantly. Companies can dedicate resources
that are newly available to activities that are of more productive value to the business and
increase their levels of service delivery. The retention of an experienced pool of employees
can significantly assist a business enterprise to outperform its competitors using their long-
term experiences.

15
Unit 2 – Managing Resources for Business Analytics

Business Analytics Personnel

One way to identify personnel needed for BA staff is to examine what is required for
certification in BA by organizations that provide BA services. INFORMS
(www.informs.org/Certification-Continuing-Ed/Analytics-Certification), a major academic
and professional organization, announced the startup of a Certified Analytic Professional
(CAP) program in 2013. Another more established organization, Cognize
(www.cognizure.com/index.aspx), offers a variety of service products, including business
analytics services. It offers a general certification Business Analytics Professional (BAP)
exam that measures existing skill sets in BA staff and identifies areas needing improvement
(www.cognizure.com/cert/bap.aspx). This is a tool to validate technical proficiency,
expertise, and professional standards in BA. The certification consists of three exams
covering the content areas listed in Table 3.1.

16
Table 3.1 Cognize Organization Certification Exam Content Areas*

Most of the content areas in Table 3.1 will be discussed and illustrated in subsequent chapters
and appendixes. The three exams required in the Cognize certification program can easily be
understood in the context of the three steps of the BA process (descriptive, predictive, and
prescriptive) discussed in previous chapters. The topics in Figure 3.1 of the certification
program are applicable to the three major steps in the BA process. The basic statistical tools
apply to the descriptive analytics step, the more advanced statistical tools apply to the
predictive analytics step, and the operations research tools apply to the prescriptive analytics
step. Some of the tools can be applied to both the descriptive and the predictive steps.
Likewise, tools like simulation can be applied to answer questions in both the predictive and
the prescriptive steps, depending on how they’re used. At the conjunction of all the tools is
the reality of case studies. The use of case studies is designed to provide practical experience,
whereby all tools are employed to answer important questions or seek opportunities.

17
Figure 3.1 Certification content areas and their relationship to the steps in BA

Other organizations also offer specialized certification programs. These certifications include
other areas of knowledge and skills beyond just analytic tools. IBM, for example, offers a
variety of specialized BA certifications (www-03.ibm.com/certify/certs/ba_index.shtml).
Although these include certifications in several dozen statistical, information systems, and
analytic methodologies related to BA, they also include specialized skill sets related to BA
personnel (administrators, designers, developers, solution experts, and specialists), as
presented in Table 3.2.

18
Table 3.2 Types of BA Personnel*

With the variety of positions and roles participants play in the BA process, this leads to the
question of what skill sets or competencies are needed to function in BA. In a general sense,
BA positions require competencies in business, analytic, and information systems skills. As
listed in Table 3.3, business skills involve basic management of people and processes. BA
personnel must communicate with BA staffers within the organization (the BA team
members) and the other functional areas within a firm (BA customers and users) to be useful.
Because they serve a variety of functional areas within a firm, BA personnel need to possess
customer service skills so they can interact with the firm’s personnel and understand the
nature of the problems they seek to solve. BA personnel also need to sell their services to
users inside the firm. In addition, some must lead a BA team or department, which requires
considerable interpersonal management leadership skills and abilities.

19
Table 3.3 Select Types of BA Personnel Skills or Competency Requirements

Fundamental to BA is an understanding of analytic methodologies listed in Table 3.1 and


others not listed. In addition to any tool sets, there is a need to know how they are integrated
into the BA process to leverage data (structured or unstructured) and obtain information that
customers who will be guided by the analytics desire.

In summary, people who undertake a career in BA are expected to know how to interact with
people and utilize the necessary analytic tools to leverage data into useful information that
can be processed, stored, and shared in information systems in a way that guides a firm to
higher levels of business performance.

3.3 Business Analytics Data

Structured and unstructured data (introduced in Chapter 2, “Why Is Business Analytics


Important?”) is needed to generate analytics. As a beginning for organizing data into an
understandable framework, statisticians usually categorize data into meaning groups.

3.3.1 Categorizing Data

There are many ways to categorize business analytics data. Data is commonly categorized by
either internal or external sources (Bartlett, 2013, pp. 238–239). Typical examples of internal
data sources include those presented in Table 3.4. When firms try to solve internal production

20
or service operations problems, internally sourced data may be all that is needed. Typical
external sources of data (see Table 3.5) are numerous and provide great diversity and unique
challenges for BA to process. Data can be measured quantitatively (for example, sales
dollars) or qualitatively by preference surveys (for example, products compared based on
consumers preferring one product over another) or by the amount of consumer discussion
(chatter) on the Web regarding the pluses and minuses of competing products.

Table 3.4 Typical Internal Sources of Data on Which Business Analytics Can Be Based

21
Table 3.5 Typical External Sources of Data on Which Business Analytics Can Be Based

A major portion of the external data sources are found in the literature. For example, the US
Census and the International Monetary Fund (IMF) are useful data sources at the
macroeconomic level for model building. Likewise, audience and survey data sources might
include Nielsen (www.nielsen.com/us/en.html) for psychographic or demographic data,
financial data from Equifax (www.equifax.com), Dun & Bradstreet (www.dnb.com), and so
forth.

3.3.2 Data Issues

Regardless of the source of data, it has to be put into a structure that makes it usable by BA
personnel. We will discuss data warehousing in the next section, but here we focus on a
couple of data issues that are critical to the usability of any database or data file. Those issues
are data quality and data privacy. Data quality can be defined as data that serves the purpose
for which it is collected. It means different things for different applications, but there are
some commonalities of high-quality data. These qualities usually include accurately
representing reality, measuring what it is supposed to measure, being timeless, and having
completeness. When data is of high quality, it helps ensure competitiveness, aids customer
service, and improves profitability. When data is of poor quality, it can provide information
that is contradictory, leading to misguided decision-making. For example, having missing
data in files can prohibit some forms’ statistical modeling, and incorrect coding of
information can completely render databases useless. Data quality requires effort on the part

22
of data managers to cleanse data of erroneous information and repair or replace missing data.
We will discuss some of these quality data measures in later chapters.

Data privacy refers to the protection of shared data such that access is permitted only to those
users for whom it is intended. It is a security issue that requires balancing the need to know
with the risks of sharing too much. There are many risks in leaving unrestricted access to a
company’s database. For example, competitors can steal a firm’s customers by accessing
addresses. Data leaks on product quality failures can damage brand image, and customers can
become distrustful of a firm that shares information given in confidence. To avoid these
issues, a firm needs to abide by the current legislation regarding customer privacy and
develop a program devoted to data privacy.

Collecting and retrieving data and computing analytics requires the use of computers and
information technology. A large part of what BA personnel do is related to managing
information systems to collect, process, store, and retrieve data from various sources.

3.4 Business Analytics Technology

Firms need an information technology (IT) infrastructure that supports personnel in the
conduct of their daily business operations. The general requirements for such a system are
stated in Table 3.6. These types of technology are elemental needs for business analytics
operations.

23
Table 3.6 General Information Technology (IT) Infrastructure

Of particular importance for BA is the data management technologies listed in Table 3.6.
Database management system (DBMS) is a data management technology software that
permits firms to centralize data, manage it efficiently, and provide access to stored data by
application programs. DBMS usually serves as an interface between application programs
and the physical data files of structured data. DBMS makes the task of understanding where
and how the data is actually stored more efficient. In addition, other DBMS systems can
handle unstructured data. For example, object-oriented DBMS systems are able to store and
retrieve unstructured data, like drawings, images, photographs, and voice data. These types of
technology are necessary to handle the load of big data that most firms currently collect.

DBMS includes capabilities and tools for organizing, managing, and accessing data in
databases. Four of the more important capabilities are its data definition language, data
dictionary, database encyclopedia, and data manipulation language. DBMS has a data
definition capability to specify the structure of content in a database. This is used to create
database tables and characteristics used in fields to identify content. These tables and
characteristics are critical success factors for search efforts as the database grows. These
characteristics are documented in the data dictionary (an automated or manual file that stores
the size, descriptions, format, and other properties needed to characterize data). The database
encyclopedia is a table of contents listing a firm’s current data inventory and the data files
that can be built or purchased. The typical content of the database encyclopedia is presented

24
in Table 3.7. Of particular importance for BA are the data manipulation language tools
included in DMBS. These tools are used to search databases for specific information. An
example is structure query language (SQL), which allows users to find specific data through a
session of queries and responses in a database.

Table 3.7 Database Encyclopedia Content

Data warehouses are databases that store current and historical data of potential interest to
decision makers. What a data warehouse does is make data available to anyone who needs
access to it. In a data warehouse, the data is prohibited from being altered. Data warehouses
also provide a set of query tools, analytical tools, and graphical reporting facilities. Some
firms use intranet portals to make data warehouse information widely available throughout a
firm.

Data marts are focused subsets or smaller groupings within a data warehouse. Firms often
build enterprise-wide data warehouses in which a central data warehouse serves the entire
organization and smaller, decentralized data warehouses (called data marts) are focused on a
limited portion of the organization’s data that is placed in a separate database for a specific
population of users. For example, a firm might develop a smaller database on just product
quality to focus efforts on quality customer and product issues. A data mart can be
constructed more quickly and at lower cost than enterprise-wide data warehouses to
concentrate effort in areas of greatest concern.

Once data has been captured and placed into database management systems, it is available for
analysis with BA tools, including online analytical processing, as well as data, text, and Web

25
mining technologies. Online analytical processing (OLAP) is software that allows users to
view data in multiple dimensions. For example, employees can be viewed in terms of their
age, sex, geographic location, and so on. OLAP would allow identification of the number of
employees who are age 35, male, and in the western region of a country. OLAP allows users
to obtain online answers to ad hoc questions quickly, even when the data is stored in very
large databases.

Data mining is the application of a software discovery-driven process that provides insights
into business data by finding hidden patterns and relationships in big data or large databases
and inferring rules from them to predict future behavior. The observed patterns and rules
guide decision-making. They can also act to forecast the impact of those decisions. It is an
ideal predictive analytics tool used in the BA process mentioned in Chapter 1, “What Is
Business Analytics?” The kinds of information obtained by data mining include those in
Table 3.8.

Table 3.8 Types of Information Obtainable with Data Mining Technology

Text mining (mentioned in Chapter 2) is a software application used to extract key elements
from unstructured data sets, discover patterns and relationships in the text materials, and
summarize the information. Given that the majority of the information stored in businesses is
in the form of unstructured data (e-mails, pictures, memos, transcripts, survey responses,
business receipts, and so on), the need to explore and find useful information will require
increased use of text mining tools in the future.

26
Web mining seeks to find patterns, trends, and insights into customer behavior from users of
the Web. Marketers, for example, use BA services like Google Trends
(www.google.com/trends/) and Google Insights for Search
(http://google.about.com/od/i/g/google-insights-for-search.htm) to track the popularity of
various words and phrases to learn what consumers are interested in and what they are
buying.

In addition to the general software applications discussed earlier, there are focused software
applications used every day by BA analysts in conducting the three steps of the BA process
(see Chapter 1). These include Microsoft Excel® spreadsheet applications, SAS applications,
and SPSS applications. Microsoft Excel (www.microsoft.com/) spreadsheet systems have
add-in applications specifically used for BA analysis. These add-in applications broaden the
use of Excel into areas of BA. Analysis Tool Pak is an Excel add-in that contains a variety of
statistical tools (for example, graphics and multiple regression) for the descriptive and
predictive BA process steps. Another Excel add-in, Solver, contains operations research
optimization tools (for example, linear programming) used in the prescriptive step of the BA
process.

SAS® Analytics Pro (www.sas.com/) software provides a desktop statistical toolset allowing
users to access, manipulate, analyze, and present information in visual formats. It permits
users to access data from nearly any source and transform it into meaningful, usable
information presented in visuals that allow decision makers to gain quick understanding of
critical issues within the data. It is designed for use by analysts, researchers, statisticians,
engineers, and scientists who need to explore, examine, and present data in an easily
understandable way and distribute findings in a variety of formats. It is a statistical package
chiefly useful in the descriptive and predictive steps of the BA process.

IBM’s SPSS software (www-01.ibm.com/software/analytics/spas/) offers users a wide range


of statistical and decision-making tools. These tools include methodologies for data
collection, statistical manipulation, modeling trends in structured and unstructured data, and
optimizing analytics. Depending on the statistical packages acquired, the software can cover
all three steps in the BA process.

Other software applications exist to cover the prescriptive step of the BA process. One that
will be used in this book is LINGO® by Lindo Systems (www.lindo.com). LINGO is a
comprehensive tool designed to make building and solving optimization models faster, easier,
and more efficient. LINGO provides a completely integrated package that includes an

27
understandable language for expressing optimization models, a full-featured environment for
building and editing problems, and a set of built-in solvers to handle optimization modeling
in linear, nonlinear, quadratic, stochastic, and integer programming models.

In summary, the technology needed to support a BA program in any organization will entail a
general information system architecture, including database management systems and
progress in greater specificity down to the software that BA analysts need to compute their
unique contributions to the organization. Organizations with greater BA requirements will
have substantially more technology to support BA efforts, but all firms that seek to use BA as
a strategy for competitive advantage will need a substantial investment in technology,
because BA is a technology-dependent undertaking.

How Do We Align Resources to Support Business Analytics within an Organization?

Chapter objectives:

• Explain why a centralized business analytics (BA) organization structure has advantages
over other structures.

• Describe the differences between BA programs, projects, and teams and how they align BA
resources in firms.

• Describe reasons why BA initiatives fail.

• Describe typical BA team roles and reasons for their failures.

• Explain why establishing an information policy is important.

• Explain the advantages and disadvantages of outsourcing BA.

• Describe how data can be scrubbed.

• Explain what change management involves and what its relationship is to BA.

Organization Structures Aligning Business Analytics

According to ISON and Harriott (2013, p. 124), to successfully implement business analytics
(BA) within organizations, the BA in whatever organizational form it takes must be fully
integrated throughout a firm. This requires BA resources to be aligned in a way that permits a
view of customer information within and across all departments, access to customer
information from multiple sources (internal and external to the organization), access to
historical analytics from a central repository, and alignment of technology resources so
they’re accountable for analytic success. The commonality of these requirements is the desire
for an alignment that maximizes the flow of information into and through the BA operation,

28
which in turn processes and shares information to desired users throughout the organization.
Accomplishing this information flow objective requires consideration of differing
organizational structures and managerial issues that help align BA resources to best serve an
organization.

4.1.1 Organization Structures

As mentioned in Chapter 2, “Why Is Business Analytics Important?” most organizations are


hierarchical, with senior managers making the strategic planning decisions, middle-level
managers making tactical planning decisions, and lower-level managers making operational
planning decisions. Within the hierarchy, other organizational structures exist to support the
development and existence of groupings of resources like those needed for BA. These
additional structures include programs, projects, and teams. A program in this context is the
process that seeks to create an outcome and usually involves managing several related
projects with the intention of improving organizational performance. A program can also be a
large project. A project tends to deliver outcomes and can be defined as having temporary
rather than permanent social systems within or across organizations to accomplish particular
and clearly defined tasks, usually under time constraints. Projects are often composed of
teams. A team consists of a group of people with skills to achieve a common purpose. Teams
are especially appropriate for conducting complex tasks that have many interdependent
subtasks.

The relationship of programs, projects, and teams with a business hierarchy is presented in
Figure 4.1. Within this hierarchy, the organization’s senior managers establish a BA program
initiative to mandate the creation of a BA grouping within the firm as a strategic goal. A BA
program does not always have an end-time limit. Middle-level managers reorganize or break
down the strategic BA program goals into doable BA project initiatives to be undertaken in a
fixed period of time. Some firms have only one project (establish a BA grouping) and others,
depending on the organization structure, have multiple BA projects requiring the creation of
multiple BA groupings. Projects usually have an end-time date in which to judge the
successfulness of the project. The projects in some cases are further reorganized into smaller
assignments, called BA team initiatives, to operationalize the broader strategy of the BA
program. BA teams may have a long-standing time limit (for example, to exist as the main
source of analytics for an entire organization) or have a fixed period (for example, to work on
a specific product quality problem and then end).

29
Figure 4.1 Hierarchal relationships program, project, and team planning

In summary, one way to look at the alignment of BA resources is to view it as a progression


of assigned planning tasks from a BA program, to BA projects, and eventually to BA teams
for implementation. As shown in Figure 4.1, this hierarchical relationship is a way to
examine how firms align planning and decision-making workload to fit strategic needs and
requirements.

BA organization structures usually begin with an initiative that recognizes the need to use and
develop some kind of program in analytics. Fortunately, most firms today recognize this
need. The question then becomes how to match the firm’s needs within the organization to
achieve its strategic, tactical, and operations objectives within resource limitations. Planning
the BA resource allocation within the organizational structure of a firm is a starting place for
the alignment of BA to best serve a firm’s needs.

Aligning the BA resources requires a determination of the number of resources a firm wants
to invest. The outcome of the resource investment might identify only one individual to
compute analytics for a firm. Because of the varied skill sets in information systems,
statistics, and operations research methods, a more common beginning for a BA initiative is
the creation of a BA team organization structure possessing a variety of analytical and
management skills. (We will discuss BA teams in Section 4.1.2.) Another way of aligning
BA resources within an organization is using a project structure. Most firms undertake
projects, and some firms actually use a project structure for their entire organization. For
example, consulting firms might view each client as a project (or product) and align their
resources

30
around the particular needs of that client. A project structure often necessitates multiple BA
teams to deal with a wider variety of analytic needs. Even larger investments in BA resources
might be required by firms that decide to establish a whole BA department containing all the
BA resources for a particular organization. Although some firms create BA departments, the
departments don’t have to be large. Whatever the organization structure that is used, the role
of BA is a staff (not line management) role in their advisory and consulting mission for the
firm.

In general, there are different ways to structure an organization to align its BA resources to
serve strategic plans. In organizations in which functional departments are structured on a
strict hierarchy, separate BA departments or teams have to be allocated to each functional
area, as presented in Figure 4.2. This functional organization structure may have the benefit
of stricter functional control by the VPs of an organization and greater efficiency in focusing
on just the analytics within each specialized area. On the other hand, this structure does not
promote the cross-department access that is suggested as a critical success factor for the
implementation of a BA program.

Figure 4.2 Functional organization structure with BA

The needs of each firm for BA sometimes dictate positioning BA within existing organization
functional areas. Clearly, many alternative structures can house a BA grouping. For example,
because BA provides information to users, BA could be included in the functional area of
management information systems, with the chief information officer (CIO) acting as both the
director of information systems (which includes database management) and the leader of the
BA grouping.

An alternative organizational structure commonly found in large organizations aligns


resources by project or product and is called a matrix organization. As illustrated in Figure
4.3, this structure allows the VPs some indirect control over their related specialists, which

31
would include the BA specialists but also allows direct control by the project or product
manager. This, similar to the functional organizational structure, does not promote the cross-
department access suggested for a successful implementation of a BA program.

Figure 4.3 Matrix organization structure

The literature suggests that the organizational structure that best aligns BA resources is one in
which a department, project, or team is formed in a staff structure where access to and from
the BA grouping of resources permits access to all areas within a firm, as illustrated in Figure
4.4 (Laursen and Thorlund, 2010, pp. 191–192; Bartlett, 2013, pp. 109–111; Stubbs, 2011, p.
68). The dashed line indicates a staff (not line management) relationship. This centralized BA
organization structure minimizes investment costs by avoiding duplications found in both the
functional and the matrix styles of organization structures. At the same time, it maximizes
information flow between and across functional areas in the organization. This is a logical
structure for a BA group in its advisory role to the organization. Bartlett (2013, pp. 109–110)
suggests other advantages of a centralized structure like the one in Figure 4.4. These include a
reduction in the filtering of information traveling upward through the organization, insulation
from political interests, breakdown of the siloed functional area communication barriers, a
more central platform for reviewing important analyses that require a broader field of

32
specialists, analytics-based group decision-making efforts, separation of the line management
leadership from potential clients (for example, the VP of marketing would not necessarily
come between the BA group working on customer service issues for a department within
marketing), and better connectivity between BA and all personnel within the area of problem
solving.

Figure 4.4 Centralized BA department, project, or team organization structure

Given the advocacy and logic recommending a centralized BA grouping, there are reasons for
all BA groupings to be centralized. These reasons help explain why BA initiatives that seek
to integrate and align BA resources into any type of BA group within the organization
sometimes fail. The listing in Table 4.1 is not exhaustive, but it provides some of the
important issues to consider in the process of structuring a BA group.

33
Table 4.1 Reasons for BA Initiative and Organization Failure

In summary, the organizational structure that a firm may select for the positioning of its BA
grouping can either be aligned within an existing organizational structure, or the BA grouping
can be separate, requiring full integration within all areas of an organization. While some
firms may start with a number of small teams to begin their BA program, other firms may
choose to start with a full-sized BA department. Regardless of the size of the investment in
BA resources, it must be aligned to allow maximum information flow between and across
functional areas to achieve the most benefits BA can deliver.

4.1.2 Teams

When it comes to getting the BA job done, it tends to fall to a BA team. For firms that
employ BA teams, the participants can be defined by the roles they play in the team effort.
Some of the roles BA team participants undertake and their typical background are presented
in Table 4.2.

34
Table 4.2 BA Team Participant Roles*

Aligning BA teams to achieve their tasks requires collaboration efforts from team members
and from their organizations. Like BA teams, collaboration involves working with people to
achieve a shared and explicit set of goals consistent with their mission. BA teams also have a
specific mission to complete. Collaboration through teamwork is the means to accomplish
their mission.

Team members’ need for collaboration is motivated by changes in the nature of work (no
more silos to hide behind, much more open environment, and so on), growth in professions
(for example, interactive jobs tend to be more professional, requiring greater variety in
expertise sharing), and the need to nurture innovation (creativity and innovation are fostered

35
by collaboration with a variety of people sharing ideas). To keep their job and to progress in
any business career, particularly in BA, team members must encourage working with other
members inside a team and out. For organizations, collaboration is motivated by the changing
nature of information flow (that is, hierarchical flows tend to be downward, whereas in
modern organizations, flow is in all directions) and changes in the scope of business
operations (that is, going from domestic to global allows for a greater flow of ideas and
information from multiple sources in multiple locations).

How does a firm change its culture of work and business operations to encourage
collaboration? One way to affect the culture is to provide the technology to support a more
open, cross-departmental information flow. This includes e-mail, instant messaging, wikis
(collaboratively edited works, like Wikipedia), use of social media and networking through
Facebook and Twitter, and encouragement of activities like collaborative writing, reviewing,
and editing efforts. Other technology supporting collaboration includes webinars, audio and
video conferencing, and even the use of iPads to enhance face-to-face communication. These
can be tools that change the culture of a firm to be more open and communicative.

Reward systems should be put into place to acknowledge team effort. Teams should be
commended for their performance, and individuals should be praised for performance in a
team. While middle-level managers build teams, coordinate their work, and monitor their
performance, senior management should establish collaboration and teamwork as a vital
function.

Despite the collaboration and best of intentions, BA teams sometimes fail. There are many
reasons for this, but knowing some of the more common ones can help managers avoid them.
Some of the more common reasons for team failure are presented in Table 4.3. They also
represent issues that can cause a BA program to become unaligned and unproductive.

36
Table 4.3 Reasons for BA Team Failures*

4.2 Management Issues

Aligning organizational resources is a management function. There are general management


issues that are related to a BA program, and some are specifically important to operating a
BA department, project, or team. The ones covered in this section include establishing an
information policy, outsourcing business analytics, ensuring data quality, measuring business
analytics contribution, and managing change.

4.2.1 Establishing an Information Policy

There is a need to manage information. This is accomplished by establishing an information


policy to structure rules on how information and data are to be organized and maintained and
who is allowed to view the data or change it. The information policy specifies organizational
rules for sharing, disseminating, acquiring, standardizing, classifying, and inventorying all
types of information and data. It defines the specific procedures and accountabilities that
identify which users and organizational units can share information, where the information
can be distributed, and who is responsible for updating and maintaining the information.

37
In small firms, business owners might establish the information policy. For larger firms, data
administration may be responsible for the specific policies and procedures for data
management (Siegel and Shim, 2003, p. 280). Responsibilities could include developing the
information policy, planning data collection and storage, overseeing database design,
developing the data dictionary, as well as monitoring how information systems specialists and
end user groups use data.

A more popular term for many of the activities of data administration is data governance,
which includes establishing policies and processes for managing the availability, usability,
integrity, and security of the data employed in businesses. It is specifically focused on
promoting data privacy, data security, data quality, and compliance with government
regulations.

Such information policy, data administration, and data governance must be in place to guard
and ensure data is managed for the betterment of the entire organization. These steps are also
important in the creation of database management systems (see Chapter 3, “What Resource
Considerations Are Important to Support Business Analytics?”) and their support of BA
tasks.

4.2.2 Outsourcing Business Analytics

Outsourcing can be defined as a strategy by which an organization chooses to allocate some


business activities and responsibilities from an internal source to an external source
(Schneiderman’s et al., 2005, pp. 3–4). Outsourcing business operations is a strategy that an
organization can use to implement a BA program, run BA projects, and operate BA teams.
Any business activity can be outsourced, including BA. Outsourcing is an important BA
management activity that should be considered as a viable alternative in planning an
investment in any BA program.

BA is a staff function that is easier to outsource than other line management tasks, such as
running a warehouse. To determine if outsourcing is a useful option in BA programs,
management needs to balance the advantages of outsourcing with its disadvantages. Some of
the advantages of outsourcing BA include those listed in Table 4.4.

38
Table 4.4 Advantages of Outsourcing BA

Nevertheless, there are disadvantages of outsourcing BA. Some of the disadvantages to


outsourcing are presented in Table 4.5.

Table 4.5 Disadvantages of Outsourcing BA

39
Managing outsourcing of BA does not have to involve the entire department. Most firms
outsource projects or tasks found to be too costly to assign internally. For example, firms
outsource cloud computing services to outside vendors (Laudon and Laudon, 2012, p. 511),
and other firms outsource software development or maintenance of legacy programs to
offshore firms in low-wage areas of the world to cut costs (Laudon and Laudon, 2012, p.
192).

Outsourcing BA can also be used as a strategy to bring BA into an organization


(Schneiderman’s et al., 2005, pp. 24–27). Initially, to learn how to operate a BA program,
project, or team, an outsource firm can be hired for a limited, contracted period. The client
firm can then learn from the outsourcing firm’s experience and instruction. Once the
outsourcing contract is over, the client firm can form its own BA department, project, or
team.

4.2.3 Ensuring Data Quality

Business analytics, if relevant, is based on data assumed to be of high quality. Data quality
refers to accuracy, precision, and completeness of data. High-quality data is considered to
correctly reflect the real world in which it is extracted. Poor-quality data caused by data entry
errors, poorly maintained databases, out-of-date data, and incomplete data usually leads to
bad decisions and undermines BA within a firm. Organizationally, the database management
systems (DBMS, mentioned in Chapter 3) personnel are managerially responsible for
ensuring data quality. Because of its importance and the possible location of the BA
department outside the management information systems department (which usually hosts the
DBMS), it is imperative that whoever leads the BA program should seek to ensure data
quality efforts are undertaken.

Ideally, a properly designed database with organization-wide data standards and efforts taken
to avoid duplication or inconsistent date elements should have high-quality data.
Unfortunately, times are changing, and more organizations allow customers and suppliers to
enter data into databases directly via the Web. As a result, most of the quality problems
originate from data input such as misspelled names, transposed numbers, or incorrect or
missing codes.

An organization needs to identify and correct faulty data and establish routines and
procedures for editing data in the database. The analysis of data quality can begin with a data
quality audit, in which a structured survey or inspection of accuracy and level of
completeness of data is undertaken. This audit may be of the entire database, just a sample of
files, or a survey of end users for perceptions of the data quality. If during the data quality

40
audit files are found that have errors, a process called data cleansing or data scrubbing is
undertaken to eliminate or repair data. Some of the areas in a data file that should be
inspected in the audit and suggestions on how to correct them are presented in Table 4.6.

Table 4.6 Quality Data Inspection Items and Recommendations

4.2.4 Measuring Business Analytics Contribution

The investment in BA must continually be justified by communicating the BA contribution to


the organization for ongoing projects. This means that performance analytics should be
computed for every BA project and BA team initiative. These analytics should provide an
estimate of the tangible and intangible values being delivered to the organization. This should
also involve establishing a communication strategy to promote the value being estimated.

Measuring the value and contributions that BA brings to an organization is essential to


helping the firm understand why the application of BA is worth the investment. Some BA
contribution estimates can be computed using standard financial methods, such as payback
period (how long it takes for the initial costs to be returned by profit) or return on investment
(ROI) (see Schneiderman’s et al., 2010, pp. 90–132), where dollar values or quantitative
analysis is possible. When intangible contributions are a major part of the contribution being
delivered to the firm, other methods like cost/benefit analysis (see Schneiderman’s et al.,
2010, pp. 143–158), which include intangible benefits, should be used.

41
The continued measurement of value that BA brings to a firm is not meant to be self-serving,
but it aids the organization in aligning efforts to solve problems and find new business
opportunities. By continually running BA initiatives, a firm is more likely to identify internal
activities that should and can be enhanced by employing optimization methodologies during
the Prescriptive step of the BA process introduced in Chapter 1, “What Is Business
Analytics?” It can also help identify underperforming assets. In addition, keeping track of
investment payoffs for BA initiatives can identify areas in the organization that should have a
higher priority for analysis. Indeed, past applications and allocations of BA resources that
have shown significant contributions can justify priorities established by the BA leadership
about where there should be allocated analysis efforts within the firm. They can also help
acquire increases in data support, staff hiring, and further investments in BA technology.

4.2.5 Managing Change

Wells (2000) found that what is critical in changing organizations is organizational culture
and the use of change management. Organizational culture is how an organization supports
cooperation, coordination, and empowerment of employees (Schermerhorn, 2001, p. 38).
Change management is defined as an approach for transitioning the organization (individuals,
teams, projects, departments) to a changed and desired future state (Laudon and Laudon,
2012, pp. 540–542). Change management is a means of implementing change in an
organization, such as adding a BA department (Schermerhorn, 2001, pp. 382–390). Changes
in an organization can be either planned (a result of specific and planned efforts at change
with direction by a change leader) or unplanned (spontaneous changes without direction of a
change leader). The application of BA invariably will result in both types of changes because
of BA’s specific problem-solving role (a desired, planned change to solve a problem) and
opportunity-finding exploratory nature (unplanned new knowledge opportunity changes) of
BA. Change management can also target almost everything that makes up an organization
(see Table 4.7).

42
Table 4.7 Change Management Targets*

It is not possible to gain the benefits of BA without change. The intent is change that involves
finding new and unique information on which change should take place in people, technology
systems, or business conduct. By instituting the concept of change management within an
organization, a firm can align resources and processes to more readily accept changes that BA
may suggest. Instituting the concept of change management in any firm depends on the
unique characteristics of that firm. There are, though, a number of activities in common with
successful change management programs, and they apply equally to changes in BA
departments, projects, or teams. Some of these activities that lead to change management
success are presented as best practices in Table 4.8.

43
Table 4.8 Change Management Best Practices

Unit - 3 Descriptive Analytics

Unit objectives:

• Explain why we need to visualize and explore data.

• Describe statistical charts and how to apply them.

• Describe descriptive statistics useful in the descriptive business analytics (BA) process.

• Describe sampling methods useful in BA and where to apply them.

• Describe what sampling estimation is and how it can aid in the BA process.

44
• Describe the use of confidence intervals and probability distributions.

• Explain how to undertake the descriptive analytics step in the BA process.

Introduction

In any BA undertaking, referred to as BA initiatives or projects, a set of objectives is


articulated. These objectives are a means to align the BA activities to support strategic goals.
The objectives might be to seek out and find new business opportunities, to solve operational
problems the firm is experiencing, or to grow the organization. It is from the objectives that
exploration via BA originates and is in part guided. The directives that come down, from the
strategic planners in an organization to the BA department or analyst, focus the tactical effort
of the BA initiative or project. Maybe the assignment will be one of exploring internal
marketing data for a new marketing product. Maybe the BA assignment will be focused on
enhancing service quality by collecting engineering and customer service information.
Regardless of the type of BA assignment, the first step is one of exploring data and revealing
new, unique, and relevant information to help the organization advance its goals. Doing this
requires an exploration of data.

This chapter focuses on how to undertake the first step in the BA process: descriptive
analytics. The focus in this chapter is to acquaint readers with more common descriptive
analytic tools used in this step and available in SAS software. The treatment here is not
computational but informational regarding the use and meanings of these analytic tools in
support of BA. For purposes of illustration, we will use the data set in Figure 5.1 representing
four different types of product sales (Sales 1, Sales 2, Sales 3, and Sales 4).

45
Figure 5.1 Illustrative sales data sets

When using SAS, data sets are placed into files like that in Figure 5.2.

46
Figure 5.2 SAS coding of sales data set

Creating the data set in Figure 5.2 requires a sequence of SAS steps. Because this is the first
use of SAS, the sequence of instructions is illustrated in Figures 5.3 through 5.9. These steps
should be familiar to experienced SAS users. Depending on the SAS version and intent of the
data file, the images in these figures may be slightly different.

47
Figure 5.3 Creating a workbook file for the data set

48
Figure 5.4 Excel data file used in the creation of the SAS data file

49
Figure 5.5 Step to pull the data set from an Excel (or any) file using SAS

Figure 5.6 Identify file using SAS

50
Figure 5.7 Label SAS file as SALES_DATA

Figure 5.8 Shows the SALES_DATA SAS file is created

51
Figure 5.9 Step to import the created file

SAS permits the use of many different sources for data sets or data files to be entered into an
SAS program. Big data files in Excel, SPSS, or other software applications can be brought
into SAS programs using a similar set of steps presented in this section. Once the data sets are
structured for use with SAS, there is still considerable SAS programming effort needed to
glean useful information from any big data or small data files. Fortunately, SAS provides the
means by which any sized data set can be explored and visualized by BA analysts. Because
SAS is a programming language, it permits a higher degree of application customization than
most other statistical software.

5.2 Visualizing and Exploring Data

There is no single best way to explore a data set, but some way of conceptualizing what the
data set looks like is needed for this step of the BA process. Charting is often employed to
visualize what the data might reveal.

When determining the software options to generate charts in SAS, consider that the software
can draft a variety of charts for the selected variables in the data sets. Using the data in Figure
5.1, charts can be created for the illustrative sales data sets. Some of these charts are
discussed in Table 5.1 as a set of exploratory tools that are helpful in understanding the

52
informational value of data sets. The chart to select depends on the objectives set for the
chart. The SAS program statements used to create each chart in Table 5.1 are provided.

53
54
Table 5.1 Statistical Charts Useful in BA

55
The charts presented in Table 5.1 reveal interesting facts. The column chart is useful in
revealing the almost perfect linear trend in the Sales 3 data, whereas the scatter chart reveals
an almost perfect nonlinear function in Sales 4 data. Additionally, the cluttered pie chart with
20 different percentages illustrates that all charts can or should be used in some situations.
The best practices suggest charting should be viewed as an exploratory activity of BA. BA
analysts should run a variety of charts and see which ones reveal interesting and useful
information. Those charts can be further refined to drill down to more detailed information
and more appropriate charts related to the objectives of the BA initiative.

Of course, a cursory review of the Sales 4 data in Figure 5.1 makes the concave appearance
of the data in the scatter chart in Table 5.1 unnecessary. But most BA problems involve big
data—so large as to make it impossible to just view it and make judgment calls on structure
or appearance. This is why descriptive statistics can be employed to view the data in a
parameter-based way in the hopes of better understanding the information that the data has to
reveal.

5.3 Descriptive Statistics

SAS has a number of useful statistics that can be automatically computed for the variables in
the data sets. The SAS printout of the sales data from Figure 5.1 is summarized in Table 5.2.
Some of these descriptive statistics are discussed in Table 5.3 as exploratory tools that are
helpful in understanding the informational value of data sets.

Table 5.2 SAS Descriptive Statistics

56
57
58
Table 5.3 Descriptive Statistics Useful in BA

Fortunately, we do not need to compute these statistics to know how to use them. Computer
software provides these descriptive statistics when they’re needed or requested. When you
look at the data sets for the four variables in Figure 5.1 and at the statistics in Table 5.2, there
are some obvious conclusions based on the detailed statistics from the data sets. It should be
no surprise that Sales 2, with a few of the largest values and mostly smaller ones making up
the data set, would have the largest variance statistics (standard deviation, sample variance,
range, maximum/minimum). Also, Sales 2 is highly, positively skewed (Skewedness > 1) and
highly peaked (Kurtosis > 3). Note the similarity of the mean, median, and mode in Sales 1
and the dissimilarity in Sales 2. These descriptive statistics provide a more precise basis to
envision the behavior of the data. Referred to as measures of central tendency, the mean,
median, and mode can also be used to clearly define the direction of a skewed distribution. A
negatively skewed distribution orders these measures such that mean<median<mode, and a
positive skewed distribution orders them such that mode<median<mean.

So, what can be learned from these statistics? There are many observations that can be drawn
from this data. Keep in mind that, in dealing with the big data sets, one would only have the
charts and the statistics to serve as a guide in determining what the data looks like. Yet, from
these statistics, one can begin describing the data set. So, in the case of Sales 2, it can be
predicted that the data set is positively skewed and peaked. Note in Figure 5.10 that the

59
histogram of Sales 2 is presented. The SAS chart also overlays a normal distribution (a bell-
shaped curve) to reflect the positioning of the mean (highest point on the curve, 167.2) and
the way the data appears to fit the normal distribution (not very well in this situation). As
expected, the distribution is positively distributed with a substantial variance between the
large values in the data set and the many smaller valued data points.

Figure 5.10 SAS histogram of Sales 2 data

We also know that substantial variance in the data points making up the data set is highly
diverse—so much so that it would be difficult to use this variable to predict future behavior
or trends. This type of information may be useful in the further steps of the BA process as a
means of weeding out data that will not help predict anything useful. Therefore, it would not
help an organization improve its operations.

Sometimes big data files become so large that certain statistical software systems cannot
manipulate them. In these instances, a smaller but representative sample of the data can be
obtained if necessary. Obtaining the sample for accurate prediction of business behavior
requires understanding the sampling process and estimation from that process.

60
5.4 Sampling and Estimation

The estimation of most business analytics requires sample data. In this section, we discuss
various types of sampling methods and follow that with a discussion of how the samples are
used in sampling estimation.

5.4.1 Sampling Methods

Sampling is an important strategy of handling large data. Big data can be cumbersome to
work with, but a smaller sample of items from the big data file can provide a new data file
that seeks to accurately represent the population from which it comes. In sampling data, there
are three components to recognize: a population, a sample, and a sample element (the items
that make up the sample). A firm’s collection of customer service performance documents for
one year could be designated as a population of customer service performance for that year.
From that population, a sample of a lesser number of sample elements (the individual
customer service documents) can be drawn to reduce the effort of working with the larger
data. Several sampling methods can be used to arrive at a representative sample. Some of
these sampling methods are presented in Table 5.4.

61
Table 5.4 Sampling Methods

The simple, systematic, stratified, and cluster random methods are based on some kind of
probability of their occurrence in a population. The quota and judgment methods are
nonprobability-based tools. Although the randomization process in some methods helps
ensure representative samples being drawn from the population, sometimes because of cost or
time constraints, nonprobability methods are the best choice for sampling.

Which sampling method should be selected for a particular BA analysis? It depends on the
nature of the sample. As mentioned in the application notes in Table 5.4, the size of the

62
population, the size of the sample, the area of application (geography, strata, ordering of the
data, and so on), and even the researchers running the data collection effort impact the
particular methodology selected. A best practices approach might begin with a determination
of any constraints (time allowed and costs) that might limit the selection of a sample
collection effort. That may narrow the choice to something like a quota method. Another best
practices recommendation is to start with the objective(s) of the BA project and use them as a
guide in the selection of the sampling method. For example, suppose the objective of a BA
analysis is to increase sales of a particular product. This might lead to random sampling of
customers or even a stratified sample by income levels, if income is important to the results
of the analysis. Fortunately, there is software to make the data collection process easier and
less expensive.

SAS software can be used with the methods mentioned earlier to aid in sampling analysis.
For example, SAS permits simple, systematic, stratified, and cluster random methods, among
others. Using this software requires a designation of the number of sample elements in each
stratum. (For example, we selected 2 for each stratum in this example.) In Figure 5.11, SAS
has defined seven strata for the Sales 4 data. The logic of this stratification can be observed
by looking at the Sales 4 data in Figure 5.1, where only seven different types of values exist
(1, 5, 9, 12, 18, 19, and 20). The additional SAS printout in Figure 5.11 shows the specific
sample elements that were randomly selected in each stratum, as well as totals and their
percentages in the resulting sample. For example, only 0.33, or 33 percent, of the “21” strata
sample elements were randomly selected by the SAS program.

63
Figure 5.11 SAS program statements for stratification/random sampling for Sales 4 variable

5.4.2 Sampling Estimation

Invariably, using any sampling method can cause errors in the sample results. Most of the
statistical methods listed in Table 5.2 are formulated for population statistics. Once sampling
is introduced into any statistical analysis, the data must be treated as a sample and not as a
population. Many statistical techniques, such as standard error of mean and sample variance,

64
incorporate mathematical correction factors to adjust descriptive analysis statistical tools to
compensate for the possibility of sampling error.

One of the methods of compensating for error is to show some degree of confidence in any
sampling statistic. The confidence in the sample statistics used can be expressed in a
confidence interval, which is an interval estimate about the sample statistics. In general, we
can express this interval estimate as follows:

Confidence interval = (sample statistic) ± [(confidence coefficient) × (standard error of the


estimate)]

The sample statistic in the confidence interval can be any measure or proportion from a
sample that is to be used to estimate a population parameter, such as a measure of central
tendency like a mean. The confidence coefficient is set as a percentage to define the degree of
confidence to accurately identify the correct sample statistic. The larger the confidence
coefficient, the more likely the population mean from the sample will fall within the
confidence interval. Many software systems set a 95 percent confidence level as the default
confidence coefficient, although any percentage can be used. SAS permits the user to enter a
desired percentage. The standard error of the estimate in the preceding expression can be any
statistical estimate, including proportions used to estimate a population parameter. For
example, using a mean as the sample statistic, we have the following interval estimate
expression:

Confidence interval = mean ± [(95 percent) × (standard error of the mean)]

The output of this expression consists of two values that form high and low values defining
the confidence interval. The interpretation of this interval is that the true population mean
represented by the sample has a 95 percent chance of falling in the interval. In this way, there
is still a 5 percent chance that the true population mean will not fall in the interval due to
sampling error. Because the standard error of the mean is based on variation statistics
(standard deviation), the larger the variance statistics used in this expression, the wider the
confidence interval and the less precise the sample mean value, which results in a good
estimate for the true population mean.

SAS computes confidence intervals when analyzing various statistical measures and tests. For
example, the SAS summary printout in Table 5.5 is of the 95 percent confidence interval for

65
the Sales 1 variable. With a sample mean value of 35.15, the confidence interval suggests
there is a 95 percent chance that the true population mean falls between 29.91 and 40.39.
When trying to ascertain if the sample is of any value, this kind of information can be of great
significance. For example, knowing with 95 percent certainty there is at least a mean of 29.91
might make the difference between continuing to sell a product or not because of a needed
requirement for a breakeven point in sales.

Table 5.5 SAS 95 Percent Confidence Interval Summary for Sales 1 Variable

Confidence intervals are also important for demonstrating the accuracy of some forecasting
models. For example, confidence intervals can be created about a regression equation model
forecast to see how far off the estimates might be if the model is used to predict future sales.
For additional discussion on confidence intervals, see Appendix A, “Statistical Tools.”

5.5 Introduction to Probability Distributions

By taking samples, one seeks to reveal population information. Once a sample is taken on
which to base a forecast or a decision, it may not accurately capture the population
information. No single sample can assure an analyst that the true population information has
been captured. Confidence interval statistics are employed to reflect the possibility of error
from the true population information.

To utilize the confidence interval formula expressed in Section 5.4, you set a confidence
coefficient percentage (95 percent) as a way to express the possibility that the sample
statistics used to represent the population statistics may have a potential for error. The
confidence coefficient used in the confidence interval is usually referred to as a Z value. It is
spatially related to the area (expressed as a percentage or frequency) representing the
probability under the curve of a distribution. The sample standard normal distribution is the
bell-shaped curve illustrated in Figure 5.12. This distribution shows the relationship of the Z
value to the area under the curve. The Z value is the number of standard errors of the means.

66
Figure 5.12 Standard normal probability distribution

The confidence coefficient is related to the Z values, which divide the area under a normal
curve into probabilities. Based on the central limit theorem, we assume that all sampling
distributions of sufficient size are normally distributed with a standard deviation equal to the
standard error of the estimate. This means that an interval of plus or minus two standard
errors of the estimate (whatever the estimate is) has a 95.44 percent chance of containing the
true or actual population parameter. Plus, or minus three standard errors of the estimate has
a
99.74 percent chance of containing the true or actual population parameter. So, the Z value
represents the number of standard errors of the estimate. Table 5.6 has selected Z values for
specific confidence levels representing the probability that the true population parameter is
within the confidence interval and represents the percentage of area under the curve in that
interval.

67
Table 5.6 Selected Z Values and Confidence Levels

The important BA use of the probability distributions and confidence intervals is that they
suggest an assumed parameter based on a sample that has properties that allow analysts to
predict or forecast with some assessed degree of statistical accuracy. In other words, BA
analysts can, with some designated confidence level, use samples from large databases to
accurately predict population parameters.

Another important value to probability distributions is that they can be used to compute
probabilities that certain outcomes like success with business performance may occur. In the
exploratory descriptive analytics step of the BA process, assessing the probabilities of some
events occurring can be a useful strategy to guide subsequent steps in an analysis. Indeed,
probability information may be useful in weighing the choices an analyst faces in any of the
steps of the BA process. Suppose, for example, the statistics from the Sales 1 variable in
Table 5.5 are treated as a sample to discover the probability of sales greater than one standard
error of the mean above the current mean of 35.15. In Figure 5.13, the mean (35.15) and
standard error of the mean (2.504) statistics are included at the bottom of the standard
sampling normal distribution. When one standard error of the mean is added to the sample
mean, the resulting value is 37.654. The sum of the area (the shaded region in Figure 5.13)
representing the total probability beyond 37.654 is a probability of 15.87 (13.59+2.15+0.13).
So, there is only a 15.87 percent probability that sales will exceed 37.654 based on the
sample information for the Sales 1 variable.

68
Figure 5.13 Probability function example

The ability to assess probabilities using this approach is applicable to other types of
probability distributions. For a review of probability concepts and distributions, probability
terminology, and probability applications, see Appendix A.

5.6 Marketing/Planning Case Study Example: Descriptive Analytics Step in the BA Process

In the last section of this chapter and in Chapters 6, “What Is Predictive Analytics?” and 7,
“What Is Prescriptive Analytics?” an ongoing marketing/planning case study of the relevant
BA step discussed in those chapters will be presented to illustrate some of the tools and
strategies used in a BA problem analysis. This is the first installment of the case study
dealing with the descriptive analytics step in BA. The predictive analytics step (in Chapter 6)
and prescriptive analytics step (in Chapter 7) will continue with this ongoing case study.

5.6.1 Case Study Background

A firm has collected a random sample of monthly sales information on a service product
offered infrequently and only for a month at a time. The sale of this service product occurs
only during the month that the promotion efforts are allocated. Basically, promotion funds are

69
allocated at the beginning or during the month, and whatever sales occur are recorded for that
promotion effort. There is no spillover of promotion to another month, because monthly
offerings of the service product are independent and happen randomly during any particular
year. The nature of the product does not appear to be impacted by seasonal or cyclical
variations, which prevents forecasting and makes planning the budget difficult.

The firm promotes this service product by using radio commercials, newspaper ads,
television commercials, and point-of-sale (POS) ad cards. The firm has collected the sales
information as well as promotion expenses. Because the promotion expenses are put into
place before the sales take place and on the assumption that the promotion efforts impact
products, the four promotion expenses can be viewed as predictive data sets (or what will be
the predictive variables in a forecasting model). Actually, in terms of modeling this problem,
product sales are going to be considered the dependent variable, and the other four data sets
represent independent or predictive variables.

These five data sets, in thousands of dollars, are present in the SAS printout shown in Figure
5.14. What the firm would like to know is, given a fixed budget of $350,000 for promoting
this service product, when offered again, how best should budget dollars be allocated in the
hope of maximizing future estimated months’ product sales? This is a typical question asked
of any product manager and marketing manager’s promotion efforts. Before the firm
allocates the budget, there is a need to understand how to estimate future product sales. This
requires understanding the behavior of product sales relative to sales promotion. To begin to
learn about the behavior of product sales to promotion efforts, we begin with the first step in
the BA process: descriptive analytics.

70
Figure 5.14 Data for marketing/planning case study

5.6.2 Descriptive Analytics Analysis

To begin conceptualizing possible relationships in the data, one might compute some
descriptive statistics and graph charts of data (which will end up being some of the variables
in the planned model). SAS can be used to compute these statistics and charts. The SAS
printout in Table 5.7 provides a typical set of basic descriptive statistics (means, ranges,
standard deviations, and so on) and several charts.

Table 5.7 SAS Descriptive Statistics for the Marketing/Planning Case Study

Remember, this is the beginning of an exploration that seeks to describe the data and get a
handle on what it may reveal. This effort may take some exploration to figure out the best
way to express data from a file or database, particularly as the size of the data file increases.

71
In this simple example, the data sets are small but can still reveal valuable information if
explored well.

In Figure 5.15, five typical SAS charts are presented. Respectively, these charts include a
histogram chart (sales), a block chart (radio), a line chart (TV), a pie chart (paper), and a 3D
chart (POS). These charts are interesting, but they’re not very revealing of behavior that helps
in understanding future sales trends that may be hiding in this data.

Figure 5.15 Preliminary SAS charts for the marketing/planning case study

To expedite the process of revealing potential relational information, think in terms of what
one is specifically seeking. In this instance, it is to predict the future sales of the service
product. That means looking for a graph to show a trend line. One type of simple graph that is

72
related to trend analysis is a line chart. Using SAS again, one can compute line charts for
each of the five data sets. These charts are presented in Figure 5.16. The vertical axis consists
of the dollar values, and the horizontal axis is the number ordering of observations as listed in
the data sets.

Figure 5.16 Preliminary SAS line charts for the marketing/planning case study

While providing a less confusing graphic presentation of the up-and-down behavior of the
data, the charts in these figures still do not clearly reveal any possible trend information.
Because the 20 months of data are not in any particular order and are not related to time, they
are independent values that can be reordered. Reordering data or sorting it can be a part of the
descriptive analytics process. Because trend is usually an upward or downward linear
behavior, one might be able to observe a trend in the product sales data set if that data is
reordered from low to high (or high to low). Reordering the sales by moving the 20 rows of
data around such that sales is arranged from low to high is presented in Figure 5.17. Using
this reordered data set, the SAS results are illustrated in the new line charts in Figure 5.18.

73
Figure 5.17 Reordered data in line charts for the marketing/planning case study

74
Figure 5.18 SAS line charts based on reordered data for the marketing/planning case
study

Given the low to high reordering of the product sales as a guide, some of the other four-line
charts suggest a relationship with product sales. Both radio and TV commercials appear to
have a similar low to high trending relationship that matches product sales. This suggests
these two will be good predictive variables for product sales, whereas newspaper and POS
ads are still volatile in their charted relationships with product sales. Therefore, these two
latter variables might not be useful in a model seeking to predict product sales. They cannot
be ruled out at this point in the analysis, but they are suspected of adding little to a model for
accurately forecasting product sales. Put another way, they appear to add unneeded variation
that may take away from the accuracy of the model. Further analysis is called for to explore
in more detail and sophistication the best set of predictive variables to predict the
relationships in product sales.

In summary, for this case study, the descriptive analytics analysis has revealed a potential
relationship between radio and TV commercials and future product sales, and it questions the
relationship of newspaper and POS ads to sales. The managerial ramifications of these results
might suggest discontinuing investing in newspaper and POS ads and more productively
allocating funds to radio and TV commercials. Before such a reallocation can be justified,
more analysis is needed.

75
Unit 4 – Predictive Analytics

• Explain what logic-driven models are used for in business analytics (BA).
• Describe what a cause-and-effect diagram is used for in BA.
• Explain the difference between logic-driven and data-driven models.
• Explain how data mining can aid in BA.
• Explain why neural networks can be helpful in determining both associations and
classification tasks required in some BA analyses.
• Explain how clustering is undertaken in BA.
• Explain how step-wise regression can be useful in BA.
• Explain how to use R-Squared adjusted statistics in BA.

Introduction

In Chapter 1, “What Is Business Analytics?” we defined predictive analytics as an application


of advanced statistical, information software, or operations research methods to identify
predictive variables and build predictive models to identify trends and relationships not
readily observed in the descriptive analytic analysis. Knowing that relationships exist
explains why one set of independent variables (predictive variables) influences dependent
variables like business performance. Chapter 1 further explained that the purpose of the
descriptive analytics step is to position decision makers to build predictive models designed
to identify and predict future trends.

Picture a situation in which big data files are available from a firm’s sales and customer
information (responses to differing types of advertisements, customer surveys on product
quality, customer surveys on supply chain performance, sale prices, and so on). Assume also
that a previous descriptive analytic analysis suggests that there is a relationship between
certain customer variables, but there is a need to precisely establish a quantitative relationship
between sales and customer behavior. Satisfying this need requires exploration into the big
data to first establish whether a measurable, quantitative relationship does in fact exist and
then develop a statistically valid model in which to predict future events. This is what the
predictive analytics step in BA seeks to achieve.

Many methods can be used in this step of the BA process. Some are just to sort or classify big
data into manageable files in which to later build a precise quantitative model. As previously
mentioned in Chapter 3, “What Resource Considerations Are Important to Support Business

76
Analytics?” predictive modeling and analysis might consist of the use of methodologies,
including those found in forecasting, sampling and estimation, statistical inference, data
mining, and regression analysis. A commonly used methodology is multiple regression. (See
Appendixes A, “Statistical Tools,” and E, “Forecasting,” for a discussion on multiple
regression and ANOVA testing.) This methodology is ideal for establishing whether a
statistical relationship exists between the predictive variables found in the descriptive
analysis and the dependent variable one seeks to forecast. An example of its use will be
presented in the last section of this chapter.

Although single or multiple regression models can often be used to forecast a trend line into
the future, sometimes regression is not practical. In such cases, other forecasting methods,
such as exponential smoothing or smoothing averages, can be applied as predictive analytics
to develop needed forecasts of business activity. (See Appendix E.) Whatever methodology is
used; the identification of future trends or forecasts is the principal output of the predictive
analytics step in the BA process.

6.2 Predictive Modeling

Predictive modeling means developing models that can be used to forecast or predict future
events. In business analytics, models can be developed based on logic or data.

6.2.1 Logic-Driven Models

A logic-driven model is one based on experience, knowledge, and logical relationships of


variables and constants connected to the desired business performance outcome situation. The
question here is how to put variables and constants together to create a model that can predict
the future. Doing this requires business experience. Model building requires an understanding
of business systems and the relationships of variables and constants that seek to generate a
desirable business performance outcome. To help conceptualize the relationships inherent in
a business system, diagramming methods can be helpful. For example, the cause-and-effect
diagram is a visual aid diagram that permits a user to hypothesize relationships between
potential causes of an outcome (see Figure 6.1). This diagram lists potential causes in terms
of human, technology, policy, and process resources in an effort to establish some basic
relationships that impact business performance. The diagram is used by tracing contributing
and relational factors from the desired business performance goal back to possible causes,
thus allowing the user to better picture sources of potential causes that could affect the
performance. This diagram is sometimes referred to as a fishbone diagram because of its
appearance.

77
Figure 6.1 Cause-and-effect diagram*

Another useful diagram to conceptualize potential relationships with business performance


variables is called the influence diagram. According to Evans (2013, pp. 228–229), influence
diagrams can be useful to conceptualize the relationships of variables in the development of
models. An example of an influence diagram is presented in Figure 6.2. It maps the
relationship of variables and a constant to the desired business performance outcome of
profit. From such a diagram, it is easy to convert the information into a quantitative model
with constants and variables that define profit in this situation:

Profit = Revenue − Cost, or

Profit = (Unit Price × Quantity Sold) - [(Fixed Cost) + (Variable Cost × Quantity Sold)],
or

P = (UP × QS) - [FC + (VC × QS)]

78
Figure 6.2 An influence diagram

The relationships in this simple example are based on fundamental business knowledge.
Consider, however, how complex cost functions might become without some idea of how
they are mapped together. It is necessary to be knowledgeable about the business systems
being modeled in order to capture the relevant business behavior. Cause-and-effect diagrams
and influence diagrams provide tools to conceptualize relationships, variables, and constants,
but it often takes many other methodologies to explore and develop predictive models.

6.2.2 Data-Driven Models

Logic-driven modeling is often used as a first step to establish relationships through data-
driven models (using data collected from many sources to quantitatively establish model
relationships). To avoid duplication of content and focus on conceptual material in the
chapters, we have relegated most of the computational aspects and some computer usage
content to the appendixes. In addition, some of the methodologies are illustrated in the case
problems presented in this book. Please refer to the Additional Information column in Table
6.1 to obtain further information on the use and application of the data-driven models.

79
Table 6.1 Data-Driven Models

6.3 Data Mining

As mentioned in Chapter 3, data mining is a discovery-driven software application process


that provides insights into business data by finding hidden patterns and relationships in big or
small data and inferring rules from them to predict future behavior. These observed patterns
and rules guide decision-making. This is not just numbers, but text and social media
information from the Web. For example, Abrahams et al. (2013) developed a set of text-
mining rules that automobile manufacturers could use to distill or mine specific vehicle
component issues that emerge on the Web but take months to show up in complaints or other
damaging media. These rules cut through the mountainous data that exists on the Web and
are reported to provide marketing and competitive intelligence to manufacturers, distributors,
service centers, and suppliers. Identifying a product’s defects and quickly recalling or
correcting the problem before customers experience a failure reduce customer dissatisfaction
when problems occur.

80
6.3.1 A Simple Illustration of Data Mining

Suppose a grocery store has collected a big data file on what customers put into their baskets
at the market (the collection of grocery items a customer purchase at one time). The grocery
store would like to know if there are any associated items in a typical market basket. (For
example, if a customer purchases product A, she will most often associate it or purchase it
with product B.) If the customer generally purchases product A and B together, the store
might only need to advertise product A to gain both product A’s and B’s sales. The value of
knowing this association of products can improve the performance of the store by reducing
the need to spend money on advertising both products. The benefit is real if the association
holds true.

Finding the association and proving it to be valid require some analysis. From the descriptive
analytics analysis, some possible associations may have been uncovered, such as product A’s
and B’s association. With any size data file, the normal procedure in data mining would be to
divide the file into two parts. One is referred to as a training data set, and the other as a
validation data set. The training data set develops the association rules, and the validation
data set tests and proves that the rules work. Starting with the training data set, a common
data mining methodology is what-if analysis using logic-based software. SAS has a what-if
logic-based software application, and so do a number of other software vendors (see Chapter
3). These software applications allow logic expressions. (For example, if product A is
present, then is product B present?) The systems can also provide frequency and probability
information to show the strength of the association. These software systems have differing
capabilities, which permit users to deterministically simulate different scenarios to identify
complex combinations of associations between product purchases in a market basket.

Once a collection of possible associations is identified and their probabilities are computed,
the same logic associations (now considered association rules) are rerun using the validation
data set. A new set of probabilities can be computed, and those can be statistically compared
using hypothesis testing methods to determine their similarity. Other software systems
compute correlations for testing purposes to judge the strength and the direction of the
relationship. In other words, if the consumer buys product A first, it could be referred to as
the Head and product B as the Body of the association (Nisbet et al., 2009, p. 128). If the
same basic probabilities are statistically significant, it lends validity to the association rules
and their use for predicting market basket item purchases based on groupings of products.

6.3.2 Data Mining Methodologies

Data mining is an ideal predictive analytics tool used in the BA process. We mentioned in
Chapter 3 different types of information that data mining can glean, and Table 6.2 lists a
small sampling of data mining methodologies to acquire different types of information. Some
of the same tools used in the descriptive analytics step are used in the predictive step but are

81
employed to establish a model (either based on logical connections or quantitative formulas)
that may be useful in predicting the future.

Table 6.2 Types of Information and Data Mining Methodologies

Several computer-based methodologies listed in Table 6.2 are briefly introduced here. Neural
networks are used to find associations where connections between words or numbers can be
determined. Specifically, neural networks can take large volumes of data and potential
variables and explore variable associations to express a beginning variable (referred to as an
input layer), through middle layers of interacting variables, and finally to an ending variable
(referred to as an output). More than just identifying simple one-on-one associations, neural
networks link multiple association pathways through big data like a collection of nodes in a
network. These nodal relationships constitute a form of classifying groupings of variables as
related to one another, but even more, related in complex paths with multiple associations
(Nisbet et al., 2009, pp. 128–138). Differing software have a variety of association network
function capabilities. SAS offers a series of search engines that can identify associations.
SPSS has two versions of neural network software functions: Multilayer Perception (MLP)
and Radial Basis Function (RBF). Both procedures produce a predictive model for one or
more dependent variables based on the values of the predictive variables. Both allow a
decision maker to develop, train, and use the software to identify particular traits (such as bad
loan risks for a bank) based on characteristics from data collected on past customers.

82
Discriminant analysis is similar to a multiple regression model except that it permits
continuous independent variables and a categorical dependent variable. The analysis
generates a regression function whereby values of the independent variables can be
incorporated to generate a predicted value for the dependent variable. Similarly, logistic
regression is like multiple regression. Like discriminant analysis, its dependent variable can
be categorical. The independent variables in logistic regression can be either continuous or
categorical. For example, in predicting potential outsource providers, a firm might use a
logistic regression, in which the dependent variable would be to classify an outsource
provider as either rejected (represented by the value of the dependent variable being zero) or
acceptable (represented by the value of one for the dependent variable).

Hierarchical clustering is a methodology that establishes a hierarchy of clusters that can be


grouped by the hierarchy. Two strategies are suggested for this methodology: agglomerative
and divisive. The agglomerative strategy is a bottom-up approach, in which one starts with
each item in the data and begins to group them. The divisive strategy is a top-down approach,
in which one starts with all the items in one group and divides the group into clusters. How
the clustering takes place can involve many different types of algorithms and differing
software applications. One method commonly used is to employ a Euclidean distance
formula that looks at the square root of the sum of distances between two variables, their
differences squared. Basically, the formula seeks to match up variable candidates that have
the least squared error differences. (In other words, they’re closer together.)

K-mean clustering is a classification methodology that permits a set of data to be reclassified


into K groups, where K can be set as the number of groups desired. The algorithmic process
identifies initial candidates for the K groups and then interactively searches other candidates
in the data set to be averaged into a mean value that represents a particular K group. The
process of selection is based on maximizing the distance from the initial K candidates
selected in the initial run through the list. Each run or iteration through the data set allows the
software to select further candidates for each group.

The K-mean clustering process provides a quick way to classify data into differentiated
groups. To illustrate this process, use the sales data in Figure 6.3 and assume these are sales
from individual customers. Suppose a company wants to classify the sales customers into
high and low sales groups.

83
Figure 6.3 Sales data for cluster classification problem

The SAS K-Mean cluster software can be found in Proc Cluster. Any integer value can
designate the K number of clusters desired. In this problem set, K=2. The SAS printout of
this classification process is shown in Table 6.3. The Initial Cluster Centers table listed the
initial high (20167) and a low (12369) value from the data set as the clustering process
begins. As it turns out, the software divided the customers into 9 high sales customers and 11
low sales customers.

Table 6.3 SAS K-Mean Cluster Solution

84
Consider how large big data sets can be. Then realize this kind of classification capability can
be a useful tool for identifying and predicting sales based on the mean values.

There are so many BA methodologies that no single section, chapter, or even book can
explain or contain them all. The analytic treatment and computer usage in this chapter have
been focused mainly on conceptual use. For a more applied use of some of these
methodologies, note the case study that follows and some of the content in the appendixes.

6.4 Continuation of Marketing/Planning Case Study Example: Prescriptive Analytics Step in


the BA Process

In the last sections of Chapters 5, 6, and 7, an ongoing marketing/planning case study of the
relevant BA step discussed in those chapters is presented to illustrate some of the tools and
strategies used in a BA problem analysis. This is the second installment of the case study
dealing with the predictive analytics analysis step in BA. The prescriptive analysis step
coming in Chapter 7, “What Is Prescriptive Analytics?” will complete the ongoing case
study.

6.4.1 Case Study Background Review

The case study firm had collected a random sample of monthly sales information presented in
Figure 6.4 listed in thousands of dollars. What the firm wants to know is, given a fixed
budget of $350,000 for promoting this service product, when it is offered again, how best
should the company allocate budget dollars in hopes of maximizing the future estimated
month’s product sales? Before the firm makes any allocation of budget, there is a need to
understand how to estimate future product sales. This requires understanding the behavior of
product sales relative to sales promotion efforts using radio, paper, TV, and point-of-sale
(POS) ads.

85
Figure 6.4 Data for marketing/planning case study

The previous descriptive analytics analysis in Chapter 5 revealed a potentially strong


relationship between radio and TV commercials that might be useful in predicting future
product sales. The analysis also revealed little regarding the relationship of newspaper and
POS ads to product sales. So, although radio and TV commercials are most promising, a
more in-depth predictive analytics analysis is called for to accurately measure and document
the degree of relationship that may exist in the variables to determine the best predictors of
product sales.

6.4.2 Predictive Analytics Analysis

An ideal multiple variable modeling approach that can be used in this situation to explore
variable importance in this case study and eventually lead to the development of a predictive
model for product sales is correlation and multiple regression. We will use SAS’s statistical
package to compute the statistics in this step of the BA process.

First, we must consider the four independent variables—radio, TV, newspaper, POS—before
developing the model. One way to see the statistical direction of the relationship (which is
better than just comparing graphic charts) is to compute the Pearson correlation coefficients r
between each of the independent variables with the dependent variable (product sales). The
SAS correlation coefficients and their levels of significance are presented in Table 6.4. The

86
larger the Pearson correlation (regardless of the sign) and the smaller the Significance test
values (these are t-tests measuring the significance of the Pearson r value; see Appendix A),
the more significant the relationship. Both radio and TV are statistically significant
correlations, whereas at a 0.05 level of significance, paper and POS are not statistically
significant.

Table 6.4 SAS Pearson Correlation Coefficients: Marketing/Planning Case Study

Although it can be argued that the positive or negative correlation coefficients should not
automatically discount any variable from what will be a predictive model, the negative
correlation of newspapers suggests that as a firm increases investment in newspaper ads, it
will decrease product sales. This does not make sense in this case study. Given the illogic of
such a relationship, its potential use as an independent variable in a model is questionable.
Also, this negative correlation poses several questions that should be considered. Was the
data set correctly collected? Is the data set accurate? Was the sample large enough to have
included enough data for this variable to show a positive relationship? Should it be included
for further analysis? Although it is possible that a negative relationship can statistically show
up like this, it does not make sense in this case. Based on this reasoning and the fact that the
correlation is not statistically significant, this variable (newspaper ads) will be removed from
further consideration in this exploratory analysis to develop a predictive model.

87
Some researchers might also exclude POS based on the insignificance (p=0.479) of its
relationship with product sales. However, for purposes of illustration, continue to consider it
a candidate for model inclusion. Also, the other two independent variables (radio and TV)
were found to be significantly related to product sales, as reflected in the correlation
coefficients in the tables.

At this point, there is a dependent variable (product sales) and three candidate independent
variables (POS, TV, and Radio) in which to establish a predictive model that can show the
relationship between product sales and those independent variables. Just as a line chart was
employed to reveal the behavior of product sales and the other variables in the descriptive
analytic step, a statistical method can establish a linear model that combines the three
predictive variables. We will use multiple regression, which can incorporate any of the
multiple independent variables, to establish a relational model for product sales in this case
study. Multiple regression also can be used to continue our exploration of the candidacy of
the three independent variables.

The procedure by which multiple regression can be used to evaluate which independent
variables are best to include or exclude in a linear model is called step-wise multiple
regression. It is based on an evaluation of regression models and their validation statistics—
specifically, the multiple correlation coefficients and the F-ratio from an ANOVA. SAS
software and many other statistical systems build in the step-wise process. Some are called
backward selection or step-wise regression, and some are called forward selection or step-
wise regression. The backward step-wise regression starts with all the independent variables
placed in the model, and the step-wise process removes them one at a time based on worst
predictors first until a statistically significant model emerges. The forward step-wise
regression starts with the best related variable (using correction analysis as a guide), and then
step-wise adds other variables until adding more will no longer improve the accuracy of the
model. The forward step-wise regression process will be illustrated here manually. The first
step is to generate individual regression models and statistics for each independent variable
with the dependent variable one at a time. These three SAS models are presented in Tables
6.5, 6.6, and 6.7 for the POS, radio, and TV variables, respectively.

88
Table 6.5 SAS POS Regression Model: Marketing/Planning Case Study

Table 6.6 SAS Radio Regression Model: Marketing/Planning Case Study

89
Table 6.7 SAS TV Regression Model: Marketing/Planning Case Study

The computer printouts in the tables provide a variety of statistics for comparative purposes.
Discussion will be limited here to just a few. The R-Square statistics are a precise
proportional measure of the variation that is explained by the independent variable’s behavior
with the dependent variable. The closer the R-Square is to 1.00, the more of the variation is
explained, and the better the predictive variable. The three variables’ R-Squares are 0.0002
(POS), 0.9548 (radio), and 0.9177 (TV). Clearly, radio is the best predictor variable of the
three, followed by TV and, without almost any relationship, POS. This latter result was
expected based on the prior Pearson correlation. What it is suggesting is that only 0.0823
percent (1.000−0.9177) of the variation in product sales is explained by TV commercials.

From ANOVA, the F-ratio statistic is useful in actually comparing the regression model’s
capability to predict the dependent variable. As R-Square increases, so does the F-ratio
because of the way in which they are computed and what is measured by both. The larger the
F-ratio (like the R-Square statistic), the greater the statistical significance in explaining the
variable’s relationships. The three variables’ F-ratios from the ANOVA tables are 0.00
(POS),
380.22 (radio), and 200.73 (TV). Both radio and TV are statistically significant, but POS has
an insignificant relationship. To give some idea of how significant the relationships are,
assuming a level of significance where α=0.01, one would only need a cut-off value for the F-
ratio of 8.10 to designate it as being significant. Not exceeding that F-ratio (as in the case of
POS at 0.00) is the same as saying that the coefficient in the regression model for POS is no

90
different from a value of zero (no contribution to Product Sales). Clearly, the independent
variables radio and TV appear to have strong relationships with the dependent variable. The
question is whether the two combined or even three variables might provide a more accurate
forecasting model than just using the one best variable like radio.

Continuing with the step-wise multiple regression procedure, we next determine the possible
combinations of variables to see if a particular combination is better than the single variable
models computed previously. To measure this, we have to determine the possible
combinations for the variables and compute their regression models. The combinations are
(1) POS and radio; (2) POS and TV; (3) POS, radio, and TV; and (4) radio and TV.

The resulting regression model statistics are summarized and presented in Table 6.8. If one is
to base the selection decision solely on the R-Square statistic, there is a tie between the
POS/radio/TV and the radio/TV combination (0.979 R-Square values). If the decision is
based solely on the F-ratio value from ANOVA, one would select just the radio/TV
combination, which one might expect of the two most significantly correlated variables.

Table 6.8 SAS Variable Combinations and Regression Model Statistics:


Marketing/Planning Case Study

To aid in supporting a final decision and to ensure these analytics are the best possible
estimates, we can consider an additional statistic. That tie breaker is the R-Squared
(Adjusted) statistic, which is commonly used in multiple regression models.

The R-Square Adjusted statistic does not have the same interpretation as R-Square (a precise,
proportional measure of variation in the relationship). It is instead a comparative measure of
suitability of alternative independent variables. It is ideal for selection between independent
variables in a multiple regression model. The R-Square adjusted seeks to take into account
the phenomenon of the R-Square automatically increasing when additional independent
variables are added to the model. This phenomenon is like a painter putting paint on a canvas,
where more paint additively increases the value of the painting. Yet by continually adding

91
paint, there comes a point at which some paint covers other paint, diminishing the value of
the original. Similarly, statistically adding more variables should increase the ability of the
model to capture what it seeks to model. On the other hand, putting in too many variables,
some of which may be poor predictors, might bring down the total predictive ability of the
model. The R-Square adjusted statistic provides some information to aid in revealing this
behavior.

The value of the R-Square adjusted statistic can be negative, but it will always be less than or
equal to that of the R-Square in which it is related. Unlike R-Square, the R-Square adjusted
increases when a new independent variable is included only if the new variable improves the
R-Square more than would be expected in the absence of any independent value being added.
If a set of independent variables is introduced into a regression model one at a time in
forward step-wise regression using the highest correlations ordered first, the R-Square
adjusted statistic will end up being equal to or less than the R-Square value of the original
model. By systematic experimentation with the R-Square adjusted recomputed for each added
variable or combination, the value of the R-Square adjusted will reach a maximum and then
decrease. The multiple regression model with the largest R-Square adjusted statistic will be
the most accurate combination of having the best fit without excessive or unnecessary
independent variables. Again, just putting all the variables into a model may add unneeded
variability, which can decrease its accuracy. Thinning out the variables is important.

Finally, in the step-wise multiple regression procedure, a final decision on the variables to be
included in the model is needed. Basing the decision on the R-Square adjusted, the best
combination is radio/TV. The SAS multiple regression model and support statistics are
presented in Table 6.9.

Table 6.9 SAS Best Variable Combination Regression Model and Statistics:
Marketing/Planning Case Study

Although there are many other additional analyses that could be performed to validate this
model, we will use the SAS multiple regression model in Table 6.9 for the firm in this case
study. The forecasting model can be expressed as follows:

92
Yp = −17150 + 275.69065 X1 + 48.34057 X2

where:

Yp = the estimated number of dollars of product sales

X1 = the number of dollars to invest in radio commercials

X2 = the number of dollars to invest in TV commercials

Because all the data used in the model is expressed as dollars, the interpretation of the model
is made easier than using more complex data. The interpretation of the multiple regression
model suggests that for every dollar allocated to radio commercials (represented by X1), the
firm will receive $275.69 in product sales (represented by Yp in the model). Likewise, for
every dollar allocated to TV commercials (represented by X2), the firm will receive $48.34 in
product sales.

A caution should be mentioned on the results of this case study. Many factors might
challenge a result, particularly those derived from using powerful and complex
methodologies like multiple regression. As such, the results may not occur as estimated,
because the model is not reflecting past performance. What is being suggested here is that
more analysis can always be performed in questionable situations. Also, additional analysis to
confirm a result should be undertaken to strengthen the trust that others must have in the
results to achieve the predicted higher levels of business performance.

In summary, for this case study, the predictive analytics analysis has revealed a more
detailed, quantifiable relationship between the generation of product sales and the sources of
promotion that best predict sales. The best way to allocate the $350,000 budget to maximize
product sales might involve placing the entire budget into radio commercials because they
give the best return per dollar of budget. Unfortunately, there are constraints and limitations
regarding what can be allocated to the different types of promotional methods. Optimizing
the allocation of a resource and maximizing business performance necessitate the use of
special business analytic methods designed to accomplish this task. This requires the

93
additional step of prescriptive analytics analysis in the BA process, which will be presented in
the last section of Chapter 7.

Unit No 5: Prescriptive Analytics

Unit objectives:

List and describe the commonly used prescriptive analytics in the business analytics (BA)
process.

Explain the role of case studies in prescriptive analytics.

Explain how fitting a curve can be used in prescriptive analytics.

Explain how to formulate a linear programming model.

Explain the value of linear programming in the prescriptive analytics step of BA.

Introduction

After undertaking the descriptive and predictive analytics steps in the BA process, one should
be positioned to undertake the final step: prescriptive analytics analysis. The prior analysis
should provide a forecast or prediction of what future trends in the business may hold. For
example, there may be significant statistical measures of increased (or decreased) sales,
profitability trends accurately measured in dollars for new market opportunities, or measured
cost savings from a future joint venture.

If a firm knows where the future lies by forecasting trends, it can best plan to take advantage
of possible opportunities that the trends may offer. Step 3 of the BA process, prescriptive
analytics, involves the application of decision science, management science, or operations
research methodologies to make best use of allocable resources. These are mathematically
based methodologies and algorithms designed to take variables and other parameters into a
quantitative framework and generate an optimal or near-optimal solution to complex
problems. These methodologies can be used to optimally allocate a firm’s limited resources
to take best advantage of the opportunities it has found in the predicted future trends. Limits
on human, technology, and financial resources prevent any firm from going after all the
opportunities. Using prescriptive analytics allows the firm to allocate limited resources to
optimally or near-optimally achieve the objectives as fully as possible.

In Chapter 3, “What Resource Considerations Are Important to Support Business Analytics?”


the relationships of methodologies to the BA process were expressed as a function of
certification exam content. The listing of the prescriptive analytic methodologies as they are

94
in some cases, utilized in the BA process is again presented in Figure 7.1 to form the basis of
this chapter’s content.

Figure 7.1 Prescriptive analytic methodologies

7.2 Prescriptive Modeling

The listing of prescriptive analytic methods and models in Figure 7.1 is but a small grouping
of many operations research, decision science, and management science methodologies that
are applied in this step of the BA process. Most of the methodologies in Table 7.1 are
explained throughout this book. (See the Additional Information Column in Table 7.1.)

95
96
Table 7.1 Select Prescriptive Analytic Models

7.3 Nonlinear Optimization

The prescriptive methodologies in Table 7.1 are explained in detail in the referenced chapters
and appendixes, but nonlinear optimization will be discussed here. When business
performance cost or profit functions become too complex for simple linear models to be
useful, exploration of nonlinear functions is a standard practice in BA. Although the
predictive nature of exploring for a mathematical expression to denote a trend or establish a
forecast falls mainly in the predictive analytics step of BA, the use of the nonlinear function
to optimize a decision can fall in the prescriptive analytics step.

As mentioned previously, there are many mathematical programing nonlinear methodologies


and solution procedures designed to generate optimal business performance solutions. Most

97
of them require careful estimation of parameters that may or may not be accurate, particularly
given the precision required of a solution that can be so precariously dependent upon
parameter accuracy. This precision is further complicated in BA by the large data files that
should be factored into the model-building effort.

To overcome these limitations and be more inclusive in the use of large data, we can apply
regression software. As illustrated in Appendix E, curve-fitting software can be used to
generate predictive analytic models that can also be utilized to aid in making prescriptive
analytic decisions.

For purposes of illustration, SAS’s software will be used to fit data to curves in this chapter.
Suppose that a resource allocation decision is being faced whereby one must decide how
many computer servers a service facility should purchase to optimize the firm’s costs of
running the facility. The firm’s predictive analytics effort has shown a growth trend. A new
facility is called for if costs can be minimized. The firm has a history of setting up large and
small service facilities and has collected the 20 data points in Figure 7.2. Whether there are
20 or 20,000 items in the data file, SAS can be used to fit data based on regression
mathematics to a nonlinear line that best minimizes the distance from the data items to the
line. The software then converts the line into a mathematical expression useful for
forecasting.

Figure 7.2 Data for SAS curve fitting

98
In this server problem, the basic data has a u-shaped function, as presented in Figure 7.3. This
is a classic shape for most cost functions in business. In this problem, it represents the
balancing of having too few servers (resulting in a costly loss of customer business through
dissatisfaction and complaints with the service) or too many servers (excessive waste in
investment costs because of underutilized servers). Although this is an overly simplified
example with little and nicely ordered data for clarity purposes, in big data situations, cost
functions are considerably less obvious.

Figure 7.3 Server problem basic data cost function

The first step in curve fitting is to generate the best-fitting curve to the data. Using SAS and
the data in Figure 7.2, the regression process seeks to minimize the distance by creating a line
in one of the eight regression models in Figure 7.3. Doing this in SAS requires the selection
of a set of functions that the analyst might believe is a good fit. The number of regression
functions selected can be flexible. SAS offers a wide number of possible regression models to
choose from. The result is a series of regression models and statistics, including ANOVA and
other testing statistics. It is known from the previous illustration of regression that the

99
adjusted R-Square statistic can reveal the best estimated relationship between the independent
(number of servers) and dependent (total cost) variables. These statistics are presented in
Table 7.2. The best adjusted R-Square value (the largest) occurs with the quadratic model,
followed by the cubic model. The more detailed supporting statistics for both of these models
are presented in Table 7.3. The graph for all the SPSS curve-fitting models appears in Figure
7.4.

Table 7.2 Adjusted R-Square Values of All SAS Models

100
Table 7.3 Quadratic and Cubic Model SAS Statistics

101
Figure 7.4 Graph of all SAS curve-fitting models

From Table 7.3, the resulting two statistically significant curve-fitted models follow:

Yp = 35418 − 5589.432 X + 268.445 X2 [Quadratic model]

Yp = 36134 − 5954.738 X + 310.895 X2 − 1.347 X3 [Cubic model]

where:

Yp = the forecasted or predicted total cost

X = the number of computer servers

For purposes of illustration, we will use the quadratic model. In the next step of using the
curve-fitted models, one can either use calculus to derive the cost minimizing value for X
(number of servers) or perform a deterministic simulation where values of X are substituted

102
into the model to compute and predict the total cost (Yp). The calculus-based approach is
presented in the “Addendum” section of this chapter.

As a simpler solution method to finding the optimal number of servers, simulation can be
used. Representing a deterministic simulation (see Appendix F, Section F.2.1), the resulting
costs of servers can be computed using the quadratic model, as presented in Figure 7.5. These
values were computed by plugging the number of server values (1 to 20) into the Yp
quadratic function one at a time to generate the predicted values for each of the server
possibilities. Note that the lowest value in these predicted values occurs with the acquisition
of 10 servers at $6367.952, and the next lowest is at 11 servers at $6415.865. In the actual
data in Figure 7.2, the minimum total cost point occurs at 9 servers at $4533, whereas the
next lowest total cost is $4678 occurring at 10 servers. The differences are due to the
estimation process of curve fitting. Note in Figure 7.3 that the curve that is fitted does not
touch the lowest 5 cost values. Like regression in general, it is an estimation process, and
although the ANOVA statistics in the quadratic model demonstrate a strong relationship with
the actual values, there is some error. This process provides a near-optimal solution but does
not guarantee one.

Figure 7.5 Predicted total cost in server problem for each server alternative

Like all regression models, curve fitting is an estimation process with risks, but the
supporting statistics, like ANOVA, provide some degree of confidence in the resulting
solution.

Finally, it must be mentioned that many other nonlinear optimization methodologies exist.
Some, like quadratic programming, are considered constrained optimization models (like
LP). These topics are beyond the scope of this book. For additional information on nonlinear
programming, see King and Wallace (2013), Betts (2009), and Williams (2013). Other
methodologies, like the use of calculus in this chapter, are useful in solving for optimal
solutions in unconstrained problem settings. For additional information on calculus methods,
see Spillers and MacBain (2009), Luptacik (2010), and Kwak and Schneiderman’s (1987).

7.4 Continuation of Marketing/Planning Case Study Example: Prescriptive Step in the BA


Analysis

103
In Chapter 5, “What Is Descriptive Analytics?” and Chapter 6, “What Is Predictive
Analytics?” an ongoing marketing/planning case study was presented to illustrate some of the
tools and strategies used in a BA problem analysis. This is the third and final installment of
the case study dealing with the prescriptive analytics step in BA.

7.4.1 Case Background Review

The predictive analytics analysis in Chapter 6 revealed a statistically strong relationship


between radio and TV commercials that might be useful in predicting future product sales.
The ramifications of these results suggest a better allocation of funds away from paper and
POS ads to radio and TV commercials. Determining how much of the $350,000 budget
should be allocated between the two types of commercials requires the application of an
optimization decision-making methodology.

7.4.2 Prescriptive Analysis

The allocation problem of the budget to purchase radio and TV commercials is a


multivariable (there are two media to consider), constrained (there are some limitations on
how one can allocate the budget funds), optimization problem (BA always seeks to optimize
business performance). Many optimization methods could be employed to determine a
solution to this problem. Considering the singular objective of maximizing estimated product
sales, linear programming (LP) is an ideal methodology to apply in this situation. To employ
LP to model this problem, use the six-step LP formulation procedure explained in Appendix
B.

7.4.2.1 Formulation of LP Marketing/Planning Model

In the process of exploring the allocation options, a number of limitations or constraints on


placing radio and TV commercials were observed. The total budget for all the commercials
was set at a maximum of $350,000 for the next monthly campaign. To receive the radio
commercial price discount requires a minimum budget investment of $15,000. To receive the
TV commercials price discount requires a minimum budget investment of $75,000. Because
the radio and TV stations are owned by the same corporation, there is an agreement that for
every dollar of radio commercials required, the client firm must purchase $2 in TV
commercials. Given these limitations and the modeled relationship found in the previous
predictive analysis, one can formulate the budget allocation decision as an LP model using a
five-step LP formulation procedure (see Appendix B, Section B.4.1):

1. Determine the type of problem—This problem seeks to maximize dollar product sales by
determining how to allocate budget dollars over radio and TV commercials. For each dollar

104
of radio commercials estimated with the regression model, $275.691 will be received, and for
each dollar of TV commercials, $48.341 will be received. Those two parameters are the
product sales values to maximize. Therefore, it will be a maximization model.

2. Define the decision variables—The decision variables for the LP model are derived from
the multiple regression model’s independent variables. The only adjustment is the monthly
timeliness of the allocation of the budget:

X1 = the number of dollars to invest in radio commercials for the next monthly campaign

X2 = the number of dollars to invest in TV commercials for the next monthly campaign

3. Formulate the objective function—Because the multiple regression model defines the
dollar sales as a linear function with the two independent variables, the same dollar
coefficients from the regression model can be used as the contribution coefficients in the
objective function. This results in the following LP model objective function:

Maximize: Z = 275.691 X1 + 48.341 X2

4. Formulate the constraints—Given the information on the limitations in this problem, there
are four constraints:

Constraint 1—No more than $350,000 is allowed for the total budget to allocate to both radio
(X1) and TV (X2) commercials. So, add X1 + X2 and set it less than or equal to 350,000 to
formulate the first constraint as follows:

X1 + X2 ≤ 350000

Constraint 2—To get a discount on radio (X1) commercials, the firm must allocate a
minimum of $15,000 to radio. The constraint for this limitation follows:

105
X1 ≥ 15000

Constraint 3—Similar to Constraint 2, to get a discount on TV (X2) commercials, the firm


must allocate a minimum of $75,000 to TV. The constraint for this limitation follows:

X2 ≥ 75000

Constraint 4—This is a blending problem constraint (see Appendix B, Section B.6.3). What
is needed is to express the relationship as follows:

which is to say, for each one unit of X1, one must acquire two units of X2. Said differently,
the ratio of one unit of X1 is equal to two units of X2. Given the expression, use algebra to
cross-multiply such that

2 X1 = X2

Convert it into an acceptable constraint with a constant on the right side and the variables on
the left side as follows:

2 X1 − X2 = 0

5. State the nonnegativity and given requirements—With only two variables, this formal
requirement in the formulation of an LP model is expressed as follows:

X1, X2 ≥ 0

Because these variables are in dollars, they do not have to be integer values. (They can be any
real or cardinal number.) The complete LP model formulation is given here:

106
7.4.2.2 Solution for the LP Marketing/Planning Model

Appendix B explains that both Excel and LINGO software can be used to run the LP model
and solve the budget allocation in this marketing/planning case study problem. For purposes
of brevity, discussion will be limited to just LINGO. As will be presented in Appendix B,
LINGO is a mathematical programming language and software system. It allows the fairly
simple statement of the LP model to be entered into a single window and run to generate LP
solutions.

LINGO opens with a blank window for entering whatever type of model is desired. After the
LP model formulation is entered into the LINGO software, the resulting data entry
information is presented in Figure 7.6.

Figure 7.6 LINGO LP model entry requirements: marketing/planning case study

There are several minor differences in the model entry requirements over the usual LP model
formulation. These differences are required to run a model in LINGO. These include (1)
using the term “Max” instead of “Maximize,” (2) dropping off “Subject to” and “and” in the
model formulation, (3) placing an asterisk and a space between unknowns and constant
values in the objective and constraint functions where multiplication is required, (4) ending
each expression with a semicolon, and (5) omitting the nonnegativity requirements, which
aren’t necessary.

107
Now that the model is entered into LINGO, a single click on the SOLVE option in the bar at
the top of the window generates a solution. The marketing budget allocation LP model
solution is found in Figure 7.7.

Figure 7.7 LINGO LP model solution: marketing/planning case study

As it turns out, the optimal distribution of the $350,000 promotion budget is to allocate
$116,666.70 to radio commercials and $233,333.30 to TV commercials. The resulting Z
value, which in this model is the total predicted product sales in dollars, is 0.4344352E+08,
or $43,443,524. When we compare that future estimated month’s product sales with the
average current monthly product sales of $16,717,200 presented in Figure 7.7, it does appear
that the firm in this case study will optimally maximize future estimated monthly product
sales if it allocates the budget accordingly (that is, if the multiple regression model estimates
and the other parameters in the LP model hold accurate and true).

In summary, the prescriptive analytics analysis step brings the prior statistical analytic steps
into an applied decision-making process where a potential business performance
improvement is shown to better this organization’s ability to use its resources more
effectively. The management job of monitoring performance and checking to see that
business performance is in fact improved is a needed final step in the BA analysis. Without
proof that business performance is improved, it’s unlikely that BA would continue to be used.

7.4.2.3 Final Comment on the Marketing/Planning Model

108
Although the LP solution methodology used to generate an allocation solution guarantees an
optimal LP solution, it does not guarantee that the firm using this model’s solution will
achieve the results suggested in the analysis. Like any forecasting estimation process, the
numbers are only predictions, not assurances of outcomes. The high levels of significance in
the statistical analysis and the added use of other conformational statistics (R-Square,
adjusted R-Square, ANOVA, and so on) in the model development provide some assurance
of predictive validity. There are many other methods and approaches that could have been
used in this case study. Learning how to use more statistical and decision science tools helps
ensure a better solution in the final analysis.

Summary

This chapter discussed the prescriptive analytics step in the BA process. Specifically, this
chapter revisited and briefly discussed methodologies suggested in BA certification exams.
An illustration of nonlinear optimization was presented to demonstrate how the combination
of software and mathematics can generate useful decision-making information. Finally, this
chapter presented the third installment of a marketing/planning case study illustrating how
prescriptive analytics can benefit the BA process.

We end this book with a final application of the BA process. Once again, several of the
appendixes are designed to augment this chapter’s content by including technical,
mathematical, and statistical tools. For both a greater understanding of the methodologies
discussed in this chapter and a basic review of statistical and other quantitative methods, a
review of the appendixes and chapters is recommended.

Addendum

The differential calculus method for finding the minimum cost point on the quadratic
function that follows involves a couple of steps. It finds the zero-slope point on the cost
function (the point at the bottom of the u-shaped curve where a line could be drawn that
would have a zero slope). There are limitations to its use, and qualifying conditions are
required to prove minimum or maximum positions on a curve. The quadratic model in the
server problem follows:

Yp = 35418 − 5589.432 X + 268.445 X2 [Quadratic model]

Step 1. Given the quadratic function above, take its first derivative.

109
d(Yp) = –5589.432 + 536.89 X

Step 2. Set the derivative function equal to zero and solve for X.

0 = –5589.432 + 536.89 X

X = 10.410758

110

You might also like