Chapter 6 Introduction To Predictive Analytics
Chapter 6 Introduction To Predictive Analytics
Chapter 6 Introduction To Predictive Analytics
• Predictive analytics first started in the 1940s, as governments began using the early computers.
Though it has existed for decades, predictive analytics has now developed into a concept whose time
has come.
• In 1689, predictive analytics was used by the Lloyd company to underwrite insurance for sea voyages. Using
data, the company would accept the risk of sea voyages in return for a premium. Lloyd used data sets of past
trips in order to evaluate the risk of these voyages and predict patterns of liability. Lloyds continues to use
predictive models in all facets of their insurance underwriting, and the idea has become general-practice in the
insurance industry.
• Predictive analytics has evolved greatly since the days of Arnold Daniels and the Lloyd Insurance Company, but
the drive remains the same; to utilize data and patterns to decrease description cost, increase accuracy, and
provide managers with the tools to make the right decision the first time.
www.computerhope.com/issues/ch000984.htm
www.afterinc.com/brief-history-predictive-analytics-part-
What is predictive analytics?
https://www.investopedia.com/terms/p/predictive-analytics.asp
KEY TAKEAWAYS
https://www.investopedia.com/terms/p/predictive-analytics.asp
Types of Predictive Analytical Models
Decision Trees If you want to understand what leads to someone's decisions, then you may find decision trees
useful. This type of model places data into different sections based on certain variables, such as
price or market capitalization. Just as the name implies, it looks like a tree with individual branches
and leaves. Branches indicate the choices available while individual leaves represent a particular
decision.
Regression This is the model that is used the most in statistical analysis. Use it when you want to determine
patterns in large sets of data and when there's a linear relationship between the inputs. This method
works by figuring out a formula, which represents the relationship between all the inputs found in the
dataset. For example, you can use regression to figure out how price and other key factors can shape
the performance of a security.
Neural Networks Neural networks were developed as a form of predictive analytics by imitating the way the human brain
works. This model can deal with complex data relationships using artificial intelligence and pattern
recognition. Use it if you have several hurdles that you need to overcome like when you have too much
data on hand, when you don't have the formula you need to help you find a relationship between the
inputs and outputs in your dataset, or when you need to make predictions rather than come up with
explanations.
Decision Trees
A decision tree is a decision support tool that uses a tree-like model of decisions and their possible consequences,
including chance event outcomes
Regression
A regression is a statistical technique that relates a dependent variable to one or more independent (explanatory)
variables. A regression model is able to show whether changes observed in the dependent variable are associated with
changes in one or more of the explanatory variables.
Neural Networks
Neural networks can help computers make intelligent decisions with limited human assistance.
A neural network is a method in artificial intelligence that teaches computers to process data in a way
that is inspired by the human brain. It is a type of machine learning process, called deep learning, that
uses interconnected nodes or neurons in a layered structure that resembles the human brain
Predictive models are used for all kinds of applications,
including:
•Weather forecasts
•Creating video games
•Translating voice to text for mobile phone messaging
•Customer service
•Investment portfolio development
What are examples of predictive analytics in business?
Reducing risk. Credit scores are used to assess a buyer’s likelihood of default for purchases and
are a well-known example of predictive analytics. A credit score is a number generated by a
predictive model that incorporates all data relevant to a person’s creditworthiness. Other risk-related
uses include insurance claims and collections.
PREDICTIVE ANALYTIC
TOOLS
2. Clustering Model
3. Forecast Model
4. Outliers Model
Hard Clustering
Soft Clustering
FORECAST MODEL
One of the most widely used predictive analytics models.
It works by using different data points (taken from the previous year’s
data) to develop a numerical metric that will predict trends within a
specified period.
PREDICTIVE PROBLEMS
COMMON PREDICTIVE ALGORITHMS
Machine Learning – involves structural data that we see in a table.
Algorithms for this compromise both linear and nonlinear varieties.
1. Random Forest
4. K-Means
5. Prophet
RANDOM FOREST
The most popular classification algorithm, capable of both
classification and regression.
The name “Random Forest” is derived from the fact that the algorithm
is a combination of decision trees. Each tree depends on the values of
a random vector sampled independently with the same distribution for
all trees in the “forest”.
Advantages of Random Forest
Accurate and efficient when running on large databases
Multiple trees reduce the variance and bias of a smaller set or single tree
Resistant to overfitting
Data is more expressive, and benchmarked results show that the GBM
method is preferable in terms of the overall thoroughness of the data.
K-MEANS
Jamaica R. Zara
Discretization or
Normalization
Feature Selection
DATA
COLLECTIO Noise Reduction
N
Outlier Detection
OUTLIER DETECTION
GLOBAL OUTLIERS
COLLECTIVE OUTLIERS
CONTEXTUAL OUTLIERS
Discretization or
Normalization
Feature Selection
DATA
COLLECTIO Noise Reduction
N
Outlier Detection
Instance
Selection
INSTANCE SELECTION
Discretization or
Normalization
Feature Selection
DATA
COLLECTIO Noise Reduction
N
Outlier Detection
Instance
Selection
Missing Value
Imputation
MISSING VALUE IMPUTATION