Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Advanced Data Analytics

Download as pdf or txt
Download as pdf or txt
You are on page 1of 54

ADVANCED DATA ANALYTICS:

What is advanced analytics?


Advanced analytics is a data analysis methodology using predictive
modeling, machine learning algorithms, deep learning, business process
automation and other statistical methods to analyze business information from a
variety of data sources.

Advanced analytics uses data science beyond traditional business intelligence


(BI) methods to predict patterns, estimate the likelihood of future events and find
insights in data that experts might miss. Predictive analytics capabilities can help
an organization be more efficient and increase its accuracy in decision-making.

Data scientists often use advanced analytics tools to combine prescriptive


analytics and predictive analytics. Using different analytics types together adds
options for enhanced visualization and predictive models.

Why is advanced analytics important?


Advanced analytics is a valuable resource because it enables an organization to
improve data asset functionality, regardless of where the data is stored or what
format it's in. Advanced analytics can also help address some of the more
complex business problems traditional BI reporting cannot.

For example, to create a contextual marketing engine, a consumer-packaged


goods manufacturer might need to ask the following questions:

• When is a customer likely to exhaust their supply of an item?


• What time of the day or week are they most receptive to marketing
advertisements?
• What level of profitability is achievable when marketing at the time?
• What price point are they most likely to purchase at?

Organizations can combine consumption models with historical


data and artificial intelligence (AI), which enables advanced analytics to
determine precise answers to the previous questions, and better understand their
customers.

Advanced analytics answers different questions and includes different


components from business intelligence.

What are the benefits of advanced analytics?


In addition to enabling more efficient use of data assets and providing decision-
makers with higher confidence in data accuracy, advanced analytics offers the
following benefits:

• Accurate forecasting. Using advanced analytics can confirm or refute


prediction and forecast models with better accuracy than traditional BI
tools, which still carry an element of uncertainty.
• Faster decision-making. Improving the accuracy of predictions allows
executives to act more quickly. They can be confident their quicker
business decisions will achieve the desired results and favorable
outcomes can be repeated.
• Deeper insight. Advanced analytics offers a deeper level of actionable
insight from data, including customer preference, market trends and key
business processes. Better insights empower stakeholders to make data-
driven decisions with direct effects on their strategy.
• Improved risk management. The higher level of accuracy advanced
analytics provides predictions can help businesses reduce their risk of
costly mistakes.
• Anticipate problems and opportunities. Advanced analytics uses
statistical models to reveal potential problems on the business's current
trajectory or identify new opportunities. Stakeholders can quickly
change course and achieve better outcomes.
What are some advanced analytics techniques?
Advanced analytics can help provide organizations with a competitive advantage.
Techniques range from basic statistical or trend analysis to more complex tasks
requiring BI or specialized tools. The most complex techniques can handle big
data, apply machine learning techniques and perform complex tasks. Some
commonly used advanced analytics techniques include the following:

Data mining. The data mining process sorts through large data sets to identify
patterns and establish relationships. It's a key part of successful analytics
operations because BI and advanced analytics applications use the data that
mining generates to solve problems. It has applications across a variety of
industries including healthcare, government, scientific research, mathematics and
sports.

Sentiment analysis. At its core, sentiment analysis is about understanding


emotions. It processes text data to determine the attitude or emotion behind the
words, which can be positive, negative or neutral. In a business setting, sentiment
analysis can help the business to understand how customers feel about a brand
based on their reviews, social media comments or direct feedback. Tools used for
sentiment analysis range from basic text analytics software to more advanced
natural language processing (NLP) tools, some of which use machine learning to
improve accuracy.
Cluster analysis. Cluster analysis is a method of grouping. It brings together
similar items in a data set. Data groups, or clusters, contain items more similar to
each other than items in other clusters. For example, a telecom company could
use cluster analysis to group customers based on their usage patterns. Then, they
can target each group with a specific marketing strategy.

Complex event processing. Complex event processing (CEP) involves


analyzing multiple events happening across various systems in real time to detect
patterns. If CEP detects patterns of interest or abnormal behaviors, it can trigger
alerts for immediate action. A practical example is credit card fraud detection:
The system monitors transactions and flags any suspicious patterns for
investigation.

Recommender systems. Recommender systems use past behavior analysis to


predict what a user might want, and then personalize suggestions. An everyday
example is when an online shopping site suggests products a customer might
prefer based on their browsing history, or when a streaming service suggests a
show the user may want to watch next.

Time series analysis. Time series analysis focuses on data changes over time. It
looks at patterns, trends and cycles in the data to predict future points. For
instance, a retailer might use time series analysis to forecast future sales based on
past sales data. The results can help the retailer plan stock levels and manage
resources efficiently.

Big data analytics. Big data analytics is the process of examining large volumes
of structured, semistructured and unstructured data to uncover information such
as hidden patterns, correlations, market trends and customer preferences. It uses
analytics systems to power predictive models, statistical algorithms and what-if
analysis.

Machine learning. The development of machine learning has dramatically


increased the speed of data processing and analysis, facilitating disciplines such
as predictive analytics. Machine learning uses AI to enable software applications
to predict outcomes more accurately. The inputs use historical data to predict new
outputs. Common use cases include recommendation engines, fraud detection
and predictive maintenance.

Data visualization. Data visualization is the process of presenting data in


graphical format. It makes data analysis and sharing more accessible across
organizations. Data scientists use visualizations after writing predictive analytics
or machine learning algorithms to visualize outputs, monitor results and ensure
models perform as intended. It's also a quick and effective way to communicate
information to others.

What are some use cases for advanced analytics?


The following examples show how business processes can benefit from advanced
analytics software:

• Marketing metrics. With advanced analytics, marketing organizations


can create customized, targeted marketing campaigns and avoid
wasting money on ineffective strategies. Analyzing future outcomes
also can help an organization identify opportunities to up-sell and
optimize the marketing funnel.
• Supply chain optimization. Advanced analytics can help an
organization factor demand, cost fluctuations and changing consumer
preferences to create an agile supply chain which can quickly adapt to
changing market conditions.
• Risk management. Advanced analytics can examine particular data
sets and data streams in real time. Data scientists can use the results to
identify potential high-risk-level patterns such as possible payment
fraud or insurance liabilities.
• Business operations. Advanced analytics can help organizations
streamline and adapt their operations to better suit predictions on
changing market conditions or trends and ultimately increase revenue.
Implementing advanced analytics
Advanced analytics implementation begins with a well-crafted plan. The key to
success is not just choosing the right tools, but also building a team with the right
skills. The decision to train existing employees or hire new ones is a crucial part
of the strategic plan.

Training current BI users is a cost-effective approach to ensure continuity, but it


requires a significant time investment. Numerous online courses can help users
improve their skills. Existing employees likely know the business and its data;
they already understand the business context, which can prove invaluable in
analytics. One downside is how much time training can take, especially for more
advanced techniques. It requires patience and commitment.

Hiring new staff brings in advanced skills quickly, but it costs more -- data
scientists are expensive to hire now. And hiring new staff can pose integration
challenges: New staff with specialized skills bring immediate access to advanced
capabilities, but they'll need time to understand the nuances of the business and
its data.

A mix of skills is essential in an analytics team. Each data team needs people who
can understand the data, interpret the analysis and translate insights into business
strategies. Perhaps one or two strategic hires can integrate effectively into an
existing team and help bring team members up to speed.

Fostering a data-driven culture is important. Everyone in an organization --


specialist or not -- should understand the value of data and feel empowered to use
it in their decisions. Empowering users enables a business to unlock the potential
of advanced analytics. Investment in advanced analytics is not just about
technology, but also about people and culture. It's just as important to build a
team with the right skills and encourage a data-driven culture as it is to choose
the right tools.

Evaluating advanced analytics tools


Organizations can choose from several advanced analytics platforms. Each offers
different advantages, depending on the use case. Advanced analytics tools have
two categories: open source and proprietary.

Open source tools


Open source tools have become a go-to option for many data scientists doing
machine learning and prescriptive analytics. They include programming
languages, as well as computing environments, such as Hadoop and Spark. Data
scientists typically like open source advanced analytics tools thanks to their
inexpensive price tags. Open source analytics tools also offer strong functionality
and administrators can access support from the user community which
continuously updates the tools.

Proprietary tools
On the proprietary side, vendors such as Microsoft, IBM and the SAS Institute
all offer advanced analytics tools. Most require a deep technical background and
understanding of mathematical techniques.

Self-service analytics tools have matured to make functionality more accessible


to business users with offerings from vendors such as Alteryx, Qlik, Sisense and
Tableau.

TECHNOLOGY AND TOOLS:

Discover the different types of data analytics technologies and tools used across
various industries, including statistical analysis tools, business intelligence tools,
database management tools, and machine learning tools. Learn how these tools
are used for finance and investing, academic research, and in broader business
contexts.

Key Takeaways

• Data analytics technologies encompass a range of tools that manipulate and


analyze datasets, which can be categorized into statistical analysis tools,
business intelligence tools, database management tools, and machine
learning tools.
• Statistical analysis tools, such as Microsoft Excel, IBM SPSS, and SAS are
most commonly utilized in the finance and investing industry and for
academic research. These tools often rely on mathematical functions and
formulas.
• Business intelligence tools like Tableau, Microsoft Power BI, and IBM
Cognos Analytics are versatile, allowing comparative data analysis and
aiding in decision-making processes by comparing multiple datasets.
• Database management tools are used for data collection, storage, and pre-
analysis manipulation. Examples include Microsoft SQL Server, MySQL,
and MongoDB.
• Machine learning tools leverage artificial intelligence and automation to
analyze datasets, with examples including Microsoft Azure, IBM Watson,
and Amazon Web Services (AWS).
• Noble Desktop’s Data Science Classes offer comprehensive training on
these data analytics tools, providing bootcamps and certificate programs
that combine practical application with theoretical knowledge.

Data analytics technologies are any data science tool or software used to
manipulate and analyze a dataset. While some data analytics technologies use
complex algorithms and artificial intelligence to draw conclusions about a
dataset, more traditional tools rely on statistical analysis and mathematical
calculations to return more descriptive analytics.

In addition, many data analytics technologies can be described as big


data technologies which focus on a more holistic approach to data collection and
analysis. These big data technologies create frameworks or platforms for the
analysis of data within a complex system of tools and techniques that are used to
better understand data correlations. Data analytics technologies encompass many
different tools, depending on the types of data analysis needed.

Types of Data Analytics Technologies and Tools

There are several common types of data analytics technologies depending on the
industry that you are in and the type of data that you are working on. Generally,
data analytics technologies can be categorized as statistical analysis tools,
business intelligence tools, database management tools, and machine learning
tools. While these categories do not include all of the tools available to data
scientists, the following list offers a general introduction to data analytics
technologies and their uses within the industry.

Statistical Analysis Tools

Data analytics technologies that rely on statistical analysis are the most traditional
tools within the data science industry, and they are also the most common tools
used for finance and investing, as well as academic research. From beginner-
friendly spreadsheet software, which is useful for returning descriptive analytics,
to more advanced software, which focuses on prescriptive analytics, statistical
analysis tools have many uses across fields. As the name suggests, statistical
analysis tools rely on mathematical functions and formulas in order to learn more
about a dataset. But, these tools can also require the use of statistics-friendly
programming languages like R. And, while many statistical analysis tools only
require the use of calculations and theories to analyze data, many of these tools
now include features that automate the data analysis process. In addition, many
of these tools can transform data analytics into visualizations, such as charts and
graphs, as well as organize and clean data.

Examples of Statistical Analysis Tools:Microsoft Excel, IBM SPSS, and SAS


(Statistical Analytics Software)

Business Intelligence Tools

Business intelligence (BI) tools have also gained considerable popularity within
the data science industry because these tools are more versatile than most
traditional data analytics technologies. And, while the name might suggest
otherwise, business intelligence tools are not just used in business and finance,
but also for any team or individual that requires some form of comparative data
analysis—the process of analyzing different parts of a single dataset or comparing
multiple datasets at the same time or in the same space. So, many business
intelligence tools can be used to not only analyze a dataset but also to create
reports and dashboards which compare several sheets and workbooks full of data.
Many data scientists utilize BI tools when they have a larger collection of data,
but also when they need to make decisions by weighing the costs, benefits, and
risks of various scenarios and potential outcomes.

Examples of Business Intelligence Tools:Tableau, Microsoft Power BI, and


IBM Cognos Analytics

Database Management Tools

Prior to analyzing data, data scientists must collect and store that data, which is
generally when database management tools are used. Database management tools
primarily include software for managing and manipulating data, as well as storing
that data in a secure database management system. However, database
management tools have features that go beyond data storage and include tools for
exploratory and diagnostic data analysis. In addition, most database management
tools are categorized based on data type, with some tools relying on SQL
databases and others utilizing NoSQL databases. But, regardless of the
designation, database management tools allow data scientists to search and query
data, as well as pinpoint any early data analytics issues within a dataset, such as
missing values or other errors.
Examples of Database Management Tools:Microsoft SQL Server, MySQL,
and MongoDB

Machine Learning Tools

In contrast to more traditional data analytics technologies, machine learning


tools rely on automation, artificial intelligence, and the deployment of machine
learning models to analyze a dataset. Similar to business intelligence tools, many
machine learning tools can be used for predictive and prescriptive analytics which
forecast potential future occurrences or outcomes based on past data. And, while
some of these tools are more beginner-friendly and require little to no experience
with programming languages, many machine learning tools require some
knowledge of coding with languages like Python. These tools also tend to
combine multiple aspects of working with data, from the process of collection to
visualization. With that being said, machine learning tools are widely considered
to be the future of data science and analytics.

Examples of Machine Learning Tools: Microsoft Azure, IBM Watson,


and Amazon Web Services (AWS)

UNIT – 5

MACHINE LEARNING AND DEEP LEARNING:


What is Machine Learning?
Machine learning is a subfield of artificial intelligence that focuses on the
development of algorithms and statistical models that enable computers to learn
and make predictions or decisions without being explicitly programmed. It
involves training algorithms on large datasets to identify patterns and
relationships and then using these patterns to make predictions or decisions
about new data.

What are the Different Types of Machine Learning?

Machine learning is further divided into categories based on the data on which
we are training our model.
• Supervised Learning – This method is used when we have Training
data along with the labels for the correct answer.
• Unsupervised Learning – In this task our main objective is to find
the patterns or groups in the dataset at hand because we don’t have any
particular labels in this dataset.

What is Deep Learning?
Deep learning, on the other hand, is a subset of machine learning that uses neural
networks with multiple layers to analyze complex patterns and relationships in
data. It is inspired by the structure and function of the human brain and has been
successful in a variety of tasks, such as computer vision, natural language
processing, and speech recognition.
Deep learning models are trained using large amounts of data and algorithms
that are able to learn and improve over time, becoming more accurate as they
process more data. This makes them well-suited to complex, real-world
problems and enables them to learn and adapt to new situations.

Future of Machine Learning and Deep Learning


Both machine learning and deep learning have the potential to transform a wide
range of industries, including healthcare, finance, retail, and transportation, by
providing insights and automating decision-making processes.
• Machine Learning: Machine learning is a subset, an application of
Artificial Intelligence (AI) that offers the ability of the system to learn
and improve from experience without being programmed to that level.
Machine Learning uses data to train and find accurate results. Machine
learning focuses on the development of a computer program that
accesses the data and uses it to learn from itself.
• Deep Learning: Deep Learning is a subset of Machine Learning
where the artificial neural network and the recurrent neural network
come in relation. The algorithms are created exactly just like machine
learning but it consists of many more levels of algorithms. All these
networks of the algorithm are together called the artificial neural
network. In much simpler terms, it replicates just like the human brain
as all the neural networks are connected in the brain, which exactly is
the concept of deep learning. It solves all the complex problems with
the help of algorithms and its process.

Difference Between Machine Learning and Deep Learning

Now let’s look at the difference between Machine Learning and Deep
Learning:

S.
No. Machine Learning Deep Learning

1. Machine Learning is a superset of Deep Learning is a subset of


Deep Learning Machine Learning

The data represented in Machine


The data representation used in
2. Learning is quite different
Deep Learning is quite different as
compared to Deep Learning as it
it uses neural networks(ANN).
uses structured data

Deep Learning is an evolution of


3. Machine Learning is an evolution
Machine Learning. Basically, it is
of AI.
how deep is the machine learning.

4. Machine learning consists of


Big Data: Millions of data points.
thousands of data points.

Anything from numerical values to


5. Outputs: Numerical Value, like
free-form elements, such as free
classification of the score.
text and sound.

Uses various types of automated Uses a neural network that passes


6. algorithms that turn to model data through processing layers to,
functions and predict future action interpret data features and
from data. relations.
S.
No. Machine Learning Deep Learning

Algorithms are detected by data Algorithms are largely self-


7. analysts to examine specific depicted on data analysis once
variables in data sets. they’re put into production.

Machine Learning is highly used


8. Deep Learning solves complex
to stay in the competition and learn
machine-learning issues.
new things.

Training can be performed using A dedicated GPU (Graphics


9. the CPU (Central Processing Processing Unit) is required for
Unit). training.

Although more difficult to set up,


10. More human intervention is
deep learning requires less
involved in getting results.
intervention once it is running.

Although they require additional


setup time, deep learning
Machine learning systems can be
11. algorithms can produce results
swiftly set up and run, but their
immediately (although the quality
effectiveness may be constrained.
is likely to improve over time as
more data becomes available).

12. Its model takes less time in A huge amount of time is taken
training due to its small size. because of very big data points.

Feature engineering is not needed


13. Humans explicitly do feature because important features are
engineering. automatically detected by neural
networks.

Machine learning applications are


Deep learning systems utilize
14. simpler compared to deep learning
much more powerful hardware and
and can be executed on standard
resources.
computers.
S.
No. Machine Learning Deep Learning

15. The results of an ML model are The results of deep learning are
easy to explain. difficult to explain.

Machine learning models can be Deep learning models are


16. used to solve straightforward or a appropriate for resolving
little bit challenging issues. challenging issues.

Deep learning technology enables


Banks, doctor’s offices, and increasingly sophisticated and
17. mailboxes all employ machine autonomous algorithms, such as
learning already. self-driving automobiles or
surgical robots.

Deep learning, on the other hand,


Machine learning involves
18. uses complex neural networks with
training algorithms to identify
multiple layers to analyze more
patterns and relationships in data.
intricate patterns and relationships.

Machine learning algorithms can Deep learning algorithms, on the


19. range from simple linear models to other hand, are based on artificial
more complex models such as neural networks that consist of
decision trees and random forests. multiple layers and nodes.

Machine learning algorithms Deep learning algorithms, on the


typically require less data than other hand, require large amounts
20. deep learning algorithms, but the of data to train the neural networks
quality of the data is more but can learn and improve on their
important. own as they process more data.

Deep learning, on the other hand, is


Machine learning is used for a
mostly used for complex tasks such
21. wide range of applications, such
as image and speech recognition,
as regression, classification,
natural language processing, and
and clustering.
autonomous systems.
S.
No. Machine Learning Deep Learning

Machine learning algorithms for


complex tasks, but they can also Deep learning algorithms are more
22. be more difficult to train and may accurate than machine learning
require more computational algorithms.
resources.

CLUSTERING:
What is Clustering ?

The task of grouping data points based on their similarity with each other is
called Clustering or Cluster Analysis. This method is defined under the branch
of Unsupervised Learning, which aims at gaining insights from unlabelled data
points, that is, unlike supervised learning we don’t have a target variable.
Clustering aims at forming groups of homogeneous data points from a
heterogeneous dataset. It evaluates the similarity based on a metric like
Euclidean distance, Cosine similarity, Manhattan distance, etc. and then group
the points with highest similarity score together.
For Example, In the graph given below, we can clearly see that there are 3
circular clusters forming on the basis of distance.

Now it is not necessary that the clusters formed must be circular in shape. The
shape of clusters can be arbitrary. There are many algortihms that work well
with detecting arbitrary shaped clusters.
For example, In the below given graph we can see that the clusters formed are
not circular in shape.
Types of Clustering:

Broadly speaking, there are 2 types of clustering that can be performed to group
similar data points:
• Hard Clustering: In this type of clustering, each data point belongs
to a cluster completely or not. For example, Let’s say there are 4 data
point and we have to cluster them into 2 clusters. So each data point
will either belong to cluster 1 or cluster 2.
Data Points Clusters

A C1

B C2

C C2

D C1
• Soft Clustering: In this type of clustering, instead of assigning each
data point into a separate cluster, a probability or likelihood of that
point being that cluster is evaluated. For example, Let’s say there are
4 data point and we have to cluster them into 2 clusters. So we will be
evaluating a probability of a data point belonging to both clusters. This
probability is calculated for all data points.
Data Points Probability of C1 Probability of C2

A 0.91 0.09

B 0.3 0.7

C 0.17 0.83

D 1 0

Uses of Clustering:
Now before we begin with types of clustering algorithms, we will go through
the use cases of Clustering algorithms. Clustering algorithms are majorly used
for:
• Market Segmentation – Businesses use clustering to group their
customers and use targeted advertisements to attract more audience.
• Market Basket Analysis – Shop owners analyze their sales and figure
out which items are majorly bought together by the customers. For
example, In USA, according to a study diapers and beers were usually
bought together by fathers.
• Social Network Analysis – Social media sites use your data to
understand your browsing behaviour and provide you with targeted
friend recommendations or content recommendations.
• Medical Imaging – Doctors use Clustering to find out diseased areas
in diagnostic images like X-rays.
• Anomaly Detection – To find outliers in a stream of real-time dataset
or forecasting fraudulent transactions we can use clustering to identify
them.
• Simplify working with large datasets – Each cluster is given a cluster
ID after clustering is complete. Now, you may reduce a feature set’s
whole feature set into its cluster ID. Clustering is effective when it can
represent a complicated case with a straightforward cluster ID. Using
the same principle, clustering data can make complex datasets simpler.
There are many more use cases for clustering but there are some of the major
and common use cases of clustering. Moving forward we will be discussing
Clustering Algorithms that will help you perform the above tasks.

Types of Clustering Algorithms


At the surface level, clustering helps in the analysis of unstructured data.
Graphing, the shortest distance, and the density of the data points are a few of
the elements that influence cluster formation. Clustering is the process of
determining how related the objects are based on a metric called the similarity
measure. Similarity metrics are easier to locate in smaller sets of features. It gets
harder to create similarity measures as the number of features increases.
Depending on the type of clustering algorithm being utilized in data mining,
several techniques are employed to group the data from the datasets. In this part,
the clustering techniques are described. Various types of clustering algorithms
are:
1. Centroid-based Clustering (Partitioning methods)
2. Density-based Clustering (Model-based methods)
3. Connectivity-based Clustering (Hierarchical clustering)
4. Distribution-based Clustering
We will be going through each of these types in brief.

1. Centroid-based Clustering (Partitioning methods)


Partitioning methods are the most easiest clustering algorithms. They group data
points on the basis of their closeness. Generally, the similarity measure chosen
for these algorithms are Euclidian distance, Manhattan Distance or Minkowski
Distance. The datasets are separated into a predetermined number of clusters,
and each cluster is referenced by a vector of values. When compared to the
vector value, the input data variable shows no difference and joins the cluster.
The primary drawback for these algorithms is the requirement that we establish
the number of clusters, “k,” either intuitively or scientifically (using the Elbow
Method) before any clustering machine learning system starts allocating the data
points. Despite this, it is still the most popular type of clustering. K-
means and K-medoids clustering are some examples of this type clustering.

2. Density-based Clustering (Model-based methods)


Density-based clustering, a model-based method, finds groups based on the
density of data points. Contrary to centroid-based clustering, which requires that
the number of clusters be predefined and is sensitive to initialization, density-
based clustering determines the number of clusters automatically and is less
susceptible to beginning positions. They are great at handling clusters of
different sizes and forms, making them ideally suited for datasets with
irregularly shaped or overlapping clusters. These methods manage both dense
and sparse data regions by focusing on local density and can distinguish clusters
with a variety of morphologies.
In contrast, centroid-based grouping, like k-means, has trouble finding arbitrary
shaped clusters. Due to its preset number of cluster requirements and extreme
sensitivity to the initial positioning of centroids, the outcomes can vary.
Furthermore, the tendency of centroid-based approaches to produce spherical or
convex clusters restricts their capacity to handle complicated or irregularly
shaped clusters. In conclusion, density-based clustering overcomes the
drawbacks of centroid-based techniques by autonomously choosing cluster
sizes, being resilient to initialization, and successfully capturing clusters of
various sizes and forms. The most popular density-based clustering algorithm
is DBSCAN.

3. Connectivity-based Clustering (Hierarchical clustering)


A method for assembling related data points into hierarchical clusters is called
hierarchical clustering. Each data point is initially taken into account as a
separate cluster, which is subsequently combined with the clusters that are the
most similar to form one large cluster that contains all of the data points.
Think about how you may arrange a collection of items based on how similar
they are. Each object begins as its own cluster at the base of the tree when using
hierarchical clustering, which creates a dendrogram, a tree-like structure. The
closest pairings of clusters are then combined into larger clusters after the
algorithm examines how similar the objects are to one another. When every
object is in one cluster at the top of the tree, the merging process has finished.
Exploring various granularity levels is one of the fun things about hierarchical
clustering. To obtain a given number of clusters, you can select to cut
the dendrogram at a particular height. The more similar two objects are within
a cluster, the closer they are. It’s comparable to classifying items according to
their family trees, where the nearest relatives are clustered together and the
wider branches signify more general connections. There are 2 approaches for
Hierarchical clustering:
• Divisive Clustering: It follows a top-down approach, here we
consider all data points to be part one big cluster and then this cluster
is divide into smaller groups.
• Agglomerative Clustering: It follows a bottom-up approach, here
we consider all data points to be part of individual clusters and then
these clusters are clubbed together to make one big cluster with all data
points.

4. Distribution-based Clustering
Using distribution-based clustering, data points are generated and organized
according to their propensity to fall into the same probability distribution (such
as a Gaussian, binomial, or other) within the data. The data elements are grouped
using a probability-based distribution that is based on statistical distributions.
Included are data objects that have a higher likelihood of being in the cluster. A
data point is less likely to be included in a cluster the further it is from the
cluster’s central point, which exists in every cluster.
A notable drawback of density and boundary-based approaches is the need to
specify the clusters a priori for some algorithms, and primarily the definition of
the cluster form for the bulk of algorithms. There must be at least one tuning or
hyper-parameter selected, and while doing so should be simple, getting it wrong
could have unanticipated repercussions. Distribution-based clustering has a
definite advantage over proximity and centroid-based clustering approaches in
terms of flexibility, accuracy, and cluster structure. The key issue is that, in
order to avoid overfitting, many clustering methods only work with simulated
or manufactured data, or when the bulk of the data points certainly belong to a
preset distribution. The most popular distribution-based clustering algorithm
is Gaussian Mixture Model.

Applications of Clustering in different fields:

1. Marketing: It can be used to characterize & discover customer


segments for marketing purposes.
2. Biology: It can be used for classification among different species of
plants and animals.
3. Libraries: It is used in clustering different books on the basis of
topics and information.
4. Insurance: It is used to acknowledge the customers, their policies
and identifying the frauds.
5. City Planning: It is used to make groups of houses and to study their
values based on their geographical locations and other factors present.
6. Earthquake studies: By learning the earthquake-affected areas we
can determine the dangerous zones.
7. Image Processing: Clustering can be used to group similar images
together, classify images based on content, and identify patterns in
image data.
8. Genetics: Clustering is used to group genes that have similar
expression patterns and identify gene networks that work together in
biological processes.
9. Finance: Clustering is used to identify market segments based on
customer behavior, identify patterns in stock market data, and analyze
risk in investment portfolios.
10.Customer Service: Clustering is used to group customer inquiries
and complaints into categories, identify common issues, and develop
targeted solutions.
11.Manufacturing: Clustering is used to group similar products
together, optimize production processes, and identify defects in
manufacturing processes.
12.Medical diagnosis: Clustering is used to group patients with similar
symptoms or diseases, which helps in making accurate diagnoses and
identifying effective treatments.
13.Fraud detection: Clustering is used to identify suspicious patterns or
anomalies in financial transactions, which can help in detecting fraud
or other financial crimes.
14.Traffic analysis: Clustering is used to group similar patterns of
traffic data, such as peak hours, routes, and speeds, which can help in
improving transportation planning and infrastructure.
15.Social network analysis: Clustering is used to identify communities
or groups within social networks, which can help in understanding
social behavior, influence, and trends.
16.Cybersecurity: Clustering is used to group similar patterns of
network traffic or system behavior, which can help in detecting and
preventing cyberattacks.
17.Climate analysis: Clustering is used to group similar patterns of
climate data, such as temperature, precipitation, and wind, which can
help in understanding climate change and its impact on the
environment.
18.Sports analysis: Clustering is used to group similar patterns of
player or team performance data, which can help in analyzing player
or team strengths and weaknesses and making strategic decisions.
19.Crime analysis: Clustering is used to group similar patterns of crime
data, such as location, time, and type, which can help in identifying
crime hotspots, predicting future crime trends, and improving crime
prevention strategies.
Association Rule Learning
Association rule learning is a type of unsupervised learning technique that checks
for the dependency of one data item on another data item and maps accordingly
so that it can be more profitable. It tries to find some interesting relations or
associations among the variables of dataset. It is based on different rules to
discover the interesting relations between variables in the database.

The association rule learning is one of the very important concepts of machine
learning, and it is employed in Market Basket analysis, Web usage mining,
continuous production, etc. Here market basket analysis is a technique used by
the various big retailer to discover the associations between items. We can
understand it by taking an example of a supermarket, as in a supermarket, all
products that are purchased together are put together.

For example, if a customer buys bread, he most likely can also buy butter, eggs,
or milk, so these products are stored within a shelf or mostly nearby. Consider
the below diagram:
Association rule learning can be divided into three types of algorithms:

ADVERTISEMENT
1. Apriori
2. Eclat
3. F-P Growth Algorithm

We will understand these algorithms in later chapters.

How does Association Rule Learning work?

Association rule learning works on the concept of If and Else Statement, such as
if A then B.

Here the If element is called antecedent, and then statement is called


as Consequent. These types of relationships where we can find out some
association or relation between two items is known as single cardinality. It is all
about creating rules, and if the number of items increases, then cardinality also
increases accordingly. So, to measure the associations between thousands of data
items, there are several metrics. These metrics are given below:
o Support
o Confidence
o Lift

Let's understand each of them:

Support

Support is the frequency of A or how frequently an item appears in the dataset. It


is defined as the fraction of the transaction T that contains the itemset X. If there
are X datasets, then for transactions T, it can be written as:

Confidence

Confidence indicates how often the rule has been found to be true. Or how often
the items X and Y occur together in the dataset when the occurrence of X is
already given. It is the ratio of the transaction that contains X and Y to the number
of records that contain X.

Lift

It is the strength of any rule, which can be defined as below formula:

It is the ratio of the observed support measure and expected support if X and Y
are independent of each other. It has three possible values:

o If Lift= 1: The probability of occurrence of antecedent and consequent is


independent of each other.
o Lift>1: It determines the degree to which the two itemsets are dependent
to each other.
o Lift<1: It tells us that one item is a substitute for other items, which means
one item has a negative effect on another.

Types of Association Rule Lerning

Association rule learning can be divided into three algorithms:

Apriori Algorithm

This algorithm uses frequent datasets to generate association rules. It is designed


to work on the databases that contain transactions. This algorithm uses a breadth-
first search and Hash Tree to calculate the itemset efficiently.

It is mainly used for market basket analysis and helps to understand the products
that can be bought together. It can also be used in the healthcare field to find drug
reactions for patients.

Eclat Algorithm

Eclat algorithm stands for Equivalence Class Transformation. This algorithm


uses a depth-first search technique to find frequent itemsets in a transaction
database. It performs faster execution than Apriori Algorithm.

F-P Growth Algorithm

The F-P growth algorithm stands for Frequent Pattern, and it is the improved
version of the Apriori Algorithm. It represents the database in the form of a tree
structure that is known as a frequent pattern or tree. The purpose of this frequent
tree is to extract the most frequent patterns.

Applications of Association Rule Learning

It has various applications in machine learning and data mining. Below are some
popular applications of association rule learning:

o Market Basket Analysis: It is one of the popular examples and


applications of association rule mining. This technique is commonly used
by big retailers to determine the association between items.
o Medical Diagnosis: With the help of association rules, patients can be
cured easily, as it helps in identifying the probability of illness for a
particular disease.
o Protein Sequence: The association rules help in determining the synthesis
of artificial Proteins.
o It is also used for the Catalog Design and Loss-leader Analysis and many
more other applications.

Linear Regression in Machine learning

Machine Learning is a branch of Artificial intelligence that focuses on the


development of algorithms and statistical models that can learn from and make
predictions on data. Linear regression is also a type of machine-learning
algorithm more specifically a supervised machine-learning algorithm that
learns from the labelled datasets and maps the data points to the most optimized
linear functions. which can be used for prediction on new datasets.
First of we should know what supervised machine learning algorithms is. It is a
type of machine learning where the algorithm learns from labelled data. Labeled
data means the dataset whose respective target value is already known.
Supervised learning has two types:
• Classification: It predicts the class of the dataset based on the
independent input variable. Class is the categorical or discrete values.
like the image of an animal is a cat or dog?

• Regression: It predicts the continuous output variables based on the


independent input variable. like the prediction of house prices based
on different parameters like house age, distance from the main road,
location, area, etc.

What is Linear Regression?

Linear regression is a type of supervised machine learning algorithm that


computes the linear relationship between the dependent variable and one or more
independent features by fitting a linear equation to observed data.
When there is only one independent feature, it is known as Simple Linear
Regression, and when there are more than one feature, it is known as Multiple
Linear Regression.

Similarly, when there is only one dependent variable, it is considered Univariate


Linear Regression, while when there are more than one dependent variables, it is
known as Multivariate Regression.

Why Linear Regression is Important?

The interpretability of linear regression is a notable strength. The model's


equation provides clear coefficients that elucidate the impact of each independent
variable on the dependent variable, facilitating a deeper understanding of the
underlying dynamics. Its simplicity is a virtue, as linear regression is transparent,
easy to implement, and serves as a foundational concept for more complex
algorithms.
Linear regression is not merely a predictive tool; it forms the basis for various
advanced models. Techniques like regularization and support vector machines
draw inspiration from linear regression, expanding its utility. Additionally, linear
regression is a cornerstone in assumption testing, enabling researchers to validate
key assumptions about the data.

Types of Linear Regression

There are two main types of linear regression:


Simple Linear Regression

This is the simplest form of linear regression, and it involves only one
independent variable and one dependent variable. The equation for simple linear
regression is:
[Tex]y=\beta_{0}+\beta_{1}X [/Tex]
where:

•Y is the dependent variable

•X is the independent variable

• β0 is the intercept

• β1 is the slope

Multiple Linear Regression

This involves more than one independent variable and one dependent variable.
The equation for multiple linear regression is:
[Tex]y=\beta_{0}+\beta_{1}X+\beta_{2}X+.........\beta_{n}X [/Tex]
where:

•Y is the dependent variable

• X1, X2, ..., Xp are the independent variables

• β0 is the intercept

• β1, β2, ..., βn are the slopes

The goal of the algorithm is to find the best Fit Line equation that can predict
the values based on the independent variables.

In regression set of records are present with X and Y values and these values are
used to learn a function so if you want to predict Y from an unknown X this
learned function can be used. In regression we have to find the value of Y, So, a
function is required that predicts continuous Y in the case of regression given X
as independent features.

What is the best Fit Line?

Our primary objective while using linear regression is to locate the best-fit line,
which implies that the error between the predicted and actual values should be
kept to a minimum. There will be the least error in the best-fit line.
The best Fit Line equation provides a straight line that represents the relationship
between the dependent and independent variables. The slope of the line indicates
how much the dependent variable changes for a unit change in the independent
variable(s).

Linear Regression

Here Y is called a dependent or target variable and X is called an independent


variable also known as the predictor of Y. There are many types of functions or
modules that can be used for regression. A linear function is the simplest type of
function. Here, X may be a single feature or multiple features representing the
problem.

Linear regression performs the task to predict a dependent variable value (y)
based on a given independent variable (x)). Hence, the name is Linear Regression.
In the figure above, X (input) is the work experience and Y (output) is the salary
of a person. The regression line is the best-fit line for our model.
We utilize the cost function to compute the best values in order to get the best fit
line since different values for weights or the coefficient of lines result in different
regression lines.

Hypothesis function in Linear Regression

As we have assumed earlier that our independent feature is the experience i.e X
and the respective salary Y is the dependent variable. Let's assume there is a linear
relationship between X and Y then the salary can be predicted using:
[Tex]\hat{Y} = \theta_1 + \theta_2X [/Tex]
OR

[Tex]\hat{y}_i = \theta_1 + \theta_2x_i [/Tex]


Here,

• [Tex]y_i\epsilon Y \;\; (i= 1,2, \cdots , n) [/Tex] are labels to data


(Supervised learning)

• [Tex]x_i\epsilon X \;\; (i= 1,2, \cdots , n) [/Tex] are the input


independent training data (univariate - one input variable(parameter))

• [Tex]\hat{y_i} \epsilon \hat{Y} \;\; (i= 1,2, \cdots , n) [/Tex] are the
predicted values.

The model gets the best regression fit line by finding the best θ1 and θ2 values.

• θ1: intercept

• θ2: coefficient of x

Once we find the best θ1 and θ2 values, we get the best-fit line. So when we are
finally using our model for prediction, it will predict the value of y for the input
value of x.

How to update θ1 and θ2 values to get the best-fit line?

To achieve the best-fit regression line, the model aims to predict the target
value [Tex]\hat{Y} [/Tex] such that the error difference between the predicted
value [Tex]\hat{Y} [/Tex] and the true value Y is minimum. So, it is very
important to update the θ1 and θ2 values, to reach the best value that minimizes
the error between the predicted y value (pred) and the true y value (y).

[Tex]minimize\frac{1}{n}\sum_{i=1}^{n}(\hat{y_i}-y_i)^2 [/Tex]

Cost function for Linear Regression

The cost function or the loss function is nothing but the error or difference
between the predicted value [Tex]\hat{Y} [/Tex] and the true value Y.

In Linear Regression, the Mean Squared Error (MSE) cost function is


employed, which calculates the average of the squared errors between the
predicted values [Tex]\hat{y}_i [/Tex] and the actual values [Tex]{y}_i [/Tex].
The purpose is to determine the optimal values for the intercept [Tex]\theta_1
[/Tex] and the coefficient of the input feature [Tex]\theta_2 [/Tex] providing the
best-fit line for the given data points. The linear equation expressing this
relationship is [Tex]\hat{y}_i = \theta_1 + \theta_2x_i [/Tex].
MSE function can be calculated as:

[Tex]\text{Cost function}(J) = \frac{1}{n}\sum_{n}^{i}(\hat{y_i}-y_i)^2


[/Tex]
Utilizing the MSE function, the iterative process of gradient descent is applied to
update the values of \[Tex]\theta_1 \& \theta_2 [/Tex]. This ensures that the MSE
value converges to the global minima, signifying the most accurate fit of the linear
regression line to the dataset.

This process involves continuously adjusting the parameters \(\theta_1\) and


\(\theta_2\) based on the gradients calculated from the MSE. The final result is a
linear regression line that minimizes the overall squared differences between the
predicted and actual values, providing an optimal representation of the underlying
relationship in the data.

Assumptions of Simple Linear Regression

Linear regression is a powerful tool for understanding and predicting the behavior
of a variable, however, it needs to meet a few conditions in order to be accurate
and dependable solutions.

1. Linearity: The independent and dependent variables have a linear


relationship with one another. This implies that changes in the dependent
variable follow those in the independent variable(s) in a linear fashion.
This means that there should be a straight line that can be drawn through
the data points. If the relationship is not linear, then linear regression will
not be an accurate model.

2. Independence: The observations in the dataset are independent of each


other. This means that the value of the dependent variable for one
observation does not depend on the value of the dependent variable for
another observation. If the observations are not independent, then linear
regression will not be an accurate model.

3. Homoscedasticity: Across all levels of the independent variable(s), the


variance of the errors is constant. This indicates that the amount of the
independent variable(s) has no impact on the variance of the errors. If
the variance of the residuals is not constant, then linear regression will
not be an accurate model.

Homoscedasticity in Linear Regression


4. Normality: The residuals should be normally distributed. This means
that the residuals should follow a bell-shaped curve. If the residuals are
not normally distributed, then linear regression will not be an accurate
model.

Assumptions of Multiple Linear Regression

For Multiple Linear Regression, all four of the assumptions from Simple Linear
Regression apply. In addition to this, below are few more:

1. No multicollinearity: There is no high correlation between the


independent variables. This indicates that there is little or no correlation
between the independent variables. Multicollinearity occurs when two or
more independent variables are highly correlated with each other, which
can make it difficult to determine the individual effect of each variable
on the dependent variable. If there is multicollinearity, then multiple
linear regression will not be an accurate model.

2. Additivity: The model assumes that the effect of changes in a predictor


variable on the response variable is consistent regardless of the values of
the other variables. This assumption implies that there is no interaction
between variables in their effects on the dependent variable.

3. Feature Selection: In multiple linear regression, it is essential to


carefully select the independent variables that will be included in the
model. Including irrelevant or redundant variables may lead to
overfitting and complicate the interpretation of the model.

4. Overfitting: Overfitting occurs when the model fits the training data too
closely, capturing noise or random fluctuations that do not represent the
true underlying relationship between variables. This can lead to poor
generalization performance on new, unseen data.

Multicollinearity

Multicollinearity is a statistical phenomenon that occurs when two or more


independent variables in a multiple regression model are highly correlated,
making it difficult to assess the individual effects of each variable on the
dependent variable.
Detecting Multicollinearity includes two techniques:

• Correlation Matrix: Examining the correlation matrix among the


independent variables is a common way to detect multicollinearity. High
correlations (close to 1 or -1) indicate potential multicollinearity.

• VIF (Variance Inflation Factor): VIF is a measure that quantifies how


much the variance of an estimated regression coefficient increases if your
predictors are correlated. A high VIF (typically above 10) suggests
multicollinearity.

Evaluation Metrics for Linear Regression

A variety of evaluation measures can be used to determine the strength of any


linear regression model. These assessment metrics often give an indication of
how well the model is producing the observed outputs.

The most common measurements are:

Mean Square Error (MSE)

Mean Squared Error (MSE) is an evaluation metric that calculates the average of
the squared differences between the actual and predicted values for all the data
points. The difference is squared to ensure that negative and positive differences
don't cancel each other out.

[Tex]MSE = \frac{1}{n}\sum_{i=1}^{n}\left ( y_i - \widehat{y_{i}} \right )^2


[/Tex]
Here,

•n is the number of data points.

• yi is the actual or observed value for the ith data point.

• [Tex]\widehat{y_{i}} [/Tex] is the predicted value for the ith data point.

MSE is a way to quantify the accuracy of a model's predictions. MSE is sensitive


to outliers as large errors contribute significantly to the overall score.

Mean Absolute Error (MAE)

Mean Absolute Error is an evaluation metric used to calculate the accuracy of a


regression model. MAE measures the average absolute difference between the
predicted values and actual values.
Mathematically, MAE is expressed as:
[Tex]MAE =\frac{1}{n} \sum_{i=1}^{n}|Y_i - \widehat{Y_i}| [/Tex]

Here,

•n is the number of observations

• Yi represents the actual values.

• [Tex]\widehat{Y_i} [/Tex] represents the predicted values

Lower MAE value indicates better model performance. It is not sensitive to the
outliers as we consider absolute differences.

Root Mean Squared Error (RMSE)

The square root of the residuals' variance is the Root Mean Squared Error. It
describes how well the observed data points match the expected values, or the
model's absolute fit to the data.

In mathematical notation, it can be expressed as:


[Tex]RMSE=\sqrt{\frac{RSS}{n}}=\sqrt\frac{{{\sum_{i=2}^{n}(y^{actual}_{
i}}- y_{i}^{predicted})^2}}{n} [/Tex]
Rather than dividing the entire number of data points in the model by the number
of degrees of freedom, one must divide the sum of the squared residuals to obtain
an unbiased estimate. Then, this figure is referred to as the Residual Standard
Error (RSE).
In mathematical notation, it can be expressed as:
[Tex]RMSE=\sqrt{\frac{RSS}{n}}=\sqrt\frac{{{\sum_{i=2}^{n}(y^{actual}_{
i}}- y_{i}^{predicted})^2}}{(n-2)} [/Tex]

RSME is not as good of a metric as R-squared. Root Mean Squared Error can
fluctuate when the units of the variables vary since its value is dependent on the
variables' units (it is not a normalized measure).

Coefficient of Determination (R-squared)

R-Squared is a statistic that indicates how much variation the developed model
can explain or capture. It is always in the range of 0 to 1. In general, the better the
model matches the data, the greater the R-squared number.
In mathematical notation, it can be expressed as:
[Tex]R^{2}=1-(^{\frac{RSS}{TSS}}) [/Tex]
• Residual sum of Squares (RSS): The sum of squares of the residual for
each data point in the plot or data is known as the residual sum of squares,
or RSS. It is a measurement of the difference between the output that was
observed and what was anticipated.
[Tex]RSS=\sum_{i=2}^{n}(y_{i}-b_{0}-b_{1}x_{i})^{2} [/Tex]

• Total Sum of Squares (TSS): The sum of the data points' errors from the
answer variable's mean is known as the total sum of squares, or TSS.
[Tex]TSS= \sum_{}^{}(y-\overline{y_{i}})^2 [/Tex]

R squared metric is a measure of the proportion of variance in the dependent


variable that is explained the independent variables in the model.

Adjusted R-Squared Error

Adjusted R2 measures the proportion of variance in the dependent variable that is


explained by independent variables in a regression model. Adjusted R-
square accounts the number of predictors in the model and penalizes the model
for including irrelevant predictors that don't contribute significantly to explain the
variance in the dependent variables.

Mathematically, adjusted R2 is expressed as:

[Tex]Adjusted \, R^2 = 1 - (\frac{(1-R^2).(n-1)}{n-k-1}) [/Tex]


Here,

•n is the number of observations

•k is the number of predictors in the model


2
•R is coeeficient of determination

Adjusted R-square helps to prevent overfitting. It penalizes the model with


additional predictors that do not contribute significantly to explain the variance
in the dependent variable.

Regularization Techniques for Linear Models

Lasso Regression (L1 Regularization)

Lasso Regression is a technique used for regularizing a linear regression model,


it adds a penalty term to the linear regression objective function to
prevent overfitting.
The objective function after applying lasso regression is:
[Tex]J(\theta) = \frac{1}{2m} \sum_{i=1}^{m}(\widehat{y_i} - y_i) + \lambda
\sum_{j=1}^{n}|\theta_j| [/Tex]

• the first term is the least squares loss, representing the squared difference
between predicted and actual values.

• the second term is the L1 regularization term, it penalizes the sum of


absolute values of the regression coefficient θj.

Ridge Regression (L2 Regularization)

Ridge regression is a linear regression technique that adds a regularization term


to the standard linear objective. Again, the goal is to prevent overfitting by
penalizing large coefficient in linear regression equation. It useful when the
dataset has multicollinearity where predictor variables are highly correlated.

The objective function after applying ridge regression is:

[Tex]J(\theta) = \frac{1}{2m} \sum_{i=1}^{m}(\widehat{y_i} - y_i) + \lambda


\sum_{j=1}^{n}\theta_{j}^{2} [/Tex]

• the first term is the least squares loss, representing the squared difference
between predicted and actual values.

• the second term is the L1 regularization term, it penalizes the sum of square
of values of the regression coefficient θj.

Elastic Net Regression

Elastic Net Regression is a hybrid regularization technique that combines the


power of both L1 and L2 regularization in linear regression objective.

[Tex]J(\theta) = \frac{1}{2m} \sum_{i=1}^{m}(\widehat{y_i} - y_i) + \alpha


\lambda \sum_{j=1}^{n}{|\theta_j|} + \frac{1}{2}(1- \alpha) \lambda
\sum_{j=1}{n} \theta_{j}^{2} [/Tex]

• the first term is least square loss.

• the second term is L1 regularization and third is ridge regression.

• ???? is the overall regularization strength.

•α controls the mix between L1 and L2 regularization.


Applications of Linear Regression

Linear regression is used in many different fields, including finance, economics,


and psychology, to understand and predict the behavior of a particular variable.
For example, in finance, linear regression might be used to understand the
relationship between a company's stock price and its earnings or to predict the
future value of a currency based on its past performance.

Advantages & Disadvantages of Linear Regression

Advantages of Linear Regression

• Linear regression is a relatively simple algorithm, making it easy to


understand and implement. The coefficients of the linear regression
model can be interpreted as the change in the dependent variable for a
one-unit change in the independent variable, providing insights into the
relationships between variables.

• Linear regression is computationally efficient and can handle large datasets


effectively. It can be trained quickly on large datasets, making it suitable
for real-time applications.

• Linear regression is relatively robust to outliers compared to other machine


learning algorithms. Outliers may have a smaller impact on the overall
model performance.

• Linearregression often serves as a good baseline model for comparison


with more complex machine learning algorithms.

• Linear
regression is a well-established algorithm with a rich history and is
widely available in various machine learning libraries and software
packages.

Disadvantages of Linear Regression

• Linearregression assumes a linear relationship between the dependent and


independent variables. If the relationship is not linear, the model may not
perform well.

• Linear regression is sensitive to multicollinearity, which occurs when there


is a high correlation between independent variables. Multicollinearity
can inflate the variance of the coefficients and lead to unstable model
predictions.
• Linear regression assumes that the features are already in a suitable form
for the model. Feature engineering may be required to transform features
into a format that can be effectively used by the model.

• Linear regression is susceptible to both overfitting and underfitting.


Overfitting occurs when the model learns the training data too well and
fails to generalize to unseen data. Underfitting occurs when the model is
too simple to capture the underlying relationships in the data.

• Linear regression provides limited explanatory power for complex


relationships between variables. More advanced machine learning
techniques may be necessary for deeper insights.

LOGISTIC REGRESSION:
What is Logistic Regression?
Logistic regression is used for binary classification where we use sigmoid
function, that takes input as independent variables and produces a probability
value between 0 and 1.
For example, we have two classes Class 0 and Class 1 if the value of the logistic
function for an input is greater than 0.5 (threshold value) then it belongs to Class
1 otherwise it belongs to Class 0. It’s referred to as regression because it is the
extension of linear regression but is mainly used for classification problems.

Key Points:
• Logistic regression predicts the output of a categorical dependent
variable. Therefore, the outcome must be a categorical or discrete
value.
• It can be either Yes or No, 0 or 1, true or False, etc. but instead of
giving the exact value as 0 and 1, it gives the probabilistic values
which lie between 0 and 1.
• In Logistic regression, instead of fitting a regression line, we fit an
“S” shaped logistic function, which predicts two maximum values (0
or 1).

Logistic Function – Sigmoid Function
• The sigmoid function is a mathematical function used to map the
predicted values to probabilities.
• It maps any real value into another value within a range of 0 and 1.
The value of the logistic regression must be between 0 and 1, which
cannot go beyond this limit, so it forms a curve like the “S” form.
• The S-form curve is called the Sigmoid function or the logistic
function.
• In logistic regression, we use the concept of the threshold value,
which defines the probability of either 0 or 1. Such as values above the
threshold value tends to 1, and a value below the threshold values tends
to 0.

Types of Logistic Regression
On the basis of the categories, Logistic Regression can be classified into three
types:
1. Binomial: In binomial Logistic regression, there can be only two
possible types of the dependent variables, such as 0 or 1, Pass or Fail,
etc.
2. Multinomial: In multinomial Logistic regression, there can be 3 or
more possible unordered types of the dependent variable, such as
“cat”, “dogs”, or “sheep”
3. Ordinal: In ordinal Logistic regression, there can be 3 or more
possible ordered types of dependent variables, such as “low”,
“Medium”, or “High”.
4.
Assumptions of Logistic Regression
We will explore the assumptions of logistic regression as understanding these
assumptions is important to ensure that we are using appropriate application of
the model. The assumption include:
1. Independent observations: Each observation is independent of the
other. meaning there is no correlation between any input variables.
2. Binary dependent variables: It takes the assumption that the
dependent variable must be binary or dichotomous, meaning it can
take only two values. For more than two categories SoftMax functions
are used.
3. Linearity relationship between independent variables and log odds:
The relationship between the independent variables and the log odds
of the dependent variable should be linear.
4. No outliers: There should be no outliers in the dataset.
5. Large sample size: The sample size is sufficiently large
6.
Terminologies involved in Logistic Regression
Here are some common terms involved in logistic regression:
• Independent variables: The input characteristics or predictor factors
applied to the dependent variable’s predictions.
• Dependent variable: The target variable in a logistic regression
model, which we are trying to predict.
• Logistic function: The formula used to represent how the
independent and dependent variables relate to one another. The
logistic function transforms the input variables into a probability value
between 0 and 1, which represents the likelihood of the dependent
variable being 1 or 0.
• Odds: It is the ratio of something occurring to something not
occurring. it is different from probability as the probability is the ratio
of something occurring to everything that could possibly occur.
• Log-odds: The log-odds, also known as the logit function, is the
natural logarithm of the odds. In logistic regression, the log odds of
the dependent variable are modeled as a linear combination of the
independent variables and the intercept.
• Coefficient: The logistic regression model’s estimated parameters,
show how the independent and dependent variables relate to one
another.
• Intercept: A constant term in the logistic regression model, which
represents the log odds when all independent variables are equal to
zero.
• Maximum likelihood estimation: The method used to estimate the
coefficients of the logistic regression model, which maximizes the
likelihood of observing the data given the model.

How does Logistic Regression work?
The logistic regression model transforms the linear regression function
continuous value output into categorical value output using a sigmoid function,
which maps any real-valued set of independent variables input into a value
between 0 and 1. This function is known as the logistic function.
Let the independent input features be:
𝑋=[𝑥11 …𝑥1𝑚𝑥21 …𝑥2𝑚 ⋮⋱ ⋮ 𝑥𝑛1 …𝑥𝑛𝑚]X=x11 x21 ⋮xn1 ……⋱ …x1m
x2m⋮ xnm
and the dependent variable is Y having only binary value i.e. 0 or 1.
𝑌={0 if 𝐶𝑙𝑎𝑠𝑠11 if 𝐶𝑙𝑎𝑠𝑠2Y={01 if Class1 if Class2
then, apply the multi-linear function to the input variables X.
𝑧=(∑𝑖=1𝑛𝑤𝑖𝑥𝑖)+𝑏z=(∑i=1nwixi)+b
Here 𝑥𝑖xi is the ith observation of X, 𝑤𝑖=[𝑤1,𝑤2,𝑤3,⋯,𝑤𝑚]wi=[w1,w2,w3
,⋯,wm] is the weights or Coefficient, and b is the bias term also known as
intercept. simply this can be represented as the dot product of weight and bias.
𝑧=𝑤⋅𝑋+𝑏z=w⋅X+b
whatever we discussed above is the linear regression.
Sigmoid Function
Now we use the sigmoid function where the input will be z and we find the
probability between 0 and 1. i.e. predicted y.
𝜎(𝑧)=11+𝑒−𝑧σ(z)=1+e−z1

Sigmoid function

As shown above, the figure sigmoid function converts the continuous variable
data into the probability i.e. between 0 and 1.
• 𝜎(𝑧) σ(z) tends towards 1 as 𝑧→∞z→∞
• 𝜎(𝑧) σ(z) tends towards 0 as 𝑧→−∞z→−∞
• 𝜎(𝑧) σ(z) is always bounded between 0 and 1
where the probability of being a class can be measured as:
𝑃(𝑦=1)=𝜎(𝑧)𝑃(𝑦=0)=1−𝜎(𝑧)P(y=1)=σ(z)P(y=0)=1−σ(z)

Logistic Regression Equation

The odd is the ratio of something occurring to something not occurring. it is


different from probability as the probability is the ratio of something occurring
to everything that could possibly occur. so odd will be:
𝑝(𝑥)1−𝑝(𝑥) =𝑒𝑧1−p(x)p(x) =ez
Applying natural log on odd. then log odd will be:
log⁡[𝑝(𝑥)1−𝑝(𝑥)]=𝑧log⁡[𝑝(𝑥)1−𝑝(𝑥)]=𝑤⋅𝑋+𝑏𝑝(𝑥)1−𝑝(𝑥)=𝑒𝑤⋅𝑋+𝑏⋯Expone
ntiate both sides𝑝(𝑥)=𝑒𝑤⋅𝑋+𝑏⋅(1−𝑝(𝑥))𝑝(𝑥)=𝑒𝑤⋅𝑋+𝑏−𝑒𝑤⋅𝑋+𝑏⋅𝑝(𝑥))𝑝(𝑥)+𝑒𝑤⋅
𝑋+𝑏⋅𝑝(𝑥))=𝑒𝑤⋅𝑋+𝑏𝑝(𝑥)(1+𝑒𝑤⋅𝑋+𝑏)=𝑒𝑤⋅𝑋+𝑏𝑝(𝑥)=𝑒𝑤⋅𝑋+𝑏1+𝑒𝑤⋅𝑋+𝑏log[1−p
(x)p(x)]log[1−p(x)p(x)]1−p(x)p(x)
p(x)p(x)p(x)+ew⋅X+b⋅p(x))p(x)(1+ew⋅X+b)p(x)
=z=w⋅X+b=ew⋅X+b⋯Exponentiate both sides=ew⋅X+b⋅(1−p(x))=ew⋅X+b−ew⋅X
+b⋅p(x))=ew⋅X+b=ew⋅X+b=1+ew⋅X+bew⋅X+b
then the final logistic regression equation will be:
𝑝(𝑋;𝑏,𝑤)=𝑒𝑤⋅𝑋+𝑏1+𝑒𝑤⋅𝑋+𝑏=11+𝑒−𝑤⋅𝑋+𝑏p(X;b,w)=1+ew⋅X+bew⋅X+b
=1+e−w⋅X+b1

Likelihood Function for Logistic Regression


The predicted probabilities will be:
• for y=1 The predicted probabilities will be: p(X;b,w) = p(x)
• for y = 0 The predicted probabilities will be: 1-p(X;b,w) = 1-p(x)
𝐿(𝑏,𝑤)=∏𝑖=1𝑛𝑝(𝑥𝑖)𝑦𝑖(1−𝑝(𝑥𝑖))1−𝑦𝑖L(b,w)=∏i=1np(xi)yi(1−p(xi))1−yi
Taking natural logs on both sides
log⁡(𝐿(𝑏,𝑤))=∑𝑖=1𝑛𝑦𝑖log⁡𝑝(𝑥𝑖)+(1−𝑦𝑖)log⁡(1−𝑝(𝑥𝑖))=∑𝑖=1𝑛𝑦𝑖log⁡𝑝(𝑥𝑖
)+log⁡(1−𝑝(𝑥𝑖))−𝑦𝑖log⁡(1−𝑝(𝑥𝑖))=∑𝑖=1𝑛log⁡(1−𝑝(𝑥𝑖))+∑𝑖=1𝑛𝑦𝑖log⁡𝑝(𝑥
𝑖)1−𝑝(𝑥𝑖=∑𝑖=1𝑛−log⁡1−𝑒−(𝑤⋅𝑥𝑖+𝑏)+∑𝑖=1𝑛𝑦𝑖(𝑤⋅𝑥𝑖+𝑏)=∑𝑖=1𝑛−log⁡1+𝑒𝑤⋅
𝑥𝑖+𝑏+∑𝑖=1𝑛𝑦𝑖(𝑤⋅𝑥𝑖+𝑏)log(L(b,w))=i=1∑nyilogp(xi)+(1−yi)log(1−p(xi
))=i=1∑nyilogp(xi)+log(1−p(xi))−yilog(1−p(xi))=i=1∑nlog(1−p(xi))+i=1∑nyi
log1−p(xip(xi)=i=1∑n−log1−e−(w⋅xi+b)+i=1∑nyi(w⋅xi+b)=i=1∑n−log1+ew⋅xi
+b+i=1∑nyi(w⋅xi+b)

Gradient of the log-likelihood function


To find the maximum likelihood estimates, we differentiate w.r.t w,
∂𝐽(𝑙(𝑏,𝑤)∂𝑤𝑗=−∑𝑖=𝑛𝑛11+𝑒𝑤⋅𝑥𝑖+𝑏𝑒𝑤⋅𝑥𝑖+𝑏𝑥𝑖𝑗+∑𝑖=1𝑛𝑦𝑖𝑥𝑖𝑗=−∑𝑖=𝑛𝑛𝑝(𝑥𝑖;𝑏,𝑤
)𝑥𝑖𝑗+∑𝑖=1𝑛𝑦𝑖𝑥𝑖𝑗=∑𝑖=𝑛𝑛(𝑦𝑖−𝑝(𝑥𝑖;𝑏,𝑤))𝑥𝑖𝑗∂wj∂J(l(b,w)=−i=n∑n1+ew⋅xi+b1
ew⋅xi+bxij+i=1∑nyixij=−i=n∑np(xi;b,w)xij+i=1∑nyixij=i=n∑n(yi−p(xi
;b,w))xij
Code Implementation for Logistic Regression
Binomial Logistic regression:
Target variable can have only 2 possible types: “0” or “1” which may represent
“win” vs “loss”, “pass” vs “fail”, “dead” vs “alive”, etc., in this case, sigmoid
functions are used, which is already discussed above.
Importing necessary libraries based on the requirement of model. This Python
code shows how to use the breast cancer dataset to implement a Logistic
Regression model for classification.

Python
# import the necessary libraries
from sklearn.datasets import load_breast_cancer
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# load the breast cancer dataset


X, y = load_breast_cancer(return_X_y=True)

# split the train and test dataset


X_train, X_test,\
y_train, y_test = train_test_split(X, y,
test_size=0.20,
random_state=23)
# LogisticRegression
clf = LogisticRegression(random_state=0)
clf.fit(X_train, y_train)
# Prediction
y_pred = clf.predict(X_test)

acc = accuracy_score(y_test, y_pred)


print("Logistic Regression model accuracy (in %):", acc*100)
Output:
Logistic Regression model accuracy (in %): 95.6140350877193

Multinomial Logistic Regression:


Target variable can have 3 or more possible types which are not ordered (i.e.
types have no quantitative significance) like “disease A” vs “disease B” vs
“disease C”.
In this case, the softmax function is used in place of the sigmoid
function. Softmax function for K classes will be:
softmax(𝑧𝑖)=𝑒𝑧𝑖∑𝑗=1𝐾𝑒𝑧𝑗softmax(zi)=∑j=1Kezjezi
Here, K represents the number of elements in the vector z, and i, j iterates over
all the elements in the vector.
Then the probability for class c will be:
𝑃(𝑌=𝑐∣𝑋→=𝑥)=𝑒𝑤𝑐⋅𝑥+𝑏𝑐∑𝑘=1𝐾𝑒𝑤𝑘⋅𝑥+𝑏𝑘P(Y=c∣X=x)=∑k=1Kewk⋅x+bkewc
⋅x+bc
In Multinomial Logistic Regression, the output variable can have more than
two possible discrete outputs. Consider the Digit Dataset.
Python
from sklearn.model_selection import train_test_split
from sklearn import datasets, linear_model, metrics

# load the digit dataset


digits = datasets.load_digits()

# defining feature matrix(X) and response vector(y)


X = digits.data
y = digits.target

# splitting X and y into training and testing sets


X_train, X_test,\
y_train, y_test = train_test_split(X, y,
test_size=0.4,
random_state=1)

# create logistic regression object


reg = linear_model.LogisticRegression()

# train the model using the training sets


reg.fit(X_train, y_train)

# making predictions on the testing set


y_pred = reg.predict(X_test)

# comparing actual response values (y_test)


# with predicted response values (y_pred)
print("Logistic Regression model accuracy(in %):",
metrics.accuracy_score(y_test, y_pred)*100)
Output:
Logistic Regression model accuracy(in %): 96.52294853963839

How to Evaluate Logistic Regression Model?


We can evaluate the logistic regression model using the following metrics:
• Accuracy: Accuracy provides the proportion of correctly classified
instances.
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦=𝑇𝑟𝑢𝑒𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑠+𝑇𝑟𝑢𝑒𝑁𝑒𝑔𝑎𝑡𝑖𝑣𝑒𝑠𝑇𝑜𝑡𝑎𝑙Accuracy=TotalT
ruePositives+TrueNegatives
• Precision: Precision focuses on the accuracy of positive predictions.
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛=𝑇𝑟𝑢𝑒𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑠𝑇𝑟𝑢𝑒𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑠+𝐹𝑎𝑙𝑠𝑒𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑠Precisio
n=TruePositives+FalsePositivesTruePositives
• Recall (Sensitivity or True Positive Rate): Recall measures the
proportion of correctly predicted positive instances among all actual
positive instances.
𝑅𝑒𝑐𝑎𝑙𝑙=𝑇𝑟𝑢𝑒𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑠𝑇𝑟𝑢𝑒𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑠+𝐹𝑎𝑙𝑠𝑒𝑁𝑒𝑔𝑎𝑡𝑖𝑣𝑒𝑠Recall=Tr
uePositives+FalseNegativesTruePositives
• F1 Score: F1 score is the harmonic mean of precision and recall.
𝐹1𝑆𝑐𝑜𝑟𝑒=2∗𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛∗𝑅𝑒𝑐𝑎𝑙𝑙𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛+𝑅𝑒𝑐𝑎𝑙𝑙F1Score=2∗Precis
ion+RecallPrecision∗Recall
• Area Under the Receiver Operating Characteristic Curve (AUC-
ROC): The ROC curve plots the true positive rate against the false
positive rate at various thresholds. AUC-ROC measures the area under
this curve, providing an aggregate measure of a model’s performance
across different classification thresholds.
• Area Under the Precision-Recall Curve (AUC-PR): Similar to
AUC-ROC, AUC-PR measures the area under the precision-recall
curve, providing a summary of a model’s performance across different
precision-recall trade-offs.

Precision-Recall Tradeoff in Logistic Regression Threshold Setting


Logistic regression becomes a classification technique only when a decision
threshold is brought into the picture. The setting of the threshold value is a very
important aspect of Logistic regression and is dependent on the classification
problem itself.
The decision for the value of the threshold value is majorly affected by the
values of precision and recall. Ideally, we want both precision and recall being
1, but this seldom is the case.
In the case of a Precision-Recall tradeoff, we use the following arguments to
decide upon the threshold:
1. Low Precision/High Recall: In applications where we want to
reduce the number of false negatives without necessarily reducing the
number of false positives, we choose a decision value that has a low
value of Precision or a high value of Recall. For example, in a cancer
diagnosis application, we do not want any affected patient to be
classified as not affected without giving much heed to if the patient is
being wrongfully diagnosed with cancer. This is because the absence
of cancer can be detected by further medical diseases, but the presence
of the disease cannot be detected in an already rejected candidate.
2. High Precision/Low Recall: In applications where we want to
reduce the number of false positives without necessarily reducing the
number of false negatives, we choose a decision value that has a high
value of Precision or a low value of Recall. For example, if we are
classifying customers whether they will react positively or negatively
to a personalized advertisement, we want to be absolutely sure that the
customer will react positively to the advertisement because otherwise,
a negative reaction can cause a loss of potential sales from the
customer.

Differences Between Linear and Logistic Regression

The difference between linear regression and logistic regression is that linear
regression output is the continuous value that can be anything while logistic
regression predicts the probability that an instance belongs to a given class or
not.
Linear Regression Logistic Regression

Linear regression is used to Logistic regression is used


predict the continuous to predict the categorical
dependent variable using a dependent variable using a
given set of independent given set of independent
variables. variables.

Linear regression is used for It is used for solving


solving regression problem. classification problems.

In this we predict the value In this we predict values of


of continuous variables categorical variables

In this we find best fit line. In this we find S-Curve.

Least square estimation Maximum likelihood


method is used for estimation method is used
estimation of accuracy. for Estimation of accuracy.

The output must be Output must be categorical


continuous value, such as value such as 0 or 1, Yes or
price, age, etc. no, etc.

It required linear
relationship between It not required linear
dependent and independent relationship.
variables.

There may be collinearity There should be little to no


between the independent collinearity between
variables. independent variables.

Types of Regression Techniques in ML



A regression problem is when the output variable is a real or continuous value,
such as “salary” or “weight”. Many different models can be used, the simplest is
linear regression. It tries to fit data with the best hyperplane that goes through the
points.

What is Regression Analysis?

Regression Analysis is a statistical process for estimating the


relationships between the dependent variables or criterion variables and one or
more independent variables or predictors. Regression analysis is generally used
when we deal with a dataset that has the target variable in the form of continuous
data. Regression analysis explains the changes in criteria about changes in select
predictors. The conditional expectation of the criteria is based on predictors
where the average value of the dependent variables is given when the independent
variables are changed. Three major uses for regression analysis are determining
the strength of predictors, forecasting an effect, and trend forecasting.

What is the purpose of using Regression Analysis?

There are times when we would like to analyze the effect of different independent
features on the target or what we say dependent features. This helps us make
decisions that can affect the target variable in the desired direction. Regression
analysis is heavily based on statistics and hence gives quite reliable results to this
reason only regression models are used to find the linear as well as non-linear
relation between the independent and the dependent or target variables.

Types of Regression Techniques

Along with the development of the machine learning domain regression analysis
techniques have gained popularity as well as developed manifold from just y =
mx + c. There are several types of regression techniques, each suited for different
types of data and different types of relationships. The main types of regression
techniques are:
1. Linear Regression
2. Polynomial Regression
3. Stepwise Regression
4. Decision Tree Regression
5. Random Forest Regression
6. Support Vector Regression
7. Ridge Regression
8. Lasso Regression
9. ElasticNet Regression
10.Bayesian Linear Regression
Linear Regression
Linear regression is used for predictive analysis. Linear regression is a linear
approach for modeling the relationship between the criterion or the scalar
response and the multiple predictors or explanatory variables. Linear regression
focuses on the conditional probability distribution of the response given the
values of the predictors. For linear regression, there is a danger of overfitting. The
formula for linear regression is:
Syntax:
y = θx + b
where,
• θ – It is the model weights or parameters
• b – It is known as the bias.

This is the most basic form of regression analysis and is used to model a linear
relationship between a single dependent variable and one or more independent
variables.
Here, a linear regression model is instantiated to fit a linear relationship between
input features (X) and target values (y). This code is used for simple
demonstration of the approach.

from sklearn.linear_model import LinearRegression

# Create a linear regression model


model = LinearRegression()

# Fit the model to the data


model.fit(X, y)

# Predict the response for a new data point


y_pred = model.predict(X_new)
Note: This code demonstrates the basic workflow of creating, training, and
utilizing a linear regression model for predictive modeling tasks.

Polynomial Regression
This is an extension of linear regression and is used to model a non-linear
relationship between the dependent variable and independent variables. Here as
well syntax remains the same but now in the input variables we include some
polynomial or higher degree terms of some already existing features as well.
Linear regression was only able to fit a linear model to the data at hand but
with polynomial features, we can easily fit some non-linear relationship between
the target as well as input features.
Here is the code for simple demonstration of the Polynomial regression approach.

from sklearn.linear_model import PolynomialRegression

# Create a polynomial regression model


model = PolynomialRegression(degree=2)

# Fit the model to the data


model.fit(X, y)

# Predict the response for a new data point


y_pred = model.predict(X_new)
Note: This code demonstrates the basic workflow of creating, training, and
utilizing a Polynomial regression model for predictive modeling tasks.

Stepwise Regression
Stepwise regression is used for fitting regression models with predictive models.
It is carried out automatically. With each step, the variable is added or subtracted
from the set of explanatory variables. The approaches for stepwise regression are
forward selection, backward elimination, and bidirectional elimination. The
formula for stepwise regression is

Here is the code for simple demonstration of the stepwise regression approach.

from sklearn.linear_model import StepwiseLinearRegression

# Create a stepwise regression model


model = StepwiseLinearRegression(forward=True,
backward=True,
verbose=1)

# Fit the model to the data


model.fit(X, y)

# Predict the response for a new data point


y_pred = model.predict(X_new)
Note: This code demonstrates the basic workflow of creating, training, and
utilizing a Stepwise regression model for predictive modeling tasks.

Decision Tree Regression


A Decision Tree is the most powerful and popular tool for classification and
prediction. A Decision tree is a flowchart-like tree structure, where each internal
node denotes a test on an attribute, each branch represents an outcome of the test,
and each leaf node (terminal node) holds a class label. There is a non-parametric
method used to model a decision tree to predict a continuous outcome.
Here is the code for simple demonstration of the Decision Tree regression
approach.

from sklearn.tree import DecisionTreeRegressor

# Create a decision tree regression model


model = DecisionTreeRegressor()

# Fit the model to the data


model.fit(X, y)

# Predict the response for a new data point


y_pred = model.predict(X_new)
Note: This code demonstrates the basic workflow of creating, training, and
utilizing a Decision Tree regression model for predictive modeling tasks.

Random Forest Regression


Random Forest is an ensemble technique capable of performing both regression
and classification tasks with the use of multiple decision trees and a technique
called Bootstrap and Aggregation, commonly known as bagging. The basic idea
behind this is to combine multiple decision trees in determining the final output
rather than relying on individual decision trees.
Random Forest has multiple decision trees as base learning models. We randomly
perform row sampling and feature sampling from the dataset forming sample
datasets for every model. This part is called Bootstrap.
Here is the code for simple demonstration of the Random Forest regression
approach.

from sklearn.ensemble import RandomForestRegressor


# Create a random forest regression model
model = RandomForestRegressor(n_estimators=100)

# Fit the model to the data


model.fit(X, y)

# Predict the response for a new data point


y_pred = model.predict(X_new)
Note: This code demonstrates the basic workflow of creating, training, and
utilizing a Random Forest regression model for predictive modeling tasks.

Support Vector Regression (SVR)


Support vector regression (SVR) is a type of support vector machine (SVM) that
is used for regression tasks. It tries to find a function that best predicts the
continuous output value for a given input value.
SVR can use both linear and non-linear kernels. A linear kernel is a simple dot
product between two input vectors, while a non-linear kernel is a more complex
function that can capture more intricate patterns in the data. The choice of kernel
depends on the data’s characteristics and the task’s complexity.
Here is the code for simple demonstration of the Support vector regression
approach.

from sklearn.svm import SVR

# Create a support vector regression model


model = SVR(kernel='linear')

# Fit the model to the data


model.fit(X, y)

# Predict the response for a new data point


y_pred = model.predict(X_new)
Note: This code demonstrates the basic workflow of creating, training, and
utilizing a Support vector regression model for predictive modeling tasks.

Ridge Regression
Ridge regression is a technique for analyzing multiple regression data. When
multicollinearity occurs, least squares estimates are unbiased. This is a
regularized linear regression model, it tries to reduce the model complexity by
adding a penalty term to the cost function. A degree of bias is added to the
regression estimates, and as a result, ridge regression reduces the standard errors.

Here is the code for simple demonstration of the Ridge regression approach.
from sklearn.linear_model import Ridge

# Create a ridge regression model


model = Ridge(alpha=0.1)

# Fit the model to the data


model.fit(X, y)

# Predict the response for a new data point


y_pred = model.predict(X_new)
Note: This code demonstrates the basic workflow of creating, training, and
utilizing a Ridge regression model for predictive modeling tasks.

Lasso Regression
Lasso regression is a regression analysis method that performs both variable
selection and regularization. Lasso regression uses soft thresholding. Lasso
regression selects only a subset of the provided covariates for use in the final
model.
This is another regularized linear regression model, it works by adding a penalty
term to the cost function, but it tends to zero out some features’ coefficients,
which makes it useful for feature selection.
Here is the code for simple demonstration of the Lasso regression approach.

from sklearn.linear_model import Lasso

# Create a lasso regression model


model = Lasso(alpha=0.1)

# Fit the model to the data


model.fit(X, y)
# Predict the response for a new data point
y_pred = model.predict(X_new)
Note: This code demonstrates the basic workflow of creating, training, and
utilizing a Lasso regression model for predictive modeling tasks.

ElasticNet Regression
Linear Regression suffers from overfitting and can’t deal with collinear data.
When there are many features in the dataset and even some of them are not
relevant to the predictive model. This makes the model more complex with a too-
inaccurate prediction on the test set (or overfitting). Such a model with high
variance does not generalize on the new data. So, to deal with these issues, we
include both L-2 and L-1 norm regularization to get the benefits of both Ridge
and Lasso at the same time. The resultant model has better predictive power than
Lasso. It performs feature selection and also makes the hypothesis simpler. The
modified cost function for Elastic-Net Regression is given below

where,
• w(j) represents the weight for the jth feature.
• n is the number of features in the dataset.
• lambda1 is the regularization strength for the L1 norm.
• lambda2 is the regularization strength for the L2 norm.
Here is the code for simple demonstration of the Elasticnet regression approach.

from sklearn.linear_model import ElasticNet

# Create an elastic net regression model


model = ElasticNet(alpha=0.1, l1_ratio=0.5)

# Fit the model to the data


model.fit(X, y)

# Predict the response for a new data point


y_pred = model.predict(X_new)
Note: This code demonstrates the basic workflow of creating, training, and
utilizing a Elastic Net regression model for predictive modeling tasks.
Bayesian Linear Regression
As the name suggests this algorithm is purely based on Bayes Theorem. Because
of this reason only we do not use the Least Square method to determine the
coefficients of the regression model. So, the technique which is used here to find
the model weights and parameters relies on features posterior distribution and this
provides an extra stability factor to the regression model which is based on this
technique.
Here is the code for simple demonstration of the Bayesian Linear regression
approach.
from sklearn.linear_model import BayesianLinearRegression

# Create a Bayesian linear regression model


model = BayesianLinearRegression()

# Fit the model to the data


model.fit(X, y)

# Predict the response for a new data point


y_pred = model.predict(X_new)

You might also like