Lesson 2 - Machine Learning
Lesson 2 - Machine Learning
z
OBJECTIVES
Supervised Learning
- Definition of Supervised Learning
- Advantage and Disadvantage of Supervised
Learning
- Process of Supervised Learning
- Types of Supervised Learning
2
WHAT IS
z
SUPERVISED
LEARNING?
4
SUPERVISED LEARNING
Supervised learning is a branch of machine learning
that involves training an algorithm on a labeled
dataset, where each training example consists of an
input-output pair. The "supervision" in supervised
learning refers to the presence of the correct outputs
(labels) that guide the learning process. The
primary objective is for the model to learn the
mapping from inputs to outputs so that it can predict
the labels for new, unseen data.
5
ADVANTAGE
Predictive Accuracy: Supervised learning can achieve high levels of
accuracy, especially when trained with a large amount of labeled data.
Models can generalize well to unseen data if properly tuned.
Clear Performance Metrics: The performance of supervised learning
models can be evaluated using various metrics (e.g., accuracy, precision,
recall, F1-score), making it easier to assess and compare different models.
Diverse Applications: Supervised learning techniques can be applied
across various domains, including finance (credit scoring), healthcare
(diagnosis prediction), and marketing (customer segmentation).
Algorithm Variety: There are numerous algorithms available, allowing
practitioners to choose the most suitable one for their specific problem,
whether it’s a linear model, tree-based model, or deep learning approach.
6
z
DISADVANTAGE
Data Dependency: Supervised learning requires a large and
representative labeled dataset. Obtaining this data can be resource-
intensive and time-consuming.
Risk of Overfitting: If a model becomes too complex, it may perform
exceptionally well on training data but poorly on unseen data. This
phenomenon, known as overfitting, requires techniques like
regularization or cross-validation to mitigate.
Limited to Available Data: The model's performance is inherently tied
to the quality and diversity of the training data. If the training data is
biased or unrepresentative, the model may produce biased or
inaccurate predictions.
Computationally Intensive: Some algorithms, especially those
involving deep learning, can be computationally intensive, requiring
significant hardware resources and longer training times.
PROCESS
z
OF SUPERVISED
LEARNING
Data Collection: Gather a dataset that includes both features (inputs) and
labels (outputs). This dataset can come from various sources, such as
databases, web scraping, or APIs.
Data Preprocessing: Clean and prepare the data for analysis. This includes:
•Handling missing values (imputation or removal).
•Normalizing or scaling features (especially important for algorithms sensitive
to feature magnitude).
•Encoding categorical variables (using techniques like one-hot encoding).
•Splitting the data into training and test sets to evaluate model performance.
Model Selection: Choose an appropriate supervised learning algorithm
based on the nature of the problem (classification or regression). This choice
can be guided by factors such as the size of the dataset, the dimensionality of
the data, and the specific application requirements.
TYPES OF SUPERVISED
z
LEARNING
EXAMPLE CODE
z
EXAMPLE CODE
OUTPUT
TYPES
z OF SUPERVISED LEARNING
EXAMPLE CODE
OUTPUT