ML-UNIT2
ML-UNIT2
ML-UNIT2
Input Data Labeled data (input- Mostly unlabeled, with a Only unlabeled data.
output pairs). small portion labeled.
Training Requires a large amount Balances labeled and Easier to find data but
Complexity of labeled data. unlabeled data for harder to label outputs.
efficiency.
Key Differences:
SSL: Uses a small set of labeled data and a large set of unlabeled data to improve learning.
USL: Focuses only on uncovering patterns in unlabeled data without specific guidance.
REINFORCEMENT LEARNING
Reinforcement Learning is a type of machine learning where an agent learns to make decisions by
interacting with an environment. It receives feedback in the form of rewards or penalties based on its
actions and seeks to maximize cumulative rewards over time.
In Reinforcement Learning, the agent learns automatically using feedbacks without any labeled data,
unlike supervised learning.
Since there is no labeled data, so the agent is bound to learn by its experience only.
RL solves a specific type of problem where decision making is sequential, and the goal is long-term,
such as game-playing, robotics, etc.
The agent interacts with the environment and explores it by itself. The primary goal of an agent in
reinforcement learning is to improve the performance by getting the maximum positive rewards.
Key Components:
4. Action (A): Possible moves the agent can take in a given state.
6. Policy (π): The strategy that the agent follows to choose actions.
8. Q-Function (Q): The expected cumulative reward from taking an action in a given state.
How RL Works:
4. The agent updates its policy based on the received reward and the new state.
Model-Free RL: The agent learns from experience without a model of the environment.
o Value-based:
The value-based approach is about to find the optimal value function, which is the
maximum value at a state under any policy. Therefore, the agent expects the long-
term return at any state(s) under policy π.
o Policy-based:
Policy-based approach is to find the optimal policy for the maximum future rewards
without using the value function. In this approach, the agent tries to apply such a
policy that the action performed in each step helps to maximize the future reward.
The policy-based approach has mainly two types of policy:
Deterministic: The same action is produced by the policy (π) at any state.
Stochastic: In this policy, probability determines the produced action.
Model-Based RL: The agent uses a model of the environment to plan actions.
o In the model-based approach, a virtual model is created for the environment, and
the agent explores that environment to learn it. There is no particular solution or
algorithm for this approach because the model representation is different for each
environment.
Common Algorithms:
Q-Learning: A model-free algorithm that learns the value of actions.
Policy Gradient Methods: Directly optimize the policy, rather than the value function.
Applications:
Strengths Challenges
Learns from interactions with the Requires a lot of data and exploration.
environment.
Example:
Overview:
Self-driving cars use RL to learn optimal driving policies by interacting with a simulated or real-world
environment. The goal is to maximize safety, efficiency, and passenger comfort while minimizing
accidents and fuel consumption.
Key Components:
Component Explanation
State (S) Current sensor readings (e.g., location, speed, distance from other cars).
Positive rewards for safe driving, reaching destinations efficiently; negative rewards for
Reward (R)
collisions, traffic violations, and discomfort.
1. Observation: The car observes its surroundings using sensors like LiDAR, cameras, and GPS.
2. Decision-Making: Based on its current state, the car chooses an action (e.g., accelerate,
brake, turn).
4. Learning: The car updates its policy to maximize cumulative rewards using algorithms like
Deep Q-Networks (DQN) or Policy Gradient methods.
Reward:
Data No labeled data; feedback Requires labeled data Uses only input data (no
comes from rewards and (input-output pairs) labels)
penalties
Definition: The model is trained on labeled data where each input has a corresponding output
(label).
Examples:
Spam Detection:
Medical Diagnosis:
Definition: The model is trained on a small amount of labeled data and a large amount of unlabeled
data.
Examples:
Speech Recognition:
Customer Segmentation:
Definition: The model is trained on completely unlabeled data to discover patterns or structures.
Examples:
Anomaly Detection:
o Output: Discover associations (e.g., "people who buy bread also buy butter").
Selecting an appropriate machine learning (ML) technique depends on the problem type, the data
available, and the desired outcomes. Here’s a step-by-step guide to help you make the right choice:
Step 1: Define the Problem Type
o Use when: You have labeled data and need to predict outcomes.
o Problem types:
o Use when: Data is unlabeled, and you want to find hidden patterns.
o Problem types:
3. Semi-Supervised Learning
o Use when: You have a small labeled dataset and a large unlabeled dataset.
4. Reinforcement Learning
1. Size of Data:
o Small datasets might need simpler models (e.g., SVM, Decision Trees).
2. Type of Data:
o Structured data (tabular): Use algorithms like Random Forest, Gradient Boosting.
o Unstructured data (images, text): Use deep learning (CNNs for images,
RNNs/Transformers for text).
3. Label Availability:
1. Predictions:
2. Clustering:
3. Anomaly Detection:
4. Decision-Making:
1. Linear-Based Models
These models assume a linear relationship between input features and the output.
Key Characteristics:
Examples:
Linear Regression:
Logistic Regression:
o Finds a hyperplane that separates data into classes with maximum margin.
These models use logical rules, decision-making trees, and algebraic equations to map inputs to
outputs.
Key Characteristics:
Examples:
Decision Trees:
Rule-Based Models:
3. Probabilistic Models
Key Characteristics:
Examples:
Bayesian Networks:
Models trained on labeled data, where each input has a corresponding output label.
Types:
o Examples:
Logistic Regression
Decision Trees
Random Forest
o Examples:
Linear Regression
Polynomial Regression
Ridge/Lasso Regression
Types:
o Examples:
K-Means
Hierarchical Clustering
o Examples:
o Examples:
Isolation Forest
One-Class SVM
Examples:
Self-training (bootstrapping)
Label Propagation
Label Spreading
Examples:
Q-Learning
Deep Q-Networks (DQN)
5. Ensemble Models
Types:
Neural network-based models, effective for large datasets and complex tasks.
Types:
Recurrent Neural Networks (RNNs): For sequential data (e.g., time series, text).
Autoencoders: For unsupervised tasks like dimensionality reduction and anomaly detection.