Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Unit 1 Aktu

Download as pdf or txt
Download as pdf or txt
You are on page 1of 26

Unit I: Introduction to Machine Learning

Lecture 1: Overview of Learning

Definition of Learning
Learning is a fundamental process through which systems improve their
performance over time by gaining experience. In the context of artificial intelligence
(AI), learning specifically refers to the acquisition of knowledge and the ability to
make decisions based on data. This involves using algorithms that can identify
patterns, make predictions, and adapt to changes based on the information they
receive.

- Experience: In learning, experience can be thought of as the data or examples that


a system processes. For instance, a machine learning model learns from historical
data to predict future outcomes.
- Improvement: The goal of learning is to enhance the system’s ability to perform
specific tasks more accurately over time, refining its predictions and decisions as it
is exposed to more data.

Importance of Learning in AI
Learning plays a critical role in the development and functionality of AI systems.
Here are several key aspects highlighting its importance:

1. Adaptability:
- AI systems that learn can adjust to new inputs without requiring reprogramming.
This adaptability enables them to remain relevant and useful in dynamic
environments where conditions frequently change.

2. Predictive Modeling:
- Learning algorithms can analyze historical data to create models that predict
future outcomes. This is invaluable in various applications, such as forecasting sales,
stock prices, or customer behavior.

3. Decision-Making:
- AI systems leverage learning to make informed decisions based on data analysis.
For example, a recommendation system learns from user preferences to suggest
products or services tailored to individual tastes.

Ms. Ayushi Sharma, AP, CSE Dept.,HCST


4. Automation:
- Learning facilitates the automation of complex tasks that would typically require
human intelligence. This leads to increased efficiency and accuracy in processes like
image recognition, natural language processing, and autonomous driving.

5. Scalability:
- As AI systems learn from more data, they can scale their operations and enhance
their performance. This allows businesses to handle larger datasets and more
complex tasks without compromising efficiency.

6. Foundation for Intelligent Applications:


- Learning algorithms serve as the backbone for various intelligent applications,
including chatbots, fraud detection systems, and medical diagnosis tools. The
effectiveness of these applications hinges on their ability to learn from data.

Types of Learning: Human Learning vs. Machine Learning

Learning is a vital process in AI that allows systems to enhance their performance


based on experience and data. Its importance lies in enabling adaptability, predictive
modeling, decision-making, and the development of intelligent applications. By
comparing human learning to machine learning, we can better understand how both
systems acquire knowledge and adapt to their environments, despite the differences
in processes and applications.

Ms. Ayushi Sharma, AP, CSE Dept.,HCST


Lecture 2: Types of Learning

Supervised Learning

Definition: Supervised learning is a type of machine learning where models are


trained on labeled data, meaning that each training example is paired with an output
label. The goal is to learn a mapping from inputs to outputs so that the model can
predict the outcome for new, unseen data.

Key Characteristics:
- Labeled Data: The dataset contains both input features and the corresponding
correct outputs.
- Outcome Known: The learning process is guided by known outcomes, allowing
the model to adjust its parameters based on the errors it makes during training.

Examples:
1. Classification Tasks:
- Spam Detection: The model is trained on emails labeled as "spam" or "not spam"
and learns to classify new emails accordingly.
- Image Recognition: Training a model to identify whether an image contains a cat
or a dog based on labeled images.

2. Regression Tasks:
- Predicting House Prices: A model uses features like location, size, and number
of bedrooms to predict the price of houses based on historical data.

Ms. Ayushi Sharma, AP, CSE Dept.,HCST


- Sales Forecasting: Using past sales data to predict future sales based on various
influencing factors.

Unsupervised Learning

Definition: Unsupervised learning is a type of machine learning that involves


training models on unlabeled data. The objective is to find hidden patterns or
intrinsic structures within the data without prior knowledge of outcomes.

Key Characteristics:
- Unlabeled Data: The dataset consists only of input features, with no corresponding
output labels.
- Pattern Discovery: The focus is on identifying structures or groupings in the data.

Examples:
1. Clustering:
- Customer Segmentation: Grouping customers based on purchasing behavior to
identify distinct segments for targeted marketing.
- Market Basket Analysis: Identifying sets of products frequently purchased
together.

2. Dimensionality Reduction:
- Principal Component Analysis (PCA): Reducing the number of features in a
dataset while preserving as much variance as possible, making data easier to
visualize and process.
- t-SNE: A technique for visualizing high-dimensional data by reducing it to two
or three dimensions.

Semi-supervised Learning

Definition: Semi-supervised learning is a hybrid approach that combines both


supervised and unsupervised learning. It utilizes a small amount of labeled data
along with a larger amount of unlabeled data to improve learning performance.

Key Characteristics:
- Limited Labeled Data: It is especially useful when obtaining labeled data is
expensive or time-consuming.
- Leverages Unlabeled Data: The model can learn from both labeled and unlabeled
data, improving its generalization capabilities.

Ms. Ayushi Sharma, AP, CSE Dept.,HCST


Use Cases:
- Text Classification: When only a few documents are labeled (e.g., sentiment
analysis), the model can use a larger set of unlabeled documents to enhance its
understanding and improve accuracy.
- Image Classification: In scenarios where only a small subset of images is annotated,
semi-supervised learning can help leverage the vast number of unannotated images
for better feature learning.

Reinforcement Learning

Definition: Reinforcement learning (RL) is a type of machine learning where an


agent learns to make decisions by interacting with an environment. The agent
receives feedback in the form of rewards or penalties based on its actions, guiding
its learning process.

Key Characteristics:
- Trial and Error: The agent explores the environment and learns from the
consequences of its actions.
- Feedback Loop: Positive outcomes reinforce behavior, while negative outcomes
discourage it.

Examples:
1. Game Playing:
- AlphaGo: A reinforcement learning system developed by DeepMind that
defeated human champions in the game of Go by learning optimal strategies through
extensive gameplay.

Ms. Ayushi Sharma, AP, CSE Dept.,HCST


- Dota 2: AI agents that learn to play complex games through self-play, adjusting
strategies based on in-game performance.

2. Robotic Control:
- Autonomous Robots: Robots learn to navigate and manipulate objects in real-
world environments through continuous interaction and adjustment of actions based
on feedback.
- Self-driving Cars: RL algorithms help vehicles learn optimal driving strategies
based on environmental conditions and traffic patterns.

NOTES
- Supervised Learning: Involves learning from labeled data, focused on predicting
outcomes for new inputs.
- Unsupervised Learning: Deals with unlabeled data to find hidden patterns or
groupings without predefined outputs.
- Semi-supervised Learning: Combines both labeled and unlabeled data to improve
model performance, particularly when labeled data is scarce.
- Reinforcement Learning: Involves learning through interaction with an
environment, using feedback to optimize actions over time.

Lecture 3: Well-defined Learning Problems

Characteristics of Well-defined Learning Problems

1. Clear Objective:
- A well-defined learning problem has a specific goal that outlines what the model
is expected to achieve. This could involve classification (assigning categories),
regression (predicting continuous values), clustering (grouping similar data), or
other tasks.
- Examples:
- Classification: Determine whether an email is spam or not.
- Regression: Predict the price of a house based on various features.

2. Defined Input and Output Variables:


- The problem must clearly specify what input features (independent variables)
will be used to make predictions and what the output variable (dependent variable)
will be.
- This clarity helps in data preparation and model selection.
- Examples:

Ms. Ayushi Sharma, AP, CSE Dept.,HCST


- For a house price prediction model, inputs might include square footage,
number of bedrooms, and location, while the output is the house price.

3. Measurable Performance Metrics:


- There should be defined metrics to evaluate the model's performance. These
metrics provide a way to quantify how well the model is performing relative to the
objectives set.
- Common performance metrics include:
- Accuracy: The proportion of correct predictions made by the model.
- Precision and Recall: Particularly important in classification tasks, measuring
the quality of the positive predictions.
- Mean Squared Error (MSE): Commonly used in regression to measure the
average of the squares of the errors.

Formulating Learning Problems

1. Identifying Input Features (Independent Variables):


- The first step in formulating a learning problem is to identify which features will
be used as inputs. These features should be relevant to the problem and have the
potential to influence the outcome.
- Techniques:
- Domain knowledge: Understanding the field to choose impactful features.
- Feature selection methods: Techniques like correlation analysis, mutual
information, or algorithms to select the most relevant features.

2. Defining the Target Variable (Dependent Variable):


- The target variable is the outcome that the model is trying to predict. It must be
clearly defined and measurable.
- Examples:
- For a spam detection model, the target variable is whether an email is "spam"
or "not spam."
- In a house price prediction model, the target variable is the price of the house.

3. Establishing the Performance Metric to Evaluate Success:


- A key part of formulating a learning problem is deciding how success will be
measured. This involves selecting performance metrics that align with the objectives
of the problem.
- Considerations may include:
- The nature of the problem (classification vs. regression).

Ms. Ayushi Sharma, AP, CSE Dept.,HCST


- The importance of false positives vs. false negatives (especially in classification
tasks).
- Stakeholder requirements or industry standards for success.

Example Problem Formulation: Predicting Whether an Email is Spam

1. Problem Statement:
- The goal is to classify incoming emails as either "spam" or "not spam."

2. Inputs:
- Features:
- Word Frequency: The occurrence of certain words or phrases (e.g., “free,”
“discount”).
- Length of the Email: The total number of words or characters.
- Presence of Links: Whether the email contains hyperlinks (a common indicator
of spam).

3. Output:
- Label: The target variable is a binary classification: either "spam" or "not spam."

4. Metric:
- Accuracy of Classification: The proportion of correctly classified emails (both
spam and non-spam) out of the total number of emails.
- Additional metrics could include precision, recall, and F1-score, particularly if
the cost of false positives (non-spam classified as spam) is high.

NOTES
Well-defined learning problems have clear objectives, specified input and output
variables, and measurable performance metrics, making it easier to design effective
machine learning models. Proper formulation of learning problems involves
identifying relevant features, defining the target variable, and establishing
performance metrics, all of which are crucial for successful model development and
evaluation. The example of predicting spam emails illustrates how these principles
can be applied in a practical context.
Lecture 4: Designing a Learning System

Components of a Learning System

1. Data:

Ms. Ayushi Sharma, AP, CSE Dept.,HCST


- Quality and Quantity: The effectiveness of a learning system heavily depends on
the quality and quantity of the data used. High-quality data should be accurate,
relevant, and representative of the problem domain. Sufficient quantity ensures that
the model has enough examples to learn from.
- Types of Data:
- Structured Data: Organized data in tabular form (e.g., databases, spreadsheets).
- Unstructured Data: Raw data that doesn’t have a predefined structure (e.g., text,
images, audio).
- Impact on Learning Outcomes: Poor quality or insufficient data can lead to
overfitting (where the model performs well on training data but poorly on unseen
data) or underfitting (where the model fails to capture the underlying trends in the
data).

2. Algorithm:
- Definition: An algorithm is a set of rules or processes used to learn from data. It
dictates how the model will interpret and analyze the input data to make predictions
or decisions.
- Types of Algorithms:
- Supervised Learning Algorithms: Such as linear regression, logistic regression,
decision trees, and neural networks.
- Unsupervised Learning Algorithms: Such as K-means clustering, hierarchical
clustering, and principal component analysis (PCA).
- Reinforcement Learning Algorithms: Such as Q-learning and policy gradient
methods.
- Choosing the Right Algorithm: The choice of algorithm depends on the nature of
the problem (classification, regression, etc.), the type of data, and the specific
requirements for accuracy and interpretability.

3. Model:
- Definition: The model is the output of the learning process, representing the
learned knowledge from the training data. It encapsulates the relationships and
patterns identified by the algorithm.
- Types of Models:
- Predictive Models: Used for making predictions based on input features (e.g.,
predicting house prices).
- Descriptive Models: Used to describe patterns and relationships in data (e.g.,
customer segmentation).
- Model Training: During training, the model adjusts its parameters based on the
data and the algorithm used, resulting in a model that can make predictions on new
data.

Ms. Ayushi Sharma, AP, CSE Dept.,HCST


Steps in Designing a Machine Learning System

1. Data Collection:
- Purpose: Gathering relevant data is the foundational step for any machine
learning project. The data should be sufficient in quantity and diversity to capture
the variability of the problem domain.
- Methods:
- Surveys and Questionnaires: Collecting data directly from users or subjects.
- Web Scraping: Extracting data from websites.
- Public Datasets: Utilizing existing datasets from repositories or research studies.

2. Data Preprocessing:
- Purpose: Preparing the raw data for analysis is crucial for achieving accurate and
reliable results. This step involves cleaning and transforming the data to make it
suitable for modeling.
- Tasks:
- Cleaning: Handling missing values, removing duplicates, and correcting errors.
- Transformation: Normalizing or standardizing data, encoding categorical
variables, and extracting features.
- Splitting Data: Dividing the dataset into training, validation, and test sets to
ensure robust evaluation.

3. Model Selection:
- Purpose: Choosing the right algorithm is essential for effectively solving the
learning problem. Different algorithms have different strengths and weaknesses.
- Considerations:
- Problem Type: Determine whether the task is classification, regression,
clustering, etc.
- Data Characteristics: Consider the size, quality, and structure of the data.
- Model Interpretability: Assess how important it is for stakeholders to understand
the model's decisions.

4. Training:
- Purpose: Teaching the model involves using the training dataset to enable the
model to learn patterns and relationships in the data.
- Process:
- Input Data: Feeding the training data into the model.
- Optimization: Adjusting the model’s parameters using optimization techniques
(e.g., gradient descent) to minimize prediction errors.

Ms. Ayushi Sharma, AP, CSE Dept.,HCST


5. Evaluation:
- Purpose: Assessing model performance ensures that the model generalizes well
to new, unseen data. This step helps identify strengths and weaknesses in the model.
- Methods:
- Performance Metrics: Using metrics like accuracy, precision, recall, F1-score,
or mean squared error, depending on the task.
- Validation Techniques: Employing techniques such as cross-validation to
ensure the model is evaluated on different subsets of data.

6. Deployment:
- Purpose: Implementing the model in a real-world setting allows it to make
predictions on new data and deliver value to users.
- Considerations:
- Integration: Ensuring the model can interact with existing systems (e.g., APIs,
databases).
- Monitoring: Continuously monitoring model performance to detect issues such
as drift in data distributions or declining performance.
- Updates and Maintenance: Periodically retraining the model with new data to
maintain its accuracy and relevance.

Lecture 5: History of Machine Learning

Machine learning (ML) has evolved significantly since its inception, driven by
advances in algorithms, computational power, and data availability. The field has
undergone various phases, from early concepts to the sophisticated systems we see
today.

Key Milestones in ML Development

1. 1950s: Early Algorithms:


- Perceptron (1958): Developed by Frank Rosenblatt, the perceptron was one of
the first algorithms designed for supervised learning. It aimed to mimic how neurons
in the brain work, serving as a foundational model for neural networks. The
perceptron could classify data into two parts but faced limitations with non-linear
separability.
- Initial Concepts: The concept of machine learning began to take shape, with early
explorations into algorithms that could learn from data.

Ms. Ayushi Sharma, AP, CSE Dept.,HCST


2. 1980s: Revival of Neural Networks:
- Backpropagation (1986): The rediscovery of the backpropagation algorithm by
Geoffrey Hinton, David Parker, and others marked a significant turning point. This
technique allowed for the efficient training of multi-layer neural networks by
calculating gradients and updating weights, paving the way for deeper networks.
- Neural Network Research: Interest in neural networks surged during this period,
leading to new architectures and learning methods. Despite initial setbacks,
researchers continued to explore the potential of neural networks.

3. 1990s: Support Vector Machines and Decision Trees:


- Support Vector Machines (SVMs): Developed by Vladimir Vapnik and
colleagues, SVMs emerged as powerful tools for classification tasks, especially in
high-dimensional spaces. They introduced the concept of maximizing the margin
between classes.
- Decision Trees: Algorithms like CART (Classification and Regression Trees)
gained popularity, providing interpretable models that could handle both
classification and regression tasks. Their ease of use and interpretability made them
widely adopted in various applications.

4. 2000s: Rise of Ensemble Methods:


- Ensemble Learning: Techniques such as Random Forests and Gradient Boosting
became prominent. These methods combine multiple models to improve predictive
performance and reduce overfitting.
- Increased Data Availability: The growth of the internet and digital data sources
provided abundant data for training machine learning models, leading to improved
accuracy and generalization.

Influential Researchers

1. Geoffrey Hinton:
- Contributions: Known as the "Godfather of Deep Learning," Hinton's work on
neural networks and deep learning has been pivotal. His research on
backpropagation and deep belief networks laid the groundwork for modern deep
learning techniques.
- Impact: Hinton's innovations have influenced various applications, from speech
recognition to image classification.

2. Yann LeCun:
- Development of Convolutional Networks: LeCun is known for his work on
convolutional neural networks (CNNs), which revolutionized image processing

Ms. Ayushi Sharma, AP, CSE Dept.,HCST


tasks. His architecture, LeNet-5, was one of the first successful applications of CNNs
for digit recognition.
- Applications: CNNs are now fundamental in computer vision, powering
advancements in facial recognition, autonomous vehicles, and more.

3. Andrew Ng:
- Work in Deep Learning: Ng has made significant contributions to machine
learning and deep learning, particularly in online education and practical
applications. He co-founded Google Brain and has been a prominent advocate for
AI and machine learning education.
- Courses and Impact: Ng's online courses have democratized access to machine
learning knowledge, influencing a generation of practitioners.

Recent Advances

1. Deep Learning Breakthroughs (2010s):


- ImageNet Competition (2012): The performance of deep learning models,
particularly CNNs, surged when Alex Krizhevsky, Ilya Sutskever, and Geoffrey
Hinton's model won the ImageNet Large Scale Visual Recognition Challenge
(ILSVRC) with a substantial margin. This event showcased the potential of deep
learning for image classification tasks.
- Natural Language Processing (NLP): Deep learning techniques have also
transformed NLP, leading to advancements in models such as recurrent neural
networks (RNNs) and transformers (e.g., BERT, GPT). These models excel in tasks
like language translation, sentiment analysis, and text generation.
- Real-World Applications: The advancements in deep learning have led to
breakthroughs in various fields, including healthcare (medical imaging analysis),
finance (fraud detection), and autonomous systems (self-driving cars).
Lecture 6: Overview of Machine Learning Approaches

1. Artificial Neural Networks (ANN)


Definition: ANN is a computational model inspired by the way biological
neural networks in the human brain process information. It consists of
interconnected layers of nodes (neurons), where each connection (synapse)
has an associated weight. The network learns by adjusting these weights
based on input data and the desired output.

Applications:

Ms. Ayushi Sharma, AP, CSE Dept.,HCST


• Image and Speech Recognition: ANNs are widely used in tasks like facial
recognition and voice command interpretation.
• Natural Language Processing (NLP): They power applications such as
chatbots and language translation.
• Medical Diagnosis: ANNs assist in analyzing medical images and
predicting patient outcomes.

2. Clustering
Definition: Clustering is an unsupervised learning method that involves
grouping a set of objects in such a way that objects in the same group
(cluster) are more similar to each other than to those in other groups. It
doesn’t rely on labeled data, making it useful for exploratory data analysis.
Examples:
• K-means Clustering: A partitioning method that divides data into K
distinct clusters based on feature similarity.
• Hierarchical Clustering: Builds a hierarchy of clusters either through a
bottom-up (agglomerative) or top-down (divisive) approach.
Applications:
• Market Segmentation: Businesses use clustering to identify customer
segments for targeted marketing.
• Image Segmentation: In computer vision, clustering helps identify and
delineate objects within images.

3. Reinforcement Learning
Definition: Reinforcement learning (RL) focuses on training agents to
make decisions by interacting with their environment. The agent learns
through trial and error, receiving rewards or penalties based on its actions,
allowing it to maximize cumulative reward over time.
Applications:
• Game AI: RL algorithms are used in training agents for playing games like
chess, Go, and video games.
• Robotics: Robots learn to perform tasks, such as walking or manipulating
objects, through interactions with their surroundings.
• Recommendation Systems: RL can optimize recommendations based on
user interactions over time.

4. Decision Tree Learning


Definition: Decision tree learning is a predictive modeling approach that
creates a tree-like model of decisions based on feature values. Each

Ms. Ayushi Sharma, AP, CSE Dept.,HCST


internal node represents a feature, each branch represents a decision rule,
and each leaf node represents an outcome.
Applications:
• Classification and Regression: Decision trees are used for both
classification tasks (e.g., identifying spam emails) and regression tasks
(e.g., predicting house prices).
• Healthcare: They assist in diagnostic decision-making by evaluating
symptoms against possible conditions.
• Finance: Used for credit scoring and risk assessment.

5. Bayesian Networks
Definition: Bayesian networks are probabilistic graphical models that
represent a set of variables and their conditional dependencies through a
directed acyclic graph (DAG). Each node represents a variable, and edges
represent probabilistic dependencies.
• Applications:
• Medical Diagnosis: They help in diagnosing diseases based on symptoms
and medical history.
• Risk Management: Used in finance to assess risks based on various
economic factors.
• Decision Support Systems: Help in making informed decisions under
uncertainty by modeling complex relationships.

6. Support Vector Machine (SVM)


Definition: SVM is a supervised learning algorithm used primarily for
classification tasks. It works by finding the hyperplane that best separates
different classes in the feature space, maximizing the margin between the
classes.
Applications:
• Text Classification: Effective in categorizing documents, such as spam
detection.
• Image Classification: Used for recognizing objects in images.
• Bioinformatics: Applied in classifying proteins and gene expression data.

7. Genetic Algorithm
Definition: Genetic algorithms (GAs) are optimization techniques inspired by
the process of natural selection. They operate on a population of potential
solutions, applying mechanisms like selection, crossover, and mutation to
evolve better solutions over generations.

Ms. Ayushi Sharma, AP, CSE Dept.,HCST


Applications:
• Optimization Problems: Used in scheduling, routing, and resource allocation.
• Machine Learning: Helps in feature selection and hyperparameter tuning.
• Game Development: Employed to evolve strategies for agents in games.

Genetic Algorithm Example :

- Real-time Example: Traffic signal optimization using genetic algorithms.


- Solution Representation: Each solution represents different traffic light
timings.
- Initial Population: Random sets of timings are tested in simulations.
- Fitness Function: Evaluate how well each timing reduces congestion and
wait time.
- Selection Process: Best timings are chosen for further improvement.
- Crossover & Mutation: New timings are generated by combining and
tweaking the best ones.
- Continuous Optimization: The algorithm adjusts to real-time traffic data
for better flow.

Lecture 7: Issues in Machine Learning

Common Challenges

1. Overfitting:
- Definition: Overfitting occurs when a model learns the training data too well,
including its noise and outliers. This results in a model that performs excellently on
the training dataset but poorly on new, unseen data.
- Symptoms:
- High accuracy on training data but low accuracy on validation/test data.
- Complex models with too many parameters relative to the amount of training
data.
- Causes:
- Too many features or parameters in the model (e.g., deep neural networks with
many layers).
- Insufficient training data relative to model complexity.
- Mitigation Strategies:
- Regularization: Techniques like L1 (Lasso) and L2 (Ridge) regularization add
a penalty for larger coefficients, helping to prevent the model from becoming too
complex.

Ms. Ayushi Sharma, AP, CSE Dept.,HCST


- Cross-Validation: Using techniques like k-fold cross-validation to ensure that
the model performs well on different subsets of the data.
- Simplifying the Model: Reducing the complexity of the model (e.g., fewer
layers in a neural network) or using simpler algorithms.

2. Underfitting:
- Definition: Underfitting occurs when a model is too simplistic to capture the
underlying structure of the data, resulting in poor performance on both training and
test datasets.
- Symptoms:
- Low accuracy on both training and validation/test data.
- The model fails to capture important relationships between features.
- Causes:
- Inadequate model complexity (e.g., using a linear model for a non-linear
relationship).
- Insufficient training (not training the model long enough).
- Mitigation Strategies:
- Complex Models: Use more complex algorithms or architectures that can
capture the relationships in the data.
- Feature Engineering: Adding or transforming features to provide the model with
more informative input.
- Increase Training Time: Allowing the model to train longer if it’s underfitting
due to insufficient learning.

Data Quality

1. Importance of Clean, Relevant, and Representative Data:


- Clean Data: Data must be free from errors and inconsistencies to ensure accurate
model training. Poor quality data can lead to misleading results and model
performance issues.
- Relevant Data: The features selected should have a direct correlation with the
target variable. Irrelevant features can confuse the model and lead to overfitting.
- Representative Data: The dataset should reflect the diversity of the real-world
scenarios where the model will be applied. This ensures that the model can
generalize well to unseen data.

2. Common Data Quality Issues:


- Missing Values: Incomplete data can lead to biased or inaccurate predictions.
Strategies to handle missing values include imputation (filling in missing values) or
removing affected records.

Ms. Ayushi Sharma, AP, CSE Dept.,HCST


- Noise: Random errors or variance in measured variables can distort model
learning. Techniques such as outlier detection and data smoothing can help mitigate
this.
- Imbalanced Datasets: When one class of data is significantly underrepresented,
it can lead to biased models that perform poorly on minority classes. Techniques
such as resampling (oversampling minority classes or undersampling majority
classes) or using specialized algorithms can help address this issue.

Ethical Considerations

1. Bias:
- Definition: Bias in machine learning refers to the systematic favoritism towards
certain outcomes based on training data. This can lead to unfair or discriminatory
models.
- Sources of Bias:
- Historical biases present in the training data (e.g., biased hiring practices
reflected in a recruitment algorithm).
- Sampling bias if certain groups are underrepresented in the data.
- Mitigation Strategies:
- Conducting bias audits and testing models for fairness across different
demographic groups.
- Ensuring diverse and representative training datasets to capture various
perspectives.

2. Transparency:
- Importance of Explainability: As models become more complex (e.g., deep
learning), understanding how they make decisions becomes challenging.
Transparency in model decisions is essential for trust and accountability, especially
in high-stakes domains like healthcare and finance.
- Techniques for Explainability:
- Model-Agnostic Methods: Techniques such as LIME (Local Interpretable
Model-agnostic Explanations) or SHAP (SHapley Additive exPlanations) that help
explain predictions by approximating the model locally.
- Interpretable Models: Using simpler models (like decision trees) where
decision-making can be easily understood.

3. Privacy:
- Safeguarding Personal Data: The use of personal data in training models raises
privacy concerns. It is crucial to ensure that data is collected, stored, and processed
in compliance with regulations (e.g., GDPR).

Ms. Ayushi Sharma, AP, CSE Dept.,HCST


- Techniques for Enhancing Privacy:
- Data Anonymization: Removing personally identifiable information (PII) from
datasets to protect individual identities.
- Federated Learning: A method where models are trained across decentralized
devices or servers without sharing raw data, enhancing privacy while still learning
from diverse datasets.

NOTES
The challenges of overfitting and underfitting, data quality issues, and ethical
considerations are critical aspects of machine learning. Addressing these challenges
is essential for developing robust, fair, and effective models. Ensuring high-quality
data, mitigating bias, maintaining transparency, and protecting privacy are vital for
responsible and successful machine learning applications.
Lecture 8: Data Science vs. Machine Learning

Ms. Ayushi Sharma, AP, CSE Dept.,HCST


Ms. Ayushi Sharma, AP, CSE Dept.,HCST
Lecture 9 Applications of Machine Learning
Machine learning is a buzzword for today's technology, and it is growing very
rapidly day by day. We are using machine learning in our daily life even without
knowing it such as Google Maps, Google Assistant, Alexa, etc. Below are some most
trending real-world applications of Machine Learning:

1. Image Recognition:
Image recognition is one of the most common applications of machine learning. It
is used to identify objects, persons, places, digital images, etc. The popular use case
of image recognition and face detection is, Automatic friend tagging suggestion:
Facebook provides us a feature of auto friend tagging suggestion. Whenever we
upload a photo with our Facebook friends, then we automatically get a tagging
suggestion with name, and the technology behind this is machine learning's face
detection and recognition algorithm.
It is based on the Facebook project named "Deep Face," which is responsible for
face recognition and person identification in the picture.
2. Speech Recognition
While using Google, we get an option of "Search by voice," it comes under speech
recognition, and it's a popular application of machine learning.
Speech recognition is a process of converting voice instructions into text, and it is
also known as "Speech to text", or "Computer speech recognition." At present,
machine learning algorithms are widely used by various applications of speech

Ms. Ayushi Sharma, AP, CSE Dept.,HCST


recognition. Google assistant, Siri, Cortana, and Alexa are using speech
recognition technology to follow the voice instructions.

Ms. Ayushi Sharma, AP, CSE Dept.,HCST


3. Traffic prediction:
If we want to visit a new place, we take help of Google Maps, which shows us the
correct path with the shortest route and predicts the traffic conditions.
It predicts the traffic conditions such as whether traffic is cleared, slow-moving, or
heavily congested with the help of two ways:
• o Real Time location of the vehicle form Google Map app and sensors
• o Average time has taken on past days at the same time.

Everyone who is using Google Map is helping this app to make it better. It takes
information from the user and sends back to its database to improve the performance.
4. Product recommendations:
Machine learning is widely used by various e-commerce and entertainment
companies such as Amazon, Netflix, etc., for product recommendation to the user.
Whenever we search for some product on Amazon, then we started getting an
advertisement for the same product while internet surfing on the same browser and
this is because of machine learning.
Google understands the user interest using various machine learning algorithms and
suggests the product as per customer interest.
As similar, when we use Netflix, we find some recommendations for entertainment
series, movies, etc., and this is also done with the help of machine learning.
5. Self-driving cars:
One of the most exciting applications of machine learning is self-driving cars.
Machine learning plays a significant role in self-driving cars. Tesla, the most popular
car manufacturing company is working on self-driving car. It is using unsupervised
learning method to train the car models to detect people and objects while driving.
6. Email Spam and Malware Filtering:
Whenever we receive a new email, it is filtered automatically as important, normal,
and spam. We always receive an important mail in our inbox with the important
symbol and spam emails in our spam box, and the technology behind this is Machine
learning. Below are some spam filters used by Gmail:
• o Content Filter
• o Header filter
• o General blacklists filter
• o Rules-based filters
• o Permission filters

Ms. Ayushi Sharma, AP, CSE Dept.,HCST


Some machine learning algorithms such as Multi-Layer Perceptron, Decision tree,
and Naïve Bayes classifier are used for email spam filtering and malware detection.

Ms. Ayushi Sharma, AP, CSE Dept.,HCST


7. Virtual Personal Assistant:
We have various virtual personal assistants such as Google assistant, Alexa,
Cortana, Siri. As the name suggests, they help us in finding the information using
our voice instruction. These assistants can help us in various ways just by our voice
instructions such as Play music, call someone, Open an email, Scheduling an
appointment, etc.
These virtual assistants use machine learning algorithms as an important part.
These assistant record our voice instructions, send it over the server on a cloud, and
decode it using ML algorithms and act accordingly.
8. Online Fraud Detection:
Machine learning is making our online transaction safe and secure by detecting fraud
transaction. Whenever we perform some online transaction, there may be various
ways that a fraudulent transaction can take place such as fake accounts, fake ids,
and steal money in the middle of a transaction. So to detect this, Feed Forward
Neural network helps us by checking whether it is a genuine transaction or a fraud
transaction.
For each genuine transaction, the output is converted into some hash values, and
these values become the input for the next round. For each genuine transaction, there
is a specific pattern which gets change for the fraud transaction hence, it detects it
and makes our online transactions more secure.
9. Stock Market trading:
Machine learning is widely used in stock market trading. In the stock market, there
is always a risk of up and downs in shares, so for this machine learning's long short
term memory neural network is used for the prediction of stock market trends.
10. Medical Diagnosis:
In medical science, machine learning is used for diseases diagnoses. With this,
medical technology is growing very fast and able to build 3D models that can predict
the exact position of lesions in the brain.
It helps in finding brain tumors and other brain-related diseases easily.
11. Automatic Language Translation:
Nowadays, if we visit a new place and we are not aware of the language then it is
not a problem at all, as for this also machine learning helps us by converting the text
into our known languages. Google's GNMT (Google Neural Machine Translation)
provide this feature, which is a Neural Machine Learning that translates the text into
our familiar language, and it called as automatic translation.
The technology behind the automatic translation is a sequence to sequence
learning algorithm, which is used with image recognition and translates the text

Ms. Ayushi Sharma, AP, CSE Dept.,HCST


from one language to another language.

Ms. Ayushi Sharma, AP, CSE Dept.,HCST

You might also like