Unit 1 - Machine Learning - NOTES1 - ML

Uploaded by

mauli.imscit21

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views

Unit 1 - Machine Learning - NOTES1 - ML

Uploaded by

mauli.imscit21

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 52

Unit 1

Introduction to Machine
Learning
Machine Learning

 Machine learning is an application of artificial intelligence (AI)

that provides systems the ability to automatically learn and
improve from experience without being explicitly programmed.
 Machine learning focuses on the development of computer
programs that can access data and use it to learn for themselves.
 Machine learning algorithms are used in a wide variety of
applications, such as in medicine, email filtering, speech
recognition, and computer vision
Machine Learning
Machine Learning

 The process of learning begins with observations or data, such as

examples, direct experience, or instruction, in order to look for
patterns in data and make better decisions
 The primary aim is to allow the computers learn automatically
 Machine learning algorithms use historical data as input to
predict new output values.
 The primary aim is to allow the computers learn automatically
without human intervention or assistance and adjust actions
accordingly.
Machine Learning Examples
 Image Recognition
 Speech Recognition
 Medical diagnosis
Types of Machine Learning
Supervised Learning

Supervised learning is when the model is getting trained

on a labelled dataset.

A labelled dataset is one that has both input and output
parameters.

In this type of learning, both training and validation,

datasets are labelled

Figure A: It is a dataset of a shopping store that is useful in
predicting whether a customer will purchase a particular product
under consideration or not based on his/ her gender, age, and
salary.

Input: Gender, Age, Salary

Output: Purchased i.e. 0 or 1; 1 means yes the customer will
purchase and 0 means that the customer won’t purchase it.

While training the model, data is usually split in the ratio of 80:20

i.e. 80% as training data and rest as testing data.

In training data, we feed input as well as output for 80% of data. The
model learns from training data only.

By learning, it means that the model will build some logic of its own.

Once the model is ready then it is good to be tested.

At the time of testing, the input is fed from the remaining 20% data

which the model has never seen before, the model will predict some value
and we will compare it with actual output and calculate the accuracy.
Examples of Supervised Learning

Advertisement Popularity

Email Filtering

Face Recognition
UnSupervised Learning

Unsupervised machine learning algorithms are used

when the information used to train is neither classified

nor labeled.

Models itself find the hidden patterns and insights from the

given data. It can be compared to learning which takes place

in the human brain while learning new things.

The system doesn’t figure out the right output, but it explores

the data and can draw inferences from datasets to describe

hidden structures from unlabeled data.
E.g. Email Filtering, Face Recognition
UnSupervised Learning

The goal of unsupervised learning is to find the underlying structure of
dataset, group that data according to similarities, and represent that dataset
in a compressed format.

Example: Suppose the unsupervised learning algorithm is given an input
dataset containing images of different types of cats and dogs. The algorithm
is never trained upon the given dataset, which means it does not have any
idea about the features of the dataset. The task of the unsupervised learning
algorithm is to identify the image features on their own.

Unsupervised learning algorithm will perform this task by clustering the
image dataset into the groups according to similarities between images.
UnSupervised Learning

It’s a type of learning where we don’t give a target to our model while training

i.e. training model has only input parameter values.

The model by itself has to find which way it can learn.

Data-set in Figure A is mall data that contains information of its

clients that subscribe to them. Once subscribed they are provided a membership card
and so the mall has complete information about the customer and his/her every
purchase.

Now using this data and unsupervised learning techniques, the mall can easily group
clients based on the parameters we are feeding in.
Semi Supervised Learning

In this type of learning, the algorithm is trained upon a

combination of labeled and unlabelled data.

This combination will contain a very small amount of

labeled data and a very large amount of unlabelled data.

It uses the unsupervised techniques to predict labels and then
feed these labels to supervised techniques. This technique is
mostly applicable in the case of image data sets where usually
all images are not labeled.
Reinforcement

In this technique, the model keeps on increasing its performance using
Reward Feedback to learn the behavior or pattern

it will make a lot of mistakes in the beginning.

So long as we provide some sort of signal to the algorithm that associates
good behaviors with a positive signal and bad behaviors with a negative
one

learning algorithm learns to make less mistakes than it used to.

E.x. Video game – Mario game
Reinforcement

These algorithms are specific to a particular problem e.g.
Google Self Driving car, AlphaGo where a bot competes with
humans and even itself to getting better and better performers of
Go Game.
ML Applications

Virtual Personal Assistant - Siri, Alexa, Google Now are some of the
popular examples of virtual personal assistants.

Email Spam and Malware Filtering - There are a number of spam filtering
approaches that email clients use. To ascertain that these spam filters are
continuously updated, they are powered by machine learning.

Product Recommendations - Product recommendation is one of the stark
features of almost every e-commerce website today, which is an advanced
application of machine learning techniques. Using machine learning and
AI, websites track your behavior based on your previous purchase, your
searching pattern, your cart history, and make product recommendations.
ML Applications

Online Fraud Detection - Machine learning is proving its
potential to make cyberspace a secure place and tracking
monetary frauds online is one of its examples. For example:
Paypal is using ML for protection against money laundering.

Image Recognition – It is an approach for cataloging and
detecting a feature or an object in the digital image. E.g. pattern
recognition, face detection, or face recognition.
ML Applications

Sentiment Analysis - Sentiment analysis is a real-time machine
learning application that determines the emotion or opinion of
the speaker or the writer.
Machine Learning Life Cycle

Machine learning life cycle is a cyclic process to build an efficient
machine learning project. The main purpose of the life cycle is to find a
solution to the problem or project.

It is needed to understand the problem because the good result depends
on the better understanding of the problem.

In the complete life cycle process, to solve a problem, we create a
machine learning system called "model", and this model is created by
providing "training". But to train a model, we need data, hence, life cycle
starts by collecting data.
1. Collecting Data:

Identify the different data sources, as data can be collected from various sources
such as files, database, internet, or mobile devices.

The quantity and quality of the collected data will determine the efficiency of the
output. The more will be the data, the more accurate will be the prediction.

This step includes the below tasks:

Identify various data sources

Collect data

Integrate the data obtained from different sources

Coherent set of data is also called as a dataset.

2. Data preparation

Prepare the data to use in machine learning training.

This step can be further divided into two processes:

Data exploration:
It is used to understand the nature of data that we have to work with. We
need to understand the characteristics, format, and quality of data.
A better understanding of data leads to an effective outcome.

Data pre-processing:
preprocessing of data for its analysis.
3. Data Wrangling

It is the process of cleaning and converting raw data into a useable
format.

It is the process of cleaning the data, selecting the variable to use, and
transforming the data in a proper format to make it more suitable for
analysis

Cleaning of data is required to address the quality issues.

collected data may have various issues, including:

Missing Values

Duplicate data

Invalid data

Noise
Missing data
Noise data

Duplicate data
4. Data Analysis

This step involves:

Selection of analytical techniques

Building models

Review the result

build a machine learning model to analyze the data using various analytical
techniques and review the outcome.

It starts with the determination of the type of the problems, where we select
the machine learning techniques such as Classification, Regression, Cluster
analysis, Association, etc. then build the model using prepared data, and
evaluate the model.
5. Train Model

train model to improve its performance for better outcome of the
problem.

Training a model is required so that it can understand the various
patterns, rules, and, features.
6. Test Model

G,N,O,Y,6 - 97.27

Machine learning model has been trained on a given dataset, then we test
the model. In this step, we check for the accuracy of our model by
providing a test dataset to it.

Testing the model determines the percentage accuracy of the model as per
the requirement of project or problem.
7. Deployment


The last step of machine learning life cycle is deployment, where we
deploy the model in the real-world system.
AI vs ML
Artificial Intelligence Machine learning
Artificial intelligence is a technology Machine learning is a subset of AI which
which enables a machine to simulate allows a machine to automatically learn
human behavior. from past data without programming
explicitly.
The goal of AI is to make a smart The goal of ML is to allow machines to
computer system like humans to solve learn from data so that they can give
complex problems. accurate output.
In AI, we make intelligent systems to In ML, we teach machines with data to
perform any task like a human. perform a particular task and give an
accurate result.
Machine learning and deep learning are the Deep learning is a main subset of machine
two main subsets of AI. learning.
AI has a very wide range of scope. Machine learning has a limited scope.

AI is working to create an intelligent Machine learning is working to create

system which can perform various machines that can perform only those
complex tasks. specific tasks for which they are trained.
AI system is concerned about maximizing Machine learning is mainly concerned
the chances of success. about accuracy and patterns.
The main applications of AI are Siri, The main applications of machine learning
customer support using chatboats, Expert are Online recommender system, Google
System, Online game playing, intelligent search algorithms, Facebook auto friend
humanoid robot, etc. tagging suggestions, etc.
On the basis of capabilities, AI can be Machine learning can also be divided into
divided into three types, which are, Weak mainly three types that are Supervised
AI, General AI, and Strong AI. learning, Unsupervised learning, and
Reinforcement learning.
Data in Machine Learning

DATA: It can be any unprocessed fact, value, text, sound, or
picture that is not being interpreted and analyzed. Data is the
most important part of all Data Analytics, Machine Learning,
Artificial Intelligence.

INFORMATION: Data that has been interpreted and
manipulated and has now some meaningful inference for the
users.

KNOWLEDGE: Combination of inferred information,
experiences, learning, and insights. Results in awareness or
concept building for an individual or organization.
Data in Machine Learning

Training Data: The part of data we use to train our model. This
is the data that your model actually sees(both input and output)
and learns from.

Validation Data: The part of data that is used to do a frequent
evaluation of the model, fit on the training dataset along with improving
involved hyperparameters (initially set parameters before the model
begins learning). This data plays its part when the model is actually
training.

Testing Data: Once our model is completely trained, testing data
provides an unbiased evaluation. When we feed in the inputs of Testing
data, our model will predict some values(without seeing actual output).
After prediction, we evaluate our model by comparing it with the actual
output present in the testing data. This is how we evaluate and see how
much our model has learned from the experiences feed in as training
data, set at the time of training.

Properties of Data –


Volume: Scale of Data. With the growing world population and technology at
exposure, huge data is being generated each and every millisecond.

Variety: Different forms of data – healthcare, images, videos, audio clippings.

Velocity: Rate of data streaming and generation.

Value: Meaningfulness of data in terms of information that researchers can infer
from it.

Veracity: Certainty and correctness in data we are working on.
Data Processing
Data Cleaning

Data Cleaning means the process of identifying the incorrect,
incomplete, inaccurate, irrelevant or missing part of the data and
then modifying, replacing or deleting them according to the
necessity.
Inconsistent column

DataFrame (A Data frame is a two-dimensional data structure,
i.e., data is aligned in a tabular fashion in rows and columns)
contains columns that are irrelevant or never going to use them
then it can be dropped to give more focus on the columns.
Missing data:

Most of the dataset contains missing values.

Handling missing values is very important because it may affect your analysis
and machine learning models.

If you find any missing values in the dataset you can perform any of these
three task on it:

1. Leave as it is

2. Filling the missing values

3. Drop them
Outliers:

“In statistics, an outlier is a data point that differs significantly
from other observations.”

That means an outlier indicates a data point that is significantly
different from the other data points in the data set.

Outliers can be created due to the errors in the experiments or
the variability in the measurements.

All the values in math column are in range between 90–95
except 20 which is significantly different from others. It can be
an input error in the dataset. So we can call it a outliers. One
thing should be added here — “ Not all the outliers are bad data
points. Some can be errors but others are the valid values. ”
Duplicate rows:

Datasets may contain duplicate entries. It is one of the most
easiest task to delete duplicate rows.
Roll No Math Science

1 50 55

2 100 90

3 80 85
Data cleansing tools

Openrefine

Trifacta Wrangler

TIBCO Clarity

Cloudingo

IBM Infosphere Quality Stage
Tidy data set:

Tidy dataset means each columns represent separate variables
and each rows represent individual observations. But in untidy
data each columns represent values but not the variables. Tidy
data is useful to fix common data problem.

Machine Learning Notes
100% (10)
Machine Learning Notes
19 pages
Personal Loan Campaign Final
No ratings yet
Personal Loan Campaign Final
12 pages
Internet and Web Programming Project Report Review 3: Travel Tours Booking System
No ratings yet
Internet and Web Programming Project Report Review 3: Travel Tours Booking System
22 pages
AI Lecture FirstYear Unit 4 Introduction To ML
No ratings yet
AI Lecture FirstYear Unit 4 Introduction To ML
8 pages
MLT Unit 1
No ratings yet
MLT Unit 1
15 pages
ML-Unit 1
No ratings yet
ML-Unit 1
43 pages
ML-Unit 1 Merged
No ratings yet
ML-Unit 1 Merged
151 pages
Types of Machine Learning
No ratings yet
Types of Machine Learning
14 pages
DL Unit-1
No ratings yet
DL Unit-1
25 pages
Data Science Solutions IA 2
No ratings yet
Data Science Solutions IA 2
16 pages
Machine Learning Is The Branch of
No ratings yet
Machine Learning Is The Branch of
12 pages
ML Unit-1 (CEC)
No ratings yet
ML Unit-1 (CEC)
108 pages
5_6095834670757318868
No ratings yet
5_6095834670757318868
62 pages
Machine Learning
No ratings yet
Machine Learning
12 pages
There Are Key Areas in The Process of Machine Learning, Like
No ratings yet
There Are Key Areas in The Process of Machine Learning, Like
45 pages
Supervised Machine Learning
No ratings yet
Supervised Machine Learning
20 pages
DAIOT UNIT 5 (1) Own
No ratings yet
DAIOT UNIT 5 (1) Own
13 pages
ML IN FASHION INDUSTRY
No ratings yet
ML IN FASHION INDUSTRY
40 pages
ML Unit 1
No ratings yet
ML Unit 1
19 pages
Machine Learning
No ratings yet
Machine Learning
12 pages
Machine Learning
No ratings yet
Machine Learning
146 pages
ML - Module 1
No ratings yet
ML - Module 1
30 pages
Python UNIT-5
100% (1)
Python UNIT-5
67 pages
Fulldoc - Dsec Mca - Crime Prediction
No ratings yet
Fulldoc - Dsec Mca - Crime Prediction
56 pages
Unit 1 1. Define Machine Learning. Application of Machine Learning Applications of ML
No ratings yet
Unit 1 1. Define Machine Learning. Application of Machine Learning Applications of ML
40 pages
machine learning and AI
No ratings yet
machine learning and AI
13 pages
Everything You Need To Know About Machine Learning
No ratings yet
Everything You Need To Know About Machine Learning
6 pages
Unit 1
No ratings yet
Unit 1
4 pages
Unit-5 Machine Learning
No ratings yet
Unit-5 Machine Learning
25 pages
Introducion to ML
No ratings yet
Introducion to ML
29 pages
Machine Learning
No ratings yet
Machine Learning
35 pages
What Is Machine Learning-UNIT III
No ratings yet
What Is Machine Learning-UNIT III
12 pages
ML unit-I part 1
No ratings yet
ML unit-I part 1
7 pages
Module 1: Introduction To Machine Learning: 1. What Is Machine Learning? How Is It Different From Human Learning?
No ratings yet
Module 1: Introduction To Machine Learning: 1. What Is Machine Learning? How Is It Different From Human Learning?
21 pages
ML Unit-1
No ratings yet
ML Unit-1
12 pages
INTRODUCTION TO MACHINE LEARNING
No ratings yet
INTRODUCTION TO MACHINE LEARNING
31 pages
Machine Learning
100% (1)
Machine Learning
23 pages
10 Machine Learning
No ratings yet
10 Machine Learning
9 pages
Machine Learning 1
No ratings yet
Machine Learning 1
34 pages
ai.docx
No ratings yet
ai.docx
13 pages
ML LAB MANUAL
No ratings yet
ML LAB MANUAL
53 pages
UNIT4
No ratings yet
UNIT4
12 pages
ML Unit 1
No ratings yet
ML Unit 1
20 pages
Chapter1
No ratings yet
Chapter1
30 pages
Machine Learning - Ii Unit 1
No ratings yet
Machine Learning - Ii Unit 1
21 pages
Machine Learning - its types
No ratings yet
Machine Learning - its types
8 pages
Part 2 Introduction To ML
No ratings yet
Part 2 Introduction To ML
13 pages
CBSYLLABUS BDA 1
No ratings yet
CBSYLLABUS BDA 1
4 pages
AI Session 3 Machine Learning Slides
No ratings yet
AI Session 3 Machine Learning Slides
35 pages
Datascience
No ratings yet
Datascience
14 pages
AI unit 5
No ratings yet
AI unit 5
27 pages
ML Unit-1 - UA
No ratings yet
ML Unit-1 - UA
44 pages
Unit-1 new
No ratings yet
Unit-1 new
48 pages
Unit-1 Part-1 Material
No ratings yet
Unit-1 Part-1 Material
45 pages
ai.docx (2)
No ratings yet
ai.docx (2)
13 pages
ML 1
No ratings yet
ML 1
79 pages
New Microsoft Word Document
No ratings yet
New Microsoft Word Document
11 pages
An Enlightenment To Machine Learning - Resp
No ratings yet
An Enlightenment To Machine Learning - Resp
22 pages
What Is Machine Learning
No ratings yet
What Is Machine Learning
13 pages
All algos_of_ML
No ratings yet
All algos_of_ML
31 pages
What Is Machine Learning
No ratings yet
What Is Machine Learning
7 pages
Mastering Machine Learning: A Comprehensive Guide to Success
From Everand
Mastering Machine Learning: A Comprehensive Guide to Success
Rick Spair
No ratings yet
2SC1740S PDF
No ratings yet
2SC1740S PDF
2 pages
Computer Aided Analysis and Synthesis of Mechanisms
100% (1)
Computer Aided Analysis and Synthesis of Mechanisms
10 pages
The Pakistan Institute For Parliamentary Services Employees (Recruitment An
No ratings yet
The Pakistan Institute For Parliamentary Services Employees (Recruitment An
23 pages
VFD F
No ratings yet
VFD F
112 pages
SQL - Set Operations and Subqueries: Create Two Tables With The Following Details
No ratings yet
SQL - Set Operations and Subqueries: Create Two Tables With The Following Details
10 pages
SWIFTgpi Newsflash July Application Providers v02
No ratings yet
SWIFTgpi Newsflash July Application Providers v02
12 pages
Assistive Technologies
No ratings yet
Assistive Technologies
4 pages
Op Amp Exp
No ratings yet
Op Amp Exp
23 pages
Faculty of Business and Management Universiti Teknologi Mara
No ratings yet
Faculty of Business and Management Universiti Teknologi Mara
8 pages
FXAQ-A Wall Mount Unit - 948
No ratings yet
FXAQ-A Wall Mount Unit - 948
2 pages
Marlin Ch09
No ratings yet
Marlin Ch09
35 pages
LaTeX For Economists
No ratings yet
LaTeX For Economists
12 pages
Users Manual 4933914
No ratings yet
Users Manual 4933914
9 pages
Sample Technical Report: Measurement and Error
No ratings yet
Sample Technical Report: Measurement and Error
10 pages
24-01-2024 - telugu parinayam daily
No ratings yet
24-01-2024 - telugu parinayam daily
221 pages
CBC CHS
No ratings yet
CBC CHS
78 pages
Assignment 1 BUS5BIM PDF
No ratings yet
Assignment 1 BUS5BIM PDF
3 pages
Optimization: 1 Motivation
No ratings yet
Optimization: 1 Motivation
20 pages
Strobe and Handshake Signal
No ratings yet
Strobe and Handshake Signal
11 pages
Managing & Tabulating Data in Microsoft Excel
No ratings yet
Managing & Tabulating Data in Microsoft Excel
184 pages
637669063
No ratings yet
637669063
6 pages
WB713 - Balanceadora
No ratings yet
WB713 - Balanceadora
1 page
05 Query Processing and Optimization-TELU
No ratings yet
05 Query Processing and Optimization-TELU
56 pages
5G Wireless Technology Agency by Slidesgo
No ratings yet
5G Wireless Technology Agency by Slidesgo
40 pages
Small Size, Big Power: Fast, Accurate, Versatile XRF Analysis
No ratings yet
Small Size, Big Power: Fast, Accurate, Versatile XRF Analysis
2 pages
PIM Workflow Document - BRD
No ratings yet
PIM Workflow Document - BRD
25 pages
Goes To Campus: Last Update Desember 2018
No ratings yet
Goes To Campus: Last Update Desember 2018
27 pages
Basics of DL: Prof. Leal-Taixé and Prof. Niessner 1
No ratings yet
Basics of DL: Prof. Leal-Taixé and Prof. Niessner 1
76 pages