A PDF
A PDF
A PDF
AN INTERNSIP REPORT
Submitted by
SAVITHAVANI A 412420106059
BACHELOR OF ENGINEERING
IN
ELECTRONICS AND COMMUNICATION ENGINEERING
1
BONAFIDE CERTIFICATE
Certified that this internship report "MACHINE LEARNING" is the Bonafide work of
"SAVITHAVANI A (412420106059)" who carried out the internship work in the
"TECHVOLT SOFTWARE PVT.LTD"
EVALUATORS
2
TABLE OF CONTENT
3
ABSTRACT
Present day computer applications require the representation of huge amount of complex
knowledge and data in programs and thus require tremendous amount of work. Our ability
to code the computers falls short of the demand for applications. If the computers are
endowed with the learning ability, then our burden of coding the machine is eased for at
least reduced). This is particularly true for developing expert systems where the
"bottleneck" is to extract the expert's knowledge and feed the knowledge to computers. The
present day computer programs in general (with the exception of some Machine Learning
programs) cannot correct their own errors or improve from past mistakes, or learn to
perform a new task by analogy to a previously seen task. In contrast, human beings are
capable of all the above. Machine Learning will produce smarter computers capable of all
the above intelligent behaviour. The area of Machine Learning deals with the design of
programs that can learn rules from data, adapt to changes, and improve performance with
experience. In addition to being one of the initial dreams of Computer Science. Machine
Learning has become crucial as computers are expected to solve increasingly complex
problems and become more integrated into our daily lives. This is a hard problem, since
making a machine learn from its computational tasks requires work at several levels, and
complexities and ambiguities arise at each of those levels.
4
INTRODUCTION TO MACHINE LEARNING
Machine learning is the science of getting computers to act without being explicitly
programmed. In the past decade, machine learning has given us self-driving cars, practical
speech recognition, effective web search, and a vastly improved understanding of the
human genome. Machine learning is so pervasive today that you probably use it dozens of
times a day without knowing it. Many researchers also think it is the best way to make
progress towards human-level Al.
Learning?
5
The general effect of learning in a system is the improvement of the system's capability to
solve problems. It is hard to imagine a system capable of learning cannot improve its
problem-solving performance. A system with learning capability should be able to do self
changing in order to perform better in its future problem-solving.
We also note that I cannot take place in isolation: We typically learn something (knowledge
K) to perform some tasks (T), through some experience E, and whether we have learned
well or not will be judged by some performance criteria P at the task T.
2) To solve the same problem more effectively and give better quality.
solutions.
The goal of ML, in simple words, is to understand the nature of (human and other forms
of) learning, and to build learning capability in computers. To be more specific, there are
three aspects of the goals of ML.
1) To make the computers smarter, more intelligent. The more direct objective
in this aspect is to develop systems (programs) for specific practical learning tasks in
application domains.
6
Why the goals of ML are important and desirable.?
The present day computer programs in general (with the exception of some ML programs)
cannot correct their own errors or improve from past mistakes, or learn to perform a new
task by analogy to a previously seen task. In contrast, human beings are capable of all the
above. ML will produce smarter computers capable of all the above intelligent behavior.
It is clear that central to our intelligence is our ability to learn. Thus a thorough
understanding of human learning process is crucial to understand human intelligence. ML
will gain us the insight into the underlying principles of human learning and that may lead
to the discovery of more effective education techniques. It will also contribute to the design
of machine learning systems.
7
In order to apply the actual algorithm to the data, we need to have that complete
unstructured data into a structured and shaped data for which a process of pre-massaging is
required, through which the data is passed. Finally, we get a candidate copy of data which
could be processes through the algorithm to get the actual golden copy.
After the data is pre-processed, we get some good structured data, and this data is now an
input for machine learning. But is this a one-time job? Of course not, the process has to be
iterative, and it has to be iterative until the data is available. In machine learning the major
chunk of time is spent in this process. That is, working on the data to make it structured,
clean, ready and available. Once the data is available, the algorithms could be applied to
the data. Not only pre-processing tools, but the machine learning products also offer a large
number of machine learning algorithms as well. The result of the algorithm applied data is
a model, but now the question is whether this is the final model we needed.
No, it is the candidate model that we got. Candidate model means the first most appropriate
model that we get, but still it needs to be massaged. But do we get only one candidate
model? Of course not, since this is an iterative process, we do not actually know what the
best candidate model is, until we again and again produce several candidate models through
the iterative process. We do it until we get the model that is good enough to be deployed.
Once the model is deployed, applications start making use of it, so there is iteration at small
levels and at the largest level as well.
We need to repeat the entire process again and again and re-create the model at regular
intervals. The reason again for this process is very simple, it's because the scenarios and
factors change and we need to have our model up to date and real all the time. This could
eventually also mean to process new data or applying new algorithms altogether.
8
CLASSIFICATION OF MACHINE LEARNING
SUPERVISED LEARNING:
Supervised learning is the types of machine learning in which machines are trained using
well "labelled" training data, and on basis of that data, machines predict the output. The
labelled data means some input data is already tagged with the correct output.
9
In supervised learning, the training data provided to the machines work as the supervisor
that teaches the machines to predict the output correctly. It applies the same concept as a
student learns in the supervision of the teacher. Supervised learning is a process of providing
input data as well as correct output data to the machine learning model. The aim of a
supervised learning algorithm is to find a mapping function to map the input variable(x)
with the output variable(y).
In the real-world, supervised learning can be used for Risk Assessment, Image
classification, Fraud Detection, spam filtering, etc.
1. Regression
Regression algorithms are used if there is a relationship between the input variable and the
output variable. It is used for the prediction of continuous variables, such as Weather
forecasting, Market Trends, etc. Below are some popular Regression algorithms which come
under supervised learning:
o Linear Regression
o Regression Trees
o Non Linear Regression
o Bayesian Linear Regression
o Polynomial Regression
2. Classification
Classification algorithms are used when the output variable is categorical, which means there are
two classes such as Yes-No, Male-Female, True-false, etc.
o Random Forest
o Logistic regression
10
o Support vector machine
o Decision tree
UNSUPERVISED LEARNING:
11
TYPES OF UNSUPERVISED LEARNING:
1.Clustering:
Clustering is a method of grouping the objects into clusters such that objects with most
similarities remains into a group and has less or no similarities with the objects of another
group. Cluster analysis finds the commonalities between the data objects and categorizes
them as per the presence and absence of those commonalities.
2.Association:
An association rule is an unsupervised learning method which is used for finding the
relationships between variables in the large database. It determines the set of items that
occurs together in the dataset. Association rule makes marketing strategy more effective.
Such as people who buy X item (suppose a bread) are also tend to purchase Y (Butter/Jam)
item.
Semi-supervised learning is an important category that lies between the Supervised and
Unsupervised machine learning. Although Semi-supervised learning is the middle ground
between supervised and unsupervised learning and operates on the data that consists of a
12
few labels, it mostly consists of unlabelled data. As labels are costly, but for the corporate
purpose, it may have few labels.
However, labelled data exists with a very small amount while it consists of a huge amount
of unlabelled data. Initially, similar data is clustered along with an unsupervised learning
algorithm, and further, it helps to label the unlabelled data into labelled data. It is why label
data is a comparatively, more expensive acquisition than unlabelled data.
We can imagine these algorithms with an example. Supervised learning is where a student
is under the supervision of an instructor at home and college. Further, if that student is self
analysing the same concept without any help from the instructor, it comes under
unsupervised learning. Under semi-supervised learning, the student has to revise itself after
analysing the same concept under the guidance of an instructor at college.
Semi-supervised learning models are becoming more popular in the industries. Some of the main
applications are as follows.
13
o Protein sequence classification- DNA strands are larger, they require active human
intervention. So, the rise of the Semi-supervised model has been proximate in this
field.
o Text document classifier- As we know, it would be very unfeasible to find a large
amount of labelled text data, so semi-supervised learning is an ideal model to
overcome this.
REINFORCEMENT LEARNING:
14
In Reinforcement Learning, the agent learns automatically using feedbacks without any
labelled data, unlike supervised learning Since there is no labelled data, so the agent is bound
to learn by its experience only.
RL solves a specific type of problem where decision making is sequential, and the goal is
long-term, such as game-playing, robotics, etc. The agent interacts with the environment
and explores it by itself. The primary goal of an agent in reinforcement learning is to
improve the performance by getting the maximum positive rewards.
The agent learns with the process of hit and trial, and based on the experience, it learns to
perform the task in a better way. Hence, we can say that "Reinforcement learning is a type
of machine learning method where an intelligent agent (computer program) interacts with
the environment and learns to act within that." How a Robotic dog learns the movement of
his arms is an example of Reinforcement learning.
It is a core part of AI, and all AI agent works on the concept of reinforcement learning. Here
we do not need to pre-program the agent, as it learns from its own experience without any
human intervention.
The agent continues doing these three things (take action, change state/remain in the same
state, and get feedback), and by doing these actions, he learns and explores the environment.
The agent learns that what actions lead to positive feedback or rewards and what actions
lead to negative feedback penalty. As a positive reward, the agent gets a positive point, and
as a penalty, it gets a negative point.
15
TYPES OF MACHINE LEARNING ALGORITHMS
Machine learning comes in many different flavours, depending on the algorithm and its
objectives. You can divide machine learning algorithms into three main groups based on
their purpose:
Y=F(X)
The goal is to approximate the mapping function so well that when you have new input data
(x) that you can predict the output variables (Y) for that data.We know the correct answers,
the algorithm iteratively makes predictions on the training data and is corrected by the
teacher. Learning stops when the algorithm achieves an acceptable level of performance.
16
Classification:
A classification problem is when the output variable is a category, such as "red" or
"blue" or "disease" and "no disease".
Regression:
A regression problem is when the output variable is a real(continues) value, such
as "dollars" or "weight".
Linear Regression:
Linear regression is a linear model, e.g. a model that assumes a linear relationship
between the input variables (x) and the single output variable (y).
When there is a single input variable (x), the method is referred to as simple linear
regression. When there are multiple input variables, literature from statistics often refers to
the method as multiple linear regression. To define the supervised learning problem more
formally, given a training set, the aim is to learn a function / so that (r) is a predictor for the
corresponding value of Y. This function his called a hypothesis. Next, we need to decide
while designing a learning algorithm is the representation if the hypothesis function as a
function of r. Let us initially assume that the hypothesis function looks like this. These are
called parameters.
17
Random Forest is a popular machine learning algorithm that belongs to the supervised
learning technique. It can be used for both Classification and Regression problems in ML. It
is based on the concept of ensemble learning, which is a process of combining multiple
classifiers to solve a complex problem and to improve the performance of the model.
As the name suggests, "Random Forest is a classifier that contains a number of decision trees
on various subsets of the given dataset and takes the average to improve the predictive
accuracy of that dataset." Instead of relying on one decision tree, the random forest takes the
prediction from each tree and based on the majority votes of predictions, and it predicts the
final output.
The Working process can be explained in the below steps and diagram:
Step-1: Select random K data points from the training set.
Step-2: Build the decision trees associated with the selected data points (Subsets).
Step-3: Choose the number N for decision trees that you want to build.
Step-5: For new data points, find the predictions of each decision tree, and assign the new data points to
the category that wins the majority votes.
K MEANS CLUSTERING:
18
K-Means Clustering is an Unsupervised Learning Algorithm, which groups the unlabelled
dataset into different clusters. Here K defines the number of pre-defined clusters that need to
be created in the process, as if K=2, there will be two clusters, and for K=3, there will be
three clusters, and so on.
It is an iterative algorithm that divides the unlabelled dataset into k different clusters in such a
way that each dataset belongs only one group that has similar properties.
It allows us to cluster the data into different groups and a convenient way to discover the categories of
groups in the unlabelled dataset on its own without the need for any training.
It is a centroid-based algorithm, where each cluster is associated with a centroid. The main
aim of this algorithm is to minimize the sum of distances between the data point and their
corresponding clusters.
The algorithm takes the unlabelled dataset as input, divides the dataset into k-number of
clusters, and repeats the process until it does not find the best clusters. The value of k should
be predetermined in this algorithm.
o Determines the best value for K centre points or centroids by an iterative process.
o Assigns each data point to its closest k-center. Those data points which are near to the
particular k-center, create a cluster.
Step-2: Select random K points or centroids. (It can be other from the input dataset).
Step-3: Assign each data point to their closest centroid, which will form the predefined K clusters.
Step-4: Calculate the variance and place a new centroid of each cluster.
Step-5: Repeat the third steps, which means reassign each datapoint to the new closest centroid of each
cluster.
19
Step-6: If any reassignment occurs, then go to step-4 else go to FINISH.
HIERARCHIAL CLUSTERING:
20
The working of the Hierarchical clustering algorithm can be explained using the below steps:
Step-1: Create each data point as a single cluster. Let's say there are N data points, so the
number of clusters will also be N.
Step-2: Take two closest data points or clusters and merge them to form one cluster. So,
there will now be N-1 clusters.
Step-3: Again, take the two closest clusters and merge them together to form one cluster.
There will be N-2 clusters.
Step-4: Repeat Step 3 until only one cluster left. So, we will get the following clusters.
Consider the below images:
Step-5: Once all the clusters are combined into one big cluster, develop the dendrogram to
divide the clusters as per the problem.
NEURAL NETWORKS
21
Neural Networks are computational models that mimic the complex functions of the human
brain. The neural networks consist of interconnected nodes or neurons that process and
learn from data, enabling tasks such as pattern recognition and decision making in machine
learning. The article explores more about neural networks, their working, architecture and
more.
These include:
• The neural network is simulated by a new environment.
• Then the free parameters of the neural network are changed as a result of this
simulation.
• The neural network then responds in a new way to the environment because of
the changes in its free parameters.
The ability of neural networks to identify patterns, solve intricate puzzles, and adjust to
changing surroundings is essential. Their capacity to learn from data has far-reaching
effects, ranging from revolutionizing technology like natural language processing and self
driving automobiles to automating decision-making processes and increasing efficiency in
numerous industries. The development of artificial intelligence is largely dependent on
neural networks, which also drive innovation and influence the direction of technology.
Forward Propagation:
Input Layer: Each feature in the input layer is represented by a node on the network, which
receives input data.
Weights and Connections: The weight of each neuronal connection indicates how strong
the connection is. Throughout training, these weights are changed.
22
Hidden Layers: Each hidden layer neuron processes inputs by multiplying them by
weights, adding them up, and then passing them through an activation function. By doing
this, non-linearity is introduced, enabling the network to recognize intricate patterns.
Output: The final result is produced by repeating the process until the output layer is
reached.
Backpropagation:
Loss Calculation: The network’s output is evaluated against the real goal values, and a loss
function is used to compute the difference. For a regression problem, the Mean Square
Error (MSE) is commonly used as the cost function.
Gradient Descent: Gradient descent is then used by the network to reduce the loss. To
lower the inaccuracy, weights are changed based on the derivative of the loss with respect
to each weight.
Adjusting weights: The weights are adjusted at each connection by applying this iterative
process, or backpropagation, backward across the network.
Training: During training with different data samples, the entire process of forward
propagation, loss calculation, and backpropagation is done iteratively, enabling the network
to adapt and learn patterns from the data.
23
makes it appropriate for a number of applications, such as regression and pattern
recognition.
Multilayer Perceptron (MLP): MLP is a type of feedforward neural network with three or
more layers, including an input layer, one or more hidden layers, and an output layer. It uses
nonlinear activation functions.
Recurrent Neural Network (RNN): An artificial neural network type intended for
sequential data processing is called a Recurrent Neural Network (RNN). It is appropriate
for applications where contextual dependencies are critical, such as time series prediction
and natural language processing, since it makes use of feedback loops, which enable
information to survive within the network.
24
FUTURE OF MACHINE LEARNING
Unlike a computer with a CPU, you’ll probably take weeks to perform the task, provided
you are a dog expert. In practice, computer vision has great potential in the medical world
and airport security that companies are already starting to explore!
One of the most beneficial advancements of machine learning has to do with understanding
target markets and their preferences. With the increased accuracy of a model, businesses can
now tailor their products and services according to specific needs using recommender
system algorithms.
Machine learning technology helps search engines optimize their output by analysing past
data, such as terms used, preferences, and interactions. To put it into perspective, Google
registers over 8.5 billion searches every day. With so much data at hand, Google algorithms
continue to learn and get better at returning relevant results. For many of you, that’s the
most familiar machine learning technology of our time.
This is another ongoing trend businesses around the globe employ. Chatbot technologies
contribute to improving marketing and customer service operations. You may have seen a
chatbot prompting you to ask a question. This is how these technologies learn—the more
you ask, the better they get.
In 2018, the South Korean car manufacturer KIA launched the Facebook Messenger and
chatbot Kian to its customers, boosting social media conversion rates up to 21%—that is 3
25
times higher than KIA’s official website. And that’s just one example of how powerful
machine learning technology can be.
Learn how to use ChatGPT effectively and acquire fundamental AI knowledge with our
course, Introduction to ChatGPT.
Many logistics and aviation companies see adopting machine learning technology as a way to increase
efficiency, safety, and estimated time of arrival (ETA) accuracy.
You will be surprised to know that the actual flying of a plane is predominantly automated
with the help of machine learning. Overall, businesses are largely interested to unearth
ML’s potential within the transportation industry, so that’s something to look out for in the near future.
26