Internship Report On Machine Learning
Internship Report On Machine Learning
ON
" Heart ”
BACHELOR OF ENGINEERING
IN
ELECTRONICS AND COMMUNICATION
ENGINEERING
Submitted By
Name: XYX
USN: 123
CERTIFICATE
Certified that training work entitled “Internship on Machine Learning Using Matlab” is a bonafied work
carried out in the seventh semester by “xyz” In partial fulfilment for the award of the degree of Bachelor of
Engineering in Electronics and Communication Engineering from VTU, Belgaum during the academic
year 2019-2020.
GUIDE H.O.D
ACKNOWLEDGEMENT
I would like to acknowledge the contributions of the following people without whose help and guidance this
report would not have been completed.
I acknowledge the counsel and support of our training coordinator, xyz, Assistant Professor, ECE
Department, with respect and gratitude, whose expertise, guidance, support, encouragement, and
enthusiasm has made this report possible. Their feedback vastly improved the quality of this report and
provided an enthralling experience. I am indeed proud and fortunate to be supported by him/her.
I am also thankful to Prof. (Dr.) XYZ, H.O.D of ECE Department, for his constant encouragement,
valuable suggestions and moral support and blessings.
Although it is not possible to name individually, I shall ever remain indebted to the faculty members for their
persistent support and cooperation extended during this work.
This acknowledgement will remain incomplete if I fail to express our deep sense of obligation to my parents
and God for their consistent blessings and encouragement.
Name: XYX
USN: 123
Chapter 1
The name machine learning was coined in 1959 by Arthur Samuel. Tom M. Mitchell provided a widely
quoted, more formal definition of the algorithms studied in the machine learning field: "A computer
program is said to learn from experience E with respect to some class of tasks T and performance
measure P if its performance at tasks in T, as measured by P, improves with experience E." This
follows Alan Turing's proposal in his paper "Computing Machinery and Intelligence", in which the question
"Can machines think?" is replaced with the question "Can machines do what we (as thinking entities) can
do?". In Turing’s proposal the characteristics that could be possessed by a thinking machine and the various
implications in constructing one are exposed.
The types of machine learning algorithms differ in their approach, the type of data they input and output, and
the type of task or problem that they are intended to solve. Broadly Machine Learning can be categorized
into four categories.
I. Supervised Learning
II. Unsupervised Learning
III. Reinforcement Learning
IV. Semi-supervised Learning
Machine learning enables analysis of massive quantities of data. While it generally delivers faster, more
accurate results in order to identify profitable opportunities or dangerous risks, it may also require additional
time and resources to train it properly.
Supervised Learning
Supervised Learning is a type of learning in which we are given a data set and we already know what are
correct output should look like, having the idea that there is a relationship between the input and output.
Basically, it is learning task of learning a function that maps an input to an output based on example input-
output pairs. It infers a function from labeled training data consisting of a set of training examples.
Supervised learning problems are categorized
Unsupervised Learning
Unsupervised Learning is a type of learning that allows us to approach problems with little or no idea what
our problem should look like. We can derive the structure by clustering the data based on a relationship
among the variables in data. With unsupervised learning there is no feedback based on prediction result.
Basically, it is a type of self-organized learning that helps in finding previously unknown patterns in data set
without pre-existing label.
Reinforcement Learning
Reinforcement learning is a learning method that interacts with its environment by producing actions and
discovers errors or rewards. Trial and error search and delayed reward are the most relevant characteristics
of reinforcement learning. This method allows machines and software agents to automatically determine the
ideal behavior within a specific context in order to maximize its performance. Simple reward feedback is
required for the agent to learn which action is best.
Semi-Supervised Learning
Semi-supervised learning fall somewhere in between supervised and unsupervised learning, since they use
both labeled and unlabeled data for training – typically a small amount of labeled data and a large amount of
unlabeled data. The systems that use this method are able to considerably improve learning accuracy.
Usually, semi-supervised learning is chosen when the acquired labeled data requires skilled and relevant
resources in order to train it / learn from it. Otherwise, acquiring unlabeled data generally doesn’t require
additional resources.
Literature Survey
Theory
A core objective of a learner is to generalize from its experience. The computational analysis of machine
learning algorithms and their performance is a branch of theoretical Information Technology known as
computational learning theory. Because training sets are finite and the future is uncertain, learning theory
usually does not yield guarantees of the performance of algorithms. Instead, probabilistic bounds on the
performance are quite common. The bias–variance decomposition is one way to quantify generalization
error.
For the best performance in the context of generalization, the complexity of the hypothesis should match the
complexity of the function underlying the data. If the hypothesis is less complex than the function, then the
model has underfit the data. If the complexity of the model is increased in response, then the training error
decreases. But if the hypothesis is too complex, then the model is subject to overfitting and generalization
will be poorer.
In addition to performance bounds, learning theorists study the time complexity and feasibility of learning.
In computational learning theory, a computation is considered feasible if it can be done in polynomial time.
There are two kinds of time complexity results. Positive results show that a certain class of functions can be
learned in polynomial time. Negative results show that certain classes cannot be learned in polynomial time.
While there has been much progress in machine learning, there are also challenges. For example, the
mainstream machine learning technologies are black-box approaches, making us concerned about their
potential risks. To tackle this challenge, we may want to make machine learning more explainable and
controllable. As another example, the computational complexity of machine learning algorithms is usually
very high and we may want to invent lightweight algorithms or implementations. Furthermore, in many
domains such as physics, chemistry, biology, and social sciences, people usually seek elegantly simple
equations (e.g., the Schrödinger equation) to uncover the underlying laws behind various phenomena.
Machine learning takes much more time. You have to gather and prepare data, then train the algorithm.
There are much more uncertainties. That is why, while in traditional website or application development an
experienced team can estimate the time quite precisely, a machine learning project used for example to
provide product recommendations can take much less or much more time than expected. Why? Because
even the best machine learning engineers don’t know how the deep learning networks will behave when
analyzing different sets of data. It also means that the machine learning engineers and data scientists cannot
guarantee that the training process of a model can be replicated.
Applications of Machine Learning
Web Search Engine: One of the reasons why search engines like google, bing etc work so well is
because the system has learnt how to rank pages through a complex learning algorithm.
Photo tagging Applications: Be it Facebook or any other photo tagging application, the ability to
tag friends makes it even more happening. It is all possible because of a face recognition algorithm
that runs behind the application.
Spam Detector: Our mail agent like Gmail or Hotmail does a lot of hard work for us in classifying
the mails and moving the spam mails to spam folder. This is again achieved by a spam classifier
running in the back end of mail application.
Database Mining for growth of automation: Typical applications include Web-click data for better
UX, Medical records for better automation in healthcare, biological data and many more.
Applications that cannot be programmed: There are some tasks that cannot be programmed as the
computers we use are not modelled that way. Examples include Autonomous Driving, Recognition
tasks from unordered data (Face Recognition/ Handwriting Recognition), Natural language
Processing, computer Vision etc.
Understanding Human Learning: This is the closest we have understood and mimicked the
human brain. It is the start of a new revolution, The real AI. Now, after a brief insight lets come to a
more formal definition of Machine Learning
Future Scope
Future of Machine Learning is as vast as the limits of human mind. We can always keep learning, and
teaching the computers how to learn. And at the same time, wondering how some of the most complex
machine learning algorithms have been running in the back of our own mind so effortlessly all the time.
There is a bright future for machine learning. Companies like Google, Quora, and Facebook hire people with
machine learning. There is intense research in machine learning at the top universities in the world. The
global machine learning as a service market is rising expeditiously mainly due to the Internet revolution. The
process of connecting the world virtually has generated vast amount of data which is boosting the adoption
of machine learning solutions. Considering all these applications and dramatic improvements that ML has
brought us, it doesn't take a genius to realize that in coming future we will definitely see more advanced
applications of ML, applications that will stretch the capabilities of machine learning to an unimaginable
level.
Objectives
Main objectives of training were to learn:
Methodologies
There were several facilitation techniques used by the trainer which included question and answer,
brainstorming, group discussions, case study discussions and practical implementation of some of the topics
by trainees on flip charts and paper sheets. The multitude of training methodologies was utilized in order to
make sure all the participants get the whole concepts and they practice what they learn, because only
listening to the trainers can be forgotten, but what the trainees do by themselves they will never forget. After
the post-tests were administered and the final course evaluation forms were filled in by the participants, the
trainer expressed his closing remarks and reiterated the importance of the training for the trainees in their
daily activities and their readiness for applying the learnt concepts in their assigned tasks. Certificates of
completion were distributed among the participants at the end.
Chapter 2
Technology Implemented
7
Matlab
What Is MATLAB?
MATLAB is a high-performance language for technical computing. It integrates computation,
visualization, and programming in an easy-to-use environment where problems and solutions are
expressed in familiar mathematical notation. Typical uses include:
Math and computation
Algorithm development
Modeling, simulation, and prototyping
Data analysis, exploration, and visualization
Scientific and engineering graphics
Application development, including Graphical User Interface building.
MATLAB is an interactive system whose basic data element is an array that does not require
dimensioning. This allows you to solve many technical computing problems, especially those with
matrix and vector formulations, in a fraction of the time it would take to write a program in a scalar no
interactive language such as C or Fortran.
The name MATLAB stands for matrix laboratory. MATLAB was originally written to provide easy
access to matrix software developed by the LINPACK and EISPACK projects, which together
represent the state-of-the-art in software for matrix computation.
MATLAB has evolved over a period of years with input from many users. In university environments,
it is the standard instructional tool for introductory and advanced courses in mathematics, engineering,
and science. In industry, MATLAB is the tool of choice for high-productivity research, development,
and analysis.
MATLAB features a family of application-specific solutions called toolboxes. Very important to most
users of MATLAB, toolboxes allow you to learn and apply specialized technology. Toolboxes are
comprehensive collections of MATLAB functions (M-files) that extend the MATLAB environment to
solve particular classes of problems. Areas in which toolboxes are available include signal processing,
control systems, neural networks, fuzzy logic, wavelets, simulation, and many others.
Handle Graphics.
This is the MATLAB graphics system. It includes high-level commands for two-dimensional and three-
dimensional data visualization, image processing, animation, and presentation graphics. It also includes
low-level commands that allow you to fully customize the appearance of graphics as well as to build
complete Graphical User Interfaces on your MATLAB applications.
MATLAB makes machine learning easy. With tools and functions for handling big data, as well as apps
to make machine learning accessible, MATLAB is an ideal environment for applying machine learning
to your data analytics.
With MATLAB, engineers and data scientists have immediate access to prebuilt functions, extensive
toolboxes, and specialized apps for classification, regression, and clustering.
MATLAB lets:
Extract features from signals and images using established manual and automated methods.
Compare approaches such as logistic regression, classification trees, support vector machines,
ensemble methods, and deep learning.
Apply AutoML and other model refinement and reduction techniques to create optimized models
Integrate machine learning models into enterprise systems, clusters, and clouds, and target models
to real-time embedded hardware.
Machine Learning algorithms don’t work so well with processing raw data. Before we can feed such data
to an ML algorithm, we must preprocess it. We must apply some transformations on it. With data
preprocessing, we convert raw data into a clean data set. To perform data this, there are 7 techniques -
1. Rescaling Data -
For data with attributes of varying scales, we can rescale attributes to possess the same scale. We rescale
attributes into the range 0 to 1 and call it normalization.
2. Standardizing Data -
With standardizing, we can take attributes with a Gaussian distribution and different means and standard
deviations and transform them into a standard Gaussian distribution with a mean of 0 and a standard
deviation of 1.
3. Normalizing Data -
In this task, we rescale each observation to a length of 1 (a unit norm). For this, we use the Normalizer
class.
4. Binarizing Data -
Using a binary threshold, it is possible to transform our data by marking the values above it 1 and those
equal to or below it, 0. For this purpose, we use the Binarizer class.
5. Mean Removal-
We can remove the mean from each feature to center it on zero.
7. Label Encoding -
Some labels can be words or numbers. Usually, training data is labelled with words to make it readable.
Label encoding converts word labels into numbers to let algorithms work on them.
There are many types of Machine Learning Algorithms specific to different use cases. As we work with
datasets, a machine learning algorithm works in two stages. We usually split the data around 20%-80%
between testing and training stages. Under supervised learning, we split a dataset into a training data and test
data in Matlab ML. Followings are the Algorithms of Machine Learning -
1. Decision Tree -
A decision tree falls under supervised Machine Learning Algorithms in Matlab and comes of use for both
classification and regression- although mostly for classification. This model takes an instance, traverses the
tree, and compares important features with a determined conditional statement. Whether it descends to the
left child branch or the right depends on the result. Usually, more important features are closer to the root.
Decision Tree, a Machine Learning algorithm in Matlab can work on both categorical and continuous
dependent variables. Here, we split a population into two or more homogeneous sets. Tree models where the
target variable can take a discrete set of values are called classification trees; in these tree structures, leaves
represent class labels and branches represent conjunctions of features that lead to those class labels. Decision
trees where the target variable can take continuous values (typically real numbers) are called regression
trees.
4. kNN Algorithm -
This is a Matlab Machine Learning algorithm for classification and regression- mostly for classification.
This is a supervised learning algorithm that considers different centroids and uses a usually Euclidean
function to compare distance. Then, it analyzes the results and classifies each point to the group to optimize
it to place with all closest points to it. It classifies new cases using a majority vote of k of its neighbors. The
case it assigns to a class is the one most common among its K nearest neighbors. For this, it uses a distance
function. k-NN is a type of instance-based learning, or lazy learning, where the function is only
approximated locally and all computation is deferred until classification. k-NN is a special case of a variable-
bandwidth, kernel density "balloon" estimator with a uniform kernel.
Chapter 3
Result Discussion
Result
This training has introduced us to Machine Learning. Now, we know that Machine Learning is a technique
of training machines to perform the activities a human brain can do, albeit bit faster and better than an
average human-being. Today we have seen that the machines can beat human champions in games such as
Chess, Mahjong, which are considered very complex. We have seen that machines can be trained to perform
human activities in several areas and can aid humans in living better lives. Machine learning is quickly
growing field in Information Technology. It has applications in nearly every other field of study and is
already being implemented commercially because machine learning can solve problems too difficult or time
consuming for humans to solve. To describe machine learning in general terms, a variety models are used to
learn patterns in data and make accurate predictions based on the patterns it observes.
Machine Learning can be a Supervised or Unsupervised. If we have a lesser amount of data and clearly
labelled data for training, we opt for Supervised Learning. Unsupervised Learning would generally give
better performance and results for large data sets. If we have a huge data set easily available, we go for deep
learning techniques. We also have learned Reinforcement Learning and Deep Reinforcement Learning. We
now know what Neural Networks are, their applications and limitations. Specifically, we have developed a
thought process for approaching problems that machine learning works so well at solving. We have learnt
how machine learning is different than descriptive statistics.
Finally, when it comes to the development of machine learning models of our own, we looked at the choices
of various development languages, IDEs and Platforms. Next thing that we need to do is start learning and
practicing each machine learning technique. The subject is vast, it means that there is width, but if we
consider the depth, each topic can be learned in a few hours. Each topic is independent of each other. We
need to take into consideration one topic at a time, learn it, practice it and implement the algorithm/s in it
using a language choice of yours. This is the best way to start studying Machine Learning. Practicing one
topic at a time, very soon we can acquire the width that is eventually required of a Machine Learning expert.
Chapter 4
Project Report
Overview-
A dataset related to adult income is given. This project classifies whether a person will be able to earn more
than 50,000 or not.
The Dataset is collected/Created the table below shows the sample of dataset.
Result-
Our project successfully classifies people based on salary with 83.43 % Accuracy
Every coin has two faces, each face has its own property and features. It’s time to uncover the faces of ML.
A very powerful tool that holds the potential to revolutionize the way things work.
3. Continuous Improvement -
As ML algorithms gain experience, they keep improving in accuracy and efficiency. This lets them make
better decisions. Say we need to make a weather forecast model. As the amount of data, we have keeps
growing, our algorithms learn to make more accurate predictions faster.
5. Wide Applications -
We could be an e-seller or a healthcare provider and make ML work for us. Where it does apply, it holds the
capability to help deliver a much more personal experience to customers while also targeting the right
customers.
Disadvantages of Machine Learning
With all those advantages to its powerfulness and popularity, Machine Learning isn’t perfect. The following
factors serve to limit it:
1. Data Acquisition -
Machine Learning requires massive data sets to train on, and these should be inclusive/unbiased, and of good
quality. There can also be times where they must wait for new data to be generated.
3. Interpretation of Results -
Another major challenge is the ability to accurately interpret results generated by the algorithms. We must
also carefully choose the algorithms for your purpose.
4. High error-susceptibility -
Machine Learning is autonomous but highly susceptible to errors. Suppose you train an algorithm with data
sets small enough to not be inclusive. You end up with biased predictions coming from a biased training set.
This leads to irrelevant advertisements being displayed to customers. In the case of ML, such blunders can
set off a chain of errors that can go undetected for long periods of time. And when they do get noticed, it
takes quite some time to recognize the source of the issue, and even longer to correct it.
References
https://expertsystem.com/
https://www.geeksforgeeks.org/
https://www.wikipedia.org/
https://www.coursera.org/learn/machine-learning
https://machinelearningmastery.com/
https://towardsdatascience.com/machine-learning/home