Machine Learning
Machine Learning
Machine Learning
STUDENT NAME:
STUDENT ID:
MOBILE NUMBER:
Summative Feedback:
Internal verification:
A. INTRODUCTION
I work for BK, a software development company that produces client-server and web
applications. The company decided to expand its expertise in simulation software. Machine
learning is one of the disciplines that play an important role in this type of development. My
job is to conduct research and investigate new ventures and train staff to prepare.
Contents
A. INTRODUCTION .............................................................................................................................. 3
B. CONTENTS....................................................................................................................................... 4
Part3: Use Machine Learning To Determine Titanic Survivors ............................................................. 4
LO1 Analyse the theoretical foundation of machine learning to determine how an intelligent
machine works ................................................................................................................................. 15
P1 Analyse the types of learning problems. ................................................................................. 15
P2 Demonstrate the taxonomy of machine learning algorithms. ................................................ 19
M1 Evaluate the category of machine learning algorithms with appropriate examples ............. 21
D1 Critically evaluate why machine learning is essential to the design of intelligent machines. 26
LO2 Investigate the most popular and efficient machine learning algorithms used in industry .. 27
P3 Investigate a range of machine learning algorithms and how these algorithms solve the
learning problems. ........................................................................................................................ 27
P4 Demonstrate the efficiency of these algorithms by implementing them using an appropriate
programming language or machine learning tool. ....................................................................... 34
M2 Analyse these algorithms using an appropriate example to determine their power. ........... 38
LO3 Develop a machine learning application using an appropriate programming language or
machine learning tool for solving a real-world problem ................................................................ 39
P5 Chose an appropriate learning problem and prepare the training and test data sets in order
to implement a machine learning solution. .................................................................................. 39
P6 Implement a machine learning solution with a suitable machine learning algorithm and
demonstrate the outcome. ........................................................................................................... 42
M3 Test the machine learning application using a range of test data and explain each stages of
this activity. ................................................................................................................................... 46
D2 Critically evaluate the implemented learning solution and it's effectiveness in meeting end
user requirements......................................................................................................................... 55
LO4 Evaluate the outcome or the result of the application to determine the effectiveness of the
learning algorithm used in the application ..................................................................................... 57
P7 Discuss whether the result is balanced, under-fitting or over-fitting...................................... 57
P8 Analyse the result of the application to determine the effectiveness of the algorithm ......... 60
M4 Evaluate the effectiveness of the learning algorithm used in the application. ...................... 62
C. REFERENCES: ................................................................................................................................. 62
B. CONTENTS
Our approach to this machine learning implementation will use the following steps:
1. Perform an exploratory data analysis to see which of the variables we might want to
include in our model
2. Examine the baseline model, which is based on a single variable (sex) and yet provides
a survivability of 77%. Any model we generate must yield a probability of surviving
greater than 0.77.
3. Create a decision tree model to see whether we can use multiple variables to yield a
higher probability of survival.
5. Finally, we’ll compare the scores from each method, and analyze the efficacy of each
one.
The Titanic dataset given by Kaggle is parted into train and test records. The preparation record
contains a variable called Survived (addressing the quantity of survivors), which is our
objective. In the wake of downloading the dataset, you can play out a programmed Exploratory
Data Analysis (EDA) to experience the accessible factors. We will depend on the pandas-
profiling library, as displayed beneath:
The report gives an overall outline of the factors, including:
• Number of factors
• Missing qualities
• Cardinality
• Copy lines
For each mathematical variable, you will likewise get a histogram showing its worth and the
way that it relates with different factors. The subtleties given for the straight out values
incorporate the recurrence of every classification, as in the portrayal of the Sex variable
underneath:
Now that we know which variables are available, we can explore the data in detail in order to
find patterns that will help us define a useful model. Let’s start with plotting the relationship
between the Sex and Survived variables:
As you can see, more than 70% of female passengers survived, whereas less than 20% of their
male counterparts made it out alive. We can examine the ticket class (Pclass) versus
the Survived variable in the same way:
The contrast between the three classes is obvious, as practically 60% of travelers with top notch
tickets made due. This could give us knowledge into the departure orders, or even let us know
how the rafts were loaded up (with inclination being given to top notch travelers). We might
actually take a look at the connection between the port (S=Southampton, C=Cherbourg,
Q=Queenstown) where the travelers set out and their endurance:
Considering this arrangement of factors, we can think of various hypotheses regarding which
of them may be almost certain related with survivors. For instance, ladies with top of the line
tickets who left in Cherbourg appear to have a far more noteworthy possibility getting by than
a man with a second rate class ticket who set out at Southampton. Presently, we should continue
on toward our models.
The information for the opposition incorporates an example accommodation document that
accepts all female travelers made due. This is known as a benchmark model, and that implies
that the least complex model can be worked from the information without requiring any more
profound examination other than a little check. In this model, the level of female versus male
survivors upholds the theory that orientation is a decent indicator of endurance.
The score for this baseline model is over 0.7, and any new model that we submit should have
a better score.
The first step in building a good model is to make sure we start with clean, workable data, so
we’ll need to work on the dataset a bit. Since Sex is important but only has two possible
variables, we can transform M and F to numerical values using the scikit-learn preprocessing
class LabelEncoder, which assigns a unique integer to each column’s label in the DataFrame:
Recall our hypothesis that a top of the line lady from Cherbourg had a vastly improved
possibility of making due than a second rate class man from Southampton? Indeed, it very well
may be displayed as a choice tree, and you can prepare a class to make forecasts in view of this
sort of investigation utilizing scikit-learn. The thought here is that the calculation can induce a
few guidelines in view of the elements passed as preparing information, and afterward apply
those principles to make expectations when given new information:
• Sex
• Pclass
• Embarked
The Cross_val_score performs five emphasess in which it chooses a few information for
preparing and some for testing. It then, at that point, fits the DecisionTreeClassifier case and
assesses the outcomes utilizing the default metric of the calculation, which was the precision
(the quantity of good outcomes/complete tests performed), in this model. The outcomes were
superior to the pattern model in all cases, so presently we can prepare a model and anticipate
the outcomes with the test dataset:
As may be obvious, our code stacks the test dataset and plays out the very changes that we
utilized in the preparation information. Then, at that point, it makes expectations and recoveries
the outcomes in a CSV record, monitoring the kind of information in the forecast. Kaggle
assesses the inaccurate forecasts with an alternate sort of information than the one utilized in
preparing. The outcomes are somewhat more regrettable than they were with the single
variable:
4. How to use AutoML Tools to Create a Model
We truly need a more profound investigation to separate additional data from the information.
We additionally need to play with the calculations and the hyperparameters to tune the ideal
strategy for characterization appropriately. Yet, that will be a ton of work, so all things
considered we should allow the robotized tooling an opportunity to perceive the amount it can
further develop our gauge model.
The group behind the MLBox project collected an investigation for the Titanic dataset that
incorporates full preprocessing, calculation determination, hyperparameter tuning, preparing,
anticipating, and in any event, bundling the outcomes for accommodation:
In the above code:
1. Step 1 essentially utilizes a peruser to stack the preparation and test datasets.
2. Step 2 is the most convoluted, in light of the fact that it manages the determination cycle
that drops pointless factors, yet additionally deals with the floating factors. (A floating variable
changes its measurable properties from the preparation dataset to the test dataset. For more
data, look at this connection).
3. Step 3 enhances the hyperparameters by setting a hunt space and fitting the chose
calculation with the preparation information.
4. Step 4 plays out the forecasts and saves them in a mlbox.csv record.
As you can see, the predictions made by the AutoML model were slightly better than the
baseline model. The lesson is clear: the automatic model was better parametrized, but it still
lacks the feature engineering that a human could contribute.
All things considered, there's still much more you can do with the information given by the
Titanic dataset. Our scores from the standard model, the straightforward choice tree model,
and the AutoML model are alright, yet they could be incredibly worked on by working with
the highlights, calculations, and hyperparameters accessible in the Python libraries for AI.
Linear regression is perhaps one of the best-known and best understood algorithms in statistics
and machine learning.
Predictive modeling is primarily concerned with minimizing the model's errors or making the
most accurate predictions possible, at an explanatory cost.
A linear regression representation is an equation that describes a straight line that best describes
the relationship between the input variables (x) and the output variables (y), by finding specific
weights for the variables. The input variables are called the coefficients (B).
For example: y = B0 + B1 * x
We will predict y for a given variable x, and the standard of a linear regression algorithm is to
look for values for the coefficients B0 and B1.
Various techniques can be used to search for model computational lines from the data, such as
an algebraic solution for Least Squares and optimized Gradient descent.
2. Logistic Regression
Logistic regression is another algorithm borrowed by machine learning from the field of
statistics. This is the best method for binary classification problems (problems with two value
classes).
Logistic regression is like linear regression whose aim is to find values for the coefficients that
weight each input variable. Unlike linear regression, the output prediction is transformed using
a non-linear function called the logistic function.
The logistic function looks like a big S and will transform any value to 0-1. This is useful
because we can apply a rule to the output of the logistic function to increment values for 0 and
1 (eg IF less than 0.5 then output 1) and predict a clash of values.
Because of the way the model is learned, the predictions made by logistic regression can also
be used as the probability that a certain data instance is of class 0 or class 1. This can be useful
for problems problem when you need to give multiple reasons for a prediction.
The expression of LDA is quite simple. It includes the statistical properties of your data,
calculated for each class. For a single input variable, it includes:
Decision trees are an important type of algorithm for predictive machine learning modeling.
The representation of the decision tree model is a binary tree. This is your binary tree from
algorithms and data structures, nothing too fancy. Each node represents a single input variable
(x) and a split point on that variable (assuming the variable is numeric).
The leaf nodes of the tree contain an output variable (y) that is used for prediction. Predictions
are made by traversing the branches of the tree until reaching a leaf node and giving the class
of values at that leaf node.
Trees can learn very quickly and can be used to make predictions very quickly. They are often
accurate for many types of problems and your data does not require any special preparation.
5. K – Nearest Neighbors
The KNN algorithm is very simple and very efficient. The representative model for KNN is
the entire training data. Simple isn't it?
Prediction is made for a new data point by searching through the entire training set for the most
similar K examples (neighbors) and summarizing the output variable for K examples.
regression problem, this can be the average output variable, for classification problems this can
be the mode (or most common) of the class.
The simplest technique if your attributes are all the same size (all in inches, for example) is to
use the Euclidean distance, a number you can calculate directly based on the difference between
each input variable. into the.
may require a lot of memory or space to store all the data, but only perform computations (or
learn) when a forecast is needed, just in time. You can also update and organize training
exercises over time to keep predictions accurate.
1. Supervised learning
2. Unsupervised learning
3. Reinforcement learning
1) Supervised Learning
Supervised learning is a type of machine learning method in which we provide sample labeled
data to the machine learning system in order to train it, and on that basis, it predicts the output.
The system creates a model using labeled data to understand the datasets and learn about each
data, once the training and processing are done then we test the model by providing a sample
data to check whether it is predicting the exact output or not.
The goal of supervised learning is to map input data with the output data. The supervised
learning is based on supervision, and it is the same as when a student learns things in the
supervision of the teacher. The example of supervised learning is spam filtering.
o Classification
o Regression
2) Unsupervised Learning
Unsupervised learning is a learning method in which a machine learns without any supervision.
The training is provided to the machine with the set of data that has not been labeled, classified,
or categorized, and the algorithm needs to act on that data without any supervision. The goal
of unsupervised learning is to restructure the input data into new features or a group of objects
with similar patterns.
In unsupervised learning, we don't have a predetermined result. The machine tries to find useful
insights from the huge amount of data. It can be further classifieds into two categories of
algorithms:
o Clustering
o Association
3) Reinforcement Learning
The robotic dog, which automatically learns the movement of his arms, is an example of
Reinforcement learning.
We will practice right away with DataSet which is the property of flowers.
The input is a CSV file with 6 columns with the first column being the index, the middle 4
columns being the parameters of each attribute and the last column being the name of the
flower.
Our requirement is that through the data, I can predict the name of the flower based on the
similarity parameters.
Some preprocessing operations include: Delete the first row containing the header, delete the
first column (ordinal), then we will use numpy.random's shuffle method to shuffle the data.
The reason is that after merging we will take the last 50 rows as test data.
It will perform the calculation of the input 2-point data using the Euclidean formula. Simply
iterate over all the attributes corresponding to each point, calculate the sum of the squared
difference of each attribute, and finally return the square root of that sum. (If you find it difficult
to understand, you can review the formula above)
And the last function is to find the most common flower among the k species found:
We have the list labels as a collection of labels, then cycle through those to find the one that
occurs most often.
Finally, we'll iterate over the values in the test tuple to enter the lookup:
And this is the result:
You can see the results are relatively accurate. And to calculate the accuracy, I can add 1
variable to calculate like this:
Conclude:
In this article, we have learned about the KNN algorithm at the most basic level. Through this
example, it will help you to easily approach the KNN algorithm.
D1 Critically evaluate why machine learning is essential to the design of intelligent machines.
Why is machine learning important?
The reason machine learning is important is that it provides businesses with insight into trends
in customer behavior and business models, and aids the development of new products. Many
leading companies today, such as Facebook, Google, and Uber, make machine learning a
central part of their operations. Machine learning has become a significant competitive
differentiator for many companies.
Perhaps one of the most famous examples of machine learning in action is Facebook's news
feed-powered recommendation engine.
Facebook uses machine learning to personalize how each member's feed is delivered. If a
member frequently stops to read a particular group's posts, the recommendation engine will
start showing more of that group's activity earlier in the feed.
Behind the scenes, the engine is trying to reinforce known patterns in members' online
behavior. If a member changes the template and doesn't read posts from that group in the
coming weeks, the news feed will adjust accordingly.
There are basically two ways to group the Machine Learning algorithms that you may
come across in the field.
Basically, there are different ways an algorithm can model a problem. Also, it involves
interaction with experience. Although, it has nothing to do with how we want to call the
input data.
This way of organizing machine learning algorithms is very useful. Because it forces you to
think about the role of input data and the model preparation process. Also, choose the one that
best suits your problem for the best results.
Let's take a look at three different learning styles in machine learning algorithms:
a. Supervised Learning
In this supervised algorithm, the input data is called training data and has a label or known
result like spam/non-spam or stock price at a time.
In it, a model is prepared through a training process. In addition, this is necessary to make
predictions. And corrected when those predictions are wrong. The training process continues
until the model reaches the desired level.
b. Unsupervised learning
In this unsupervised learning, the input data is unlabelled and there is no known outcome.
We have to prepare a model by inferring the structures present in the input data. This could be
to extract general rules. It is possible through a mathematical process to reduce redundancy.
• Example problems are clustering, dimensionality reduction, and association rule learning
c. Semi-supervised learning
The data head is a mix of labeled and unlabeled examples.
There is a problem that the desired prediction. But the model has to learn the structure to
organize the data as well as the predictions to be made.
• Algorithm examples are extensions to different modes of operation. That the used by
used to the style of the dramatization model is not mounted.
I think this is the most useful way to group machine learning algorithms, and it's the
approach we're going to use here.
a. Regression Algorithm
Regression algorithms are concerned with modeling relationships between variables. That we
use to refine using the method of measuring error in the predictions made by the model.
b. Instance-based algorithm
This model is a decision problem with training data instances. That is considered important or
necessary for the model.
Such methods build a database of example data. And it needs to compare the new data with the
database. For comparison, we use a similarity measure to find the best match and make
predictions. For this reason, instance-based methods are also referred to as win-get-all and
memory-based learning. The focus is placed on the representation of the stored instances. So
the same measures are used between cases.
c. Regularization algorithm
An extension was made to another method. It's sanctioning models relative to their complexity.
Also, favoring simpler models is also better at generalizing.
I have listed Regularization algorithms here because they are popular, powerful. And in general
simple modifications are made to other methods.
The most popular Regularization algorithms in Machine Learning are:
• Ridge Regression
• Least Absolute Shrinkage and Selection Operator (LASSO)
• Elastic Net
• Least-Angle Regression (LARS)
The decision tree method builds a model of decisions. That is done based on the actual values
of the attributes in the data.
Decide fork in the tree structure until a prediction decision is made for a given profile. Decision
trees are trained on data for classification and regression problems. Decision trees are often fast
and accurate and are a big favorite in machine learning.
Simple linear regression is an approach for predicting a response using a single feature.
It is assumed that the two variables are linearly related. Hence, we try to find a linear function
that predicts the response value(y) as accurately as possible as a function of the feature or
independent variable(x).
Let us consider a dataset where we have a value of response y for every feature x:
Here,
and our task is to find the value of b_0 and b_1 for which J(b_0,b_1) is minimum!
Without going into the mathematical details, we present the result here:
Supervises Learning is an algorithm that predicts the output (outcome) of a new data (new
input) based on known (input, result) pairs. This data is or is called (data, label). Supervised
learning is a popular clustering algorithm in Engineering Machine Learning.
Supervised learning means when we have a set of input variables X = {x1 ,x2,…, xN} and a
set of corresponding labels Y = {y1, y2,…, yN}, where xi, yixi , yi are vectors. The pairs of
known data (xi, yi) ∈ X × Y are called the training data set. From these training data sets, we
need to create a function that maps each element from the set X to a corresponding
(approximately) element from the set YY:
Yi ≈ f(xi), ∀I = 1, 2,…, N
The goal is to approximate the function f very well so that when we have a new data X we can
compute the corresponding label from which y = f(x).
Example: In the recognition of capital letters. We have pictures of thousands of examples from
each digit written by many different people. After feeding this picture into the algorithm and
showing it knows each to a function whose input will be a digit. After receiving the new photo
the model has never seen. From there, it will predict what numbers the picture contains.
The above example is very similar to the way people learn as children. We give the alphabet
to any child and show them which is A and which is B. After many times of being taught,
children can completely recognize which is the letter A, and the letter B through the words.
books they have never seen.
Besides, there are many algorithms to detect faces in images. It can be seen that facebook has
used this algorithm to identify their faces in photos and ask users to tag friends...
This step is concerned with selecting the subset of all available data that you will be working
with. There is always a strong desire for including all data that is available, that the maxim
“more is better” will hold.
You need to consider what data you actually need to address the question or problem you are
working on. Make some assumptions about the data you require and be careful to record those
assumptions so that you can test them later if needed.
Below are some questions to help you think through this process:
• What is the extent of the data you have available? For example through time, database
tables, connected systems. Ensure you have a clear picture of everything that you can
use.
• What data is not available that you wish you had available? For example data that is
not recorded or cannot be recorded. You may be able to derive or simulate this data.
• What data don’t you need to address the problem? Excluding data is almost always
easier than including data. Note down which data you excluded and why.
It is only in small problems, like competition or toy datasets where the data has already been
selected for you.
After you have selected the data, you need to consider how you are going to use the data. This
preprocessing step is about getting the selected data into a form that you can work.
Three common data preprocessing steps are formatting, cleaning and sampling:
• Formatting: The data you have selected may not be in a format that is suitable for you
to work with. The data may be in a relational database and you would like it in a flat
file, or the data may be in a proprietary file format and you would like it in a relational
database or a text file.
• Cleaning: Cleaning data is the removal or fixing of missing data. There may be data
instances that are incomplete and do not carry the data you believe you need to address
the problem. These instances may need to be removed. Additionally, there may be
sensitive information in some of the attributes and these attributes may need to be
anonymized or removed from the data entirely.
• Sampling: There may be far more selected data available than you need to work with.
More data can result in much longer running times for algorithms and larger
computational and memory requirements. You can take a smaller representative sample
of the selected data that may be much faster for exploring and prototyping solutions
before considering the whole dataset.
It is very likely that the machine learning tools you use on the data will influence the
preprocessing you will be required to perform.
The final step is to transform the process data. The specific algorithm you are working with
and the knowledge of the problem domain will influence this step and you will very likely have
to revisit different transformations of your preprocessed data as you work on your problem.
Three common data transformations are scaling, attribute decompositions and attribute
aggregations. This step is also referred to as feature engineering.
• Scaling: The preprocessed data may contain attributes with a mixtures of scales for
various quantities such as dollars, kilograms and sales volume. Many machine learning
methods like data attributes to have the same scale such as between 0 and 1 for the
smallest and largest value for a given feature. Consider any feature scaling you may
need to perform.
• Decomposition: There may be features that represent a complex concept that may be
more useful to a machine learning method when split into the constituent parts. An
example is a date that may have day and time components that in turn could be split out
further. Perhaps only the hour of day is relevant to the problem being solved. consider
what feature decompositions you can perform.
• Aggregation: There may be features that can be aggregated into a single feature that
would be more meaningful to the problem you are trying to solve. For example, there
may be a data instances for each time a customer logged into a system that could be
aggregated into a count for the number of logins allowing the additional instances to be
discarded. Consider what type of feature aggregations could perform.
You can spend a lot of time engineering features from your data and it can be very beneficial
to the performance of an algorithm. Start small and build on the skills you learn.
P6 Implement a machine learning solution with a suitable machine learning algorithm and
demonstrate the outcome.
How to Implement a Machine Learning Algorithm
Implementing a machine learning algorithm in code can teach you a lot about the algorithm
and how it works.
In this article, you will learn how to effectively implement machine learning algorithms and
how to maximize your learning from these projects.
You can use the implementation of machine learning algorithms as a strategy for learning about
applied machine learning. You can also carve out a niche and skills in algorithm
implementation.
Algorithm Understanding
Implementing a machine learning algorithm will give you a deep and practical appreciation for
how the algorithm works. This knowledge can also help you to internalize the mathematical
description of the algorithm by thinking of the vectors and matrices as arrays and the
computational intuitions for the transformations on those structures.
Practical Skills
You are developing valuable skills when you implement machine learning algorithms by hand.
Skills such as mastery of the algorithm, skills that can help in the development of production
systems and skills that can be used for classical research in the field.
1. Select programming language: Select the programming language you want to use for
the implementation. This decision may influence the APIs and standard libraries you
can use in your implementation.
2. Select Algorithm: Select the algorithm that you want to implement from scratch. Be
as specific as possible. This means not only the class, and type of algorithm, but also
go as far as selecting a specific description or implementation that you want to
implement.
3. Select Problem: Select a canonical problem or set of problems you can use to test and
validate your implementation of the algorithm. Machine learning algorithms do not
exist in isolation.
4. Research Algorithm: Locate papers, books, websites, libraries and any other
descriptions of the algorithm you can read and learn from. Although, you ideally want
to have one keystone description of the algorithm from which to work, you will want
to have multiple perspectives on the algorithm. This is useful because the multiple
perspectives will help you to internalize the algorithm description faster and overcome
roadblocks from any ambiguities or assumptions made in the description (there are
always ambiguities in algorithm descriptions).
5. Unit Test: Write unit tests for each function, even consider test driven development
from the beginning of the project so that you are forced to understand the purpose and
expectations of each unit of code before you implement them.
Extensions
Once you have implemented an algorithm you can explore making improvements to the
implementation. Some examples of improvements you could explore include:
• Experimentation: You can expose many of the micro-decisions you made in the
algorithms implementation as parameters and perform studies on variations of those
parameters. This can lead to new insights and disambiguation of algorithm
implementations that you can share and promote.
• Optimization: You can explore opportunities to make the implementation more
efficient by using tools, libraries, different languages, different data structures, patterns
and internal algorithms. Knowledge you have of algorithms and data structures for
classical computer science can be very beneficial in this type of work.
• Specialization: You may explore ways of making the algorithm more specific to a
problem. This can be required when creating production systems and is a valuable skill.
Making an algorithm more problem specific can also lead to increases in efficiency
(such as running time) and efficacy (such as accuracy or other performance measures).
• Generalization: Opportunities can be created by making a specific algorithm more
general. Programmers (like mathematicians) are uniquely skilled in abstraction and you
may be able to see how the algorithm could be applied to more general cases of a class
of problem or other problems entirely.
Limitations
You can learn a lot by implementing machine learning algorithms by hand, but there are also
some downsides to keep in mind.
In this post I want to make some suggestions for intuitive algorithms from which you might
like to select your first machine learning algorithm to implement from scratch.
• Ordinary Least Squares Linear Regression: Use two dimensional data sets and
model x from y. Print out the error for each iteration of the algorithm. Consider plotting
the line of best fit and predictions for each iteration of the algorithm to see how the
updates affect the model.
• k-Nearest Neighbor: Consider using two dimensional data sets with 2 classes even
ones that you create with graph paper so that you can plot them. Once you can plot and
make predictions, you can plot the relationships created for each prediction decision the
model makes.
• Perceptron: Considered the simplest artificial neural network model and very similar
to a regression model. You can track and graph the performance of the model as it learns
a dataset.
Summary
In this article, you learned the benefits of implementing machine learning algorithms by hand.
You have learned that you can understand an algorithm, improve, and develop valuable skills
by following this path.
You've learned a simple process that you can follow and customize when implementing
multiple algorithms from scratch, and you've learned three algorithms that you can choose as
your first algorithm to implement from scratch. head.
M3 Test the machine learning application using a range of test data and explain each stages
of this activity.
7 steps to building a machine learning model.
To start, work with the owner of the project and make sure you understand the project's
objectives and requirements. The goal is to convert this knowledge into a suitable problem
definition for the machine learning project and devise a preliminary plan for achieving the
project's objectives. Key questions to answer include the following:
• Have all the necessary technical, business and deployment issues been addressed?
• What are the acceptable parameters for accuracy, precision and confusion matrix
values?
• What are the expected inputs to the model and the expected outputs?
• What are the characteristics of the problem being solved? Is this a classification,
regression or clustering problem?
• What is the "heuristic" -- the quick-and-dirty approach to solving the problem that
doesn't require machine learning? How much better than the heuristic does the
model need to be?
Setting specific, quantifiable goals will help realize measurable ROI from the machine learning
project instead of simply implementing it as a proof of concept that'll be tossed aside later.
In order for a machine learning project to go forward, you need to determine the feasibility of
the effort from a business, data and implementation standpoint.
Once you have a firm understanding of the business requirements and receive approval for the
plan, you can start to build a machine learning model, right? Wrong. Establishing the business
case doesn't mean you have the data needed to create the machine learning model.
A machine learning model is built by learning and generalizing from training data, then
applying that acquired knowledge to new data it has never seen before to make predictions and
fulfill its purpose.
The focus should be on data identification, initial collection, requirements, quality
identification, insights and potentially interesting aspects that are worth further investigation.
Here are some key questions to consider:
• Where are the sources of the data that's needed for training the model?
• How are the test set data and training set data being split?
• Are there special needs for accessing real-time data on edge devices or in more
difficult-to-reach places?
Answering these important questions helps you get a handle on the quantity and quality of data
as well as understand the type of data that's needed to make the model work.
In addition, you need to know how the model will operate on real-world data. For example,
will the model be used offline, operate in batch mode on data that's fed in and processed
asynchronously, or be used in real time, operating with high-performance requirements to
provide instant results? This information will also determine the sort of data needed and data
access requirements.
During this phase of the AI project, it's also important to know if any differences exist between
real-world data and training data as well as test data and training data, and what approach you
will take to validate and evaluate the model for performance.
The above chart outlines different kinds of data and sources needed for machine learning
projects.
Procedures during the data preparation, collection and cleansing process include the following:
• "Multiply" image-based data sets if they aren't sufficient enough for training.
• Select features that identify the most important dimensions and, if necessary, reduce
dimensions using a variety of techniques.
Data preparation and cleansing tasks can take a substantial amount of time. Surveys of machine
learning developers and data scientists show that the data collection and preparation steps can
take up to 80% of a machine learning project's time. As the saying goes, "garbage in, garbage
out." Since machine learning models need to learn from data, the amount of time spent on
prepping and cleansing is well worth it.
The above chart is an overview of the training and inference pipelines used in developing and
updating machine learning models.
• Select the right algorithm based on the learning objective and data requirements.
The resulting model can then be evaluated to determine whether it meets the business and
operational requirements.
In machine learning, an algorithm is the formula or set of instructions to follow to record
experience and improve learning over time. Depending on what type of machine learning
approach you are doing, different algorithms perform better than others.
Model evaluation can be considered the quality assurance of machine learning. Adequately
evaluating model performance against metrics and requirements determines how the model
will work in the real world.
Understanding the concepts of bias and variance helps you find the sweet spot for optimizing
the performance of your machine learning models.
Step 6. Put the model in operation and make sure it works well
When you're confident that the machine learning model can work in the real world, it's time to
see how it actually operates in the real world -- also known as "operationalizing" the model:
• Deploy the model with a means to continually measure and monitor its
performance.
• Develop a baseline or benchmark against which future iterations of the model can
be measured.
Depending on the requirements, model operationalization can range from simply generating a
report to a more complex, multi-endpoint deployment.
Successful AI projects iterate models to ensure the models continue to provide valuable,
reliable and desirable results in the real world.
Real-world data changes in unexpected ways. All of which might create new requirements for
deploying the model onto different endpoints or in new systems. The end may just be a new
beginning, so it's best to determine the following:
• the next requirements for the model's functionality;
• solutions to "model drift" or "data drift," which can cause changes in performance
due to changes in real-world data.
The surefire way to achieve success in machine learning model building is to continuously look
for improvements and better ways to meet evolving business requirements.
D2 Critically evaluate the implemented learning solution and it's effectiveness in meeting end
user requirements.
Evaluate algorithm complexity
The magnitude of the function T(n) is represented by a function O(f(n)) with T(n) and f(n)
being two non-negative real functions. If an algorithm has an execution time of T(n) = O(f(n))
then we say that the algorithm has an execution time of order f(n).
• Loop statement: Assuming the execution time of the body of the loop statement is
O(f(n)) and the maximum number of iterations of the loop is g(n), then The execution
time of the whole loop is O(f(n).g(n)). This applies to all for, while, and do...while
loops.
• After evaluating the execution time of all instructions in the program, the execution
time of the entire program will be the execution time of the statement with the largest
execution time.
3. Example analysis
Example: Analyze the execution time of the following program segment:
The execution time of the above program depends on the number n. We analyze in detail:
• Instructions (1), (2), (3), (5), (6), (8), (9) all have O(1) execution time.
• The for loop number 4 has n iterations and the statement in the body (which is a statement
(5)) has O(1) execution time. So the whole loop has an execution time of O(n). Same
with loop number (7).
LO4 Evaluate the outcome or the result of the application to determine the
effectiveness of the learning algorithm used in the application
P7 Discuss whether the result is balanced, under-fitting or over-fitting.
Tactics To Combat Imbalanced Training Data
You might think it’s silly, but collecting more data is almost always overlooked.
Can you collect more data? Take a second and think about whether you are able to gather more
data on your problem.
More examples of minor classes may be useful later when we look at resampling your dataset.
Accuracy is not the metric to use when working with an imbalanced dataset. We have seen that
it is misleading.
There are metrics that have been designed to tell you a more truthful story when working with
imbalanced classes.
You should consider the following performance metrics that can provide more insight into
model accuracy than traditional classifier accuracy:
You can change the dataset that you use to build your predictive model to have more balanced
data.
This change is called sampling your dataset and there are two main methods that you can use
to even-up the classes:
1. You can add copies of instances from the under-represented class called over-sampling
(or more formally sampling with replacement), or
2. You can delete instances from the over-represented class, called under-sampling.
4) Try Generate Synthetic Samples
A simple way to generate synthetic samples is to randomly sample the attributes from instances
in the minority class.
You could sample them empirically within your dataset or you could use a method like Naive
Bayes that can sample each attribute independently when run in reverse. You will have more
and different data, but the non-linear relationships between the attributes may not be preserved.
These approaches are often very easy to implement and fast to run. They are an excellent
starting point.
As always, I strongly advice you to not use your favorite algorithm on every problem. You
should at least be spot-checking a variety of different types of algorithms on a given problem.
For more on spot-checking algorithms, see my post “Why you should be Spot-Checking
Algorithms on your Machine Learning Problems”.
Penalized classification imposes an additional cost on the model for making classification
mistakes on the minority class during training. These penalties can bias the model to pay more
attention to the minority class.
Often the handling of class penalties or weights are specialized to the learning algorithm. There
are penalized versions of algorithms such as penalized-SVM and penalized-LDA.
There are fields of study dedicated to imbalanced datasets. They have their own algorithms,
measures and terminology.
Taking a look and thinking about your problem from these perspectives can sometimes shame
loose some ideas.
This shift in thinking considers the minor class as the outliers class which might help you think
of new ways to separate and classify samples.
Change detection is similar to anomaly detection except rather than looking for an anomaly it
is looking for a change or difference. This might be a change in behavior of a user as observed
by usage patterns or bank transactions.
Both of these shifts take a more real-time stance to the classification problem that might give
you some new ways of thinking about your problem and maybe some more techniques to try.
Really climb inside your problem and think about how to break it down into smaller problems
that are more tractable.
For example:
…resampling the unbalanced training set into not one balanced set, but several. Running an
ensemble of classifiers on these sets could produce a much better result than one classifier
alone
These are just a few of some interesting and creative ideas you could try.
For more ideas, check out these comments on the reddit post “Classification when 80% of my
training set is of one class“.
P8 Analyse the result of the application to determine the effectiveness of the algorithm
Analyze the efficiency of the algorithm
Usually, when solving a problem, we always tend to choose the "best" solution. But what is
"good"? In mathematics, a "good" solution can be a short, concise, or criterion-based solution
that uses easy-to-understand knowledge. As for algorithms in Informatics, it is based on the
following two criteria:
The above is the simplest implementation of the primality checking algorithm. This algorithm
needs N-2 checks in the loop. Let's say we need to test a number of about 25 digits, and we
have a supercomputer that can calculate a hundred trillion (1014) calculation per second, then
the total time needed to check is:
1025
≈ 3170 years!
1014 × 60 × 60 × 24 × 365
we just need to traverse from 2 to √𝑁) is it possible to know if N has any divisors in this passage:
Following this method, still an integer of about 25 digits but the check time will be reduced to:
√1025
≈ 0,03 seconds!
1014
C. REFERENCES:
• https://en.wikipedia.org/wiki/Linear_regression
• https://en.wikipedia.org/wiki/Simple_linear_regression
• http://scikit-learn.org/stable/auto_examples/linear_model/plot_ols.html
• http://www.statisticssolutions.com/assumptions-of-linear-regression/
• https://www.oreilly.com/library/view/machine-learning-pocket/9781492047537/