Machine Learning With ML - Net and C# - VB - Net - CodeProject
Machine Learning With ML - Net and C# - VB - Net - CodeProject
Net - CodeProject
Wikipedia_SentimentAnalysis.zip Wikipedia_SentimentAnalysis_VB.zip
YouGotSpam_Analysis.zip YouGotSpam_Analysis_VB.zip
LanguageDetection.zip LanguageDetection_VB.zip
IrisClassification.zip IrisClassification_VB.zip
IrisClassification_uint.zip IrisClassification_uint_VB.zip
Index
Introduction
Background
Overview
Binary Classification
Sentiment Analysis Wikipedia
Training Stage
Prediction Stage
Multiclass Classification
Language Detection
Iris Flower Classification
Version 1
Version 2
Conclusions
References
History
Introduction
This article introduces machine learning in .Net without touching the mathematical side of things. It will focus on essential work-
flows and their structures of the data handling in .Net to facilitate experimentation with what is available in an open source project
ML.Net version 0.2.
The ML.Net project version 0.2 is available for .Net Core 2.0 and .Net Standard 2.0 with support for x64 architecture only (Any
CPU will not compile right now). It should, thus, be applicable in any framework where .Net Standard 2.0 (eg.: .Net Framework
4.6.1) is applicable. The project is currently on review. APIs may change in the future.
Background
https://www.codeproject.com/Articles/1249611/Machine-Learning-with-ML-Net-and-Csharp-VB-Net?display=Print 1/17
28/06/2018 Machine Learning with ML.Net and C#/VB.Net - CodeProject
Learning the basics of machine learning has not not been easy, if you want to use an object oriented language like C# or VB.Net.
Because most of the time you have to learn Python, before anything else, and then you have to find tutorials with sample data that
can teach you more. Even looking at object oriented projects like [1] Accord.Net, Tensor.Flow, or CNTK is not easy because each of
them comes with their own API, way of implementing same things differently, and so on. I was thrilled by the presentations at Build
2018 [2] because they indicated that we can use a generic work-flow approach that allows us to evaluate the subject with local
data, local .Net programs, local models, and results, without having to use a service or another programming language like Python.
Overview
Machine learning is a subset of Artificial Intelligence (AI) and it can answer 5 types of questions [3]:
Supervised
2. Regression
Question: How much or how many?
Unsupervised
1. Ranking
Question: What should I do next?
2. Clustering
Question: How is this organized?
3. Anomaly Detection
Question: Is this weird?
Each type of question has many applications and in order to use the correct machine learning approach we must first try to
determine if we want to answer any of the given questions, and if so, whether we have the data to support it.
A binary classification can be applied when you want to answer a question with a true or false answer. You usually find yourself
sorting an item (an image or text) into one of 2 classes. Consider, for instance, the question of whether a customer feedback to your
recent survey is in a good mood (positive) or not (negative).
Answering this question with machine learning requires us to tag sample items (eg: images or text) as belonging to either group.
The normal work-flow requires two independent sets of tagged data:
1. A Training Data Set (to train the machine learning algorithm) and
2. An Evaluation Data Set (to measure the efficiency of the ml algorithm).
where "1" in the first column denotes a negative sentiment and "0" in the first column denotes a positive sentiment. The rule of
thumb is usually that the ml algorithm will work better if we have more training data. And it should also be assured that the training
data and the data used later on is clean and of high quality to support an effective algorithm.
The overall work-flow to determine an effective algorithm using KPIs is denoted by the diagram on the left side below, where we
(ideally) find a model that reflects our classification problem best. The model is not explained in more detail here. It is, in the case
of ML.Net, a zip file containing the persisted facts learned from the tagged training data.
https://www.codeproject.com/Articles/1249611/Machine-Learning-with-ML-Net-and-Csharp-VB-Net?display=Print 2/17
28/06/2018 Machine Learning with ML.Net and C#/VB.Net - CodeProject
The second independent data set for evaluation is used to determine KPIs towards the efficiency of the learned classification. This
steps estimates how good our algorithm will classify items in the future by comparing the result from the machine learning algorithm
with the available tag (without using the tag in the algorithm). A KPI to measure efficiency is, for example, the percentage of the
number of items classified right versus the wrong classified items. We can always go back to the training step, and adjust
parameters, or swap one algorithm for the other, if we find that our KPIs do not meet our expectations and we need ways to
optimize the model.
The Training stage hopefully ends with an effective model which can be applied in the second Prediction stage to classify each
item that we see in the future. This stage requires the model from the previous stage and the item to classify, which is used to
output a prediction of a classification (eg.: positive or negative sentiment).
This is a brief overview on the work-flow for attended machine learning. We need to understand this to work with the code samples
discussed in this article further below. So, lets look at each sample in turn.
Binary Classification
https://www.codeproject.com/Articles/1249611/Machine-Learning-with-ML-Net-and-Csharp-VB-Net?display=Print 3/17
28/06/2018 Machine Learning with ML.Net and C#/VB.Net - CodeProject
Wikipedia_SentimentAnalysis.zip
Training Stage
The work-flow discussed in the previous section is implemented to some degree in the demo projects attached to this article. The
demo project contains two executable projects:
Training and
Prediction
We get the following output if we compile and start the Training project:
We see here how the program first trains a model and evaluates the result in the second step.
The Training and the Prediction modul share a reference to the previously mentioned Model.zip file (most be copied manually -
see details below), a reference to ML.Net library, and a common model of the data input and the classification output defined in the
Models project:
[Column(ordinal: "1")]
public string Text;
}
<Column("1")>
Public Text As String
End Class
https://www.codeproject.com/Articles/1249611/Machine-Learning-with-ML-Net-and-Csharp-VB-Net?display=Print 4/17
28/06/2018 Machine Learning with ML.Net and C#/VB.Net - CodeProject
The properties defined in ClassificationData map each column into an input that is present in the text input file. The
Label column defines the item that contains the class definition that we want to train against for each line of text. The Text
property itself cannot be labeled as a "Feature" because it consists of more than one "column" (int the text file). This is why we need
to add the new TextFeaturizer("Features", "Text") line in the pipeline below to read the text into the input
data structure.
The ClassificationData is a rough description of our input and how it should be mapped into either a Label or a Feature.
Try removing the Label column definition, compile and execute, to verify that the system will throw an exception, if a column named
Label cannot be found in the input text.
The ClassPrediction states only one binary output result, which is expected to be a Boolean value that maps the input
to either binary class. This part is relevant to:
1. Verify whether learning was succesful (with known input in the test phase) and
2. Determine the actual classifiaction of the machine learning algorithm when using its model in production.
In summery: The ClassificationData is used to descripe how we whish to process the input (always consisting of Label
and Features), and the ClassPrediction maps this input to a learned result.
The Training pipline that consumes the text input via the ClassificationData definition looks like this:
pipeline.Add(new TextLoader(trainingDataFile).CreateFrom<ClassificationData>());
pipeline.Add(New StochasticDualCoordinateAscentBinaryClassifier())
The ML.Net framework comes with an extensible pipeline concept in which the different processing steps can be plugged in as
shown above. The TextLoader step loads the data from the text file and the TextFeaturizer step converts the given
input text into a feature vector, which is a numerical representation of the given text. This numerical representation is then fed into
something that the ML community calls a learner. The learner in this case a FastTreeBinaryClassifier.
https://www.codeproject.com/Articles/1249611/Machine-Learning-with-ML-Net-and-Csharp-VB-Net?display=Print 5/17
28/06/2018 Machine Learning with ML.Net and C#/VB.Net - CodeProject
A learner or trainer is the component that converts the numerical feature vectors into a model that can later be used to classify input
in the future. The documentation for these learners is currently under construction and not all learners are fully implemented and
tested, yet. For binary classifications, there are a few alternative learners that could be used as an alternative (just edit the
constructor as shown below):
Testing all of the above learners and shows that the StochasticDualCoordinateAscentBinaryClassifier
works best based on the measured KPIs. These KPIs are measured by an instance of the
BinaryClassificationMetrics which also offers other KPIs, such as, Precision and Recall. Note that you can still
analyse more KPIs, , such as, memory consumption and processing time, which are also not measured here. The test presented is
rather small and brief. We can also use different settings of individual learners which may still reveal significant improvements.
Being able to play with these different scenarious looks like an interesting excercise when we face the problem of an automated
classification of a large amount of items (text or images etc).
So, this in a nutshell how machine learning can work. The machine consumes data (text), converts it into numerical vectors, and
integrates the vectorized data into a model. The Model is the main output of the first stage. Lets have a look at the Classification
stage to understand the complete work-flow.
Prediction Stage
The Prediction stage is the modul that represents the code that runs in production and classifies data as new items arrive in the
system. This part is implemented in the PredictAsync method of the Prediction project in the Wikipedia_SentimentAnalysis
solution. The code for this method looks like this:
Console.WriteLine("Classification Predictions");
https://www.codeproject.com/Articles/1249611/Machine-Learning-with-ML-Net-and-Csharp-VB-Net?display=Print 6/17
28/06/2018 Machine Learning with ML.Net and C#/VB.Net - CodeProject
IEnumerable<(ClassificationData sentiment, ClassPrediction prediction)>
sentimentsAndPredictions =
predicts.Zip(predictions, (sentiment, prediction) => (sentiment, prediction));
return model;
}
Console.WriteLine()
Console.WriteLine("Classification Predictions")
Console.WriteLine("--------------------------")
Return model
End Function
The PredictionModel.ReadAsync line in the method loads the model from the file system into an in-memory
PredictionModel:
The model loaded is stored in the project's Learned folder. This Model.zip file has to be copied from the Training moduls
output whenever we find a significant improvement and want to take advantage of it in the Prediction modul.
Everything below the model loading code line evaluates input against the loaded model and outputs a predicted classification in the
last part of the method. You can use the interactive input prompt to test sample texts of your own and test on a small scale what
https://www.codeproject.com/Articles/1249611/Machine-Learning-with-ML-Net-and-Csharp-VB-Net?display=Print 7/17
28/06/2018 Machine Learning with ML.Net and C#/VB.Net - CodeProject
was learned and what was not. Remember that the learned data is usually cleaned (not the same as the original input) and that you
can only test on a small scale like this. A better and more reasonable test is probably to feed in the last n text lines from a real data
source, get their classification, and see if an independent reviewer has a closely matching result or not.
We have seen in this section how binary classification can work for sentiment analysis in a very "simple" scenario. But the real
strength of ml is that each type of question (here: Is this A or B?) can be applied in a wide variety of applications. Let's review one
more sample in the next section to review another binary classification use case.
The data for the sample discussed in this section is based on the codeproject article You've Got Spam. The aim of this binary
classification project is that we want to know determine whether a given text should be classified as Spam or not.
The source code attached to this article for the YouGotSpam_Analysis solution is almost identical to the code explained in the last
section. Even the execute-able projects are in fact almost identical. The only difference here is the datasource for training and test
evaluation, which is in this case the test data from the above codeproject article (see Data folder in Training project). The Training
project produces this output:
...which indicates that we can reach the same KPIs as indicated in the original article based on Python.
You can again use the Prediction project to load a model from the file system and test it with further input.
The projects discussed so far have shown that ML.Net can be helpful to determine a binary classification in an automated fashion.
But what if I want to classify more than 2 classes (eg: negative, neutral, and positive sentiment)? The next section examines
classifying data for this use case.
Multiclass Classification
Language Detection
LanguageDetection.zip
LanguageDetection_VB.zip
The data for the sample discussed in this section was downloaded from http://wortschatz.uni-leipzig.de and pre-processed
(removed quote character ") for improved parsing experience.
The multiclass classifications use case discussed here is the detection of a language based on a given text. Just imagine, you have
teams of social media agents and you are trying to relay online customer feedback (eg. chats), in different languages, to the correct
team that speaks that language.
The LanguageDetection solution attached to this section follows the structure of the previously discussed binary classification
samples. We have a Training project, a Prediction project, and a Models class library that is shared between the executables.
The Training project can be used to create a model with a particular learner. A successful model can then be copied from the
Training project to the Prediction project for consumation and multiclass classification of future input.
https://www.codeproject.com/Articles/1249611/Machine-Learning-with-ML-Net-and-Csharp-VB-Net?display=Print 8/17
28/06/2018 Machine Learning with ML.Net and C#/VB.Net - CodeProject
The Prediction project of the LanguageDetection solution differs in the way of how we define the LanguageClass property
in the ClassificationData class and the Class property in the ClassPrediction class. Both properties must be
of the data type float to support multible classifications:
[Column(ordinal: "1")]
public string Text;
}
<Column("1")>
Public Text As String
End Class
The input mapping in ClassificationData is the same as the one in the binary classification problem. The only difference
is not that we have more than two values in the Label column of the text file that is being fed in.
The output mapping in ClassPrediction is different because we now have to map to a float value in order to classify
towards more than one class.
pipeline.Add(new TextLoader(trainingDataFile).CreateFrom<ClassificationData>());
pipeline.Add(new Dictionarizer("Label"));
pipeline.Add(new TextFeaturizer("Features", "Text"));
pipeline.Add(new StochasticDualCoordinateAscentClassifier());
//pipeline.Add(new LogisticRegressionClassifier());
//pipeline.Add(new NaiveBayesClassifier());
// Train the pipeline based on the dataset that has been loaded, transformed.
PredictionModel<ClassificationData, ClassPrediction> model =
pipeline.Train<ClassificationData, ClassPrediction>();
return model;
}
https://www.codeproject.com/Articles/1249611/Machine-Learning-with-ML-Net-and-Csharp-VB-Net?display=Print 9/17
28/06/2018 Machine Learning with ML.Net and C#/VB.Net - CodeProject
pipeline.Add(New Dictionarizer("Label"))
pipeline.Add(New StochasticDualCoordinateAscentClassifier())
'pipeline.Add(new LogisticRegressionClassifier());
'pipeline.Add(new NaiveBayesClassifier());
' Train the pipeline based on the dataset that has been loaded, transformed.
Dim model As PredictionModel(Of ClassificationData, ClassPrediction) =
pipeline.Train(Of ClassificationData, ClassPrediction)()
The Dictionarizer("Label"); step maps each line with a labeled input value (0-5) into a bucket. The
PredictedLabelColumnOriginalValueConverter maps the predicted value (a vector) to the original values
datatype (a float).
PerClassLogLoss:
Class: 0 - 11.18%
Class: 1 - 4.08%
Class: 2 - 5.95%
Class: 3 - 10.43%
Class: 4 - 7.86%
Class: 5 - 5.52%
There are three multiclass classification learners in ML.Net Version 0.2. and their KPIs compare as indicated below:
new StochasticDualCoordinateAscentClassifier()
Accuracy Macro: 98.66%
Accuracy Micro: 98.66%
Top KAccuracy: 0.00%
LogLoss: 7.50%
https://www.codeproject.com/Articles/1249611/Machine-Learning-with-ML-Net-and-Csharp-VB-Net?display=Print 10/17
28/06/2018 Machine Learning with ML.Net and C#/VB.Net - CodeProject
PerClassLogLoss:
Class: 0 - 11.18%
Class: 1 - 4.08%
Class: 2 - 5.95%
Class: 3 - 10.43%
Class: 4 - 7.86%
Class: 5 - 5.52%
new LogisticRegressionClassifier()
Accuracy Macro: 98.52%
Accuracy Micro: 98.52%
Top KAccuracy: 0.00%
LogLoss: 8.63%
PerClassLogLoss:
Class: 0 - 13.32%
Class: 1 - 4.67%
Class: 2 - 7.09%
Class: 3 - 11.50%
Class: 4 - 8.98%
Class: 5 - 6.19%
new NaiveBayesClassifier()
Accuracy Macro: 96.58%
Accuracy Micro: 96.58%
Top KAccuracy: 0.00%
LogLoss: 3,453.88%
PerClassLogLoss:
Class: 0 - 3,453.88%
Class: 1 - 3,453.88%
Class: 2 - 3,453.88%
Class: 3 - 3,453.88%
Class: 4 - 3,453.88%
Class: 5 - 3,453.88%
So, this is how we can multiclass classify text based on one Feature input column. The same machine learning approach (binary of
multiclass) is also available for more than one feature input column, as we will see next.
The Multiclass classification problem discussed in this section is a well known reference test in the pattern recognition community
[4]. The original database was created by Ronald Fisher in 1936 and ML.Net sample reviewed here comes from the Get Started
section of the ML.Net tutorial. The problem statement is to create an algorithm that will accept an input vector of multiple float
values (representing properties of the flower), and the output of that algorithm should be the most likely name of the flower.
Doing this in ML.Net requires us to create an input mapping with more than one column:
[Column("1")]
public float SepalWidth;
[Column("2")]
public float PetalLength;
[Column("3")]
https://www.codeproject.com/Articles/1249611/Machine-Learning-with-ML-Net-and-Csharp-VB-Net?display=Print 11/17
28/06/2018 Machine Learning with ML.Net and C#/VB.Net - CodeProject
public float PetalWidth;
[Column("4")]
[ColumnName("Label")]
public string Label;
}
<Column("1")>
Public SepalWidth As Single
<Column("2")>
Public PetalLength As Single
<Column("3")>
Public PetalWidth As Single
<Column("4")>
<ColumnName("Label")>
Public Label As String
End Class
We are inputing a set of feature columns (namely SepalLength, SepalWidth, PetalLength, PetalWidth) that is later combined into
one Features vector. The Label is in this case a string that is given as last column to identify each data row during the training and
test stage of the algorithm.
The training code for this case is very similar to the previous section:
pipeline.Add(new TextLoader(trainingDataFile).CreateFrom<ClassificationData>(separator:
','));
pipeline.Add(new Dictionarizer("Label"));
pipeline.Add(new StochasticDualCoordinateAscentClassifier());
// Train the pipeline based on the dataset that has been loaded, transformed.
PredictionModel<ClassificationData, ClassPrediction> model =
pipeline.Train<ClassificationData, ClassPrediction>();
await model.WriteAsync(modelPath);
return model;
}
https://www.codeproject.com/Articles/1249611/Machine-Learning-with-ML-Net-and-Csharp-VB-Net?display=Print 12/17
28/06/2018 Machine Learning with ML.Net and C#/VB.Net - CodeProject
pipeline.Add(New Dictionarizer("Label"))
pipeline.Add(New StochasticDualCoordinateAscentClassifier())
' Train the pipeline based on the dataset that has been loaded, transformed.
' Saves the model we trained to a zip file.
Dim model As PredictionModel(Of ClassificationData, ClassPrediction) =
pipeline.Train(Of ClassificationData, ClassPrediction)()
Await model.WriteAsync(modelPath)
There are only two new things here. The raw input data is in this case a comma seperated list, therefore, we have to use a
separator: ',' parameter when loading the data from the text file in the pipeline. And we use the
ColumnConcatenator to convert the set of feature columns into one column consisting of a vector named Features.
The output is similar to what we've seen before (and we can again experiment with the other two learners as shown in the last
section):
PerClassLogLoss:
Class: 0 - 0.72%
Class: 1 - 10.62%
Class: 2 - 13.43%
Again, we can use the Training module of the IrisClassification solution to train different learners and settings and use the
Prediction module to predict new classifications with the previously determined model.
We have seen in this section how 4 input columns (SepalLength, SepalWidth, PetalLength, PetalWidth) are converted into one
vectorized Features column using the ColumnConcatenator converter. An equivalent approach that does not require us to
use a ColumnConcatenator in the pipeline code is to use the following input class definition:
https://www.codeproject.com/Articles/1249611/Machine-Learning-with-ML-Net-and-Csharp-VB-Net?display=Print 13/17
28/06/2018 Machine Learning with ML.Net and C#/VB.Net - CodeProject
[Column("0-3")]
[ColumnName("Features")]
[VectorType(4)] public float[] Features = new float[4];
[Column("4")]
[ColumnName("Label")]
public string Label;
}
<Column("0-3","Features")>
https://www.codeproject.com/Articles/1249611/Machine-Learning-with-ML-Net-and-Csharp-VB-Net?display=Print 14/17
28/06/2018 Machine Learning with ML.Net and C#/VB.Net - CodeProject
<VectorType(4)>
Public Features As Single() = New Single(3) {}
<Column("4", "Label")>
Public Label As String
End Class
But it is a bad practice to define the actual feature set through the ClassificationData definition as shown above. We
should, therefore, remove the [ColumnName("Features")] line and add the new
ColumnConcatenator("Features", nameof(Digit.Features)) in the pipeline code instead. This design
can give us more flexability when trying to evaluate different feature configurations.
Version 2
IrisClassification_uint.zip
IrisClassification_uint_VB.zip
Let us suppose for a moment that we do not want the machine learning algorithm to handle strings (since we really want to localize
that part of the application). It would be a better practice to go back to handling integer values and tread each integer as an index to
indicate the classification (type of flower). But how exactly can this be done? We can change the definition of the input and
predicted output like so:
[Column("0-3")]
[ColumnName("Features")]
[VectorType(4)] public float[] Features = new float[4];
[Column("4")]
[ColumnName("Label")]
public float Label;
}
<Column("0-3")>
<ColumnName("Features")>
<VectorType(4)>
Public Features As Single() = New Single(3) {}
<Column("4")>
<ColumnName("Label")>
Public Label As Single
End Class
Next, we will have to remove the PredictedLabelColumnOriginalValueConverter from the pipeline of the
previous solution and this is how we can adjust for this scenario (assuming we adjusted the data as well). This approach can also
be verified in the attached IrisClassification_uint solution.
Conclusions
The reviewed sample applications have shown that ML.Net has an interesting value (even at version 0.2) when it comes to
delivering machine learning into the .Net framework. We have seen that binary and multiclass classification can be based on
different types of input and output. This input and output always requires:
The data types of the inputs and outputs are flexible because converters can be used to convert values into numbers and vectors
when feeding the input into the engine and the same conversion is obviously possible when we have to interprete the result of a
classification.
I hope this article was useful and helps getting started with the subject. Give me your feedback in the form of stars or let me know if
you see essential things to add or change since this could help us all to develop this ML.Net based apps even further.
References
[1] Machine Learning Frameworks
ML.Net
Accord.Net
Tensor Flow
Microsoft Cognitive Toolkit (CNTK)
https://www.codeproject.com/Articles/1249611/Machine-Learning-with-ML-Net-and-Csharp-VB-Net?display=Print 16/17
28/06/2018 Machine Learning with ML.Net and C#/VB.Net - CodeProject
[2] ML.Net at Build 2018
Data Science for Beginners video 1: The 5 questions data science answers
Demystifying Machine and Deep Learning for Developers : Build 2018
ML.Net Gloassary
History
2018-Jun-18 Added VB.Net samples and minor bugfix in Wikipedia sample (changed default learner to best default learner
instead of using worst learner by default).
License
This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)
The Windows Presentation Foundation (WPF) and C# are among my favorites and so I developed Edi
and a few other projects on GitHub. I am normally an algorithms and structure type person but WPF has such interesting UI
sides that I cannot help myself but get into it.
https://de.linkedin.com/in/dirkbahle
Permalink | Advertise | Privacy | Cookies | Terms of Use | Mobile Article Copyright 2018 by Dirk Bahle
Web03-2016 | 2.8.180627.1 | Last Updated 28 Jun 2018 Everything else Copyright © CodeProject, 1999-2018
https://www.codeproject.com/Articles/1249611/Machine-Learning-with-ML-Net-and-Csharp-VB-Net?display=Print 17/17