Module 1
Module 1
Azure AI Fundamentals
Article • 03/05/2024
Describe data and compute services for data science and machine learning
Describe model management and deployment capabilities in Azure Machine
Learning
Study resources
We recommend that you train and get hands-on experience before you take the exam.
We offer self-study options and classroom training as well as links to documentation,
community sites, and videos.
ノ Expand table
Get trained Choose from self-paced learning paths and modules or take an
instructor-led course
Change log
Key to understanding the table: The topic groups (also known as functional groups) are
in bold typeface followed by the objectives within each group. The table is a comparison
between the two versions of the exam skills measured and the third column describes
the extent of the changes.
ノ Expand table
Unit 1 of 9 S テ Ask Learn
" 100 XP
Introduction to AI
2 minutes
Welcome!
You're presumably here because you want to learn more about artificial intelligence (AI).
Maybe you've heard about AI in the media and want to know more; or maybe you're going to
be adopting AI at work or in school, and want to know more about what to expect.
This training module is designed to provide a high-level overview of some core capabilities of
artificial intelligence (AI) and give you an intuition of how they work. It's not a deeply technical
module, and we won't be writing any code or getting into the mathematical details of machine
learning models. Instead, we'll focus on understanding the kinds of things that AI can do, and
the basic principles on which it is based.
So, let's go! Move on to the next unit and we'll start our exploration of AI.
" 100 XP
Generative AI
4 minutes
Generative AI scenarios
Common uses of generative AI include:
" 100 XP
Computer vision
4 minutes
" 100 XP
Speech
4 minutes
Speech recognition is the ability of AI to "hear" and interpret speech. Usually this
capability takes the form of speech-to-text (where the audio signal for the speech is
transcribed into text).
Speech synthesis is the ability of AI to vocalize words as spoken language. Usually this
capability takes the form of text-to-speech in which information in text format is
converted into an audible signal.
AI speech technology is evolving rapidly to handle challenges like ignoring background
noise, detecting interruptions, and generating increasingly expressive and human-like
voices.
AI speech scenarios
Common uses of AI speech technologies include:
Personal AI assistants in phones, computers, or household devices with which you
interact by talking.
Automated transcription of calls or meetings.
Automating audio descriptions of video or text.
Automated speech translation between languages.
" 100 XP
NLP capabilities are based on models that are trained to do particular types of text
analysis.
While many natural language processing scenarios are handled by generative AI models
today, there are many common text analytics use cases where simpler NLP language
models can be more cost-effective.
Common NLP tasks include:
Entity extraction - identifying mentions of entities like people, places, organizations in
a document
Text classification - assigning document to a specific category.
Sentiment analysis - determining whether a body of text is positive, negative, or
neutral and inferring opinions.
Language detection - identifying the language in which text is written.
7 Note
In this module, we've used the term natural language processing (NLP) to describe AI
capabilities derive meaning from "ordinary" human language. You might also see this area
of AI referred to as natural language understanding (NLU).
" 100 XP
Key points to understand about using AI to extract data and insights include:
The basis for most document analysis solutions is a computer vision technology called
optical character recognition (OCR).
While an OCR model can identify the location of text in an image, more advanced
models can also interpret individual values in the document - and so extract specific
fields.
While most data extraction models have historically focused on extracting fields from
text-based forms, more advanced models that can extract information from audio
recording, images, and videos are becoming more readily available.
" 100 XP
Responsible AI
4 minutes
Fairness: AI models are trained using data, which is generally sourced and selected by
humans. There's substantial risk that the data selection criteria, or the data itself reflects
unconscious bias that may cause a model to produce discriminatory outputs. AI
developers need to take care to minimize bias in training data and test AI systems for
fairness.
Reliability and safety: AI is based on probabilistic models, it is not infallible. AI-powered
applications need to take this into account and mitigate risks accordingly.
Privacy and security: Models are trained using data, which may include personal
information. AI developers have a responsibility to ensure that the training data is kept
secure, and that the trained models themselves can't be used to reveal private personal
or organizational details.
Inclusiveness: The potential of AI to improve lives and drive success should be open to
everyone. AI developers should strive to ensure that their solutions don't exclude some
users.
Transparency: AI can sometimes seem like "magic", but it's important to make users
aware of how the system works and any potential limitations it may have.
Accountability: Ultimately, the people and organizations that develop and distribute AI
solutions are accountable for their actions. It's important for organizations developing AI
models and applications to define and apply a framework of governance to help ensure
that they apply responsible AI principles to their work.
Responsible AI examples
Some example of scenarios where responsible AI practices should be applied include:
" 200 XP
Knowledge check
Module assessment 3 minutes
2. A wildlife conservation app uses AI to locate one or more animals in photos. Which
computer vision capability is being used? *
Image classification
Object detection
" Correct. Object detection is used to identify the location of one or more objects in
an image.
Entity extraction
3. An AI application reads email aloud to a user. Which AI speech capability is being used? *
Speech recognition
Speech synthesis
" Correct. Speech synthesis is used to convert text into spoken language.
Sentiment analysis
Unit 1 of 12 S テ Ask Learn
" 100 XP
Introduction
1 minute
Machine learning is in many ways the intersection of two disciplines - data science and
software engineering. The goal of machine learning is to use data to create a predictive model
that can be incorporated into a software application or service. To achieve this goal requires
collaboration between data scientists who explore and prepare the data before using it to
train a machine learning model, and software developers who integrate the models into
applications where they're used to predict new data values (a process known as inferencing).
In this module, you'll explore some of the core concepts on which machine learning is based,
learn how to identify different kinds of machine learning model, and examine the ways in
which machine learning models are trained and evaluated. Finally, you'll learn how to use
Microsoft Azure Machine Learning to train and deploy a machine learning model, without
needing to write any code.
7 Note
Machine learning is based on mathematical and statistical techniques, some of which are
described at a high level in this module. Don't worry if you're not a mathematical expert
though! The goal of the module is to help you gain an intuition of how machine learning
works - we'll keep the mathematics to the minimum required to understand the core
concepts.
" 100 XP
Machine learning has its origins in statistics and mathematical modeling of data. The
fundamental idea of machine learning is to use data from past observations to predict
unknown outcomes or values. For example:
The proprietor of an ice cream store might use an app that combines historical sales and
weather records to predict how many ice creams they're likely to sell on a given day,
based on the weather forecast.
A doctor might use clinical data from past patients to run automated tests that predict
whether a new patient is at risk from diabetes based on factors like weight, blood
glucose level, and other measurements.
A researcher in the Antarctic might use past observations to automate the identification
of different penguin species (such as Adelie, Gentoo, or Chinstrap) based on
measurements of a bird's flippers, bill, and other physical attributes.
4. Now that the training phase is complete, the trained model can be used for inferencing.
The model is essentially a software program that encapsulates the function produced by
the training process. You can input a set of feature values, and receive as an output a
prediction of the corresponding label. Because the output from the model is a prediction
that was calculated by the function, and not an observed value, you'll often see the
output from the function shown as ŷ (which is rather delightfully verbalized as "y-hat").
" 100 XP
There are multiple types of machine learning, and you must apply the appropriate type
depending on what you're trying to predict. A breakdown of common types of machine
learning is shown in the following diagram.
Regression
Regression is a form of supervised machine learning in which the label predicted by the model
is a numeric value. For example:
The number of ice creams sold on a given day, based on the temperature, rainfall, and
windspeed.
The selling price of a property based on its size in square feet, the number of bedrooms
it contains, and socio-economic metrics for its location.
The fuel efficiency (in miles-per-gallon) of a car based on its engine size, weight, width,
height, and length.
Classification
Classification is a form of supervised machine learning in which the label represents a
categorization, or class. There are two common classification scenarios.
Binary classification
In binary classification, the label determines whether the observed item is (or isn't) an instance
of a specific class. Or put another way, binary classification models predict one of two mutually
exclusive outcomes. For example:
Whether a patient is at risk for diabetes based on clinical metrics like weight, age, blood
glucose level, and so on.
Whether a bank customer will default on a loan based on income, credit history, age, and
other factors.
Whether a mailing list customer will respond positively to a marketing offer based on
demographic attributes and past purchases.
In all of these examples, the model predicts a binary true/false or positive/negative prediction
for a single possible class.
Multiclass classification
Multiclass classification extends binary classification to predict a label that represents one of
multiple possible classes. For example,
In most scenarios that involve a known set of multiple classes, multiclass classification is used
to predict mutually exclusive labels. For example, a penguin can't be both a Gentoo and an
Adelie. However, there are also some algorithms that you can use to train multilabel
classification models, in which there may be more than one valid label for a single observation.
For example, a movie could potentially be categorized as both science fiction and comedy.
Group similar flowers based on their size, number of leaves, and number of petals.
Identify groups of similar customers based on demographic attributes and purchasing
behavior.
In some ways, clustering is similar to multiclass classification; in that it categorizes observations
into discrete groups. The difference is that when using classification, you already know the
classes to which the observations in the training data belong; so the algorithm works by
determining the relationship between the features and the known classification label. In
clustering, there's no previously known cluster label and the algorithm groups the data
observations based purely on similarity of features.
In some cases, clustering is used to determine the set of classes that exist before training a
classification model. For example, you might use clustering to segment your customers into
groups, and then analyze those groups to identify and categorize different classes of customer
(high value - low volume, frequent small purchaser, and so on). You could then use your
categorizations to label the observations in your clustering results and use the labeled data to
train a classification model that predicts to which customer category a new customer might
belong.
" 100 XP
Regression
12 minutes
Regression models are trained to predict numeric label values based on training data that
includes both features and known labels. The process for training a regression model (or
indeed, any supervised machine learning model) involves multiple iterations in which you use
an appropriate algorithm (usually with some parameterized settings) to train a model, evaluate
the model's predictive performance, and refine the model by repeating the training process
with different algorithms and parameters until you achieve an acceptable level of predictive
accuracy.
The diagram shows four key elements of the training process for supervised machine learning
models:
1. Split the training data (randomly) to create a dataset with which to train the model while
holding back a subset of the data that you'll use to validate the trained model.
2. Use an algorithm to fit the training data to a model. In the case of a regression model,
use a regression algorithm such as linear regression.
3. Use the validation data you held back to test the model by predicting labels for the
features.
4. Compare the known actual labels in the validation dataset to the labels that the model
predicted. Then aggregate the differences between the predicted and actual label values
to calculate a metric that indicates how accurately the model predicted for the validation
data.
After each train, validate, and evaluate iteration, you can repeat the process with different
algorithms and parameters until an acceptable evaluation metric is achieved.
Example - regression
Let's explore regression with a simplified example in which we'll train a model to predict a
numeric label (y) based on a single feature value (x). Most real scenarios involve multiple
feature values, which adds some complexity; but the principle is the same.
For our example, let's stick with the ice cream sales scenario we discussed previously. For our
feature, we'll consider the temperature (let's assume the value is the maximum temperature on
a given day), and the label we want to train a model to predict is the number of ice creams
sold that day. We'll start with some historic data that includes records of daily temperatures (x)
and ice cream sales (y):
ノ Expand table
51 1
52 0
67 14
65 14
70 23
69 20
72 23
75 26
73 22
81 30
78 26
83 36
ノ Expand table
51 1
65 14
69 20
72 23
75 26
81 30
To get an insight of how these x and y values might relate to one another, we can plot them as
coordinates along two axes, like this:
Now we're ready to apply an algorithm to our training data and fit it to a function that applies
an operation to x to calculate y. One such algorithm is linear regression, which works by
deriving a function that produces a straight line through the intersections of the x and y values
while minimizing the average distance between the line and the plotted points, like this:
The line is a visual representation of the function in which the slope of the line describes how
to calculate the value of y for a given value of x. The line intercepts the x axis at 50, so when x
is 50, y is 0. As you can see from the axis markers in the plot, the line slopes so that every
increase of 5 along the x axis results in an increase of 5 up the y axis; so when x is 55, y is 5;
when x is 60, y is 10, and so on. To calculate a value of y for a given value of x, the function
simply subtracts 50; in other words, the function can be expressed like this:
f(x) = x-50
You can use this function to predict the number of ice creams sold on a day with any given
temperature. For example, suppose the weather forecast tells us that tomorrow it will be 77
degrees. We can apply our model to calculate 77-50 and predict that we'll sell 27 ice creams
tomorrow.
But just how accurate is our model?
ノ Expand table
52 0
67 14
70 23
73 22
78 26
83 36
We can use the model to predict the label for each of the observations in this dataset based
on the feature (x) value; and then compare the predicted label (ŷ) to the known actual label
value (y).
Using the model we trained earlier, which encapsulates the function f(x) = x-50, results in the
following predictions:
ノ Expand table
Temperature (x) Actual sales (y) Predicted sales (ŷ)
52 0 2
67 14 17
70 23 20
73 22 23
78 26 28
83 36 33
We can plot both the predicted and actual labels against the feature values like this:
The predicted labels are calculated by the model so they're on the function line, but there's
some variance between the ŷ values calculated by the function and the actual y values from
the validation dataset; which is indicated on the plot as a line between the ŷ and y values that
shows how far off the prediction was from the actual value.
Regression evaluation metrics
Based on the differences between the predicted and actual values, you can calculate some
common metrics that are used to evaluate a regression model.
In the ice cream example, the mean (average) of the absolute errors (2, 3, 3, 1, 2, and 3) is 2.33.
In our ice cream example, the mean of the squared absolute values (which are 4, 9, 9, 1, 4, and
9) is 6.
R2 = 1- ∑(y-ŷ)2 ÷ ∑(y-ȳ)2
Don't worry too much if that looks complicated; most machine learning tools can calculate the
metric for you. The important point is that the result is a value between 0 and 1 that describes
the proportion of variance explained by the model. In simple terms, the closer to 1 this value
is, the better the model is fitting the validation data. In the case of the ice cream regression
model, the R2 calculated from the validation data is 0.95.
Iterative training
The metrics described above are commonly used to evaluate a regression model. In most real-
world scenarios, a data scientist will use an iterative process to repeatedly train and evaluate a
model, varying:
Feature selection and preparation (choosing which features to include in the model, and
calculations applied to them to help ensure a better fit).
Algorithm selection (We explored linear regression in the previous example, but there are
many other regression algorithms)
Algorithm parameters (numeric settings to control algorithm behavior, more accurately
called hyperparameters to differentiate them from the x and y parameters).
After multiple iterations, the model that results in the best evaluation metric that's acceptable
for the specific scenario is selected.
" 100 XP
Binary classification
12 minutes
Classification, like regression, is a supervised machine learning technique; and therefore follows
the same iterative process of training, validating, and evaluating models. Instead of calculating
numeric values like a regression model, the algorithms used to train classification models
calculate probability values for class assignment and the evaluation metrics used to assess
model performance compare the predicted classes to the actual classes.
Binary classification algorithms are used to train a model that predicts one of two possible
labels for a single class. Essentially, predicting true or false. In most real scenarios, the data
observations used to train and validate the model consist of multiple feature (x) values and a y
value that is either 1 or 0.
ノ Expand table
67 0
103 1
114 1
72 0
116 1
65 0
There are many algorithms that can be used for binary classification, such as logistic regression,
which derives a sigmoid (S-shaped) function with values between 0.0 and 1.0, like this:
7 Note
Despite its name, in machine learning logistic regression is used for classification, not
regression. The important point is the logistic nature of the function it produces, which
describes an S-shaped curve between a lower and upper value (0.0 and 1.0 when used for
binary classification).
The function produced by the algorithm describes the probability of y being true (y=1) for a
given value of x. Mathematically, you can express the function like this:
f(x) = P(y=1 | x)
For three of the six observations in the training data, we know that y is definitely true, so the
probability for those observations that y=1 is 1.0 and for the other three, we know that y is
definitely false, so the probability that y=1 is 0.0. The S-shaped curve describes the probability
distribution so that plotting a value of x on the line identifies the corresponding probability
that y is 1.
The diagram also includes a horizontal line to indicate the threshold at which a model based
on this function will predict true (1) or false (0). The threshold lies at the mid-point for y (P(y) =
0.5). For any values at this point or above, the model will predict true (1); while for any values
below this point it will predict false (0). For example, for a patient with a blood glucose level of
90, the function would result in a probability value of 0.9. Since 0.9 is higher than the threshold
of 0.5, the model would predict true (1) - in other words, the patient is predicted to have
diabetes.
ノ Expand table
66 0
107 1
112 1
71 0
Blood glucose (x) Diabetic? (y)
87 1
89 1
Applying the logistic function we derived previously to the x values results in the following
plot.
Based on whether the probability calculated by the function is above or below the threshold,
the model generates a predicted label of 1 or 0 for each observation. We can then compare
the predicted class labels (ŷ) to the actual class labels (y), as shown here:
ノ Expand table
Blood glucose (x) Actual diabetes diagnosis (y) Predicted diabetes diagnosis (ŷ)
66 0 0
107 1 1
112 1 1
71 0 0
Blood glucose (x) Actual diabetes diagnosis (y) Predicted diabetes diagnosis (ŷ)
87 1 0
89 1 1
This visualization is called a confusion matrix, and it shows the prediction totals where:
ŷ=0 and y=0: True negatives (TN)
ŷ=1 and y=0: False positives (FP)
ŷ=0 and y=1: False negatives (FN)
ŷ=1 and y=1: True positives (TP)
The arrangement of the confusion matrix is such that correct (true) predictions are shown in a
diagonal line from top-left to bottom-right. Often, color-intensity is used to indicate the
number of predictions in each cell, so a quick glance at a model that predicts well should
reveal a deeply shaded diagonal trend.
Accuracy
The simplest metric you can calculate from the confusion matrix is accuracy - the proportion of
predictions that the model got right. Accuracy is calculated as:
(TN+TP) ÷ (TN+FN+FP+TP)
In the case of our diabetes example, the calculation is:
(2+3) ÷ (2+1+0+3)
=5÷6
= 0.83
So for our validation data, the diabetes classification model produced correct predictions 83%
of the time.
Accuracy might initially seem like a good metric to evaluate a model, but consider this.
Suppose 11% of the population has diabetes. You could create a model that always predicts 0,
and it would achieve an accuracy of 89%, even though it makes no real attempt to
differentiate between patients by evaluating their features. What we really need is a deeper
understanding of how the model performs at predicting 1 for positive cases and 0 for negative
cases.
Recall
Recall is a metric that measures the proportion of positive cases that the model identified
correctly. In other words, compared to the number of patients who have diabetes, how many
did the model predict to have diabetes?
3 ÷ (3+1)
=3÷4
= 0.75
So our model correctly identified 75% of patients who have diabetes as having diabetes.
Precision
Precision is a similar metric to recall, but measures the proportion of predicted positive cases
where the true label is actually positive. In other words, what proportion of the patients
predicted by the model to have diabetes actually have diabetes?
TP ÷ (TP+FP)
For our diabetes example:
3 ÷ (3+0)
=3÷3
= 1.0
So 100% of the patients predicted by our model to have diabetes do in fact have diabetes.
F1-score
F1-score is an overall metric that combined recall and precision. The formula for F1-score is:
= 1.5 ÷ 1.75
= 0.86
Of course, if we were to change the threshold above which the model predicts true (1), it would
affect the number of positive and negative predictions; and therefore change the TPR and FPR
metrics. These metrics are often used to evaluate a model by plotting a received operator
characteristic (ROC) curve that compares the TPR and FPR for every possible threshold value
between 0.0 and 1.0:
The ROC curve for a perfect model would go straight up the TPR axis on the left and then
across the FPR axis at the top. Since the plot area for the curve measures 1x1, the area under
this perfect curve would be 1.0 (meaning that the model is correct 100% of the time). In
contrast, a diagonal line from the bottom-left to the top-right represents the results that
would be achieved by randomly guessing a binary label; producing an area under the curve of
0.5. In other words, given two possible class labels, you could reasonably expect to guess
correctly 50% of the time.
In the case of our diabetes model, the curve above is produced, and the area under the curve
(AUC) metric is 0.875. Since the AUC is higher than 0.5, we can conclude the model performs
better at predicting whether or not a patient has diabetes than randomly guessing.
" 100 XP
Multiclass classification
12 minutes
0: Adelie
1: Gentoo
2: Chinstrap
7 Note
As with previous examples in this module, a real scenario would include multiple feature
(x) values. We'll use a single feature to keep things simple.
ノ Expand table
167 0
172 0
225 2
197 1
189 1
232 2
158 0
Each algorithm produces a sigmoid function that calculates a probability value between 0.0
and 1.0. A model trained using this kind of algorithm predicts the class for the function that
produces the highest probability output.
Multinomial algorithms
As an alternative approach is to use a multinomial algorithm, which creates a single function
that returns a multi-valued output. The output is a vector (an array of values) that contains the
probability distribution for all possible classes - with a probability score for each class which
when totaled add up to 1.0:
f(x) =[P(y=0|x), P(y=1|x), P(y=2|x)]
An example of this kind of function is a softmax function, which could produce an output like
the following example:
[0.2, 0.3, 0.5]
The elements in the vector represent the probabilities for classes 0, 1, and 2 respectively; so in
this case, the class with the highest probability is 2.
Regardless of which type of algorithm is used, the model uses the resulting function to
determine the most probable class for a given set of features (x) and predicts the
corresponding class label (y).
ノ Expand table
165 0 0
171 0 0
205 2 1
Flipper length (x) Actual species (y) Predicted species (ŷ)
195 1 1
183 1 1
221 2 2
214 2 2
The confusion matrix for a multiclass classifier is similar to that of a binary classifier, except
that it shows the number of predictions for each combination of predicted (ŷ) and actual class
labels (y):
From this confusion matrix, we can determine the metrics for each individual class as follows:
ノ Expand table
To calculate the overall accuracy, recall, and precision metrics, you use the total of the TP, TN,
FP, and FN metrics:
Overall accuracy = (13+6)÷(13+6+1+1) = 0.90
Overall recall = 6÷(6+1) = 0.86
Overall precision = 6÷(6+1) = 0.86
The overall F1-score is calculated using the overall recall and precision metrics:
Overall F1-score = (2x0.86x0.86)÷(0.86+0.86) = 0.86
" 100 XP
Clustering
10 minutes
Clustering is a form of unsupervised machine learning in which observations are grouped into
clusters based on similarities in their data values, or features. This kind of machine learning is
considered unsupervised because it doesn't make use of previously known label values to train
a model. In a clustering model, the label is the cluster to which the observation is assigned,
based only on its features.
Example - clustering
For example, suppose a botanist observes a sample of flowers and records the number of
leaves and petals on each flower:
There are no known labels in the dataset, just two features. The goal is not to identify the
different types (species) of flower; just to group similar flowers together based on the number
of leaves and petals.
ノ Expand table
0 5
0 6
1 3
1 3
Leaves (x1) Petals (x2)
1 6
1 8
2 3
2 7
2 8
1. The feature (x) values are vectorized to define n-dimensional coordinates (where n is the
number of features). In the flower example, we have two features: number of leaves (x1)
and number of petals (x2). So, the feature vector has two coordinates that we can use to
conceptually plot the data points in two-dimensional space ([x1,x2])
2. You decide how many clusters you want to use to group the flowers - call this value k. For
example, to create three clusters, you would use a k value of 3. Then k points are plotted
at random coordinates. These points become the center points for each cluster, so
they're called centroids.
3. Each data point (in this case a flower) is assigned to its nearest centroid.
4. Each centroid is moved to the center of the data points assigned to it based on the mean
distance between the points.
5. After the centroid is moved, the data points may now be closer to a different centroid, so
the data points are reassigned to clusters based on the new closest centroid.
6. The centroid movement and cluster reallocation steps are repeated until the clusters
become stable or a predetermined maximum number of iterations is reached.
Average distance to cluster center: How close, on average, each point in the cluster is to
the centroid of the cluster.
Average distance to other center: How close, on average, each point in the cluster is to
the centroid of all other clusters.
Maximum distance to cluster center: The furthest distance between a point in the cluster
and its centroid.
Silhouette: A value between -1 and 1 that summarizes the ratio of distance between
points in the same cluster and points in different clusters (The closer to 1, the better the
cluster separation).
" 100 XP
Deep learning
12 minutes
Deep learning is an advanced form of machine learning that tries to emulate the way the
human brain learns. The key to deep learning is the creation of an artificial neural network that
simulates electrochemical activity in biological neurons by using mathematical functions, as
shown here.
ノ Expand table
Artificial neural networks are made up of multiple layers of neurons - essentially defining a
deeply nested function. This architecture is the reason the technique is referred to as deep
learning and the models produced by it are often referred to as deep neural networks (DNNs).
You can use deep neural networks for many kinds of machine learning problem, including
regression and classification, as well as more specialized models for natural language
processing and computer vision.
Just like other machine learning techniques discussed in this module, deep learning involves
fitting training data to a function that can predict a label (y) based on the value of one or more
features (x). The function (f(x)) is the outer layer of a nested function in which each layer of the
neural network encapsulates functions that operate on x and the weight (w) values associated
with them. The algorithm used to train the model involves iteratively feeding the feature
values (x) in the training data forward through the layers to calculate output values for ŷ,
validating the model to evaluate how far off the calculated ŷ values are from the known y
values (which quantifies the level of error, or loss, in the model), and then modifying the
weights (w) to reduce the loss. The trained model includes the final weight values that result in
the most accurate predictions.
The feature data (x) consists of some measurements of a penguin. Specifically, the
measurements are:
The label we're trying to predict (y) is the species of the penguin, and that there are three
possible species it could be:
Adelie
Gentoo
Chinstrap
This is an example of a classification problem, in which the machine learning model must
predict the most probable class to which an observation belongs. A classification model
accomplishes this by predicting a label that consists of the probability for each class. In other
words, y is a vector of three probability values; one for each of the possible classes: [P(y=0|x),
P(y=1|x), P(y=2|x)].
The process for inferencing a predicted penguin class using this network is:
1. The feature vector for a penguin observation is fed into the input layer of the neural
network, which consists of a neuron for each x value. In this example, the following x
vector is used as the input: [37.3, 16.8, 19.2, 30.0]
2. The functions for the first layer of neurons each calculate a weighted sum by combining
the x value and w weight, and pass it to an activation function that determines if it meets
the threshold to be passed on to the next layer.
3. Each neuron in a layer is connected to all of the neurons in the next layer (an architecture
sometimes called a fully connected network) so the results of each layer are fed forward
through the network until they reach the output layer.
4. The output layer produces a vector of values; in this case, using a softmax or similar
function to calculate the probability distribution for the three possible classes of penguin.
In this example, the output vector is: [0.2, 0.7, 0.1]
5. The elements of the vector represent the probabilities for classes 0, 1, and 2. The second
value is the highest, so the model predicts that the species of the penguin is 1 (Gentoo).
7 Note
While it's easier to think of each case in the training data being passed through the
network one at a time, in reality the data is batched into matrices and processed using
linear algebraic calculations. For this reason, neural network training is best performed on
computers with graphical processing units (GPUs) that are optimized for vector and
matrix manipulation.
" 100 XP
Microsoft Azure Machine Learning is a cloud service for training, deploying, and managing
machine learning models. It's designed to be used by data scientists, software engineers,
devops professionals, and others to manage the end-to-end lifecycle of machine learning
projects, including:
Centralized storage and management of datasets for model training and evaluation.
On-demand compute resources on which you can run machine learning jobs, such as
training a model.
Automated machine learning (AutoML), which makes it easy to run multiple training jobs
with different algorithms and parameters to find the best model for your data.
Visual tools to define orchestrated pipelines for processes such as model training or
inferencing.
Integration with common machine learning frameworks such as MLflow, which make it
easier to manage model training, evaluation, and deployment at scale.
Built-in support for visualizing and evaluating metrics for responsible AI, including model
explainability, fairness assessment, and others.
The screenshot shows the Metrics page for a trained model in Azure Machine Learning studio,
in which you can see the evaluation metrics for a trained multiclass classification model.
" 200 XP
Knowledge check
Module assessment 3 minutes
1. You want to create a model to predict the cost of heating an office building based on its
size in square feet and the number of employees working there. What kind of machine
learning problem is this? *
Regression
" Correct. Regression models predict numeric values.
Classification
Clustering
P Incorrect. Clustering models group similar items together.
2. You need to evaluate a classification model. Which metric can you use? *
To evaluate the aggregate difference between predicted and actual label values
" Correct. A loss function determines the overall variance, or loss, between
predicted and actual label values.
To calculate the cost of training a neural network rather than a statistical model
4. What does automated machine learning in Azure Machine Learning enable you to do? *
" 100 XP
Introduction
2 minutes
Artificial Intelligence (AI) is changing our world and there’s hardly an industry that hasn't been
affected. From better healthcare to online safety, AI is helping us to tackle some of society’s
biggest issues.
Azure AI services are a portfolio of AI capabilities that unlock automation for workloads in
language, vision, intelligent search, content generation, and much more. They are
straightforward to implement and don’t require specialist AI knowledge.
Organizations are using Azure AI services in innovative ways, such as within robots to
provide life-like companionship to older people by expressing happiness, concern, and even
laughter. In other use cases, scientists are using AI to protect endangered species by
identifying hard-to-find animals in images. This was previously time-consuming and error-
prone work, which the Azure AI Vision service can complete quickly and with a high degree of
accuracy, freeing scientists to do other work.
In this module you will learn what Azure AI services are, and how you can use them in your
own applications.
" 100 XP
Azure AI services are AI capabilities that can be built into web or mobile applications, in a way
that's straightforward to implement. These AI services include generative AI, image
recognition, natural language processing, speech, AI-powered search, and more. There are
over a dozen different services that can be used separately or together to add AI power to
applications.
Let's take a look at some examples of what can be done with Azure AI services. Azure OpenAI
service provides access to powerful, cutting-edge, generative AI models for application
development. Azure AI Content Safety service can be used to detect harmful content within
text or images, including violent or hateful content, and report on its severity. Azure AI
Language service can be used to summarize text, classify information, or extract key phrases.
Azure AI Speech service provides powerful speech to text and text to speech capabilities,
allowing speech to be accurately transcribed into text, or text to natural sounding voice audio.
7 Note
You can use multiple Azure AI services with Azure AI Foundry, a platform for AI
application development. Azure AI Foundry is covered in its own module: Introduction to
Azure AI Foundry.
Azure AI services are based on three principles that dramatically improve speed-to-market:
Prebuilt and ready to use
Accessed through APIs
Available and secure on Azure
7 Note
In some situations, AI has the potential to be used in ways that might compromise an
individual's privacy or rights. Microsoft has six Responsible AI principles to help ensure AI
services are ethical and fair. Because of this, certain Azure AI services are restricted to
ensure they're used responsibly.
" 100 XP
Azure AI services are cloud-based, and like all Azure services you need to create a resource to
use them. There are two types of AI service resources: multi-service or single-service. Your
development requirements and how you want costs to be billed determine the types of
resources you need.
Multi-service resource: a resource created in the Azure portal that provides access to
multiple Azure AI services with a single key and endpoint. Use the resource Azure AI
services when you need several AI services or are exploring AI capabilities. When you use
an Azure AI services resource, all your AI services are billed together.
Single-service resources: a resource created in the Azure portal that provides access to a
single Azure AI service, such as Speech, Vision, Language, etc. Each Azure AI service has a
unique key and endpoint. These resources might be used when you only require one AI
service or want to see cost information separately.
You can create a resource several ways, such as in the Azure portal .
How to use the Azure portal to create an Azure AI
services resource
To create an Azure AI services resource, sign in to the Azure portal with Contributor access
and select Create a resource. To create a multi-services resource search for Azure AI services in
the marketplace.
To create a single-service resource, search for the specific Azure AI service such as Face,
Language, or Content Safety, and so on. Most AI services have a free price tier to allow you to
explore their capabilities. After clicking Create for the resource you require, you will be
prompted to complete details of your subscription, the resource group to contain the
resource, the region, a unique name, and the price tier.
" 100 XP
Once you create an Azure AI service resource, you can build applications using the REST API,
software development kits (SDKs), or visual studio interfaces.
7 Note
In addition to studios for individual Azure AI services, Microsoft Azure has another portal,
Azure AI Foundry portal , which combines access to multiple Azure AI services and
generative AI models into one user interface.
As an example, let's look at the Azure AI Content Safety service, which identifies harmful text
or images. To explore what the Content Safety service does, let's use the Content Safety
Studio. First create either a multi-service Azure AI services resource, or a single-service Content
Safety resource. Then, on the Content Safety Studio Settings page, select the resource, and
select Use Resource. The AI service you created is now associated with the Content Safety
Studio, and ready to be used.
7 Note
When developers incorporate an AI service into their applications, they often use a SDK
or the REST API.
" 100 XP
You've now learned how to create an AI service resource. But how do you ensure that only
those authorized have access to your AI service? This is done through authentication, the
process of verifying that the user or service is who they say they are, and that they are
authorized to use the service.
Most Azure AI services are accessed through a RESTful API, although there are other ways. The
API defines what information is passed between two software components: the Azure AI
service and whatever is using it. Having a clearly defined interface is important, because if the
AI service is updated, your application must continue to work correctly.
Part of what an API does is to handle authentication. Whenever a request is made to use an AI
services resource, that request must be authenticated. For example, your subscription and AI
service resource is verified to ensure you have sufficient permissions to access it. This
authentication process uses an endpoint and a resource key.
The endpoint describes how to reach the AI service resource instance that you want to use, in
a similar way to the way a URL identifies a web site. When you view the endpoint for your
resource, it will look something like:
https://myaiservices29.cognitiveservices.azure.com/
The resource key protects the privacy of your resource. To ensure this is always secure, the key
can be changed periodically. You can view the endpoint and key in the Azure portal under
Resource Management and Keys and Endpoint.
When you write code to access the AI service, the keys and endpoint must be included in the
authentication header. The authentication header sends an authorization key to the service to
confirm that the application can use the resource. Learn more about different authentication
requests to Azure AI services here.
When you use a studio interface with Azure AI services, your credentials are authenticated
when you sign in, and a similar process is happening in the background.
" 200 XP
Knowledge check
Module assessment 3 minutes
1. An application requires three separate AI services. To see the cost for each separately,
what type of resource(s) should be created? *
2. After logging into one of the individual Azure studios, what is one task to complete to
begin using the individual studio? *