Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
2 views

Module 1

The document serves as a study guide for the AI-900: Microsoft Azure AI Fundamentals exam, detailing key topics such as AI workloads, machine learning principles, and features of computer vision, NLP, and generative AI on Azure. It emphasizes the importance of responsible AI practices, including fairness, reliability, privacy, inclusiveness, transparency, and accountability. Additionally, it provides resources for training and hands-on experience to prepare for the exam.

Uploaded by

aishux07
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Module 1

The document serves as a study guide for the AI-900: Microsoft Azure AI Fundamentals exam, detailing key topics such as AI workloads, machine learning principles, and features of computer vision, NLP, and generative AI on Azure. It emphasizes the importance of responsible AI practices, including fairness, reliability, privacy, inclusiveness, transparency, and accountability. Additionally, it provides resources for training and hands-on experience to prepare for the exam.

Uploaded by

aishux07
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 65

Study guide for Exam AI-900: Microsoft

Azure AI Fundamentals
Article • 03/05/2024

Describe Artificial Intelligence workloads and considerations (15–20%)


Describe fundamental principles of machine learning on Azure (20–25%)

Describe features of computer vision workloads on Azure (15–20%)

Describe features of Natural Language Processing (NLP) workloads on Azure (15–


20%)

Describe features of generative AI workloads on Azure (15–20%)

 Describe Artificial Intelligence workloads and


considerations (15–20%)

Identify features of common AI workloads


Identify features of content moderation and personalization workloads

Identify computer vision workloads

Identify natural language processing workloads


Identify knowledge mining workloads

Identify document intelligence workloads


Identify features of generative AI workloads

Identify guiding principles for responsible AI


Describe considerations for fairness in an AI solution

Describe considerations for reliability and safety in an AI solution

Describe considerations for privacy and security in an AI solution


Describe considerations for inclusiveness in an AI solution

Describe considerations for transparency in an AI solution


Describe considerations for accountability in an AI solution

Describe fundamental principles of machine learning on


Azure (20–25%)

Identify common machine learning techniques


Identify regression machine learning scenarios
Identify classification machine learning scenarios

Identify clustering machine learning scenarios

Identify features of deep learning techniques

Describe core machine learning concepts


Identify features and labels in a dataset for machine learning
Describe how training and validation datasets are used in machine learning

Describe Azure Machine Learning capabilities


Describe capabilities of automated machine learning

Describe data and compute services for data science and machine learning
Describe model management and deployment capabilities in Azure Machine
Learning

Describe features of computer vision workloads on Azure


(15–20%)

Identify common types of computer vision solution


Identify features of image classification solutions
Identify features of object detection solutions

Identify features of optical character recognition solutions


Identify features of facial detection and facial analysis solutions
Identify Azure tools and services for computer vision tasks
Describe capabilities of the Azure AI Vision service

Describe capabilities of the Azure AI Face detection service

Describe features of Natural Language Processing (NLP)


workloads on Azure (15–20%)

Identify features of common NLP Workload Scenarios


Identify features and uses for key phrase extraction
Identify features and uses for entity recognition

Identify features and uses for sentiment analysis

Identify features and uses for language modeling


Identify features and uses for speech recognition and synthesis

Identify features and uses for translation

Identify Azure tools and services for NLP workloads


Describe capabilities of the Azure AI Language service
Describe capabilities of the Azure AI Speech service

Describe features of generative AI workloads on Azure


(15–20%)

Identify features of generative AI solutions


Identify features of generative AI models

Identify common scenarios for generative AI


Identify responsible AI considerations for generative AI

Identify capabilities of Azure OpenAI Service


Describe natural language generation capabilities of Azure OpenAI Service
Describe code generation capabilities of Azure OpenAI Service

Describe image generation capabilities of Azure OpenAI Service

Study resources
We recommend that you train and get hands-on experience before you take the exam.
We offer self-study options and classroom training as well as links to documentation,
community sites, and videos.

ノ Expand table

Study resources Links to learning and documentation

Get trained Choose from self-paced learning paths and modules or take an
instructor-led course

Find documentation Anomaly Detector


Language Understanding
Azure Machine Learning
Computer Vision
Natural language processing technology
Azure Bot Service
Speech to Text
Speech Translation

Ask a question Microsoft Q&A | Microsoft Docs

Get community Artificial Intelligence and Machine Learning Hub


support

Follow Microsoft Microsoft Learn - Microsoft Tech Community


Learn

Find a video The AI Show


Browse other Microsoft Learn shows

Change log
Key to understanding the table: The topic groups (also known as functional groups) are
in bold typeface followed by the objectives within each group. The table is a comparison
between the two versions of the exam skills measured and the third column describes
the extent of the changes.

ノ Expand table
Unit 1 of 9 S テ Ask Learn

" 100 XP

Introduction to AI
2 minutes

Welcome!

You're presumably here because you want to learn more about artificial intelligence (AI).
Maybe you've heard about AI in the media and want to know more; or maybe you're going to
be adopting AI at work or in school, and want to know more about what to expect.

This training module is designed to provide a high-level overview of some core capabilities of
artificial intelligence (AI) and give you an intuition of how they work. It's not a deeply technical
module, and we won't be writing any code or getting into the mathematical details of machine
learning models. Instead, we'll focus on understanding the kinds of things that AI can do, and
the basic principles on which it is based.

So, let's go! Move on to the next unit and we'll start our exploration of AI.

Next unit: Generative AI


Next T
Unit 2 of 9 S テ Ask Learn

" 100 XP

Generative AI
4 minutes

Key points to understand about generative AI include:

Generative AI is a branch of AI that enables software applications to generate new


content; often natural language dialogs, but also images, video, code, and other formats.
The ability to generate content is based on a language model, which has been trained
with huge volumes of data - often documents from the Internet or other public sources
of information.
Generative AI models encapsulate semantic relationships between language elements
(that's a fancy way of saying that the models "know" how words relate to one another),
and that's what enables them to generate a meaningful sequence of text.
There are large language models (LLMs) and small language models (SLMs) - the
difference is based on the volume of data and the number of variables in the model.
LLMs are very powerful and generalize well, but can be more costly to train and use.
SLMs tend to work well in scenarios that are more focused on specific topic areas, and
usually cost less.

Generative AI scenarios
Common uses of generative AI include:

Implementing chatbots and AI agents that assist human users.


Creating new documents or other content (often as a starting point for further iterative
development)
Automated translation of text between languages.
Summarizing or explaining complex documents.

Next unit: Computer vision


R Previous Next T
Unit 3 of 9 S テ Ask Learn

" 100 XP

Computer vision
4 minutes

Key points to understand about computer vision include:

Computer vision is accomplished by using large numbers of images to train a model.


Image classification is a form of computer vision in which a model is trained with images
that are labeled with the main subject of the image (in other words, what it's an image of)
so that it can analyze unlabeled images and predict the most appropriate label -
identifying the subject of the image.
Object detection is a form of computer vision in which the model is trained to identify the
location of specific objects in an image.
There are more advanced forms of computer vision - for example, semantic segmentation
is an advanced form of object detection where, rather than indicate an object's location
by drawing a box around it, the model can identify the individual pixels in the image that
belong to a particular object.
You can combine computer vision and language models to create a multi-modal model
that combines computer vision and generative AI capabilities.

Computer vision scenarios


Common uses of computer vision include:

Auto-captioning or tag-generation for photographs.


Visual search.
Monitoring stock levels or identifying items for checkout in retail scenarios.
Security video monitoring.
Authentication through facial recognition.
Robotics and self-driving vehicles.

Next unit: Speech


R Previous Next T
Unit 4 of 9 S テ Ask Learn

" 100 XP

Speech
4 minutes

Key points to understand about speech include:

Speech recognition is the ability of AI to "hear" and interpret speech. Usually this
capability takes the form of speech-to-text (where the audio signal for the speech is
transcribed into text).
Speech synthesis is the ability of AI to vocalize words as spoken language. Usually this
capability takes the form of text-to-speech in which information in text format is
converted into an audible signal.
AI speech technology is evolving rapidly to handle challenges like ignoring background
noise, detecting interruptions, and generating increasingly expressive and human-like
voices.

AI speech scenarios
Common uses of AI speech technologies include:
Personal AI assistants in phones, computers, or household devices with which you
interact by talking.
Automated transcription of calls or meetings.
Automating audio descriptions of video or text.
Automated speech translation between languages.

Next unit: Natural language processing


R Previous Next T
Unit 5 of 9 S テ Ask Learn

" 100 XP

Natural language processing


4 minutes

Key points to understand about natural language processing (NLP) include:

NLP capabilities are based on models that are trained to do particular types of text
analysis.
While many natural language processing scenarios are handled by generative AI models
today, there are many common text analytics use cases where simpler NLP language
models can be more cost-effective.
Common NLP tasks include:
Entity extraction - identifying mentions of entities like people, places, organizations in
a document
Text classification - assigning document to a specific category.
Sentiment analysis - determining whether a body of text is positive, negative, or
neutral and inferring opinions.
Language detection - identifying the language in which text is written.

7 Note

In this module, we've used the term natural language processing (NLP) to describe AI
capabilities derive meaning from "ordinary" human language. You might also see this area
of AI referred to as natural language understanding (NLU).

Natural language processing scenarios


Common uses of NLP technologies include:
Analyzing document or transcripts of calls and meetings to determine key subjects and
identify specific mentions of people, places, organizations, products, or other entities.
Analyzing social media posts, product reviews, or articles to evaluate sentiment and
opinion.
Implementing chatbots that can answer frequently asked questions or orchestrate
predictable conversational dialogs that don't require the complexity of generative AI.
Next unit: Extract data and insights
R Previous Next T
Unit 6 of 9 S テ Ask Learn

" 100 XP

Extract data and insights


4 minutes

Key points to understand about using AI to extract data and insights include:

The basis for most document analysis solutions is a computer vision technology called
optical character recognition (OCR).
While an OCR model can identify the location of text in an image, more advanced
models can also interpret individual values in the document - and so extract specific
fields.
While most data extraction models have historically focused on extracting fields from
text-based forms, more advanced models that can extract information from audio
recording, images, and videos are becoming more readily available.

Data and insight extraction scenarios


Common uses of AI to extract data and insights include:

Automated processing of forms and other documents in a business process - for


example, processing an expense claim.
Large-scale digitization of data from paper forms. For example, scanning and archiving
census records.
Indexing documents for search.
Identifying key points and follow-up actions from meeting transcripts or recordings.

Next unit: Responsible AI


R Previous Next T
Unit 7 of 9 S テ Ask Learn

" 100 XP

Responsible AI
4 minutes

Key points to understand about responsible AI include:

Fairness: AI models are trained using data, which is generally sourced and selected by
humans. There's substantial risk that the data selection criteria, or the data itself reflects
unconscious bias that may cause a model to produce discriminatory outputs. AI
developers need to take care to minimize bias in training data and test AI systems for
fairness.
Reliability and safety: AI is based on probabilistic models, it is not infallible. AI-powered
applications need to take this into account and mitigate risks accordingly.
Privacy and security: Models are trained using data, which may include personal
information. AI developers have a responsibility to ensure that the training data is kept
secure, and that the trained models themselves can't be used to reveal private personal
or organizational details.
Inclusiveness: The potential of AI to improve lives and drive success should be open to
everyone. AI developers should strive to ensure that their solutions don't exclude some
users.
Transparency: AI can sometimes seem like "magic", but it's important to make users
aware of how the system works and any potential limitations it may have.
Accountability: Ultimately, the people and organizations that develop and distribute AI
solutions are accountable for their actions. It's important for organizations developing AI
models and applications to define and apply a framework of governance to help ensure
that they apply responsible AI principles to their work.

Responsible AI examples
Some example of scenarios where responsible AI practices should be applied include:

An AI-powered college admissions system should be tested to ensure it evaluates all


applications fairly, taking into account relevant academic criteria but avoiding unfounded
discrimination based on irrelevant demographic factors.
An AI-powered robotic solution that uses computer vision to detect objects should avoid
unintentional harm or damage. One way to accomplish this goal is to use probability
values to determine "confidence" in object identification before interacting with physical
objects, and avoid any action if the confidence level is below a specific threshold.
A facial identification system used in an airport or other secure area should delete
personal images that are used for temporary access as soon as they're no longer
required. Additionally, safeguards should prevent the images being made accessible to
operators or users who have no need to view them.
A web-based chatbot that offers speech-based interaction should also generate text
captions to avoid making the system unusable for users with a hearing impairment.
A bank that uses an AI-based loan-approval application should disclose the use of AI,
and describe features of the data on which it was trained (without revealing confidential
information).

Next unit: Knowledge check


R Previous Next T
Unit 8 of 9 S テ Ask Learn

" 200 XP

Knowledge check
Module assessment 3 minutes

) Great job! You passed the module assessment. P

1. Which is the best description of generative AI? *

Generative AI uses a language model to create original content in response to a


prompt.
" Correct. Generative AI uses language models to create new original content.
Generative AI is an older form of AI that is being replaced by machine learning.

Generative AI is a complex form of AI that can only be used by specialists such


as data scientists.

2. A wildlife conservation app uses AI to locate one or more animals in photos. Which
computer vision capability is being used? *

Image classification

Object detection
" Correct. Object detection is used to identify the location of one or more objects in
an image.
Entity extraction

3. An AI application reads email aloud to a user. Which AI speech capability is being used? *

Speech recognition

Speech synthesis
" Correct. Speech synthesis is used to convert text into spoken language.
Sentiment analysis
Unit 1 of 12 S テ Ask Learn

" 100 XP

Introduction
1 minute

Machine learning is in many ways the intersection of two disciplines - data science and
software engineering. The goal of machine learning is to use data to create a predictive model
that can be incorporated into a software application or service. To achieve this goal requires
collaboration between data scientists who explore and prepare the data before using it to
train a machine learning model, and software developers who integrate the models into
applications where they're used to predict new data values (a process known as inferencing).

In this module, you'll explore some of the core concepts on which machine learning is based,
learn how to identify different kinds of machine learning model, and examine the ways in
which machine learning models are trained and evaluated. Finally, you'll learn how to use
Microsoft Azure Machine Learning to train and deploy a machine learning model, without
needing to write any code.

7 Note

Machine learning is based on mathematical and statistical techniques, some of which are
described at a high level in this module. Don't worry if you're not a mathematical expert
though! The goal of the module is to help you gain an intuition of how machine learning
works - we'll keep the mathematics to the minimum required to understand the core
concepts.

Next unit: What is machine learning?


Next T
Unit 2 of 12 S テ Ask Learn

" 100 XP

What is machine learning?


5 minutes

Machine learning has its origins in statistics and mathematical modeling of data. The
fundamental idea of machine learning is to use data from past observations to predict
unknown outcomes or values. For example:

The proprietor of an ice cream store might use an app that combines historical sales and
weather records to predict how many ice creams they're likely to sell on a given day,
based on the weather forecast.
A doctor might use clinical data from past patients to run automated tests that predict
whether a new patient is at risk from diabetes based on factors like weight, blood
glucose level, and other measurements.
A researcher in the Antarctic might use past observations to automate the identification
of different penguin species (such as Adelie, Gentoo, or Chinstrap) based on
measurements of a bird's flippers, bill, and other physical attributes.

Machine learning as a function


Because machine learning is based on mathematics and statistics, it's common to think about
machine learning models in mathematical terms. Fundamentally, a machine learning model is
a software application that encapsulates a function to calculate an output value based on one
or more input values. The process of defining that function is known as training. After the
function has been defined, you can use it to predict new values in a process called inferencing.

Let's explore the steps involved in training and inferencing.


1. The training data consists of past observations. In most cases, the observations include
the observed attributes or features of the thing being observed, and the known value of
the thing you want to train a model to predict (known as the label).
In mathematical terms, you'll often see the features referred to using the shorthand
variable name x, and the label referred to as y. Usually, an observation consists of
multiple feature values, so x is actually a vector (an array with multiple values), like this:
[x1,x2,x3,...].

To make this clearer, let's consider the examples described previously:


In the ice cream sales scenario, our goal is to train a model that can predict the
number of ice cream sales based on the weather. The weather measurements for
the day (temperature, rainfall, windspeed, and so on) would be the features (x), and
the number of ice creams sold on each day would be the label (y).
In the medical scenario, the goal is to predict whether or not a patient is at risk of
diabetes based on their clinical measurements. The patient's measurements (weight,
blood glucose level, and so on) are the features (x), and the likelihood of diabetes
(for example, 1 for at risk, 0 for not at risk) is the label (y).
In the Antarctic research scenario, we want to predict the species of a penguin
based on its physical attributes. The key measurements of the penguin (length of its
flippers, width of its bill, and so on) are the features (x), and the species (for
example, 0 for Adelie, 1 for Gentoo, or 2 for Chinstrap) is the label (y).
2. An algorithm is applied to the data to try to determine a relationship between the
features and the label, and generalize that relationship as a calculation that can be
performed on x to calculate y. The specific algorithm used depends on the kind of
predictive problem you're trying to solve (more about this later), but the basic principle is
to try to fit the data to a function in which the values of the features can be used to
calculate the label.
3. The result of the algorithm is a model that encapsulates the calculation derived by the
algorithm as a function - let's call it f. In mathematical notation:
y = f(x)

4. Now that the training phase is complete, the trained model can be used for inferencing.
The model is essentially a software program that encapsulates the function produced by
the training process. You can input a set of feature values, and receive as an output a
prediction of the corresponding label. Because the output from the model is a prediction
that was calculated by the function, and not an observed value, you'll often see the
output from the function shown as ŷ (which is rather delightfully verbalized as "y-hat").

Next unit: Types of machine learning


R Previous Next T
Unit 3 of 12 S テ Ask Learn

" 100 XP

Types of machine learning


10 minutes

There are multiple types of machine learning, and you must apply the appropriate type
depending on what you're trying to predict. A breakdown of common types of machine
learning is shown in the following diagram.

Supervised machine learning


Supervised machine learning is a general term for machine learning algorithms in which the
training data includes both feature values and known label values. Supervised machine
learning is used to train models by determining a relationship between the features and labels
in past observations, so that unknown labels can be predicted for features in future cases.

Regression
Regression is a form of supervised machine learning in which the label predicted by the model
is a numeric value. For example:

The number of ice creams sold on a given day, based on the temperature, rainfall, and
windspeed.
The selling price of a property based on its size in square feet, the number of bedrooms
it contains, and socio-economic metrics for its location.
The fuel efficiency (in miles-per-gallon) of a car based on its engine size, weight, width,
height, and length.
Classification
Classification is a form of supervised machine learning in which the label represents a
categorization, or class. There are two common classification scenarios.

Binary classification
In binary classification, the label determines whether the observed item is (or isn't) an instance
of a specific class. Or put another way, binary classification models predict one of two mutually
exclusive outcomes. For example:
Whether a patient is at risk for diabetes based on clinical metrics like weight, age, blood
glucose level, and so on.
Whether a bank customer will default on a loan based on income, credit history, age, and
other factors.
Whether a mailing list customer will respond positively to a marketing offer based on
demographic attributes and past purchases.

In all of these examples, the model predicts a binary true/false or positive/negative prediction
for a single possible class.

Multiclass classification
Multiclass classification extends binary classification to predict a label that represents one of
multiple possible classes. For example,

The species of a penguin (Adelie, Gentoo, or Chinstrap) based on its physical


measurements.
The genre of a movie (comedy, horror, romance, adventure, or science fiction) based on its
cast, director, and budget.

In most scenarios that involve a known set of multiple classes, multiclass classification is used
to predict mutually exclusive labels. For example, a penguin can't be both a Gentoo and an
Adelie. However, there are also some algorithms that you can use to train multilabel
classification models, in which there may be more than one valid label for a single observation.
For example, a movie could potentially be categorized as both science fiction and comedy.

Unsupervised machine learning


Unsupervised machine learning involves training models using data that consists only of
feature values without any known labels. Unsupervised machine learning algorithms determine
relationships between the features of the observations in the training data.
Clustering
The most common form of unsupervised machine learning is clustering. A clustering algorithm
identifies similarities between observations based on their features, and groups them into
discrete clusters. For example:

Group similar flowers based on their size, number of leaves, and number of petals.
Identify groups of similar customers based on demographic attributes and purchasing
behavior.
In some ways, clustering is similar to multiclass classification; in that it categorizes observations
into discrete groups. The difference is that when using classification, you already know the
classes to which the observations in the training data belong; so the algorithm works by
determining the relationship between the features and the known classification label. In
clustering, there's no previously known cluster label and the algorithm groups the data
observations based purely on similarity of features.

In some cases, clustering is used to determine the set of classes that exist before training a
classification model. For example, you might use clustering to segment your customers into
groups, and then analyze those groups to identify and categorize different classes of customer
(high value - low volume, frequent small purchaser, and so on). You could then use your
categorizations to label the observations in your clustering results and use the labeled data to
train a classification model that predicts to which customer category a new customer might
belong.

Next unit: Regression


R Previous Next T
Unit 4 of 12 S テ Ask Learn

" 100 XP

Regression
12 minutes

Regression models are trained to predict numeric label values based on training data that
includes both features and known labels. The process for training a regression model (or
indeed, any supervised machine learning model) involves multiple iterations in which you use
an appropriate algorithm (usually with some parameterized settings) to train a model, evaluate
the model's predictive performance, and refine the model by repeating the training process
with different algorithms and parameters until you achieve an acceptable level of predictive
accuracy.

The diagram shows four key elements of the training process for supervised machine learning
models:

1. Split the training data (randomly) to create a dataset with which to train the model while
holding back a subset of the data that you'll use to validate the trained model.
2. Use an algorithm to fit the training data to a model. In the case of a regression model,
use a regression algorithm such as linear regression.
3. Use the validation data you held back to test the model by predicting labels for the
features.
4. Compare the known actual labels in the validation dataset to the labels that the model
predicted. Then aggregate the differences between the predicted and actual label values
to calculate a metric that indicates how accurately the model predicted for the validation
data.
After each train, validate, and evaluate iteration, you can repeat the process with different
algorithms and parameters until an acceptable evaluation metric is achieved.

Example - regression
Let's explore regression with a simplified example in which we'll train a model to predict a
numeric label (y) based on a single feature value (x). Most real scenarios involve multiple
feature values, which adds some complexity; but the principle is the same.
For our example, let's stick with the ice cream sales scenario we discussed previously. For our
feature, we'll consider the temperature (let's assume the value is the maximum temperature on
a given day), and the label we want to train a model to predict is the number of ice creams
sold that day. We'll start with some historic data that includes records of daily temperatures (x)
and ice cream sales (y):

ノ Expand table

Temperature (x) Ice cream sales (y)

51 1

52 0

67 14

65 14

70 23

69 20

72 23

75 26
73 22

81 30

78 26

83 36

Training a regression model


We'll start by splitting the data and using a subset of it to train a model. Here's the training
dataset:

ノ Expand table

Temperature (x) Ice cream sales (y)

51 1

65 14

69 20

72 23

75 26

81 30

To get an insight of how these x and y values might relate to one another, we can plot them as
coordinates along two axes, like this:
Now we're ready to apply an algorithm to our training data and fit it to a function that applies
an operation to x to calculate y. One such algorithm is linear regression, which works by
deriving a function that produces a straight line through the intersections of the x and y values
while minimizing the average distance between the line and the plotted points, like this:

The line is a visual representation of the function in which the slope of the line describes how
to calculate the value of y for a given value of x. The line intercepts the x axis at 50, so when x
is 50, y is 0. As you can see from the axis markers in the plot, the line slopes so that every
increase of 5 along the x axis results in an increase of 5 up the y axis; so when x is 55, y is 5;
when x is 60, y is 10, and so on. To calculate a value of y for a given value of x, the function
simply subtracts 50; in other words, the function can be expressed like this:
f(x) = x-50
You can use this function to predict the number of ice creams sold on a day with any given
temperature. For example, suppose the weather forecast tells us that tomorrow it will be 77
degrees. We can apply our model to calculate 77-50 and predict that we'll sell 27 ice creams
tomorrow.
But just how accurate is our model?

Evaluating a regression model


To validate the model and evaluate how well it predicts, we held back some data for which we
know the label (y) value. Here's the data we held back:

ノ Expand table

Temperature (x) Ice cream sales (y)

52 0

67 14

70 23

73 22

78 26

83 36

We can use the model to predict the label for each of the observations in this dataset based
on the feature (x) value; and then compare the predicted label (ŷ) to the known actual label
value (y).
Using the model we trained earlier, which encapsulates the function f(x) = x-50, results in the
following predictions:

ノ Expand table
Temperature (x) Actual sales (y) Predicted sales (ŷ)

52 0 2

67 14 17

70 23 20

73 22 23

78 26 28

83 36 33

We can plot both the predicted and actual labels against the feature values like this:

The predicted labels are calculated by the model so they're on the function line, but there's
some variance between the ŷ values calculated by the function and the actual y values from
the validation dataset; which is indicated on the plot as a line between the ŷ and y values that
shows how far off the prediction was from the actual value.
Regression evaluation metrics
Based on the differences between the predicted and actual values, you can calculate some
common metrics that are used to evaluate a regression model.

Mean Absolute Error (MAE)


The variance in this example indicates by how many ice creams each prediction was wrong. It
doesn't matter if the prediction was over or under the actual value (so for example, -3 and +3
both indicate a variance of 3). This metric is known as the absolute error for each prediction,
and can be summarized for the whole validation set as the mean absolute error (MAE).

In the ice cream example, the mean (average) of the absolute errors (2, 3, 3, 1, 2, and 3) is 2.33.

Mean Squared Error (MSE)


The mean absolute error metric takes all discrepancies between predicted and actual labels
into account equally. However, it may be more desirable to have a model that is consistently
wrong by a small amount than one that makes fewer, but larger errors. One way to produce a
metric that "amplifies" larger errors by squaring the individual errors and calculating the mean
of the squared values. This metric is known as the mean squared error (MSE).

In our ice cream example, the mean of the squared absolute values (which are 4, 9, 9, 1, 4, and
9) is 6.

Root Mean Squared Error (RMSE)


The mean squared error helps take the magnitude of errors into account, but because it
squares the error values, the resulting metric no longer represents the quantity measured by
the label. In other words, we can say that the MSE of our model is 6, but that doesn't measure
its accuracy in terms of the number of ice creams that were mispredicted; 6 is just a numeric
score that indicates the level of error in the validation predictions.
If we want to measure the error in terms of the number of ice creams, we need to calculate the
square root of the MSE; which produces a metric called, unsurprisingly, Root Mean Squared
Error. In this case √6, which is 2.45 (ice creams).

Coefficient of determination (R2)


All of the metrics so far compare the discrepancy between the predicted and actual values in
order to evaluate the model. However, in reality, there's some natural random variance in the
daily sales of ice cream that the model takes into account. In a linear regression model, the
training algorithm fits a straight line that minimizes the mean variance between the function
and the known label values. The coefficient of determination (more commonly referred to as
R2 or R-Squared) is a metric that measures the proportion of variance in the validation results
that can be explained by the model, as opposed to some anomalous aspect of the validation
data (for example, a day with a highly unusual number of ice creams sales because of a local
festival).
The calculation for R2 is more complex than for the previous metrics. It compares the sum of
squared differences between predicted and actual labels with the sum of squared differences
between the actual label values and the mean of actual label values, like this:

R2 = 1- ∑(y-ŷ)2 ÷ ∑(y-ȳ)2

Don't worry too much if that looks complicated; most machine learning tools can calculate the
metric for you. The important point is that the result is a value between 0 and 1 that describes
the proportion of variance explained by the model. In simple terms, the closer to 1 this value
is, the better the model is fitting the validation data. In the case of the ice cream regression
model, the R2 calculated from the validation data is 0.95.

Iterative training
The metrics described above are commonly used to evaluate a regression model. In most real-
world scenarios, a data scientist will use an iterative process to repeatedly train and evaluate a
model, varying:

Feature selection and preparation (choosing which features to include in the model, and
calculations applied to them to help ensure a better fit).
Algorithm selection (We explored linear regression in the previous example, but there are
many other regression algorithms)
Algorithm parameters (numeric settings to control algorithm behavior, more accurately
called hyperparameters to differentiate them from the x and y parameters).
After multiple iterations, the model that results in the best evaluation metric that's acceptable
for the specific scenario is selected.

Next unit: Binary classification


R Previous Next T
Unit 5 of 12 S テ Ask Learn

" 100 XP

Binary classification
12 minutes

Classification, like regression, is a supervised machine learning technique; and therefore follows
the same iterative process of training, validating, and evaluating models. Instead of calculating
numeric values like a regression model, the algorithms used to train classification models
calculate probability values for class assignment and the evaluation metrics used to assess
model performance compare the predicted classes to the actual classes.
Binary classification algorithms are used to train a model that predicts one of two possible
labels for a single class. Essentially, predicting true or false. In most real scenarios, the data
observations used to train and validate the model consist of multiple feature (x) values and a y
value that is either 1 or 0.

Example - binary classification


To understand how binary classification works, let's look at a simplified example that uses a
single feature (x) to predict whether the label y is 1 or 0. In this example, we'll use the blood
glucose level of a patient to predict whether or not the patient has diabetes. Here's the data
with which we'll train the model:

ノ Expand table

Blood glucose (x) Diabetic? (y)

67 0

103 1

114 1
72 0

116 1

65 0

Training a binary classification model


To train the model, we'll use an algorithm to fit the training data to a function that calculates
the probability of the class label being true (in other words, that the patient has diabetes).
Probability is measured as a value between 0.0 and 1.0, such that the total probability for all
possible classes is 1.0. So for example, if the probability of a patient having diabetes is 0.7,
then there's a corresponding probability of 0.3 that the patient isn't diabetic.

There are many algorithms that can be used for binary classification, such as logistic regression,
which derives a sigmoid (S-shaped) function with values between 0.0 and 1.0, like this:

7 Note

Despite its name, in machine learning logistic regression is used for classification, not
regression. The important point is the logistic nature of the function it produces, which
describes an S-shaped curve between a lower and upper value (0.0 and 1.0 when used for
binary classification).

The function produced by the algorithm describes the probability of y being true (y=1) for a
given value of x. Mathematically, you can express the function like this:
f(x) = P(y=1 | x)

For three of the six observations in the training data, we know that y is definitely true, so the
probability for those observations that y=1 is 1.0 and for the other three, we know that y is
definitely false, so the probability that y=1 is 0.0. The S-shaped curve describes the probability
distribution so that plotting a value of x on the line identifies the corresponding probability
that y is 1.

The diagram also includes a horizontal line to indicate the threshold at which a model based
on this function will predict true (1) or false (0). The threshold lies at the mid-point for y (P(y) =
0.5). For any values at this point or above, the model will predict true (1); while for any values
below this point it will predict false (0). For example, for a patient with a blood glucose level of
90, the function would result in a probability value of 0.9. Since 0.9 is higher than the threshold
of 0.5, the model would predict true (1) - in other words, the patient is predicted to have
diabetes.

Evaluating a binary classification model


As with regression, when training a binary classification model you hold back a random subset
of data with which to validate the trained model. Let's assume we held back the following data
to validate our diabetes classifier:

ノ Expand table

Blood glucose (x) Diabetic? (y)

66 0

107 1

112 1

71 0
Blood glucose (x) Diabetic? (y)

87 1

89 1

Applying the logistic function we derived previously to the x values results in the following
plot.

Based on whether the probability calculated by the function is above or below the threshold,
the model generates a predicted label of 1 or 0 for each observation. We can then compare
the predicted class labels (ŷ) to the actual class labels (y), as shown here:

ノ Expand table

Blood glucose (x) Actual diabetes diagnosis (y) Predicted diabetes diagnosis (ŷ)

66 0 0

107 1 1

112 1 1

71 0 0
Blood glucose (x) Actual diabetes diagnosis (y) Predicted diabetes diagnosis (ŷ)

87 1 0

89 1 1

Binary classification evaluation metrics


The first step in calculating evaluation metrics for a binary classification model is usually to
create a matrix of the number of correct and incorrect predictions for each possible class label:

This visualization is called a confusion matrix, and it shows the prediction totals where:
ŷ=0 and y=0: True negatives (TN)
ŷ=1 and y=0: False positives (FP)
ŷ=0 and y=1: False negatives (FN)
ŷ=1 and y=1: True positives (TP)

The arrangement of the confusion matrix is such that correct (true) predictions are shown in a
diagonal line from top-left to bottom-right. Often, color-intensity is used to indicate the
number of predictions in each cell, so a quick glance at a model that predicts well should
reveal a deeply shaded diagonal trend.

Accuracy
The simplest metric you can calculate from the confusion matrix is accuracy - the proportion of
predictions that the model got right. Accuracy is calculated as:

(TN+TP) ÷ (TN+FN+FP+TP)
In the case of our diabetes example, the calculation is:
(2+3) ÷ (2+1+0+3)

=5÷6
= 0.83

So for our validation data, the diabetes classification model produced correct predictions 83%
of the time.

Accuracy might initially seem like a good metric to evaluate a model, but consider this.
Suppose 11% of the population has diabetes. You could create a model that always predicts 0,
and it would achieve an accuracy of 89%, even though it makes no real attempt to
differentiate between patients by evaluating their features. What we really need is a deeper
understanding of how the model performs at predicting 1 for positive cases and 0 for negative
cases.

Recall
Recall is a metric that measures the proportion of positive cases that the model identified
correctly. In other words, compared to the number of patients who have diabetes, how many
did the model predict to have diabetes?

The formula for recall is:


TP ÷ (TP+FN)

For our diabetes example:

3 ÷ (3+1)
=3÷4

= 0.75
So our model correctly identified 75% of patients who have diabetes as having diabetes.

Precision
Precision is a similar metric to recall, but measures the proportion of predicted positive cases
where the true label is actually positive. In other words, what proportion of the patients
predicted by the model to have diabetes actually have diabetes?

The formula for precision is:

TP ÷ (TP+FP)
For our diabetes example:

3 ÷ (3+0)
=3÷3

= 1.0
So 100% of the patients predicted by our model to have diabetes do in fact have diabetes.

F1-score
F1-score is an overall metric that combined recall and precision. The formula for F1-score is:

(2 x Precision x Recall) ÷ (Precision + Recall)

For our diabetes example:


(2 x 1.0 x 0.75) ÷ (1.0 + 0.75)

= 1.5 ÷ 1.75
= 0.86

Area Under the Curve (AUC)


Another name for recall is the true positive rate (TPR), and there's an equivalent metric called
the false positive rate (FPR) that is calculated as FP÷(FP+TN). We already know that the TPR for
our model when using a threshold of 0.5 is 0.75, and we can use the formula for FPR to
calculate a value of 0÷2 = 0.

Of course, if we were to change the threshold above which the model predicts true (1), it would
affect the number of positive and negative predictions; and therefore change the TPR and FPR
metrics. These metrics are often used to evaluate a model by plotting a received operator
characteristic (ROC) curve that compares the TPR and FPR for every possible threshold value
between 0.0 and 1.0:
The ROC curve for a perfect model would go straight up the TPR axis on the left and then
across the FPR axis at the top. Since the plot area for the curve measures 1x1, the area under
this perfect curve would be 1.0 (meaning that the model is correct 100% of the time). In
contrast, a diagonal line from the bottom-left to the top-right represents the results that
would be achieved by randomly guessing a binary label; producing an area under the curve of
0.5. In other words, given two possible class labels, you could reasonably expect to guess
correctly 50% of the time.

In the case of our diabetes model, the curve above is produced, and the area under the curve
(AUC) metric is 0.875. Since the AUC is higher than 0.5, we can conclude the model performs
better at predicting whether or not a patient has diabetes than randomly guessing.

Next unit: Multiclass classification


R Previous Next T
Unit 6 of 12 S テ Ask Learn

" 100 XP

Multiclass classification
12 minutes

Multiclass classification is used to predict to which of multiple possible classes an observation


belongs. As a supervised machine learning technique, it follows the same iterative train,
validate, and evaluate process as regression and binary classification in which a subset of the
training data is held back to validate the trained model.

Example - multiclass classification


Multiclass classification algorithms are used to calculate probability values for multiple class
labels, enabling a model to predict the most probable class for a given observation.
Let's explore an example in which we have some observations of penguins, in which the flipper
length (x) of each penguin is recorded. For each observation, the data includes the penguin
species (y), which is encoded as follows:

0: Adelie
1: Gentoo
2: Chinstrap

7 Note

As with previous examples in this module, a real scenario would include multiple feature
(x) values. We'll use a single feature to keep things simple.

ノ Expand table

Flipper length (x) Species (y)

167 0
172 0

225 2

197 1

189 1

232 2

158 0

Training a multiclass classification model


To train a multiclass classification model, we need to use an algorithm to fit the training data
to a function that calculates a probability value for each possible class. There are two kinds of
algorithm you can use to do this:

One-vs-Rest (OvR) algorithms


Multinomial algorithms

One-vs-Rest (OvR) algorithms


One-vs-Rest algorithms train a binary classification function for each class, each calculating the
probability that the observation is an example of the target class. Each function calculates the
probability of the observation being a specific class compared to any other class. For our
penguin species classification model, the algorithm would essentially create three binary
classification functions:
f0(x) = P(y=0 | x)
f1(x) = P(y=1 | x)
f2(x) = P(y=2 | x)

Each algorithm produces a sigmoid function that calculates a probability value between 0.0
and 1.0. A model trained using this kind of algorithm predicts the class for the function that
produces the highest probability output.

Multinomial algorithms
As an alternative approach is to use a multinomial algorithm, which creates a single function
that returns a multi-valued output. The output is a vector (an array of values) that contains the
probability distribution for all possible classes - with a probability score for each class which
when totaled add up to 1.0:
f(x) =[P(y=0|x), P(y=1|x), P(y=2|x)]

An example of this kind of function is a softmax function, which could produce an output like
the following example:
[0.2, 0.3, 0.5]

The elements in the vector represent the probabilities for classes 0, 1, and 2 respectively; so in
this case, the class with the highest probability is 2.

Regardless of which type of algorithm is used, the model uses the resulting function to
determine the most probable class for a given set of features (x) and predicts the
corresponding class label (y).

Evaluating a multiclass classification model


You can evaluate a multiclass classifier by calculating binary classification metrics for each
individual class. Alternatively, you can calculate aggregate metrics that take all classes into
account.
Let's assume that we've validated our multiclass classifier, and obtained the following results:

ノ Expand table

Flipper length (x) Actual species (y) Predicted species (ŷ)

165 0 0

171 0 0

205 2 1
Flipper length (x) Actual species (y) Predicted species (ŷ)

195 1 1

183 1 1

221 2 2

214 2 2

The confusion matrix for a multiclass classifier is similar to that of a binary classifier, except
that it shows the number of predictions for each combination of predicted (ŷ) and actual class
labels (y):

From this confusion matrix, we can determine the metrics for each individual class as follows:

ノ Expand table

Class TP TN FP FN Accuracy Recall Precision F1-


Score

0 2 5 0 0 1.0 1.0 1.0 1.0

1 2 4 1 0 0.86 1.0 0.67 0.8

2 2 4 0 1 0.86 0.67 1.0 0.8

To calculate the overall accuracy, recall, and precision metrics, you use the total of the TP, TN,
FP, and FN metrics:
Overall accuracy = (13+6)÷(13+6+1+1) = 0.90
Overall recall = 6÷(6+1) = 0.86
Overall precision = 6÷(6+1) = 0.86

The overall F1-score is calculated using the overall recall and precision metrics:
Overall F1-score = (2x0.86x0.86)÷(0.86+0.86) = 0.86

Next unit: Clustering


R Previous Next T
Unit 7 of 12 S テ Ask Learn

" 100 XP

Clustering
10 minutes

Clustering is a form of unsupervised machine learning in which observations are grouped into
clusters based on similarities in their data values, or features. This kind of machine learning is
considered unsupervised because it doesn't make use of previously known label values to train
a model. In a clustering model, the label is the cluster to which the observation is assigned,
based only on its features.

Example - clustering
For example, suppose a botanist observes a sample of flowers and records the number of
leaves and petals on each flower:

There are no known labels in the dataset, just two features. The goal is not to identify the
different types (species) of flower; just to group similar flowers together based on the number
of leaves and petals.

ノ Expand table

Leaves (x1) Petals (x2)

0 5

0 6

1 3

1 3
Leaves (x1) Petals (x2)

1 6

1 8

2 3

2 7

2 8

Training a clustering model


There are multiple algorithms you can use for clustering. One of the most commonly used
algorithms is K-Means clustering, which consists of the following steps:

1. The feature (x) values are vectorized to define n-dimensional coordinates (where n is the
number of features). In the flower example, we have two features: number of leaves (x1)
and number of petals (x2). So, the feature vector has two coordinates that we can use to
conceptually plot the data points in two-dimensional space ([x1,x2])
2. You decide how many clusters you want to use to group the flowers - call this value k. For
example, to create three clusters, you would use a k value of 3. Then k points are plotted
at random coordinates. These points become the center points for each cluster, so
they're called centroids.
3. Each data point (in this case a flower) is assigned to its nearest centroid.
4. Each centroid is moved to the center of the data points assigned to it based on the mean
distance between the points.
5. After the centroid is moved, the data points may now be closer to a different centroid, so
the data points are reassigned to clusters based on the new closest centroid.
6. The centroid movement and cluster reallocation steps are repeated until the clusters
become stable or a predetermined maximum number of iterations is reached.

The following animation shows this process:


Evaluating a clustering model
Since there's no known label with which to compare the predicted cluster assignments,
evaluation of a clustering model is based on how well the resulting clusters are separated from
one another.
There are multiple metrics that you can use to evaluate cluster separation, including:

Average distance to cluster center: How close, on average, each point in the cluster is to
the centroid of the cluster.
Average distance to other center: How close, on average, each point in the cluster is to
the centroid of all other clusters.
Maximum distance to cluster center: The furthest distance between a point in the cluster
and its centroid.
Silhouette: A value between -1 and 1 that summarizes the ratio of distance between
points in the same cluster and points in different clusters (The closer to 1, the better the
cluster separation).

Next unit: Deep learning


R Previous Next T
Unit 8 of 12 S テ Ask Learn

" 100 XP

Deep learning
12 minutes

Deep learning is an advanced form of machine learning that tries to emulate the way the
human brain learns. The key to deep learning is the creation of an artificial neural network that
simulates electrochemical activity in biological neurons by using mathematical functions, as
shown here.

ノ Expand table

Biological neural network Artificial neural network

Neurons fire in response to Each neuron is a function that operates on an input


electrochemical stimuli. When fired, the value (x) and a weight (w). The function is wrapped in
signal is passed to connected neurons. an activation function that determines whether to pass
the output on.

Artificial neural networks are made up of multiple layers of neurons - essentially defining a
deeply nested function. This architecture is the reason the technique is referred to as deep
learning and the models produced by it are often referred to as deep neural networks (DNNs).
You can use deep neural networks for many kinds of machine learning problem, including
regression and classification, as well as more specialized models for natural language
processing and computer vision.
Just like other machine learning techniques discussed in this module, deep learning involves
fitting training data to a function that can predict a label (y) based on the value of one or more
features (x). The function (f(x)) is the outer layer of a nested function in which each layer of the
neural network encapsulates functions that operate on x and the weight (w) values associated
with them. The algorithm used to train the model involves iteratively feeding the feature
values (x) in the training data forward through the layers to calculate output values for ŷ,
validating the model to evaluate how far off the calculated ŷ values are from the known y
values (which quantifies the level of error, or loss, in the model), and then modifying the
weights (w) to reduce the loss. The trained model includes the final weight values that result in
the most accurate predictions.

Example - Using deep learning for classification


To better understand how a deep neural network model works, let's explore an example in
which a neural network is used to define a classification model for penguin species.

The feature data (x) consists of some measurements of a penguin. Specifically, the
measurements are:

The length of the penguin's bill.


The depth of the penguin's bill.
The length of the penguin's flippers.
The penguin's weight.
In this case, x is a vector of four values, or mathematically, x=[x1,x2,x3,x4].

The label we're trying to predict (y) is the species of the penguin, and that there are three
possible species it could be:

Adelie
Gentoo
Chinstrap
This is an example of a classification problem, in which the machine learning model must
predict the most probable class to which an observation belongs. A classification model
accomplishes this by predicting a label that consists of the probability for each class. In other
words, y is a vector of three probability values; one for each of the possible classes: [P(y=0|x),
P(y=1|x), P(y=2|x)].

The process for inferencing a predicted penguin class using this network is:

1. The feature vector for a penguin observation is fed into the input layer of the neural
network, which consists of a neuron for each x value. In this example, the following x
vector is used as the input: [37.3, 16.8, 19.2, 30.0]
2. The functions for the first layer of neurons each calculate a weighted sum by combining
the x value and w weight, and pass it to an activation function that determines if it meets
the threshold to be passed on to the next layer.
3. Each neuron in a layer is connected to all of the neurons in the next layer (an architecture
sometimes called a fully connected network) so the results of each layer are fed forward
through the network until they reach the output layer.
4. The output layer produces a vector of values; in this case, using a softmax or similar
function to calculate the probability distribution for the three possible classes of penguin.
In this example, the output vector is: [0.2, 0.7, 0.1]
5. The elements of the vector represent the probabilities for classes 0, 1, and 2. The second
value is the highest, so the model predicts that the species of the penguin is 1 (Gentoo).

How does a neural network learn?


The weights in a neural network are central to how it calculates predicted values for labels.
During the training process, the model learns the weights that will result in the most accurate
predictions. Let's explore the training process in a little more detail to understand how this
learning takes place.
1. The training and validation datasets are defined, and the training features are fed into
the input layer.
2. The neurons in each layer of the network apply their weights (which are initially assigned
randomly) and feed the data through the network.
3. The output layer produces a vector containing the calculated values for ŷ. For example,
an output for a penguin class prediction might be [0.3. 0.1. 0.6].
4. A loss function is used to compare the predicted ŷ values to the known y values and
aggregate the difference (which is known as the loss). For example, if the known class for
the case that returned the output in the previous step is Chinstrap, then the y value
should be [0.0, 0.0, 1.0]. The absolute difference between this and the ŷ vector is [0.3, 0.1,
0.4]. In reality, the loss function calculates the aggregate variance for multiple cases and
summarizes it as a single loss value.
5. Since the entire network is essentially one large nested function, an optimization function
can use differential calculus to evaluate the influence of each weight in the network on
the loss, and determine how they could be adjusted (up or down) to reduce the amount
of overall loss. The specific optimization technique can vary, but usually involves a
gradient descent approach in which each weight is increased or decreased to minimize
the loss.
6. The changes to the weights are backpropagated to the layers in the network, replacing
the previously used values.
7. The process is repeated over multiple iterations (known as epochs) until the loss is
minimized and the model predicts acceptably accurately.

7 Note

While it's easier to think of each case in the training data being passed through the
network one at a time, in reality the data is batched into matrices and processed using
linear algebraic calculations. For this reason, neural network training is best performed on
computers with graphical processing units (GPUs) that are optimized for vector and
matrix manipulation.

Next unit: Azure Machine Learning


R Previous Next T
Unit 9 of 12 S テ Ask Learn

" 100 XP

Azure Machine Learning


6 minutes

Microsoft Azure Machine Learning is a cloud service for training, deploying, and managing
machine learning models. It's designed to be used by data scientists, software engineers,
devops professionals, and others to manage the end-to-end lifecycle of machine learning
projects, including:

Exploring data and preparing it for modeling.


Training and evaluating machine learning models.
Registering and managing trained models.
Deploying trained models for use by applications and services.
Reviewing and applying responsible AI principles and practices.

Features and capabilities of Azure Machine


Learning
Azure Machine Learning provides the following features and capabilities to support machine
learning workloads:

Centralized storage and management of datasets for model training and evaluation.
On-demand compute resources on which you can run machine learning jobs, such as
training a model.
Automated machine learning (AutoML), which makes it easy to run multiple training jobs
with different algorithms and parameters to find the best model for your data.
Visual tools to define orchestrated pipelines for processes such as model training or
inferencing.
Integration with common machine learning frameworks such as MLflow, which make it
easier to manage model training, evaluation, and deployment at scale.
Built-in support for visualizing and evaluating metrics for responsible AI, including model
explainability, fairness assessment, and others.

Provisioning Azure Machine Learning resources


The primary resource required for Azure Machine Learning is an Azure Machine Learning
workspace, which you can provision in an Azure subscription. Other supporting resources,
including storage accounts, container registries, virtual machines, and others are created
automatically as needed.
To create an Azure Machine Learning workspace, you can use the Azure portal, as shown here:

Azure Machine Learning studio


After you've provisioned an Azure Machine Learning workspace, you can use it in Azure
Machine Learning studio; a browser-based portal for managing your machine learning
resources and jobs.

In Azure Machine Learning studio, you can (among other things):

Import and explore data.


Create and use compute resources.
Run code in notebooks.
Use visual tools to create jobs and pipelines.
Use automated machine learning to train models.
View details of trained models, including evaluation metrics, responsible AI information,
and training parameters.
Deploy trained models for on-request and batch inferencing.
Import and manage models from a comprehensive model catalog.

The screenshot shows the Metrics page for a trained model in Azure Machine Learning studio,
in which you can see the evaluation metrics for a trained multiclass classification model.

Next unit: Exercise - Explore Automated Machine Learning in


Azure Machine Learning
R Previous Next T
Unit 11 of 12 S テ Ask Learn

" 200 XP

Knowledge check
Module assessment 3 minutes

) Answer 100% of questions correctly in order to pass. Retake P

1. You want to create a model to predict the cost of heating an office building based on its
size in square feet and the number of employees working there. What kind of machine
learning problem is this? *

Regression
" Correct. Regression models predict numeric values.
Classification

Clustering
P Incorrect. Clustering models group similar items together.
2. You need to evaluate a classification model. Which metric can you use? ​*

Mean squared error (MSE)


P Incorrect. MSE is used to evaluate regression models.
Precision
" Correct. Precision is a useful metric for evaluating classification models.
Silhouette

3. In deep learning, what is the purpose of a loss function? *

To remove data for which no known label values are provided

To evaluate the aggregate difference between predicted and actual label values
" Correct. A loss function determines the overall variance, or loss, between
predicted and actual label values.
To calculate the cost of training a neural network rather than a statistical model

4. What does automated machine learning in Azure Machine Learning enable you to do? *

Automatically deploy new versions of a model as they're trained

Automatically provision Azure Machine Learning workspaces for new data


scientists in an organization

Automatically run multiple training jobs using different algorithms and


parameters to find the best model
" Correct. Automated machine learning runs multiple training jobs, varying
algorithms and parameters, to find the best model for your data.

Next unit: Summary


R Previous Next T
Unit 1 of 8 S テ Ask Learn

" 100 XP

Introduction
2 minutes

Artificial Intelligence (AI) is changing our world and there’s hardly an industry that hasn't been
affected. From better healthcare to online safety, AI is helping us to tackle some of society’s
biggest issues.

Azure AI services are a portfolio of AI capabilities that unlock automation for workloads in
language, vision, intelligent search, content generation, and much more. They are
straightforward to implement and don’t require specialist AI knowledge.

Organizations are using Azure AI services in innovative ways, such as within robots to
provide life-like companionship to older people by expressing happiness, concern, and even
laughter. In other use cases, scientists are using AI to protect endangered species by
identifying hard-to-find animals in images. This was previously time-consuming and error-
prone work, which the Azure AI Vision service can complete quickly and with a high degree of
accuracy, freeing scientists to do other work.
In this module you will learn what Azure AI services are, and how you can use them in your
own applications.

Next unit: AI services on the Azure platform


Next T
Unit 2 of 8 S テ Ask Learn

" 100 XP

AI services on the Azure platform


4 minutes

Azure AI services are AI capabilities that can be built into web or mobile applications, in a way
that's straightforward to implement. These AI services include generative AI, image
recognition, natural language processing, speech, AI-powered search, and more. There are
over a dozen different services that can be used separately or together to add AI power to
applications.
Let's take a look at some examples of what can be done with Azure AI services. Azure OpenAI
service provides access to powerful, cutting-edge, generative AI models for application
development. Azure AI Content Safety service can be used to detect harmful content within
text or images, including violent or hateful content, and report on its severity. Azure AI
Language service can be used to summarize text, classify information, or extract key phrases.
Azure AI Speech service provides powerful speech to text and text to speech capabilities,
allowing speech to be accurately transcribed into text, or text to natural sounding voice audio.

7 Note

You can use multiple Azure AI services with Azure AI Foundry, a platform for AI
application development. Azure AI Foundry is covered in its own module: Introduction to
Azure AI Foundry.

Azure AI services are based on three principles that dramatically improve speed-to-market:
Prebuilt and ready to use
Accessed through APIs
Available and secure on Azure

Azure AI services are prebuilt and ready to use


AI has been prohibitive for all but the largest technology companies because of several
factors, including the large amounts of data required to train models, the massive amount of
computing power needed, and the budget to hire specialist programmers. Azure AI services
make AI accessible to businesses of all sizes by using pre-trained machine learning models to
deliver AI as a service. Azure AI services use high-performance Azure computing to deploy
advanced AI models as resources, making decades of research available to developers of all
skill levels.
Azure AI services are a portfolio of services, with capabilities suitable for use cases across
sectors and industries.
For example, in education, Immersive Reader is being used to support students by adapting to
their requirements. Learners can have varying needs, such as wanting to read more slowly, get
words or text translated into another language, or see pictures to aid their understanding.
Immersive Reader helps students with different needs learn at their own pace, and in their own
way.
While Azure AI services can be used without any modification, some AI services can be
customized to better fit specific requirements. Customization capabilities in Azure AI Vision,
Azure AI Speech, and Azure OpenAI all allow you to add data to existing models.
For example, in sport, athletes, and coaches are customizing Azure AI Vision to improve
performance and reduce injury. One application allows surfers to upload a video and receive
AI-generated insights and analysis. These insights can then be used by coaches, medics,
judges, and event broadcasters.

Azure AI services are accessed through APIs


Azure AI services are designed to be used in different development environments. Developers
can access AI services through REST APIs, client libraries, or integrate them with tools such as
Logic Apps and Power Automate. APIs are application programming interfaces that define the
information that is required for one component to use the services of the other. APIs enable
software components to communicate, so one side can be updated without stopping the
other from working. Find out more about development options for Azure AI services here.
Azure AI services are available and secure on Azure
Azure AI services are cloud-based and accessed through an Azure resource. This means that
they're managed in the same way as other Azure services, such as platform as a service (PaaS),
infrastructure as a service (IaaS), or a managed database service. The Azure platform and
Resource Manager provide a consistent framework for all your Azure services, from creating or
deleting resources, to availability and billing.
Trust is at the core of all of Microsoft's offerings, and security is at the center of Azure AI
services. From cyberthreat prevention, to data privacy, to compliance, to safe digital
experiences, Azure AI services provides enterprise-grade security. You can learn more about
Azure AI services' security features here.

7 Note

In some situations, AI has the potential to be used in ways that might compromise an
individual's privacy or rights. Microsoft has six Responsible AI principles to help ensure AI
services are ethical and fair. Because of this, certain Azure AI services are restricted to
ensure they're used responsibly.

Next unit: Create Azure AI service resources


R Previous Next T
Unit 3 of 8 S テ Ask Learn

" 100 XP

Create Azure AI service resources


3 minutes

Azure AI services are cloud-based, and like all Azure services you need to create a resource to
use them. There are two types of AI service resources: multi-service or single-service. Your
development requirements and how you want costs to be billed determine the types of
resources you need.

Multi-service resource: a resource created in the Azure portal that provides access to
multiple Azure AI services with a single key and endpoint. Use the resource Azure AI
services when you need several AI services or are exploring AI capabilities. When you use
an Azure AI services resource, all your AI services are billed together.
Single-service resources: a resource created in the Azure portal that provides access to a
single Azure AI service, such as Speech, Vision, Language, etc. Each Azure AI service has a
unique key and endpoint. These resources might be used when you only require one AI
service or want to see cost information separately.
You can create a resource several ways, such as in the Azure portal .
How to use the Azure portal to create an Azure AI
services resource
To create an Azure AI services resource, sign in to the Azure portal with Contributor access
and select Create a resource. To create a multi-services resource search for Azure AI services in
the marketplace.

To create a single-service resource, search for the specific Azure AI service such as Face,
Language, or Content Safety, and so on. Most AI services have a free price tier to allow you to
explore their capabilities. After clicking Create for the resource you require, you will be
prompted to complete details of your subscription, the resource group to contain the
resource, the region, a unique name, and the price tier.

Next unit: Use Azure AI services


R Previous Next T
Unit 4 of 8 S テ Ask Learn

" 100 XP

Use Azure AI services


4 minutes

Once you create an Azure AI service resource, you can build applications using the REST API,
software development kits (SDKs), or visual studio interfaces.

Using service studio interfaces


Studio interfaces provide a friendly user interface to explore Azure AI services. There are
different studios for different Azure AI services, such as Vision Studio, Language Studio,
Speech Studio, and the Content Safety Studio. You can test out Azure AI services using the
samples provided, or experiment with your own content. A studio-based approach allows you
to explore, demo, and evaluate Azure AI services regardless of your experience with AI or
coding.

7 Note
In addition to studios for individual Azure AI services, Microsoft Azure has another portal,
Azure AI Foundry portal , which combines access to multiple Azure AI services and
generative AI models into one user interface.

Associate the AI service resource


Before you can use an AI service resource, you must associate it with the studio you want to
use on the Settings page. Select the resource, and then select Use Resource. You are then
ready to explore the Azure AI service within the studio.

As an example, let's look at the Azure AI Content Safety service, which identifies harmful text
or images. To explore what the Content Safety service does, let's use the Content Safety
Studio. First create either a multi-service Azure AI services resource, or a single-service Content
Safety resource. Then, on the Content Safety Studio Settings page, select the resource, and
select Use Resource. The AI service you created is now associated with the Content Safety
Studio, and ready to be used.

7 Note

When developers incorporate an AI service into their applications, they often use a SDK
or the REST API.

Next unit: Understand authentication for Azure AI services


Unit 5 of 8 S テ Ask Learn

" 100 XP

Understand authentication for Azure AI


services
3 minutes

You've now learned how to create an AI service resource. But how do you ensure that only
those authorized have access to your AI service? This is done through authentication, the
process of verifying that the user or service is who they say they are, and that they are
authorized to use the service.

Most Azure AI services are accessed through a RESTful API, although there are other ways. The
API defines what information is passed between two software components: the Azure AI
service and whatever is using it. Having a clearly defined interface is important, because if the
AI service is updated, your application must continue to work correctly.

Part of what an API does is to handle authentication. Whenever a request is made to use an AI
services resource, that request must be authenticated. For example, your subscription and AI
service resource is verified to ensure you have sufficient permissions to access it. This
authentication process uses an endpoint and a resource key.
The endpoint describes how to reach the AI service resource instance that you want to use, in
a similar way to the way a URL identifies a web site. When you view the endpoint for your
resource, it will look something like:

https://myaiservices29.cognitiveservices.azure.com/

The resource key protects the privacy of your resource. To ensure this is always secure, the key
can be changed periodically. You can view the endpoint and key in the Azure portal under
Resource Management and Keys and Endpoint.
When you write code to access the AI service, the keys and endpoint must be included in the
authentication header. The authentication header sends an authorization key to the service to
confirm that the application can use the resource. Learn more about different authentication
requests to Azure AI services here.

When you use a studio interface with Azure AI services, your credentials are authenticated
when you sign in, and a similar process is happening in the background.

Next unit: Exercise - Explore Azure AI Services


R Previous Next T
Unit 7 of 8 S テ Ask Learn

" 200 XP

Knowledge check
Module assessment 3 minutes

) Great job! You passed the module assessment. P

1. An application requires three separate AI services. To see the cost for each separately,
what type of resource(s) should be created? *

A multi-service resource that includes all the AI services

A single-service resource for each AI service


" Correct. Create a single-service resource for each AI service to see costs
separately for each resource.
It's not possible to see costs for individual AI services

2. After logging into one of the individual Azure studios, what is one task to complete to
begin using the individual studio? *

Input a key and endpoint into the studio

Customize the API request.

Associate a resource with the studio


" Correct. To explore the capabilities of the service in the studio, you must first
associate the resource with the studio.

3. What is an Azure AI services resource? *

A bundle of several AI services in one resource


" Correct. An Azure AI services resource is a bundle of several AI services in one
resource.
An AI service to recognize faces

You might also like