Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
SlideShare a Scribd company logo

1

July 2024
Sanela Nikodinoska
AIRLINE SATISFACTION
DATA SCIENCE SOLUTION ON

2

7/4/2024 Annual Review 2
Agenda
 Introduction
 Automated ML
 Designer
 Notebooks – Python SDK
 Closing

3

7/4/2024 Annual Review 3
Introduction
For the last but most significant course DP – 100 –
Designing and Implementing a Data Science Solution
on Azure, part of Data Science Institute held by Semos
Education, Airline Satisfaction dataset was given to
design and implement a data science solution on
Azure. This presentation is an overview of the
implemented solutions created using Azure Machine
Learning Studio.
Since the subscription to Azure was made for learning
purposes only and is now cancelled, this presentation
is made upon screenshots of the most important steps
while developing, training and deploying ml models.

4

Automated ML
Screenshots
from Azure
Let’s dive in

5

7/4/2024 Annual Review 5
First steps
Started with an Azure free trial, created resource group from UI and
created
Azure Machine Learning Service

6

7/4/2024 Annual Review 6
Data – created data asset, uploading local dataset Airline Satisfaction to Azure
(no screenshot for that)
Automated ML – created two experiments, setting different primary
metrics and featurization parameters

7

7/4/2024 Annual Review 7
Automated ML – best model in both experiments was
MaxMinScaler, LightGBM, experiments stopped due to early stopping policy based
on level of primary metric

8

7/4/2024 Annual Review 8
Automated ML – metrics

9

Designer
Screenshots
from Azure
Let’s dive in

10

7/4/2024 Annual Review 10
Ju
Authoring
Using components from Authoring - >
Designer tab, CREATED two
pipelines with two estimators and
one pipeline with single estimator
model for feature importance
component:
- Two-Class Logistic Regression and
Two-Class Decision Forest,
- Two-Class SupportVector Machine
andTwo-Class Neural Network,
(deep-learning model),
- Two-Class Boosted DecisionTree
with Cross –Validate Model
component
Jobs / Metrics
After configuring and
submitting pipelines
and images for env, a
job was created.
The overview of job,
as well as its outputs,
logs, child jobs and
metrics are
presented in the
following snapshots
Registered
Models
The best
performing models
from Automated
ML and Designer
(Neural Network
and LightGBM)
registered as
custom and mflow
models
Real-time
Endpoint
Blue/green deployment
of two best models was
made and the blue
deployment was tested
for inference / the
endpoint was invoked
DESIGNER
Compute targets
Compute instance for profiling data asset, compute
clusters for training models and pipeline sweep jobs
were created
Environments
Compute instance for profiling data asset,
compute clusters for training models and
pipeline sweep jobs were created

11

7/4/2024 Annual Review 11
DESIGNER
Two-Class Logistic Regression andTwo-Class Decision Forest pipeline

12

7/4/2024 Annual Review 12
DESIGNER
Two-Class SupportVector Machine andTwo-Class Neural Network

13

7/4/2024 Annual Review 13
DESIGNER
Two-Class DecisionTree with Feature Importance component and Cross –Validate Model component

14

7/4/2024 Annual Review 14
Designer - Feature importance

15

7/4/2024 Annual Review 15
JOBS – list of all experiments

16

7/4/2024 Annual Review 16
JOBS – overview of designer pipelines metrics – Random Forest Classifier
best model
Note: No snapshots of pipelines success

17

7/4/2024 Annual Review 17
JOBS – overview of designer pipelines metrics –Two-Class Neural Network model
Note: No snapshots of pipelines success

18

7/4/2024 Annual Review 18
JOBS – overview of designer pipelines metrics –Two-Class Logistic Regression
least performing model
Note: No snapshots of pipelines success

19

7/4/2024 Annual Review 19
Registered models

20

7/4/2024 Annual Review 20
Deployment of models

21

7/4/2024 Annual Review 21
Real-time Endpoint

22

7/4/2024 Annual Review 22
Custom environment for model
deployed to the endpoint

23

7/4/2024 Annual Review 23
Predicting / Invoking the Endpoint

24

7/4/2024 Annual Review 24
Environments Custom and curated environments for deploying and testing

25

7/4/2024 Annual Review 25
Compute targets

26

Notebooks
Screenshots
from Azure
Let’s dive in

27

7/4/2024 Annual Review 27
Notebooks – created pipeline for training and scoring
RandomForestClassifer model and tunning hyperparameters
with sweep job

28

7/4/2024 Annual Review 28
Notebooks – running tunning hyperparameters with
sweep job

29

7/4/2024 Annual Review 29
Notebooks – results from pipeline – child jobs

30

7/4/2024 Annual Review 30
Notebooks – results from pipeline – best model

31

7/4/2024 Annual Review 31
Notebooks – results from pipeline – model metrics

32

7/4/2024 Annual Review 32
Notebooks – results from pipeline - predicting

33

7/4/2024 Annual Review 33
Summary
No-code or programmatically
What suits the most
Great business solution
Having all resources in one place
New subscription
For future projects
Getting work done
Finished my course project
So many services
Yet to discover: Azure DataBricks, Azure Synapce
Analystics (for data ingestion), Azure AI Services,
Azure Data factory etc.
Recommend
Definitely!

34

7/4/2024 Annual Review 34
Closing
Thanks to your time.
Hoping to get some of your feedback
for improving.
Sanela Nikodinoska
snikodinoska@gmail.com

More Related Content

AIRLINE_SATISFACTION_Data Science Solution on Azure

  • 1. July 2024 Sanela Nikodinoska AIRLINE SATISFACTION DATA SCIENCE SOLUTION ON
  • 2. 7/4/2024 Annual Review 2 Agenda  Introduction  Automated ML  Designer  Notebooks – Python SDK  Closing
  • 3. 7/4/2024 Annual Review 3 Introduction For the last but most significant course DP – 100 – Designing and Implementing a Data Science Solution on Azure, part of Data Science Institute held by Semos Education, Airline Satisfaction dataset was given to design and implement a data science solution on Azure. This presentation is an overview of the implemented solutions created using Azure Machine Learning Studio. Since the subscription to Azure was made for learning purposes only and is now cancelled, this presentation is made upon screenshots of the most important steps while developing, training and deploying ml models.
  • 5. 7/4/2024 Annual Review 5 First steps Started with an Azure free trial, created resource group from UI and created Azure Machine Learning Service
  • 6. 7/4/2024 Annual Review 6 Data – created data asset, uploading local dataset Airline Satisfaction to Azure (no screenshot for that) Automated ML – created two experiments, setting different primary metrics and featurization parameters
  • 7. 7/4/2024 Annual Review 7 Automated ML – best model in both experiments was MaxMinScaler, LightGBM, experiments stopped due to early stopping policy based on level of primary metric
  • 8. 7/4/2024 Annual Review 8 Automated ML – metrics
  • 10. 7/4/2024 Annual Review 10 Ju Authoring Using components from Authoring - > Designer tab, CREATED two pipelines with two estimators and one pipeline with single estimator model for feature importance component: - Two-Class Logistic Regression and Two-Class Decision Forest, - Two-Class SupportVector Machine andTwo-Class Neural Network, (deep-learning model), - Two-Class Boosted DecisionTree with Cross –Validate Model component Jobs / Metrics After configuring and submitting pipelines and images for env, a job was created. The overview of job, as well as its outputs, logs, child jobs and metrics are presented in the following snapshots Registered Models The best performing models from Automated ML and Designer (Neural Network and LightGBM) registered as custom and mflow models Real-time Endpoint Blue/green deployment of two best models was made and the blue deployment was tested for inference / the endpoint was invoked DESIGNER Compute targets Compute instance for profiling data asset, compute clusters for training models and pipeline sweep jobs were created Environments Compute instance for profiling data asset, compute clusters for training models and pipeline sweep jobs were created
  • 11. 7/4/2024 Annual Review 11 DESIGNER Two-Class Logistic Regression andTwo-Class Decision Forest pipeline
  • 12. 7/4/2024 Annual Review 12 DESIGNER Two-Class SupportVector Machine andTwo-Class Neural Network
  • 13. 7/4/2024 Annual Review 13 DESIGNER Two-Class DecisionTree with Feature Importance component and Cross –Validate Model component
  • 14. 7/4/2024 Annual Review 14 Designer - Feature importance
  • 15. 7/4/2024 Annual Review 15 JOBS – list of all experiments
  • 16. 7/4/2024 Annual Review 16 JOBS – overview of designer pipelines metrics – Random Forest Classifier best model Note: No snapshots of pipelines success
  • 17. 7/4/2024 Annual Review 17 JOBS – overview of designer pipelines metrics –Two-Class Neural Network model Note: No snapshots of pipelines success
  • 18. 7/4/2024 Annual Review 18 JOBS – overview of designer pipelines metrics –Two-Class Logistic Regression least performing model Note: No snapshots of pipelines success
  • 19. 7/4/2024 Annual Review 19 Registered models
  • 20. 7/4/2024 Annual Review 20 Deployment of models
  • 21. 7/4/2024 Annual Review 21 Real-time Endpoint
  • 22. 7/4/2024 Annual Review 22 Custom environment for model deployed to the endpoint
  • 23. 7/4/2024 Annual Review 23 Predicting / Invoking the Endpoint
  • 24. 7/4/2024 Annual Review 24 Environments Custom and curated environments for deploying and testing
  • 25. 7/4/2024 Annual Review 25 Compute targets
  • 27. 7/4/2024 Annual Review 27 Notebooks – created pipeline for training and scoring RandomForestClassifer model and tunning hyperparameters with sweep job
  • 28. 7/4/2024 Annual Review 28 Notebooks – running tunning hyperparameters with sweep job
  • 29. 7/4/2024 Annual Review 29 Notebooks – results from pipeline – child jobs
  • 30. 7/4/2024 Annual Review 30 Notebooks – results from pipeline – best model
  • 31. 7/4/2024 Annual Review 31 Notebooks – results from pipeline – model metrics
  • 32. 7/4/2024 Annual Review 32 Notebooks – results from pipeline - predicting
  • 33. 7/4/2024 Annual Review 33 Summary No-code or programmatically What suits the most Great business solution Having all resources in one place New subscription For future projects Getting work done Finished my course project So many services Yet to discover: Azure DataBricks, Azure Synapce Analystics (for data ingestion), Azure AI Services, Azure Data factory etc. Recommend Definitely!
  • 34. 7/4/2024 Annual Review 34 Closing Thanks to your time. Hoping to get some of your feedback for improving. Sanela Nikodinoska snikodinoska@gmail.com