Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
SlideShare a Scribd company logo
Azure Machine Learning for
Data Scientists
Sergii Baidachnyi
Principal Software Engineer
Microsoft
sbaydach@microsoft.com
@sbaidachni
Looking back
Offering
Platform for emerging data scientists to graphically build and deploy experiments
Key Value Props
• Rapid experiment composition
• > 100 easily configured modules for data prep, training, evaluation
• Extensibility through R & Python
• Serverless training and deployment
Numbers
• 100’s of thousands of deployed models serving billions of requests
Azure Machine
Learning Studio
Azure Batch AI
Infrastructure Can Get in Your Way
Clusters
• Provision GPUs
• Install drivers
and software
• Interactive use
Scheduling
• Queue work
• Prioritize jobs
• Start MPI
• Monitor
• Handle failures
Data
• Scale access to
training data
• Output logs &
models
• Secure &
compliant
Cost
• Scale up and
down
• Share reserved
instances
• Low priority
Workflow
• Choose
efficient
hardware
• Tooling
integration
• Laptop to cloud
• Managed Service
• Supports Role Based Access Control
• Run any toolkit (CNTK, Tensorflow,
Caffee/Caffee2, Chainer, Keras, …)
• Run experiments in Parallel
• Run in Containers or directly on VM
• Support various Shared File Systems
• Load based automatic scaling
• Only Storage and compute cost. Service is free
Azure Batch
AI Service
Azure DataBricks
Databricks Spark as a managed service on Azure
CONTROL EASE OF USE
Azure Data Lake Store
Azure Storage
Any Hadoop technology,
any distribution
Workload optimized,
managed clusters
Data Engineering in a
Job-as-a-service model
Azure Marketplace
HDP | CDH | MapR
Azure Data Lake
Analytics
IaaS Clusters Managed Clusters Big Data as-a-service
Azure HDInsight
Frictionless & Optimized
Spark clusters
Azure Databricks
BIGDATA
STORAGE
BIGDATA
ANALYTICS
ReducedAdministration
IaaS and PaaS Big Data Analytics
Azure Databricks
Microsoft Azure
Optimized Databricks Runtime Engine
DATABRICKS I/O SERVERLESS
Collaborative Workspace
Cloud storage
Data warehouses
Hadoop storage
IoT / streaming data
Rest APIs
Machine learning models
BI tools
Data exports
Data warehouses
Azure Databricks
Enhance Productivity
Deploy Production Jobs & Workflows
APACHE SPARK
MULTI-STAGE PIPELINES
DATA ENGINEER
JOB SCHEDULER NOTIFICATION & LOGS
DATA SCIENTIST BUSINESS ANALYST
Build on secure & trusted cloud Scale without limits
Azure Databricks
Azure Databricks Cluster Architecture
Azure DB
for
PostgreSQL
Webapp
Azure Compute
Cluster
Manager
Databricks’ Azure Account User’s Azure Account
Azure Compute
Spark
Driver
Azure Compute
Spark
Worker
Azure Compute
Spark
Worker
Jobs
FileSystem
Service
Spark
History
Server
Log
Daemon
Log
Daemon
Azure Databricks Core Artifacts
Azure
Databricks
Azure Machine Learning
Experimentation and
Management
Apps + insights
Social
LOB
Graph
IoT
Image
CRM INGEST STORE PREP & TRAIN MODEL & SERVE
Data orchestration
and monitoring
Data lake
and storage
Hadoop/Spark/SQL
and ML
.
IoT
Azure Machine Learning
The AI Development lifecycle
Local machine
Scale up to DSVM
Scale out with Spark on HDInsight
Azure Batch AI (Coming Soon)
ML Server (Coming Soon)
Experiment Anywhere
A ZURE ML
EXPERIMENTATION
Command line tools
IDEs
Notebooks in Workbench
VS Code Tools for AI
Transparent Compute
Demo
Experimentation Service
DOCKER
Single node deployment
(cloud/on-prem)
Azure Container Service
Azure IoT Edge
Microsoft ML Server
Spark clusters
SQL Server (Coming Soon)
Deploy Everywhere
A ZURE ML
MODEL MANAGEMENT
Model Management
Machine Learning Server
R Server Overview
• Enhances upon open source R to scale to big data
• Embraces combined open source and commercial innovations
• Allows customers to get the support they trust
• Microsoft innovations:
• RevoScaleR
• Parallelized, distributed algorithms
• Microsoft Machine learning
• Fast and Deep learning
• Pretrained models
• Custom parallel frameworks
ML Services Version 9.2 at a glance
Platforms & Data
Tools
Languages
Algorithms
Data Sources
Rattle Mrsdeploy
RESTful API
deployment
Real-Time
Scoring
Visualization
Tool
Integration
.csv Microsoft .XDF
In-database
deployment
Operationalization
Distributed Parallelized Algorithms:
•RevoScaleR and RevoScalePy libraries
•MicrosoftML library
•Custom parallelization frameworks
Open source R algorithms
& visualizations:
•CRAN
•bioconductor
Plus:
•Deep Learning
•Pretrained Models
•Prebuilt Featurizers
ODBC/JDBC
Looking forward
Data Science lifecycle
•Primary stages:
Lifecycle
TDSP objective
Integrate DevOps with data science workflows to improve collaboration,
quality, robustness and efficiency in data science projects
o Infrastructure as Code (IaC)
o Building
o Testing
o CI / CD
o …
o App performance monitoring
TDSP documentation: https://aka.ms/tdsp
Using TDSP within Azure Machine Learning
Questions?
sbaydach@microsoft.com
@sbaidachni

More Related Content

Sergii Baidachnyi ITEM 2018

  • 1. Azure Machine Learning for Data Scientists Sergii Baidachnyi Principal Software Engineer Microsoft sbaydach@microsoft.com @sbaidachni
  • 3. Offering Platform for emerging data scientists to graphically build and deploy experiments Key Value Props • Rapid experiment composition • > 100 easily configured modules for data prep, training, evaluation • Extensibility through R & Python • Serverless training and deployment Numbers • 100’s of thousands of deployed models serving billions of requests Azure Machine Learning Studio
  • 5. Infrastructure Can Get in Your Way Clusters • Provision GPUs • Install drivers and software • Interactive use Scheduling • Queue work • Prioritize jobs • Start MPI • Monitor • Handle failures Data • Scale access to training data • Output logs & models • Secure & compliant Cost • Scale up and down • Share reserved instances • Low priority Workflow • Choose efficient hardware • Tooling integration • Laptop to cloud
  • 6. • Managed Service • Supports Role Based Access Control • Run any toolkit (CNTK, Tensorflow, Caffee/Caffee2, Chainer, Keras, …) • Run experiments in Parallel • Run in Containers or directly on VM • Support various Shared File Systems • Load based automatic scaling • Only Storage and compute cost. Service is free Azure Batch AI Service
  • 7. Azure DataBricks Databricks Spark as a managed service on Azure
  • 8. CONTROL EASE OF USE Azure Data Lake Store Azure Storage Any Hadoop technology, any distribution Workload optimized, managed clusters Data Engineering in a Job-as-a-service model Azure Marketplace HDP | CDH | MapR Azure Data Lake Analytics IaaS Clusters Managed Clusters Big Data as-a-service Azure HDInsight Frictionless & Optimized Spark clusters Azure Databricks BIGDATA STORAGE BIGDATA ANALYTICS ReducedAdministration IaaS and PaaS Big Data Analytics
  • 10. Optimized Databricks Runtime Engine DATABRICKS I/O SERVERLESS Collaborative Workspace Cloud storage Data warehouses Hadoop storage IoT / streaming data Rest APIs Machine learning models BI tools Data exports Data warehouses Azure Databricks Enhance Productivity Deploy Production Jobs & Workflows APACHE SPARK MULTI-STAGE PIPELINES DATA ENGINEER JOB SCHEDULER NOTIFICATION & LOGS DATA SCIENTIST BUSINESS ANALYST Build on secure & trusted cloud Scale without limits Azure Databricks
  • 11. Azure Databricks Cluster Architecture Azure DB for PostgreSQL Webapp Azure Compute Cluster Manager Databricks’ Azure Account User’s Azure Account Azure Compute Spark Driver Azure Compute Spark Worker Azure Compute Spark Worker Jobs FileSystem Service Spark History Server Log Daemon Log Daemon
  • 12. Azure Databricks Core Artifacts Azure Databricks
  • 14. Apps + insights Social LOB Graph IoT Image CRM INGEST STORE PREP & TRAIN MODEL & SERVE Data orchestration and monitoring Data lake and storage Hadoop/Spark/SQL and ML . IoT Azure Machine Learning The AI Development lifecycle
  • 15. Local machine Scale up to DSVM Scale out with Spark on HDInsight Azure Batch AI (Coming Soon) ML Server (Coming Soon) Experiment Anywhere A ZURE ML EXPERIMENTATION Command line tools IDEs Notebooks in Workbench VS Code Tools for AI
  • 18. DOCKER Single node deployment (cloud/on-prem) Azure Container Service Azure IoT Edge Microsoft ML Server Spark clusters SQL Server (Coming Soon) Deploy Everywhere A ZURE ML MODEL MANAGEMENT
  • 21. R Server Overview • Enhances upon open source R to scale to big data • Embraces combined open source and commercial innovations • Allows customers to get the support they trust • Microsoft innovations: • RevoScaleR • Parallelized, distributed algorithms • Microsoft Machine learning • Fast and Deep learning • Pretrained models • Custom parallel frameworks
  • 22. ML Services Version 9.2 at a glance Platforms & Data Tools Languages Algorithms Data Sources Rattle Mrsdeploy RESTful API deployment Real-Time Scoring Visualization Tool Integration .csv Microsoft .XDF In-database deployment Operationalization Distributed Parallelized Algorithms: •RevoScaleR and RevoScalePy libraries •MicrosoftML library •Custom parallelization frameworks Open source R algorithms & visualizations: •CRAN •bioconductor Plus: •Deep Learning •Pretrained Models •Prebuilt Featurizers ODBC/JDBC
  • 25. TDSP objective Integrate DevOps with data science workflows to improve collaboration, quality, robustness and efficiency in data science projects o Infrastructure as Code (IaC) o Building o Testing o CI / CD o … o App performance monitoring
  • 27. Using TDSP within Azure Machine Learning