A practical guidance of the enterprise machine learning

A Practical Guidance to the Enterprise
Machine Learning Platform Ecosystem

About Us
• Helping great companies become great software companies
• Building software solutions powered by disruptive enterprise software trends
-Machine learning and data science
-Cyber-security
-Enterprise IOT
-Powered by Cloud and Mobile
• Bringing innovation from startups and academic institutions to the enterprise
• Award winning agencies: Inc 500, American Business Awards, International Business Awards

About This Webinar
• Research that brings together big enterprise software trends,
exciting startups and academic research
• Best practices based on real world implementation experience
• No sales pitches

• Cloud vs. On-Premise machine learning
• Cloud machine learning platforms
• Azure machine learning
• AWS machine learning
• Databricks
• Watson developer cloud
• Others…
• On-premise machine learning platforms
• Revolution analytics
• Dato
• Spark Mlib
• TensorFlow
• Others…
Agenda

Modern Machine Learning
• Advances in storage, compute and data science research are
making machine learning as part of mainstream technology
platforms
• Big data movement
• Machine learning platforms are optimized with developer-friendly
interfaces
• Platform as a service providers have drastically lowered the
entry point for machine learning applications
• R and Python are leading the charge

Cloud vs. On-Premise
machine learning platforms

Cloud Machine Learning Platforms: Benefits
• Service abstraction layer over the machine learning infrastructure
• Rich visual modeling tools
• Rich monitoring and tracking interfaces
• Combine multiple platforms: R, Python, etc
• Enable programmatic access to ML models

Cloud machine Learning Platforms:: Challenges
• Integration with on-premise data stores
• Extensibility
• Security and privacy

On-Premise machine Learning Platforms: Benefits
• Control
• Security
• Integration with on-premise data stores
• Integrated with R and Python machine learning frameworks

On-Premise machine Learning Platforms: Challenges
• Code-based modeling interfaces
• Scalability
• Tightly coupled with Hadoop distributions
• Monitoring and management
• Data quality and curation

Cloud Machine Learning Platforms

• Azure Machine Learning
• AWS machine learning
• Databricks
• Watson developer cloud
The Leaders

Azure Machine Learning
• Native machine learning capabilities as part of the Azure cloud
• Elastic infrastructure that scale based on the model requirements
• Support over 30 supervised and unsupervised machine learning
algorithms
• Integration with R and Python machine learning libraries
• Expose machine learning models via programmable interfaces
• Integrated with the Cortana Analytics suite
• Integrated with PowerBI

• Supports both supervised and
unsupervised models
• Integrated with Azure HDInsight
• Large library of models and sample
gallery
• Support for R and Python code
Visual Model Creation

• Visual dashboard to track the
execution of ML models
• Track execution of different steps
within a ML model
• Integrated monitoring experience
with other Azure services
Rich Monitoring and Management Interface

• Expose machine learning models as
Web Services APIs
• Integrate ML Models with Azure API
Gateway
• Retrain and extend models via ML
APIs
Programmatic Access to ML Models

AWS Machine Learning
• Native machine learning service in AWS
• Provide data exploration and visualization tools
• Supports supervised and unsupervised algorithms
• Integrated data transformation models
• APIs for dynamically creating machine learning models

• Programmatic creation of machine
learning models
• Large number of algorithms and recipes
• Data transformation models included in
the language
Sophisticated ML Model Authoring

• Sophisticated monitoring for
evaluating ML models
• Integrated with AWS Cloud Watch
• KPIs that evaluate the efficiency of
ML models
Monitoring ML Model Execution

• Optimized DSL for data
transformation
• Recipes that abstract common
transformations
• Reuse transformation recipes
across ML models
Embedded Data Transformation

Databricks Machine Learning
• Scaling Spark machine learning pipelines
• Integrated data visualization tools
• Sophisticated ML monitoring tools
• Combine Python, Scala and R in a single platform

• Implementing machine learning
models using Notebooks
• Publishing notebooks to a
centralized catalog
• Leverage Python, Scala or R to
implement machine learning models
Notebooks Based Authoring

• Integrate data visualization into
machine learning pipelines
• Reuse data visualization
notebooks across applications
• Evaluate the efficiency of
machine learning pipelines using
visualizations
Machine Learning Data Visualization

• Monitor the execution of machine
learning pipelines
• Run machine learning pipelines
manually
• Rapidly modify and deploy machine
learning pipelines
Monitoring and Management

• Personality Insights
• Tradeoff Analytics
• Relationship Extraction
• Concept Insights
• Speech to Text
• Text to Speech
• Visual Recognition
• Natural Language Classifier
• Language Identification
• Language Translation
• Question and Answer
• Concept Expansion
• Message Resonance
• AlchemyAPI Services
Large Variety of Cognitive Services

• Access services via REST APIs
• SDKs available for different
languages
• Integration with different
services in the BlueMix
platform
Rich Developer Interfaces

Relationship Extraction Concept Expansion Message Resonance
User Modeling
Complex Algorithms – Simple Interfaces

Other Interesting Platforms
• Microsoft’s Project Oxford https://www.projectoxford.ai/
• BigML https://bigml.com/

On-premise machine
learning platforms

The Leaders
• Revolution Analytics (Microsoft)
• Spark Mlib + Spark R
• Dato
• TensorFlow
• Others: PredictionIO, Scikit-learn…

All of Open Source R plus:
• Big Data scalability
• High-performance analytics
• Development and deployment tools
• Data source connectivity
• Application integration framework
• Multi-platform architecture
• Support, Training and Services
Revolution Analytics (Microsoft)

DistributedR
ScaleR
ConnectR
DeployR
In the Cloud Amazon AWS
Workstations & Servers Windows
Red Hat and SUSE Linux
Clustered Systems IBM Platform LSF
Microsoft HPC
EDW IBM Netezza
Teradata
Hadoop Hortonworks
Cloudera
Write Once, Deploy Anywhere

DeployR does not provide any application UI.
3 integration modes embed real-time R results
into existing interfaces
Web app, mobile app, desktop app, BI tool,
Excel, …
RBroker Framework :
Simple, high-performance API for Java, .NET
and Javascript apps Supports transactional,
on-demand analytics on a stateless R session
Client Libraries:
Flexible control of R services from Java,
.NET and Javascript apps Also supports
stateful R integrations (e.g. complex GUIs)
DeployR Web Services API:
Integrate R using almost any client languages
Integrate R Scripts Into Third Party Applications

• It is built on Apache Spark, a fast and
general engine for large-scale data
processing
• Run programs up to 100x faster than Hadoop
MapReduce in memory, or 10x faster on disk.
• Write applications quickly in Java, Scala,
or Python.
Spark Mlib

• Integrated with Spark SQL for data
queries and transformations
• Integrated with Spark GraphX for
data visualizations
• Integrated with Spark Streaming for
real time data processing
Beyond Machine Learning

• Run R and machine learning models
using the same infrastructure
• Leverage R scripts from Spark Mlib
models
• Scale R models as part of a Spark
cluster
• Execute R models programmatically
using Java APIs
Spark Mlib + SparkR

• Makes Python machine learning
enterprise – ready
• Graphlab Create
• Dato Distributed
• Dato Predictive Services
Dato

Principles:
• Get started fast
• Rapidly iterate
• Combine for new apps
import graphlab as gl
data = gl.SFrame.read_csv('my_data.csv')
model = gl.recommender.create(data,
user_id='user',
item_id='moviez
target='rating')
recommendations = model.recommend(k=5)
Recommender Image search Sentiment Analysis
Data Matching Auto Tagging Churn Predictor
Click Prediction Product Sentiment Object Detector
Search Ranking Summarization …
Sophisticated ML made easy - Toolkits

• Powers deep learning capabilities on dozens
of Google’s products
• Interfaces for modeling machine and deep
learning algorithms
• Platform for executing those algorithms
• Scales from mobile devices to a cluster with
thousands of nodes
• Has become one of the most popular projects
in Guthub in less than a week
Google’s Tensor Flow

• Based on the principle of a dataflow
graph
• Nodes can perform data operations
but also send or receive data
• Python and C++ libraries. NodeJS, Go
and others in the pipeline
Tensorflow Programming Model
cross_entropy = -tf.reduce_sum(y_*tf.log(y_conv))
train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)
correct_prediction = tf.equal(tf.argmax(y_conv,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
sess.run(tf.initialize_all_variables())
for i in range(20000):
batch = mnist.train.next_batch(50)
if i%100 == 0:
train_accuracy = accuracy.eval(feed_dict={
x:batch[0], y_: batch[1], keep_prob: 1.0})
print "step %d, training accuracy %g"%(i, train_accuracy)
train_step.run(feed_dict={x: batch[0], y_: batch[1], keep_prob: 0.5})
print "test accuracy %g"%accuracy.eval(feed_dict={
x: mnist.test.images, y_: mnist.test.labels, keep_prob: 1.0})

• Scales from a single device to a large
cluster of nodes
• Tensorflow uses a placement algorithm
based on heuristics to place tasks on
the different nodes in a graph
• The execution engine assigns tasks for
fault tolerance
• Linear scalability model
Tensor Flow Implementation

• TensorFlow includes an engine that
enables the visual representation of
the execution graph
• Visualizations include summary
statistics of the different states of
the model
• The visualization engine is included
in the current open source release
Tensor Flow Graph Visualization

Other Interesting Projects
• H20.ai
• PredictionIO
• Scikit-Learn
• Microsoft’s DMTK

Machine Learning in the Enterprise

•Enable foundational building blocks
-Data quality
-Data discovery
-Functional and integration testing
•Predictions are tempting but classification and clustering are
easier
•Run multiple models at once
•Enable programmatic interfaces to interact with ML models
•Start small, deliver quickly, iterate…
Machine Learning in the Enterprise

•Machine learning is becoming one of the most important elements of
modern enterprise solutions
•Innovation in machine learning is happening in both the on-premise
and cloud space
•Cloud machine learning innovators include: Azure ML, AWS ML,
Databricks and IBM Watson
•On-premise machine learning innovators include: Spark Mlib,
Microsoft’s Revolution R, Dato, TensorFlow
•Enterprise machine learning solutions should include elements such
as data quality, data governance, etc
•Start small and use real use cases
Summary

Thanks
jesus.rodriguez@tellago.com
https://twitter.com/jrdothoughts
http://jrodthoughts.com/
https://medium.com/@jrodthoughts

• Extensions to SciPy (Scientific Python) are called SciKits. SciKit-Learn
provides machine learning algorithms.
• Algorithms for supervised & unsupervised learning
• Built on SciPy and Numpy
• Standard Python API interface
• Sits on top of c libraries, LAPACK, LibSVM, and Cython
• Open Source: BSD License (part of Linux)
• Probably the best general ML framework out there.
Scikit-Learn

Load &
Transform Data
Raw Data
Feature
Extraction
Build Model
Feature
Evaluation
Very Simple Prediction Model
Evaluate
Model

Assess how model will generalize to independent data set (e.g.
data not in the training set).
1. Divide data into training and test splits
2. Fit model on training, predict on test
3. Determine accuracy, precision and recall
4. Repeat k times with different splits then average as F1
Predicted Class A Predicted Class B
Actual A True A False B #A
Actual B False A True B #B
#P(A) #P(B) total
Simple Programming Model-Cross Validation (classification)

How to evaluate clusters? Visualization (but only in 2D)
Data Visualization

• Developer friendly machine learning platform
• Completely open source
• Based on Apache Spark
PredictionIO

• PredictionIO platform
A machine learning stack for building, evaluating
and deploying engines with machine learning
algorithms.
• Event Server
An open source machine learning analytics layer for
unifying events from multiple platforms
• Template Gallery
engine templates for different type of machine
learning applications
A Simple Architecture

• Execute models asynchronous via event
interface
• Query data programmatically via REST
interface
• Various SDKs provided as part of the platform
Model Execution

• Visual model for model creation
• Integrated with a template gallery
• Ability to test and valite engines
Rich Model Creation Interface

A practical guidance of the enterprise machine learning

More Related Content

What's hot

What's hot (20)

Similar to A practical guidance of the enterprise machine learning

Similar to A practical guidance of the enterprise machine learning (20)

More from Jesus Rodriguez

More from Jesus Rodriguez (20)

Recently uploaded

Recently uploaded (20)

A practical guidance of the enterprise machine learning