Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
SlideShare a Scribd company logo
• Please mute your phone and turn off your video. There are over eighty people who don’t want to see or
hear you chewing.
• If you have any suggestions for future topics that you would like this group to cover, please send them to
Scott Shaw using Webex’s chat feature.
• We will send out the presentation deck after the meeting. Look for an announcement in the Meetup link
for this meeting.
• If you have questions during the presentation, also send them to Scott Shaw using Webex’s chat feature.
We will get to as many questions as we can.
Before we begin…
Operationalizing Data Science
Adam Doyle
Daugherty Business Solutions
April 1, 2020
Thanks for coming out everyone!
April Fools!
Operationalizing Data Science
Adam Doyle
Daugherty Business Solutions
April 1, 2020
Begin with the end in mind.
Pause in the middle to make sure that you can get to where you are
going.
• What is the business
intention that you are
trying to achieve?
• Minimize Cost
• Maximize Return
• Minimize Risk
• Realize Opportunity
• Engage Stakeholders
• POC vs Production
ready and valued
product
Identify your thesis.
Goal vs. intention
SMART goal
Refine to question that can be
answered with data science
Data science – predict,
explain, evaluate
Decision science –
combination of data science
and data engineering
Acquire data.
Third-Party
Data
Internal API Streaming
General – Amount, Access,
Quality, Labeled?
Third Party
o Assess Data Quality (Value
Range, Adherence,
Representative)
o Data Format (Automatic vs
hand-generated, Similar data
from different partners are
vastly different)
o Governed (Use appropriate
– avoid reidentification, TTL,
Contractuals, Track access,
renewals)
Internal
API (Data size limits,
unreliability, costs)
Streaming (CDC, Device Data,
Standardized?)
Explore the data.
Data Exploration
Statistical
Relationships and Correlations
Profiling
Textual – Word, Stop Words,
Bigram, Trigram
Clustering
Check in with SME
Every block of stone has a statue inside it, and it is the
task of the sculptor to discover it.
Cleanse data.
Data profiling
Deduplication
Outliers
Filter
Imputation
Source Corrections
Data shaping
Sort
Project
Enrichment
Create the model and features.
Type of Models
(Supervised,
Unsupervised,
Reinforcement Learning,
Neural Networks)
Feature Engineering
(Transformations and
Aggregations)
Encode Indicator Variables
Binning/Bucketing
Sparse Classes
Interaction Features
Extract Elements (eg.
Time)
Normalization
Feature Selection
Testing your features
Testing your model
Check in with SME
Check in with Business
Does what you’ve created
address the concerns of
the business?
Batch vs. Real-time?
Batch Training vs
Real-time for
- Training
- Evaluation
Evaluate the model.
Accuracy
Precision
Recall
MSE
Alignment to
Business
Deploy the model.
Automation
Scaling
SLAs
Versioning
Data Pipelines
Ongoing Data Acquisition
Ongoing Data Cleaning
Ongoing Feature Encoding
Integration in application
Monitor the model.
Drift
Degrading the model
Predictions and their
effects
Optimize the model.
Feature Optimization
Retraining
Remodeling
Conclusion
What does it mean to be
done?
Explanation as a Result
Questions?
• https://www.dataengineeringpodcast.com/
• https://dataengweekly.com/
• https://www.logicalclocks.com/blog/feature-store-the-missing-data-
layer-in-ml-pipelines
• https://www.imperva.com/blog/deployment-isnt-the-final-step-
monitoring-machine-learning-models-in-production/
Links

More Related Content

Operationalizing Data Science St. Louis Big Data IDEA

  • 1. • Please mute your phone and turn off your video. There are over eighty people who don’t want to see or hear you chewing. • If you have any suggestions for future topics that you would like this group to cover, please send them to Scott Shaw using Webex’s chat feature. • We will send out the presentation deck after the meeting. Look for an announcement in the Meetup link for this meeting. • If you have questions during the presentation, also send them to Scott Shaw using Webex’s chat feature. We will get to as many questions as we can. Before we begin…
  • 2. Operationalizing Data Science Adam Doyle Daugherty Business Solutions April 1, 2020
  • 3. Thanks for coming out everyone!
  • 5. Operationalizing Data Science Adam Doyle Daugherty Business Solutions April 1, 2020
  • 6. Begin with the end in mind. Pause in the middle to make sure that you can get to where you are going. • What is the business intention that you are trying to achieve? • Minimize Cost • Maximize Return • Minimize Risk • Realize Opportunity • Engage Stakeholders • POC vs Production ready and valued product
  • 7. Identify your thesis. Goal vs. intention SMART goal Refine to question that can be answered with data science Data science – predict, explain, evaluate Decision science – combination of data science and data engineering
  • 8. Acquire data. Third-Party Data Internal API Streaming General – Amount, Access, Quality, Labeled? Third Party o Assess Data Quality (Value Range, Adherence, Representative) o Data Format (Automatic vs hand-generated, Similar data from different partners are vastly different) o Governed (Use appropriate – avoid reidentification, TTL, Contractuals, Track access, renewals) Internal API (Data size limits, unreliability, costs) Streaming (CDC, Device Data, Standardized?)
  • 9. Explore the data. Data Exploration Statistical Relationships and Correlations Profiling Textual – Word, Stop Words, Bigram, Trigram Clustering Check in with SME
  • 10. Every block of stone has a statue inside it, and it is the task of the sculptor to discover it. Cleanse data. Data profiling Deduplication Outliers Filter Imputation Source Corrections Data shaping Sort Project Enrichment
  • 11. Create the model and features. Type of Models (Supervised, Unsupervised, Reinforcement Learning, Neural Networks) Feature Engineering (Transformations and Aggregations) Encode Indicator Variables Binning/Bucketing Sparse Classes Interaction Features Extract Elements (eg. Time) Normalization Feature Selection Testing your features Testing your model Check in with SME
  • 12. Check in with Business Does what you’ve created address the concerns of the business?
  • 13. Batch vs. Real-time? Batch Training vs Real-time for - Training - Evaluation
  • 15. Deploy the model. Automation Scaling SLAs Versioning Data Pipelines Ongoing Data Acquisition Ongoing Data Cleaning Ongoing Feature Encoding Integration in application
  • 16. Monitor the model. Drift Degrading the model Predictions and their effects
  • 17. Optimize the model. Feature Optimization Retraining Remodeling
  • 18. Conclusion What does it mean to be done? Explanation as a Result
  • 20. • https://www.dataengineeringpodcast.com/ • https://dataengweekly.com/ • https://www.logicalclocks.com/blog/feature-store-the-missing-data- layer-in-ml-pipelines • https://www.imperva.com/blog/deployment-isnt-the-final-step- monitoring-machine-learning-models-in-production/ Links

Editor's Notes

  1. Welcome. Introduction.
  2. Welcome. Introduction.
  3. What is the business intention that you are trying to achieve? Minimize Cost Maximize Return Minimize Risk Realize Opportunity Engage Stakeholders POC vs Production ready and valued product
  4. Decision science SMART goal Goal vs. intention Refine to question Data science – predict, explain, evaluate
  5. General – Amount, Access, Quality, Labeled? Third Party o   Assess Data Quality (Value Range, Adherence, Representative) o   Data Format (Automatic vs hand-generated, Similar data from different partners are vastly different) o   Governed (Use appropriate – avoid reidentification, TTL, Contractuals, Track access, renewals) Internal API (Data size limits, unreliability, costs) Streaming (CDC, Device Data, Standardized?)
  6. Data Exploration Statistical Relationships and Correlations Profiling Textual – Word, Stop Words, Bigram, Trigram Clustering Check in with SME
  7. Data profiling Deduplication Outliers Filter Imputation Source Corrections Data shaping Sort Project Enrichment
  8. Type of Models (Supervised, Unsupervised, Reinforcement Learning, Neural Networks) Feature Engineering (Transformations and Aggregations) Encode Indicator Variables Binning/Bucketing Sparse Classes Interaction Features Extract Elements (eg. Time) Normalization Feature Selection Testing your features Testing your model Check in with SME
  9. Check in with Business Does what you’ve created address the concerns of the business?
  10. Batch Training vs Real-time Training Batch Evaluation vs Real-time Evaluation
  11. Truth Matrix Mean Square Error Evaluation time
  12. Automation Scaling SLAs Versioning Data Pipelines Ongoing Data Acquisition Ongoing Data Cleaning Ongoing Feature Encoding Integration in application
  13. Drift Degrading the model Predictions and their effects
  14. Feature Optimization Retraining Remodeling
  15. What does it mean to be done? Explanation as a Result