Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
SlideShare a Scribd company logo
Confidential and Proprietary to Daugherty Business Solutions
Feature Store Overview
Adam Doyle
St. Louis Big Data IDEA
August 2020
Confidential and Proprietary to Daugherty Business Solutions
The Data Science Process
Confidential and Proprietary to Daugherty Business Solutions
“A feature is an individual measurable property or characteristic of a phenomenon being observed… Feature data is
used both as input to models during training and when models are served in production.”
Key takeaways
• Features are not data
• Features enumerate information
• Not all features are equal
Features
https://docs.feast.dev/user-guide/features
Confidential and Proprietary to Daugherty Business Solutions
Feature Engineering is the process of extracting features from raw data.
Feature Engineering Techniques
• Imputation
• Handling Outliers
• Binning
• Numerical Transform
• One-Hot Encoding
• Grouping
• Extraction
• Scaling
Feature Engineering
Confidential and Proprietary to Daugherty Business Solutions
• Feature Reuse Between Models
• Consistent Feature Definitions
• Latency / Recency
• Environmental Variation
• Unstable Dependencies
• Governance
• Versioning
Feature Challenges
Confidential and Proprietary to Daugherty Business Solutions
Feature Store
API
Metadata /
Model /
Predictions
Offline
Data Store
Online
Data Store
Batch Engine
Stream Engine
Batch Prediction
Stream Prediction
Confidential and Proprietary to Daugherty Business Solutions
• Retrieve Feature Metadata
• Retrieve Feature Values
• Remove Features
• Store Features
• Stream Store Features
• Stream Retrieve Features
• Feature Versioning
• Model Versioning
• Record Predictions
Feature Store Use Cases
Confidential and Proprietary to Daugherty Business Solutions
• Data engineers interact with a feature store by creating
data pipeline definitions.
• Data pipeline definitions combine
– Data Sources
– Business definitions
– Transformation rule
– Streaming/Batch definitions
– Scheduling
• Data pipelines are executed by the feature store engines
and stored in online and offline data stores.
Data Pipeline
Confidential and Proprietary to Daugherty Business Solutions
• Data scientists interact with the feature store through the Feature Registry.
• They can search for and browse feature definitions.
• They can register data science models as a class of data pipeline.
Feature Registry
Confidential and Proprietary to Daugherty Business Solutions
• Feature stores can assist with versioning and monitoring data
science applications.
• Predictions are recorded in the feature store API including
source data, model used, version of that model, and the
rendered prediction.
• Predictions can be compared with reality to determine the
accuracy of the models.
• Models and versions are tracked and can be used to determine
the lift provided by a particular instance of a model.
Versioning and Monitoring
Confidential and Proprietary to Daugherty Business Solutions
• Open Source
– GoJEK/Google FEAST
• Product Offerings
– Logical Clocks Hopsworks
– Scribble Enrich
• Presentations Only
– Uber Michaelangelo
– Airbnb Zipline
– Survey Monkey ML Feature Store
– Netflix MetaFlow
Feature Store Implementations
Confidential and Proprietary to Daugherty Business Solutions
• http://featurestore.org/
• https://www.scribbledata.io/resources-feature-store-guide
• https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems.pdf
• https://towardsdatascience.com/feature-stores-components-of-a-data-science-factory-f0f1f73d39b8
• https://towardsdatascience.com/what-are-feature-stores-and-why-are-they-critical-for-scaling-data-science-
3f9156f7ab4
• https://www.logicalclocks.com/hopsworks-featurestore
• https://eng.uber.com/michelangelo-machine-learning-platform/
• https://technology.condenast.com/story/accelerating-machine-learning-with-the-feature-store-service
• https://cloud.google.com/blog/products/ai-machine-learning/introducing-feast-an-open-source-feature-store-for-
machine-learning
• https://databricks.com/session/zipline-airbnbs-machine-learning-data-management-platform
• https://engineering.linkedin.com/blog/2017/06/building-the-activity-graph--part-i
• https://databricks.com/session/fact-store-scale-for-netflix-recommendations
• https://medium.com/@changshe/rethinking-feature-stores-74963c2596f0
Links

More Related Content

Feature store Overview St. Louis Big Data IDEA Meetup aug 2020

  • 1. Confidential and Proprietary to Daugherty Business Solutions Feature Store Overview Adam Doyle St. Louis Big Data IDEA August 2020
  • 2. Confidential and Proprietary to Daugherty Business Solutions The Data Science Process
  • 3. Confidential and Proprietary to Daugherty Business Solutions “A feature is an individual measurable property or characteristic of a phenomenon being observed… Feature data is used both as input to models during training and when models are served in production.” Key takeaways • Features are not data • Features enumerate information • Not all features are equal Features https://docs.feast.dev/user-guide/features
  • 4. Confidential and Proprietary to Daugherty Business Solutions Feature Engineering is the process of extracting features from raw data. Feature Engineering Techniques • Imputation • Handling Outliers • Binning • Numerical Transform • One-Hot Encoding • Grouping • Extraction • Scaling Feature Engineering
  • 5. Confidential and Proprietary to Daugherty Business Solutions • Feature Reuse Between Models • Consistent Feature Definitions • Latency / Recency • Environmental Variation • Unstable Dependencies • Governance • Versioning Feature Challenges
  • 6. Confidential and Proprietary to Daugherty Business Solutions Feature Store API Metadata / Model / Predictions Offline Data Store Online Data Store Batch Engine Stream Engine Batch Prediction Stream Prediction
  • 7. Confidential and Proprietary to Daugherty Business Solutions • Retrieve Feature Metadata • Retrieve Feature Values • Remove Features • Store Features • Stream Store Features • Stream Retrieve Features • Feature Versioning • Model Versioning • Record Predictions Feature Store Use Cases
  • 8. Confidential and Proprietary to Daugherty Business Solutions • Data engineers interact with a feature store by creating data pipeline definitions. • Data pipeline definitions combine – Data Sources – Business definitions – Transformation rule – Streaming/Batch definitions – Scheduling • Data pipelines are executed by the feature store engines and stored in online and offline data stores. Data Pipeline
  • 9. Confidential and Proprietary to Daugherty Business Solutions • Data scientists interact with the feature store through the Feature Registry. • They can search for and browse feature definitions. • They can register data science models as a class of data pipeline. Feature Registry
  • 10. Confidential and Proprietary to Daugherty Business Solutions • Feature stores can assist with versioning and monitoring data science applications. • Predictions are recorded in the feature store API including source data, model used, version of that model, and the rendered prediction. • Predictions can be compared with reality to determine the accuracy of the models. • Models and versions are tracked and can be used to determine the lift provided by a particular instance of a model. Versioning and Monitoring
  • 11. Confidential and Proprietary to Daugherty Business Solutions • Open Source – GoJEK/Google FEAST • Product Offerings – Logical Clocks Hopsworks – Scribble Enrich • Presentations Only – Uber Michaelangelo – Airbnb Zipline – Survey Monkey ML Feature Store – Netflix MetaFlow Feature Store Implementations
  • 12. Confidential and Proprietary to Daugherty Business Solutions • http://featurestore.org/ • https://www.scribbledata.io/resources-feature-store-guide • https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems.pdf • https://towardsdatascience.com/feature-stores-components-of-a-data-science-factory-f0f1f73d39b8 • https://towardsdatascience.com/what-are-feature-stores-and-why-are-they-critical-for-scaling-data-science- 3f9156f7ab4 • https://www.logicalclocks.com/hopsworks-featurestore • https://eng.uber.com/michelangelo-machine-learning-platform/ • https://technology.condenast.com/story/accelerating-machine-learning-with-the-feature-store-service • https://cloud.google.com/blog/products/ai-machine-learning/introducing-feast-an-open-source-feature-store-for- machine-learning • https://databricks.com/session/zipline-airbnbs-machine-learning-data-management-platform • https://engineering.linkedin.com/blog/2017/06/building-the-activity-graph--part-i • https://databricks.com/session/fact-store-scale-for-netflix-recommendations • https://medium.com/@changshe/rethinking-feature-stores-74963c2596f0 Links