Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
SlideShare a Scribd company logo
Streamlining
Data Science
Workflows
with a Feature Catalog
Roel Bertens
Roel Bertens
Principal Data Scientist
What is the problem?
The Challenges
of Custom
Model Pipelines
The risk of different defintions
Do you recognize this?
Solution?
Feature Catalog
The Solution
to Organized and Efficient
Feature Computation
A way to structure and centralize your feature logic code.
Preferably with these goals in mind:
• User-friendly (easy to extend and to use)
• Group / reuse logic
• Balance flexibility and speed
• Autogenerate docs and diagrams
My definition
Feature Catalog
The Solution
to Organized and Efficient
Feature Computation
Benefits?
Benefits of a Feature Catalog
Single source of truth
Iteration speed
Efficient computation
Quality
Collaboration
Re-usable documentation
Consistency PoC and PROD
What is the difference with a Feature Store?
Feature Catalog
vs
Feature Store
vs
Feature Platform
Source: https://huyenchip.com/2023/01/08/self-serve-feature-platforms.html
Without … … and with Feature Store
Do you need a Feature Store?
A Feature Store is the possible next step
Easy to integrate on any platform
Features computed on demand (slow)
Only compute what is required (cheap)
Single use (no caching by the catalog itself)
Feature Catalog
Requires a more complex architecture
Features precomputed (quick)
Compute everything (expensive)
Multiple use (cheap)
Feature Store
How does can a Feature Catalog look?
Kickstart your Feature Catalog with this template
Simple to use.
Define features once and
use them on multiple
aggregation levels.
Feature groups can builld
on top of each other
without redefining or
recomputing.
Don’t worry about loading
all necessary tables, that
is done for you.
Only specify the feature
names of interest.
https://xebia.ai/catalog-code
How does it compare to … ?
Feature Catalog template: https://xebia.ai/catalog-code
An example of how to structure your feature catalog using spark.
flexible
only a starting point (you still need to do the work)
Featuretools: https://github.com/alteryx/featuretools
A python library for automated feature engineering.
lot of functionality out of the box
no complex features (will only fit limited set of use cases)
dbt Semantic Layer: https://www.getdbt.com/product/semantic-layer
Designed for core business metrics where consistency and precision are of key importance.
lot of functionality out of the box
focus on metrics not features
There are different tools out there, what to use?
Blog: https://xebia.ai/catalog
Summary
Avoid
confusion and
duplication
Create your
own
Feature
Catalog
Increase
collaboration
and quality
Launch
experiments
and models
faster
Github: https://xebia.ai/catalog-code
Disclaimer
Whilst every care has been taken by Xebia to ensure that the information contained in this document is correct
and complete, it is possible that this is not the case. Xebia provides the information "as is", without any warranty
for its soundness, suitability for a different purpose or otherwise. Xebia is not liable for any damage which has
occurred or may occur as a result of or in any respect related to the use of this information. Xebia may change
or terminate this document at any time without further notice and shall not be responsible for any consequence(s)
arising there from. Subject to this disclaimer, Xebia is not responsible for any contributions by
third parties to this information.
Copyright Notice
Copyright © Xebia Nederland B.V., Laapersveld 27, 1213 VB, Hilversum, The Netherlands. All rights reserved.
Xebia® is a registered trademark of Xebia Holding B.V. internationally. All other company references
may be trademarks and/or service marks of their respective owners.

More Related Content

Streamlining Data Science Workflows with a Feature Catalog

  • 1. Streamlining Data Science Workflows with a Feature Catalog Roel Bertens
  • 3. What is the problem?
  • 4. The Challenges of Custom Model Pipelines The risk of different defintions
  • 7. Feature Catalog The Solution to Organized and Efficient Feature Computation A way to structure and centralize your feature logic code. Preferably with these goals in mind: • User-friendly (easy to extend and to use) • Group / reuse logic • Balance flexibility and speed • Autogenerate docs and diagrams My definition
  • 8. Feature Catalog The Solution to Organized and Efficient Feature Computation
  • 10. Benefits of a Feature Catalog Single source of truth Iteration speed Efficient computation Quality Collaboration Re-usable documentation Consistency PoC and PROD
  • 11. What is the difference with a Feature Store?
  • 12. Feature Catalog vs Feature Store vs Feature Platform Source: https://huyenchip.com/2023/01/08/self-serve-feature-platforms.html
  • 13. Without … … and with Feature Store
  • 14. Do you need a Feature Store?
  • 15. A Feature Store is the possible next step Easy to integrate on any platform Features computed on demand (slow) Only compute what is required (cheap) Single use (no caching by the catalog itself) Feature Catalog Requires a more complex architecture Features precomputed (quick) Compute everything (expensive) Multiple use (cheap) Feature Store
  • 16. How does can a Feature Catalog look?
  • 17. Kickstart your Feature Catalog with this template Simple to use. Define features once and use them on multiple aggregation levels. Feature groups can builld on top of each other without redefining or recomputing. Don’t worry about loading all necessary tables, that is done for you. Only specify the feature names of interest. https://xebia.ai/catalog-code
  • 18. How does it compare to … ?
  • 19. Feature Catalog template: https://xebia.ai/catalog-code An example of how to structure your feature catalog using spark. flexible only a starting point (you still need to do the work) Featuretools: https://github.com/alteryx/featuretools A python library for automated feature engineering. lot of functionality out of the box no complex features (will only fit limited set of use cases) dbt Semantic Layer: https://www.getdbt.com/product/semantic-layer Designed for core business metrics where consistency and precision are of key importance. lot of functionality out of the box focus on metrics not features There are different tools out there, what to use?
  • 20. Blog: https://xebia.ai/catalog Summary Avoid confusion and duplication Create your own Feature Catalog Increase collaboration and quality Launch experiments and models faster Github: https://xebia.ai/catalog-code
  • 21. Disclaimer Whilst every care has been taken by Xebia to ensure that the information contained in this document is correct and complete, it is possible that this is not the case. Xebia provides the information "as is", without any warranty for its soundness, suitability for a different purpose or otherwise. Xebia is not liable for any damage which has occurred or may occur as a result of or in any respect related to the use of this information. Xebia may change or terminate this document at any time without further notice and shall not be responsible for any consequence(s) arising there from. Subject to this disclaimer, Xebia is not responsible for any contributions by third parties to this information. Copyright Notice Copyright © Xebia Nederland B.V., Laapersveld 27, 1213 VB, Hilversum, The Netherlands. All rights reserved. Xebia® is a registered trademark of Xebia Holding B.V. internationally. All other company references may be trademarks and/or service marks of their respective owners.