Unit 1 Full Notes
Unit 1 Full Notes
Unit 1 Full Notes
3. Model Building
• The model-building step involves fitting a model to data.
• The model built here is used to solve the problem defined
in the first step.
• The choice of an appropriate model is essential to the
success of the data science process.
• Again, this choice depends on the field of application and
is enhanced by strong domain knowledge.
Data Science Process and Domain Knowledge
4. Performance Measurement
• Performance measurement is the final step in the data science process that
involves measuring how the model performs on new data or out of sample
data, which was not used while building the model.
• The choice of performance metrics and thresholds is primarily driven by
domain knowledge.
• For example, when building a model to predict credit defaults, a false negative
(predicting a potential defaulter to be in good credit) is costlier than a false
positive (predicting a non-defaulter to be a defaulter).
• Such asymmetries will be different across disciplines, and it would be hard to
detect them without domain knowledge.
• Further computing the costs from model failure can only be accurately
estimated by a person with domain knowledge.
Data Science Process and Domain Knowledge
4. Performance Measurement
Beyond the Curriculum:
Case Study for Domain
Knowledge
Domain Knowledge: Case Study: Beyond the Curriculum