0% found this document useful (0 votes)

66 views

1 - Course Slides - Data Science and ML Fundamentals

This document provides an overview of a data science and machine learning fundamentals course. It distinguishes data science from business intelligence, outlines the data science process, and describes common data science terms, roles, skills, and applications. The document also differentiates popular machine learning models and data preparation steps.

Uploaded by

عبدالرحمن هشام

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

66 views

1 - Course Slides - Data Science and ML Fundamentals

Uploaded by

عبدالرحمن هشام

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 92

Data Science & Machine Learning Fundamentals

Instructors – Data Science & Machine Learning Fundamentals

Seb studied Economics and Finance with a focus on Econometrics.

He has spent the last 15 years working across Energy, Finance and
Retail delivering analytical and automation solutions. Seb is
passionate about business intelligence and data analysis and loves
to find simple ways to explain things.

Lester has led a startup fintech data science team and has held VP
roles in quantitative analysis and strategy at Wells Fargo. He has held
various data consulting roles in other Fortune 500 companies and is
and expert in the application of machine learning techniques.

Seb Taylor Lester Leong

VP of Business Intelligence Data Science Professional

& Data Analysis & Consultant

BIDA TM - Business Intelligence & Data Analysis

Learning Objectives

Distinguish data science from Outline the data science and Describe basic data science
business intelligence machine learning process terms, roles, skills, and
applications

Distinguish popular Identify data preparation Evaluate model results

models used in data steps that will help you
science and machine explore your data, remove
learning errors, and structure it to be
easier to work with

BIDA TM - Business Intelligence & Data Analysis

?
What is Data Science?
What is Data Science

Data science is all about creating data driven insights that help us deal with uncertainty.

Which type of customer is most What type of market

likely to buy our product? regime are we entering?

When will we run out of How many ice creams

warehouse stock? should we order given the
forecast next week?

Data science can be easily confused with business intelligence (BI).

BI is generally backwards looking (descriptive).

Data science uses past observations to make predictions, estimations and decisions about the future.

BIDA TM - Business Intelligence & Data Analysis

Example Questions – BI or Data Science?

Start-up
What proportion of our crowd-sourced
investors invested $200 or less?

Bank
What proportion of our loans were issued
to at risk customers?

Store Chain
What proportion of our Q1 forecasted
sales come from the Pet Food category?

BIDA TM - Business Intelligence & Data Analysis

Example Questions – BI or Data Science?

Financial Institution
Based on past transactions, which of these
new transactions are likely fraudulent?

Manufacturing Company
Based on sensor data, when is this critical
machine component likely to wear out?

E-commerce Company
Based on sales data, which of our high-
value customers are most likely to leave?

BIDA TM - Business Intelligence & Data Analysis

Types of Analysis

Descriptive Predictive
Provides a view of the facts of who, Provides a probable state of the
where, when, how many, and what future or an unknown variable.
exactly happened?

Diagnostic Prescriptive
Provides an analysis to tell us why Provides the best course of action in
something is happening—what was order to achieve a given outcome.
the leading cause?

BIDA TM - Business Intelligence & Data Analysis

Data Science Skills

Statistics & Domain

Data
Analysis Knowledge
Analysis

Data
Science
Machine Software
Learning Dev

Computer
Science & Coding

BIDA TM - Business Intelligence & Data Analysis

Two Types of Computer Science

Statistics & Domain

Data
Analysis Analysis
Knowledge

Data
Science
Machine Software
Learning Dev

Coding for Software &

analysis Data Engineering

Computer
Science & Coding
BIDA TM - Business Intelligence & Data Analysis
The Data Science Process

Definition Scenario

Data Collection Capture information, ensure quality of data, Collect transaction data, user-names, credit
and Storage and store data into database. history. Identify past fraud.

Transform Data Optimize data for the project we’re working on Combine or manipulate datasets, filtering out
for Projects and select features of interest. items, or adjust formatting.

Statistical & Build models and algorithms that spot Train model to identify the leading
Predictive Analysis patterns in our data. indicators of fraudulent transactions.

Model Evaluation Test how well is the model performing? Which model is best at identifying fraudulent
Data Visualization Present results using data visualization. transactions? Optimize for business objectives.

Share Share dashboards and reports to business Share real time information identifying risky
Insights users for use in decision making and deploy transactions.
our models into operations.

BIDA TM - Business Intelligence & Data Analysis

Machine Learning Skills
There is some specific terminology used when discussing the Machine Learning process.

Data Collection Transform Data Statistical & Model Evaluation Share

and Storage for Projects Predictive Analysis Data Visualization Insights

Business / Domain Knowledge

1. Load & Clean Data 3. Feature Engineering Analysis & Machine Learning
Ensure clean and tidy data. Remove Manipulate input data into optimal
5. Model Evaluation & Visualization
errors. Deal with missing data points format for analysis. This may
Evaluate and compare model performance. Visualize and
include categorization, scaling, one
communicate results.
hot encoding etc.
Data
Science
2. Exploratory Data Analysis 4. Model Building
Skills
What can we learn at a Build models that can analyze data, make
glance? Explore data types or predictions or quantify uncertainty.
obvious relationships. Regression, classification etc.

Coding for data analysis

Software & Data Engineering

BIDA TM - Business Intelligence & Data Analysis

Types of Machine Learning

Supervised Machine Learning

Input Data (Features) Target Data
• More common in business to
answer pre-defined questions. Credit Default Y
Income Age
Score Loan
• Predict a target variable $56k 755 43 No
based on input data.
$38k 682 22 Yes
• Once model is trained on
$120,000 731 38 No
example data, predictions can
be made on new data. $65,000 595 54 Yes

• Ensemble models are $52,00 784 68 No

combinations of other X
models. Classification Problems Regression Problems
Which one? What category? True or false? How much? How many?

Clustering
X1 X2 X3 X4 Y

Unsupervised Machine Learning

Problems
• No specific question in mind X1 X2 X3 X4 X5

Variable • Identifies the most important features in a

• Point us in the right direction dataset.
Reduction

BIDA TM - Business Intelligence & Data Analysis

Types of Data Science Models

Supervised Machine Unsupervised Other Machine Learning Models

Learning Machine Learning

X1 X2 X3 X4 Y
Y

Reinforcement Neural Networks

X Learning & Deep Learning
Regression Clustering

X1 X2 X3 X4 X5 Other Data Science Models

?
YES NO

Classification Variable Reduction

Variations: Imputation &

Monte Carlo Rule Based Other Statistical
Time Series Regression Simulation Models Models

BIDA TM - Business Intelligence & Data Analysis

Priorities for Business Leaders

Model evaluation is the most important part of the data science process from a leadership perspective.

Data Scientists, Analysts + Business Leaders

= Successful Data Science
& Engineers Project

Solve technical challenges Provide insight on business Projects and models that
and select the best objectives, project goals deliver targeted value to
analytical approach. and business costs. the business.

Business leaders and data science teams should work closely to align priorities, objectives and measures of success.

Business leaders need a basic understanding of model outputs, and their impact on decision making.

BIDA TM - Business Intelligence & Data Analysis

Model Objectives
Why are we building this model and what outcome are we targeting?

$
Identify Fraudulent Transactions Automated Nut Filtering
Using transaction data Using laser scans of nuts

Why? Possible Objectives

- Reduce the workload on human investigators? - Are minimum standards enforced by food laws?
- Meet a regulated level of fraud detection? - Is quality dependent on crop?
- Fulfill an ambitious claim by our marketing department? - Perhaps standards are assessed versus competitors?

Introduces
False Alarms High Quality
Lower Quality
Not Fraud Poor Nuts Selection

Fraud Great Nuts

Missed Fraud
Low High High Shell Damage Low Shell Damage
Threshold of certainty

The business must quantify the cost of false alarms and the benefits of correct identification.

BIDA TM - Business Intelligence & Data Analysis

Model Limitations
No data science model can be 100% accurate.

As we chase higher accuracy, the greater the marginal cost of time and resources needed to achieve it.

How good is good enough?

Business knowledge helps leaders balance results with resources.

What is the current cost of doing this process

manually?
Allocate all resources Improve 3 processes
to a single data OR by 15%
What is the $ cost of improving this process? science project?

What is the $ cost of our data science resources?

Does the Data Science team have the resources to

complete all these projects well?

Business leaders should work closely with DS teams to ensure expectations, objectives and resources are aligned.

BIDA TM - Business Intelligence & Data Analysis

Evaluation Metrics
Success metrics help us measure how well models are meeting our expectations.

False Positives and False Negatives Regression Coefficients

Prediction
BBQ
NOT SPAM (-) SPAM (+) Sales
NOT SPAM (-)

True False
Negative Positive
Actual

Forecast
SPAM (+)

False True
Temperature
Negative Positive

Relationship: Every +1 degree = 20 BBQ Sales

Different business scenarios require a different
Fit: Does our line explain all the variation?
balance of results.

Leaders should understand the basics of model evaluation, to help challenge and discuss outcomes.

BIDA TM - Business Intelligence & Data Analysis

Priorities for Business Leaders Summary

Basic knowledge of model evaluation is essential for productive collaboration.

Model Objectives Model Limitations Evaluation Metrics

Aligned on what we are When to stop & how to Implications of predictions,

trying to achieve? distribute resources. and whether they meet our
objectives.

BIDA TM - Business Intelligence & Data Analysis

Data Science Tools
Data Science tools have each evolved for a specific purpose. Notice how a number of these tools are shared with Business Intelligence.

Data Collection Transform Data Stat & Predictive Model Evaluation Share
and Storage for Projects Analysis Data Visualization Insights

ETL & Data Transformation (SQL)

• SQL is used to extract data from databases.
• SQL allows us to query, filter and transform the data.

Coding for data analysis (Python, R)

• Python and R are the two most widely used coding languages in Data Science.
• Python is favored as a more generally applicable coding language
• R is popular for those focused on statistical analysis.
Data
Science
Tools Data Visualization (Tableau & Power BI)
• Used to visualize model outputs with clear and engaging charts.
• Simplify complex analysis into clear stories and outcomes that drive
decision making.

Software & Data Engineering (Python, Scala, Hadoop, Databricks etc.)

• Tools that allow software engineers, data engineers & machine learning engineers to turn analysis into apps websites and interfaces.
• These tools allow engineers to connect to real time data feeds that allow businesses to see predictive analysis in real time, or to implement
automated decision making.

BIDA TM - Business Intelligence & Data Analysis

Data Science Roles
Data Science roles can be highly specialized or very general. It’s always important to read the job description

Data Collection Transform Data Stat & Predictive Model Evaluation Share
and Storage for Projects Analysis Data Visualization Insights

Data Architect Data Analyst

• Creates the data strategy • Focuses on the analysis that we all are most familiar with
inc. how, where, when, • Sourcing data, formulas, data models, pivot tables, and visuals
and what data is stored • Understand the business well and search for insights in data
Data Engineer / SQL Developer Data Visualization Specialist
• One of the more technical roles in BI • Turn insights into meaningful visuals that drive action
• Ensure data quality & availability of data and queries • Understand how the business works and how people think
• Ensure that analysts have what they need to do their job • Communication skills are vital to their success
Data Database Admin (DBA)
Science • Acts as a caretaker and gatekeeper for a
Roles database
• Security, access, changes, and performance
Data Scientist (more focused on coding for analysis)
• Proficient in advanced statistical methods and coding, which is used to create analysis and predictions from data
• Their coding tends to be focused on analyzing the data itself

Machine Learning Engineer (more focused on Software and Data Engineering)

• Integrate analysis and predictive models into a real-world systems, apps or websites.
• Link models with automated data feeds that often update in real time.
• Will likely know several coding languages and have a highly technical skillset
BIDA TM - Business Intelligence & Data Analysis
What about Artificial Intelligence (AI)?

Machine learning is the process by which Artificial intelligence is when a computer

computers learn from data and make replicates human thinking or abilities.
inferences or predictions.

Machine learning is a subset of artificial Artificial intelligence is a broader subject than

intelligence. machine learning.

• Autonomous cars • Chat bots evaluate • Trading algorithms

evaluate message contents follow statistical
surroundings rules
• Reply with useful
• Make decisions and links, or tips, just • Buy and sell stocks,
take action, just like like a human just like a human
a human

BIDA TM - Business Intelligence & Data Analysis

Regression Basics
Regression – Theory & Business Objectives

The goal of regression is to assess the relationship between one or more input variables (X) and a continuous output variable (Y).

Sales (Y) Yield (Y)

Ad spend (X) Water (X)

$27,000
How sales is affected by ad spend. Crop yield verse water.

Our line of best fit allows us to make predictions about the value of the target variable in a given scenario.

BIDA TM - Business Intelligence & Data Analysis

Regression - Terminology

Predictor Variable(s) Target Variable

Input Variable(s) Output Variable
Independent Variable(s) Dependent Variable
Explanatory Variable(s) Response Variable
X (X1, X2, X3 …) Y

BIDA TM - Business Intelligence & Data Analysis

Linear Regression

Linear Regression is the simplest form of Regression Analysis.

Y is the Output (target) variable

X is the Input variable

y = mx + c M is the Coefficient value (slope)

C is the Intercept (of the Y axis)

Strong positive Weak positive Negative

Y Y Y

X X X

BIDA TM - Business Intelligence & Data Analysis

Multiple Linear Regression?

• One input variable is rarely enough to make

predictions about a target variable.
• Multiple linear regression allows us to
predict a target variable using multiple
independent variables

Target Variable (Y)

• When we have two independent variables, we
are fitting a plane to the data instead of a
straight line

𝑦𝑦� = 𝜽𝜽𝟎𝟎 + 𝜽𝜽𝟏𝟏 𝒙𝒙𝟏𝟏 + 𝜽𝜽𝟐𝟐 𝒙𝒙𝟐𝟐 + ⋯ + 𝜽𝜽𝒑𝒑 𝒙𝒙𝒑𝒑

BIDA TM - Business Intelligence & Data Analysis

The Optimal Line of Best Fit

How do we decide where exactly the line of best fit sits? Generally, we try to minimize the size of errors in our predictions.

Errors represent the amount by which the target variable is different from the value predicted.

Y Sensitive to
Other Metrics
Outliers
Sum of Squared Error (SSE) Yes

Mean Squared Error (MSE) Yes

Root Mean Squared Error (RMSE) Yes

Sensitive to
Other Metrics
Outliers
Sum Absolute Error (SAE) No
X Mean Absolute Error (MAE) No

The most common approach minimizes the squared errors, and is know as Ordinary Least Squares

BIDA TM - Business Intelligence & Data Analysis

Interpreting Coefficients

Regression coefficients help us understand the interaction between variables

Marketing Scenario

Sales

𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 = 2,000 + 1.3𝑨𝑨 + 0.11R

Where A is Ad Spend and R is Product review scores

Product Review
Scores (R)
The coefficient for Ad Spend is 80 The coefficient for Sunlight is 7
Ad Spend (A) We can say that for every unit We can say that for every unit
increase in ad spend, sales will increase in Product Review Scores,
increase by 1.3 sales will increase by 0.11

P-values help us understand if our coefficients are statistically significant.

BIDA TM - Business Intelligence & Data Analysis

Regression Metrics

Coefficient of Determination (R2) is one of the most used metrics to evaluate regression models.

R2 measures how close the data are to the fitted regression line. In other words, how much of the variability
in Y is explained by changes in X.

R2 = 1 R2 = 0.86 R2 = 0

Higher R2 indicates better fit of the model, and therefore smaller errors.

R2 can sometimes be biased, so a related measure called Adjusted R2 can also be used.

BIDA TM - Business Intelligence & Data Analysis

Training and Testing

Models need to be tested on new data, before we allow them to make real world decisions.
Available Sample Data
Approx 80% of sample data is used Approx 20% of the data is used to
to teach (train) the model. Training Data Testing Data test how well the model performs.

Y Y Y

X X X
Training Data Testing Data Real World Data
Used to teach the model what Used to test model Used to make real world
the relationship looks like performance on new data predictions and decisions

It is important that models are tested on data they have never seen before.

BIDA TM - Business Intelligence & Data Analysis

Underfitting vs. Overfitting
Summary: Overfitting and underfitting describe how well (or poorly) a model fits the training data. We can use
performance metrics on the training and testing data to test for these scenarios.

Underfit Good Fit Overfit

Over generalizes the trend Learns a relationship that its too

specific to this sample
Cause: Model too simple Bias vs Variance Tradeoff
Cause: Model too complex

Error
High Bias Error Variance

Bias is the error created by High Variance Error

oversimplification. Variance is the extent to which the
model has focused too much on the
randomness in the training data.
Bias

Model Complexity

BIDA TM - Business Intelligence & Data Analysis

Other Regression Techniques

Repeated Measure Polynomial Regression Segmented Regression Bayesian Regression

Regression
Y Y
X3
(Cubic)
Linear

Linear

X X

Values sampled repeatedly. Useful for smooth, non Best fit may differ by Provides a level of
Common in medicine. linear relationships sample region certainty with outputs

Generalized Additive Poisson Regression Logistic Regression

Models
Y Y=1

Non Linear
Linear
Non
Linear
Y=0
X
X

Combines different Used to model counts of Typically used for

regression techniques something classification

BIDA TM - Business Intelligence & Data Analysis

Classification Basics
What Is Classification?

• A classification problem involves classifying data into different labels/categories.

• The target categories should be discrete variables

• Predictions are made using one or more input variables.

Spelling Grammatical Variation Yearly

Strokes Credit Residence
Errors Form Income
Errors Email Domain History Status

Spam Detection Signature Analysis Loan Default Analysis

NOT
INBOX SPAM GENUINE FORGED DEFAULT
SPAM
DEFAULT

BIDA TM - Business Intelligence & Data Analysis

Theory & Business Objectives

Binary Multi-Class Multi-label

• Classification tasks that have • Has more than two class labels • Has two or more class labels
two class labels
• Outcomes must be ONE of a • Outcome can be ONE or MORE
• Outcomes must be ONE of range of classes of the class labels
the two classes

Use Case Output Classes Use Case Output Classes Use Case Output Labels

Malignant Malignant #MachineLearning

Tumour diagnosis OR OR AND / OR
Tumour diagnosis Benign Social Tag
Benign #DataScience
OR Optimization
AND / OR
Spam Premalignant #DataAnalysisJobs
Email Spam
OR
Detection Will Buy
Not Spam OR
Customer Prediction Will Not Buy
OR
Insufficient Data

BIDA TM - Business Intelligence & Data Analysis

Classification Algorithms

In the rest of this course, we’ll explore the most common classification algorithms.

Logistic Regression Naïve Bayes KNN SVM

Decision Trees Random Forest

Uses regression principles

to achieve separation
between discrete classes

Once we understand each technique, we’ll compare and contrast the benefits, and outputs.

BIDA TM - Business Intelligence & Data Analysis

Visualizing Logistic Regression (II)

Logistic Regression probabilities are estimated using one or more input variables

Probability of Buying (Y) 1 1 SPAM

Predict
0.75
NOT SPAM

0.5
Predict
0.25 SPAM

0 0 NOT SPAM
0 3 6 9 12 15
% of words misspelt

Logistic Regression uses a curved line to summarize our observed data points

The logistic regression line generates probabilities between 0 and 1.

BIDA TM - Business Intelligence & Data Analysis

Decision Tree

The decision tree algorithm can be used to predict both categorical or numeric outcomes.

Use Case Use Case Output Classes Output Classes

Spam Spam
Email Spam Email Spam
OR OR
Detection Detection
Not Spam Not Spam

Spam
Yes
Grammatical
Errors? Suspicious
Yes Spam
No
Yes Domain? No Not Spam
New Email Spelling
Received Errors? Yes
Suspicious Spam
No
Yes Domain? No Not Spam
Grammatical
Errors? No Yes
Suspicious Spam
Domain? No
Node Not Spam

BIDA TM - Business Intelligence & Data Analysis

Decision Tree

We can change a number of model parameters in order to optimize model performance.

Parameters include: Number of nodes, minimum group size,

Spam Spam

Suspicious Grammar
Domain? Spam Errors?
Spelling Suspicious Spam
Errors? Not Spam Domain?
Grammar Spelling Not Spam
Errors? Errors
Spelling Spam
Errors? Not Spam
Not Spam

Spam

Suspicious
Domain?
Grammar
Spelling Spam
Errors?
Not Spam Errors? Not Spam
Suspicious
Domain? Suspicious
Spam Domain?
Spelling Spam
Errors Grammar
Grammar Spam Errors? Not Spam
Errors? Not Spam

We choose the model which best separates the two classes, in this case Spam and Not Spam.

BIDA TM - Business Intelligence & Data Analysis

K-Nearest Neighbours

KNN assigns output classes based on the most similar observations in our sample space.

1 Nearest Neighbour 5 Nearest Neighbours • The lower the number K, the

more specific our model
Class 1: Spam Class 1: Spam becomes to this dataset,
Class 2: Not Spam Class 2: Not Spam potentially overfitting.

Frequency of Grammatical
Frequency of Grammatical

• The higher the number K, the

Errors (Y)
Errors (Y)

more generalized the model

becomes, potentially
underfitting.

• We need to find a good balance,

Malicious Content Malicious Content to give us good performance in a
Count (X) Count (X)
sensible sample space.
Prediction: Spam Prediction: Not Spam

BIDA TM - Business Intelligence & Data Analysis

Support Vector Machines

SVM models try to maximize the separation or margin between classes in the sample space.

Support Vector Support Vector Machines are an extension of Support Vector Classifiers.
Classifier with Two
Permitted Outliers

Support Vector Classifiers

Grammatical Errors (Y)

• Maximize the margin between classes using a linear decision boundary.

Class 2
Frequency of

Class 1
• By allowing outliers, we make the model less sensitive to the training data.

Support Vector Machines

• SVMs are more advanced as they allow us to model non-linear decision

Malicious Content boundaries.
Count (X)
• SVMs use Kernel Functions to hypothetically transform the data, which
simplifies the math.

BIDA TM - Business Intelligence & Data Analysis

Naïve Bayes

Naïve Bayes is a probabilistic model based on Bayes theorem which calculates conditional probabilities.

Conditional probability is the probability of one event, given the probabilities of other events.

𝑃𝑃(𝐵𝐵|𝐴𝐴)𝑃𝑃 𝐴𝐴
𝑃𝑃(𝐴𝐴|𝐵𝐵) =
𝑃𝑃(𝐵𝐵) Did you know? The Naïve refers to the
Where A is the hypothesis or models assumption that all input
outcome variable, and B is the variables are independent.
evidence or features

For example, when we observe a phrase in an potential SPAM email, we might ask:

Known or expected Likelihood of Likelihood of of expected

Known or
probability of SPAM observing PHRASE in observing PHRASE in probability of NOT
emails a SPAM email a NOT SPAM email SPAM emails

We can manipulate some values, known as priors if we have additional knowledge.

BIDA TM - Business Intelligence & Data Analysis

Gaussian Naïve Bayes

The Gaussian Naïve Bayes is an extension of Naïve Bayes, and is used to model normally distributed variables.

Showing Likelihood of a Twitter Post going Viral

We plot our sample data and observe two very different but
overlapping distributions.

Spams emails tend to have a higher number of spelling

errors.

Not Spam emails tend to have a lower number of spelling

errors.

For a new email, with 5 spelling errors:

Is it more likely that the email has 5 spelling errors if it came

from the Spam Distribution or the Not Spam Distribution.

The Gaussian Naïve Bayes also captures prior probabilities, allowing us to capture additional information.

BIDA TM - Business Intelligence & Data Analysis

Confusion Matrix

The confusion matrix helps us compare the predictions we control, vs the actual outcomes that we don’t.

It can help us understand the quality of our predictions, or the trade-offs we must make.

Prediction
Negative (0) Not Spam Positive (1) Spam

Negative (0) Not Spam True Negative False Positive

The number of emails we The number of emails incorrectly

correctly predicted as NOT SPAM. predicted as SPAM.
Actual

Positive (1) Spam

False Negative True Positive

The number of emails incorrectly The number of emails we

predicted as NOT SPAM correctly predicted as SPAM.

BIDA TM - Business Intelligence & Data Analysis

Confusion Matrix

Suppose we predict roughly 50% of emails are SPAM, What if we want to increase the number of actual Spam
based on several input variables.. emails that we detect? We previously missed 14 of them!

Now, roughly 70% of emails are marked as SPAM.

Negative (0) Positive (1)
Prediction
Not Spam Spam Negative (0) Positive (1)
Prediction
Not Spam Spam
Negative (0)
Not Spam

True False

Negative (0)
Not Spam
Negative Positive True False
35 11 Negative Positive
Actual

17 29

Actual
Positive (1)

False True
Spam

Positive (1)
Negative Positive False True

Spam
14 40 Negative Positive
7 47
• Overall, 75 (40 + 35) out of 100 predictions were correct.
• There are now only 7 missed SPAM emails! Success!
• But there are 14 Spam Emails that we didn’t detect.
• But now we created 18 more false alarms. This may get
annoying for users as they have to search in their junk.

Our model cannot be perfect, and the confusion matrix helps us understand the trade offs.
BIDA TM - Business Intelligence & Data Analysis
Understanding Trade Offs

The confusion matrix helps us understand trade offs.

But how do we decide which outcome to favor? False Negatives are undesirable in disease
detection. We cannot afford to miss bad
outcomes.
Negative (0)
Prediction Positive (1)
Negative (0)

True False False Negatives are undesirable in Fire

Negative Positive Alarms. But if we flag too many, will people
start to ignore them?
Actual
Positive (1)

False True False Positives are undesirable in

Negative Positive recommender systems. Imagine if Netflix
constantly suggests shows you weren’t
interested in.

It depends entirely on the situation!

We must be clear on what we want to achieve, and the costs and benefits of each type of error.

BIDA TM - Business Intelligence & Data Analysis

Evaluation Metrics

There are several metrics and techniques that can help summarize the observations in the confusion matrix:

Accuracy = (TN + TP) / Total Predictions The ROC Curve

Describes what proportion of predictions were correct Helps us visualize and compare the performance
(may not always be the best indicator of performance). of models.

The model with the biggest lift is best.

Precision = TP / (TP + FP)

How good are the positive predictions? Out of those

predicted positive, how many were actually positive?

Recall = TP / (TP + FN)

Describes what proportion of the actual positive cases

were correctly identified.

F1 Score = 2 * [ (Precision*Recall) / (Precision+Recall)]

Provides a balance between precision and recall.

BIDA TM - Business Intelligence & Data Analysis

Overfitting Vs Underfitting

Underfitting and Overfitting can help us describe classification model outputs too.

Underfitting means the model under generalises Overfitting means the model learns the training data
the data too well, and misses the more general relationship.

15 Nearest Neighbours
1 Nearest Neighbour

We must find a balance to ensure our model performs according to our evaluation metrics.

BIDA TM - Business Intelligence & Data Analysis

Data Preparation
Data Preparation

Statistical &
Data Collection Transform Data Model Evaluation Share
Predictive
and Storage for Projects Data Visualization Insights
Analysis

Business / Domain Knowledge

1. Load & Clean Data 3. Feature Engineering Statistical Data Mining

Data 2. Exploratory Data Analysis 4. Model Building
Science
Skills 5. Model Evaluation & Visualization
Coding for data analysis (turns Data Mining into Machine Learning)

Software & Data Engineering

Data preparation happens before model building.

We can generalize data prep into three main areas:
1. Load & Clean Data
2. Exploratory Data Analysis
3. Feature Engineering

BIDA TM - Business Intelligence & Data Analysis

Basic Dataset Terminology

A few key terms to get you started…

Application Age (18-90) Credit Rating Income Credit

ID Approved

1 25 697 25,000 YES Feature: Used as inputs to calculations in

models or machine learning algorithms.
2 15 527 13,000 NO
Target: The variable of interest, that we
3 19 658 23,000 YES
are trying to predict, estimate or model.
4 65 738 49,000 YES
Unique ID: Uniquely identifies each row.
5 72 538 32,000 NO
Row: Each row represents a single
6 26 243 9,000 NO observation.
7 186 999 25,000 NO ROW / OBSERVATION

BIDA TM - Business Intelligence & Data Analysis

Data Cleansing – Identifying Errors
Data Cleansing refers to the process of identifying and dealing with errors or inconsistencies in our data.

This may include:

Incorrect Missing Duplicated Irrelevant Errors

The solutions for each type of error are similar.

• Remove the feature

• Remove the observation

• Estimate the correct value

BIDA TM - Business Intelligence & Data Analysis

Data Cleansing

Applicant dataset for a $1,000 credit card…

Application Age (18-90) Credit Rating Income Credit
Approved
Incorrect Data ID
1 25 697 25,000 YES

2 15 527 13,000 NO

3 19 658 23,000 YES

4 65 738 49,000 YES

Incorrect data may be 5 72 538 32,000 NO

obvious, or it may be more
6 26 243 9,000 NO
ambiguous.
7 186 999 25,000 NO

It is important that we investigate all Some errors are clearly incorrect, like a customer age of 186.
errors to understand why they occur and Some errors are suspicious, like a credit score of 999.
how they should or should not affect our
analysis. Some errors are ambiguous, like an applicant age of 15.

BIDA TM - Business Intelligence & Data Analysis

Data Cleansing

Commercial Real Estate Mortgage Data

Missing Data Borrower ID Approved Credit Annual Remaining

Loan Amount Outstanding Repayment Term (years)
672 10,000,000 NA 700,000 15

72 150,000,000 20,000,000 2,100,000 10

34 3,000,000 1,000,000 200,000 --

41 1,500,000 2,000,000 130,000 12

Missing values raise many
35 2,500,000 1,000,000 120,000 8
questions about our data.
12 600,000 Null 35,000 18

726 700,000 100,000 15,000 7

Again, we need to understand the

process through which the data was We must understand what each missing value type represents.
collected.
We may need to speak to our data engineer, or another relevant stakeholder.

BIDA TM - Business Intelligence & Data Analysis

Data Cleansing

Duplicated Data
CSV data…

Row ID Month Year Sales Costs

1 Jan 2020 150 100

2 Feb 2020 200 120

3 Mar 2020 220 130

Duplicated data can cause
bias in our analysis. 1 Jan 2020 150 100

2 Feb 2020 200 120

We should investigate the reason for Two out of three rows seem to be duplicated.
the duplicated data.

BIDA TM - Business Intelligence & Data Analysis

Data Cleansing

Home Insurance Application Data

Irrelevant Data Application House Value Fire Risk Phone Address

ID Number

1 300,000 5 604-121-22A 3424 WR5

2 220,000 3 604-422-22B 32 NS6

3 250,000 1 604-783-22C 4532 BR68

4 180,000 0 604-123-22D 3838 GV38

Incorrect data is simply not
useful in the scenario. 5 210,000 0 604-993-22E 1010 P98

6 125,000 3 604-192-22F 23 RR21

7 175,000 1 604-193-22G 38 WR38

Irrelevant data does not help us make

predictions. Sometimes, an entire column can be removed.
Sometimes, relevant data is combined with irrelevant data.

BIDA TM - Business Intelligence & Data Analysis

Data Cleansing

Birth Certificate Registrations

Errors Application Date of Birth Country Weight

ID
1 23/08/2081 ENGLAND 8lbs

2 24/11/1999 CANADA 9lbs

3 23/01/2014 +1 8lbs

4 15/04/1987 FRANCE 7lbs

Errors might be caused by
any number of reasons. 5 28/02/1994 SLOVAKIA 6lbs

6 07/05/1996 IRAN ERR##

7 18/11/2005 NIGERIA 8lbs

We should be aware of them and
understand how and why they occur. Typos, incorrect data types and failed calculations all lead to errors.

BIDA TM - Business Intelligence & Data Analysis

Data Cleansing – Solutions

House Fire Risk Survey

ID Age Suburb House Cooking
Material Fuel Some rows can simply be removed.
1 25 Forestville Wood Gas
Imputation can help us fill in missing values.
2 Null Parksville Brick Gas
3 43 Forestville Wood Gas Using average age is very simplistic!
4 75 Parksville Wood Gas
Summary of age by suburb
5 37 Central Thatch Gas
Suburb Avg Dataset Age Avg Census Age
6 Null Parksville Wood Elec
Central 29 28
7 Null Parksville Wood Gas
Forestville 37 35
8 65 Central Brick Elec
Parksville 58 61
9 0 ERR Null NA
10 43 Central Wood Gas The average age in each suburb is very different.
… … … … … We are already in the realms of prediction and estimation.

BIDA TM - Business Intelligence & Data Analysis

Exploratory Data Analysis (EDA)

EDA helps us gain initial insights from our data, that help us implement our model.

What types of data are How can we describe the data Are there any obvious
present? in each feature? relationships?

BIDA TM - Business Intelligence & Data Analysis

EDA – Data Types

Understanding the data types helps us understand limitations and challenges we may encounter.

Continuous Continuous numeric

• Speed
Continuous features allow us to measure • Temperature
8.3 mph 15.2335 mph
amounts or points along a scale. • Income
• Age

Can be measured on
0 5 10 15 20
a scale or timeline.

• Dates Continuous dates

Might include
numbers, or a
timeline of dates.

Continuous variables are the Jan ‘20 Jul ‘20 Jan ‘21 Jul ‘21
easiest to work with.

BIDA TM - Business Intelligence & Data Analysis

EDA – Data Types

Understanding the data types helps us understand limitations and challenges we may encounter.

Categorical
Categorical features tell us which bucket a data point falls into.

Unordered Ordinal Binary

Categorical Categorical Features
Features Features

Unemployed 1 Small 1 True

Part Time 2 Medium 0 False

Full Time 3 Large

Student 4 X Large

No specified order A logical order exists Two buckets

BIDA TM - Business Intelligence & Data Analysis

EDA – Data Type Exceptions

Some data type examples can be confusing…

Numbers are not always continuous Dates are not always continuous

How many customers visited the store?

How many transactions were made? 2021 2022 2023 2024

1 326 4,237

We cannot have 0.5 of a customer, so the When datapoints belong in buckets, they
scale is not strictly continuous. are considered categorical data.

BIDA TM - Business Intelligence & Data Analysis

EDA – Binary Descriptive Stats

We can use histograms to view the distribution of our target variable.

[Binary categorical stats]

We have very few FRAUDULENT Previous Fraud may be a helpful

transactions. Our sample of data predictor of fraud
is imbalanced.

BIDA TM - Business Intelligence & Data Analysis

EDA – Categorical Descriptive Stats
How many categories there are and how frequently do they appear?

Predicting Injuries for Insurance

INJURY AT WORK NO INJURY

A large amount of our data is Industry may be a helpful

from finance and education predictor for injuries at work

BIDA TM - Business Intelligence & Data Analysis

EDA – Descriptive Stats Part 1
Ordered categorical variables are still considered buckets, but they have a sense of order to them.

Drive Through Dataset Gaming Dataset

Which group will

respond better to
health focused promos?

EDA often uncovers

patterns we didn’t
expect to see.

Does this align with

industry Clearly higher income individuals visit the
expectations? store more often.
What could explain the dip in high income
visits in 2021?

Exploratory data analysis helps us be curious, ask questions and uncover patterns in our data.

BIDA TM - Business Intelligence & Data Analysis

EDA – Descriptive Stats Part 1
We have a variety of ways to plot and describe continuous variables.

Left Skewed Normal Right Skewed Uniform

Mean Mean Mean Mean

Standard Deviation Standard Deviation Standard Deviation Standard Deviation
Median Median Median Median
IQ Range IQ Range IQ Range IQ Range

BIDA TM - Business Intelligence & Data Analysis

EDA - Correlation
Correlation describes the extent to which one variable moves with another.

Positive Correlations tell us that as one Negative Correlations tell us that as one
variable increases, the other tends to also. variable increases, the other decreases.

Correlation = -1.00
Correlation Zero Correlation tells us
= 1.00
that one variable has no
Correlation impact on the other.
= 0.93 Correlation = -0.50

Correlation
= 0.00

Correlation
= 1.00

Correlation = -1.00

Correlation can have a max value of 1. Correlation can have a min value of -1.

BIDA TM - Business Intelligence & Data Analysis

EDA – Correlation Matrix
A correlation matrix can be used to evaluate the relationship between each pair of variables in the dataset.

Passenger ID 0.9

Target: Survived
0.6
Ticket Class

Age 0.3

Siblings/Spouse
0.0
Parents/Children

Fare -0.3

Family
-0.6

BIDA TM - Business Intelligence & Data Analysis

EDA –Scatter Plot Matrix
A scatter plot matrix can be used to visually inspect patterns and relationships between variables.

Clear non-linear
No relationship
relationship

Exponential
Changing
relationship
variance

Right Skew Imbalanced

Distribution Categories

BIDA TM - Business Intelligence & Data Analysis

EDA - Feature Selection
Feature selection: select the related features from the dataset and remove the irrelevant ones.

Having too many features in a model can result in overfitting.

Company Valuation
In summary, we are eliminating some
columns (features) from our dataset.

X1 X2 X3 X4 X5

5000 companies 5000 companies

Benefits
50 Features Feature
10 Key Features
Selection • Reduce processing time
(Financial Ratios)
• Improve analysis results

Common Feature Selection methods are Principal Component Analysis and Feature Importance.
BIDA TM - Business Intelligence & Data Analysis
Feature Engineering
Feature engineering is the process of modifying the structure or contents of our data to make it more
suitable for analysis, or to help improve the performance of a model.

Imputation Dealing with Outliers Scaling

Grouping / Binning One Hot Encoding Calculation

Category
0
1

BIDA TM - Business Intelligence & Data Analysis

Feature Engineering – Dealing with Outliers
Outliers can change the results of our analysis significantly, but there is no single solution that works in all scenarios.

An outlier or just a high value? Potential Questions

• Is there something we’re not capturing that might

be generating these results, or is this truly a random
Income outlier?

• Beyond what threshold do we consider an

observation to be an outlier?

• What impact will outliers have on the particular

model we are using?

• Should I change my analysis method based on the

Experience
known presence of outliers?

• Do we want to include it in our analysis?

BIDA TM - Business Intelligence & Data Analysis

Feature Engineering – Normalization

Features of significantly different scale can cause

Income Credit Score Age
problems for our Machine Learning models.

Normalization (also Min Max scaling) is a form a $56,000 755 43

scaling, which allows us to transform variables onto $38,000 682 22
a consistent numeric scale. $120,000 731 38
Feature values are scaled so that they all sit $65,000 595 54
between 0 and 1.

• 1 represents the maximum value in each

column
Income Credit Score Age
• 0 represents the minimum value in each
0.2195 1.0000 0.4565
columns
0.0000 0.5438 0.0000
• 0.5 represents halfway between the two.
1.0000 0.8500 0.3478
The distribution of values in each column does 0.3293 0.0000 0.6957
not change

BIDA TM - Business Intelligence & Data Analysis

Feature Engineering – Standardization

Standardization performs a similar role to

normalization, rescaling values to a standardized,
comparable scale. Mean = 60
SD = 10
Differences:

• Typically used for features with a gaussian

distribution. 30 40 50 60 70 80 90

• Rescales a normal distribution so that it has a

mean of 0 and standard deviation of 1.

• Scaled values can be positive or negative.

• No bounds on the values of the feature. Mean = 0

SD = 1

30
-3 40
-2 50
-1 600 701 80
2 90
3

BIDA TM - Business Intelligence & Data Analysis

Feature Engineering – Grouping / Binning

Grouping or binning helps us simplify our data to make it more digestible, or remove some unnecessary detail.

High Cardinality
Zip Code Region
Features with many unique values are referred
22261 22
to as high cardinality.
23621 23
High cardinality provides lots of detail, but
25612 25
results in a small sample set per category.
23261 23
25211 25
Solutions
22515 22
It can help to reduce the number of categories: 26612 26

• Group into districts, states or regions 22324 22

22726 22
• Keep only large groups and group remaining
observations as ‘other’ 25353 25

BIDA TM - Business Intelligence & Data Analysis

Feature Engineering – Binning Numeric Values

We can also apply grouping or binning to numeric

variables.
Income Income
Income
Numbers recorded to a high degree of accuracy can Group Class
lead to overfitting. 35,650 30-40k 3

Groups help represent a range of similar values. 36,230 30-40k 3

84,570 80-90k 8
We can even re-incode numeric values to each
group, maintaining their ordered characteristics. 45,328 40-50k 4
20,303 20-30k 2
150,320 100k+ 10
26,330 20-30k 2
62,320 60-70k 6
48,321 40-50k 4
72,320 70-80k 7

BIDA TM - Business Intelligence & Data Analysis

Feature Engineering – One Hot Encoding
One Hot Encoding turns categorical data into binary columns.
Some ML models (Decision Trees) work well with text categories, but most require numeric values.

Color Red Yellow Green

Red 1 0 0
Red 1 0 0
Yellow 0 1 0
Green 0 Another
0 chart 1
showing outliers
Yellow 0 1 0

Dummy Variable Encoding achieves a similar outcome, instead removing the final column.

BIDA TM - Business Intelligence & Data Analysis

Feature Engineering – Calculation

Calculations can help us extract new information or summarize the data we have available.

First
Review Wait Part 1 Part 2 Part 3 Part 4 Avg
Start Date Performance
(Months)
Review
30% 20% 45% 50% 36%
01 Apr 2020 01 Sep 2020 5
01 Jun 2020 15 Jun 2020 0.5 80% 60% 70% 80% 73%

01 Sep 2020 01 Dec 2020 3 90% 100% 60% 85% 84%

01 Dec 2020 01 Apr 2021 4 40% 60% 40% 60% 50%

01 Feb 2021 01 May 2021 3 50% 55% 50% 80% 59%

The new features are more suited to machine learning analysis.

.
BIDA TM - Business Intelligence & Data Analysis
Training & Testing
Models need to be tested before we use them to predict real-world outcomes.

Available
Dataset

Training Data Testing Data

Real World Data
80% 20%

Train Model Test Model Production Model

Performance

Tests must be carried out on new data that the model has never seen before.

BIDA TM - Business Intelligence & Data Analysis

Training & Testing Variations
K-fold cross-validation goes one step further, by splitting the training set into segments to add additional validation.

Available
Dataset

Training Data Testing Data

Real World Data
80% 20%

Iteration 1 Test Train Train Train Train

Iteration 2 Train Test Train Train Train
Iteration 3 Train Train Test Train Train
Iteration 4 Train Train Train Test Train
Iteration 5 Train Train Train Train Test
Test Model Production Model
Performance

A further technique: Training, Validation & Test splits the dataset into 3 segments.

What do they all have in common? All techniques are trying to validate results on new, unseen data.

BIDA TM - Business Intelligence & Data Analysis

Other Data Science Techniques
Ensemble Models Empirically, ensemble models tend to add ~5% improved
performance over stand-alone machine learning models.

Random Forest Gradient Boosting

Live Data
Live Data +
Data A Data B Data C Data D

Live Data +

Not Not Not

Spam
Spam Spam Spam Live Data +
Vote

Live Data Final Prediction

Final Prediction: Not Spam

Ensemble models can be any combination of the machine learning algorithms.

BIDA TM - Business Intelligence & Data Analysis

Unsupervised Learning - Clustering

X1 X2 X3 X4 Y
The purpose of clustering is to group data points into those with similar characteristics.

Importantly, we are grouping observations (rows) into clusters

K Means Clustering
Input: Netflix Viewer Data Executives

Income
Income
Professi
onals
Students
Retirees

Age Age
Algorithm
Hierarchical Clustering

Commodity ETFs Equity & Bond ETFs

Binge Friday Night Infrequent

Watchers Watchers Watchers IAU DBA USO CPER SPY S6X0 BCYIF LQD HYG

BIDA TM - Business Intelligence & Data Analysis

Unsupervised Learning - Variable Reduction

Our earlier example: Company Valuation

Benefits

5000 companies 5000 companies • Reduce processing time

50 Features Feature
10 Key Features • Improve analysis results
Selection
(Financial Ratios)

X1 X2 X3 X4 X5
Variable Reduction algorithms are designed to reduce the number of features.

Importantly, we are identifying the most important columns in our dataset.

Common methods are Principal Component Analysis and Feature Importance.
BIDA TM - Business Intelligence & Data Analysis
Reinforcement Learning

Reinforcement Learning is where machines learn how to navigate scenarios through repetition.

Simplistic Example: Completing a Maze Complex Example: Playing Chess or Go

A B C Clear rules which the bot (agent) must follow.

Clear outcomes act as a reward (win / lose / draw).

Unknowns including opponent or environment.

Why are computers well suited to Reinforcement Learning?

• No memory loss
• Computers hold no recent memory biases
• Computational superiority

BIDA TM - Business Intelligence & Data Analysis

Neural Networks & Deep Learning

Neural Networks are inspired by the structures of neurons in our brains. They consist of nodes organized
into layers.

The input layer receives information or data.

The output layer is where a prediction is made.

The hidden layers are where all the complex math happens.

Deep Learning is an extension of Neural Networks, where the model may retrain itself multiple times.

Deep Learning models are less reliant on humans and may be able to train themselves.

Applications of Neural Networks & Deep Learning

• Collaborative Filtering • Image Recognition • Anomaly Detection

BIDA TM - Business Intelligence & Data Analysis

Statistical Models

There are many types of other statistical model.

AB Testing a Crypto Trading App Questions to Ask:

We must consider other business factors.

App Version 1
Was it a fair test?
App Version 2 Are there any unmeasured consequences?

Are the findings statistically significant?

App Version 1: Avg Transaction Value = $50

App Version 2: Avg Transaction Value = $70

BIDA TM - Business Intelligence & Data Analysis

Monte Carlo Simulation
Monte Carlo Simulation is a statistical technique used to quantify risk in forecasting models.

Equity Portfolio Value Simulation

Historical Data Future Scenarios

Past Events

Monte Carlo Simulation Confidence: 68% 95%

Mean of daily return: 0.13% Std dev of daily return: 0.014%

Future Events

Business Intelligence & Data Analyst (BIDA)™

Rule Based Models
Rule based models are used to auto follow rules, often at speeds that a human would not be capable.

Email Sorting Algorithmic Trading

Human traders trade by the minute

Family Folder Other Emails Algorithmic traders trade by the millisecond

A basic rule-based model. Bots make trades when certain conditions are met.

The basic principles in both scenarios are the same: Computers are simply following instructions.

Business Intelligence & Data Analyst (BIDA)™

BIDA Syllabus
Focus of Business Intelligence & Data Analysis
(BIDA) Program

Enterprise BI
Advanced Tableau – LOD Calculations
Advanced Power BI
Case Study: Financial Statements in Power BI
Case Study: Trading Dashboard in Tableau
BIDA
SQL Fundamentals
Courses
Tableau Fundamentals
Power BI Fundamentals
Power Pivot Fundamentals
Power Query Fundamentals
Intro to Business Intelligence

Data Architect
Data Engineer / SQL Developer
BI
Database Admin (DBA) Data Visualization Specialist
Roles
Business Intelligence Developer or Generalist
Data Analyst

Business Intelligence Data Models,

Metrics &
Data Collection Transform Data Analysis Model Evaluation Share
and Storage for Projects Data Visualization Insights
Predictive &
Data Science Stat Analysis

Data Architect Data Analyst

Data Engineer / SQL Developer
Data
Science Database Admin (DBA) Data Visualization Specialist
Roles
Data Scientist (more focused on coding for analysis)
Machine Learning Engineer (more focused on Software and Data Engineering)

Data Science & Machine Learning Fundamentals

Python Fundamentals
Stats Fundamentals
Regression Analysis: Fundamentals & Practical Applications
Classification: Fundamentals & Practical Applications

BIDA SQL Fundamentals

Courses Data Prep for Data Science – Coming Soon
R Fundamentals – Coming Soon
Case Study: Loan Default Prediction with ML
Modelling Risk with Monte Carlo
Power BI Fundamentals
Tableau Fundamentals

Focus of Business Intelligence & Data Analysis

(BIDA) Program
Business Intelligence & Data Analyst (BIDA)™

Detailed Lesson Plan in English 7 I. Objectives
85% (13)
Detailed Lesson Plan in English 7 I. Objectives
12 pages
Guide To Using MATLAB
No ratings yet
Guide To Using MATLAB
5 pages
Depth Prediction Single Image
No ratings yet
Depth Prediction Single Image
8 pages
Data Science Download Syllabus PDF
50% (2)
Data Science Download Syllabus PDF
6 pages
Solution 2
0% (1)
Solution 2
4 pages
Spline Methods Draft: Tom Lyche and Knut Mørken
No ratings yet
Spline Methods Draft: Tom Lyche and Knut Mørken
235 pages
Adaline/Madaline:Applications
100% (1)
Adaline/Madaline:Applications
25 pages
Statistical Inference For Engineers and Data Scientists Solutions Manual
No ratings yet
Statistical Inference For Engineers and Data Scientists Solutions Manual
12 pages
Compass
No ratings yet
Compass
3 pages
IT Policy and Procedure Manual v1
100% (4)
IT Policy and Procedure Manual v1
32 pages
Lenovo IdeaPad 310-15ISK 510-15ISK LCFC DIS CG411 CG511 NM-A751 Discrite r1.0
100% (2)
Lenovo IdeaPad 310-15ISK 510-15ISK LCFC DIS CG411 CG511 NM-A751 Discrite r1.0
60 pages
Explain Machine Learning Model Using SHAP
No ratings yet
Explain Machine Learning Model Using SHAP
28 pages
Data Ethics Framework 2
No ratings yet
Data Ethics Framework 2
23 pages
Ch-4 Ethics in Data Science PPT Vasu Sharma 9-A
No ratings yet
Ch-4 Ethics in Data Science PPT Vasu Sharma 9-A
18 pages
Data Science
No ratings yet
Data Science
39 pages
Numerical Methods and Optimization Question Bank: Q.1) Explain
No ratings yet
Numerical Methods and Optimization Question Bank: Q.1) Explain
9 pages
LPP - Simplex Method
No ratings yet
LPP - Simplex Method
28 pages
FIN 640 - Assignment 1 - Part 1 With Solutions
No ratings yet
FIN 640 - Assignment 1 - Part 1 With Solutions
17 pages
Bayesian Model Updating
No ratings yet
Bayesian Model Updating
26 pages
Chapter 3 - Introduction To Linear Programming A
No ratings yet
Chapter 3 - Introduction To Linear Programming A
37 pages
Diabetes Prediction Report
No ratings yet
Diabetes Prediction Report
16 pages
Stochastic Search Methods
100% (1)
Stochastic Search Methods
45 pages
Modeling With Penalized Splines
No ratings yet
Modeling With Penalized Splines
50 pages
Linear Programming
100% (1)
Linear Programming
82 pages
Midsem Regular MFDS 22-12-2019 Answer Key PDF
No ratings yet
Midsem Regular MFDS 22-12-2019 Answer Key PDF
5 pages
Time Series Lecture Notes
No ratings yet
Time Series Lecture Notes
97 pages
Machine Learning Guide Line
No ratings yet
Machine Learning Guide Line
10 pages
Responsible_and_Explainable_AI_in_Healthcare
No ratings yet
Responsible_and_Explainable_AI_in_Healthcare
11 pages
Book Matlab Document Stats
No ratings yet
Book Matlab Document Stats
2,338 pages
Download full MATLAB Programming for Engineers 5th Edition Chapman Solutions Manual all chapters
100% (4)
Download full MATLAB Programming for Engineers 5th Edition Chapman Solutions Manual all chapters
47 pages
21AML543 - Fundamentals of Data Science
No ratings yet
21AML543 - Fundamentals of Data Science
4 pages
Complete Supervised Machine Learning For Text Analysis in R 1st Edition Emil Hvitfeldt Julia Silge PDF For All Chapters
100% (1)
Complete Supervised Machine Learning For Text Analysis in R 1st Edition Emil Hvitfeldt Julia Silge PDF For All Chapters
59 pages
STAT 650 - Foundations of Data Science Syllabus
No ratings yet
STAT 650 - Foundations of Data Science Syllabus
13 pages
Numerical Solutions of Stiff Initial Value Problems Using Modified Extended Backward Differentiation Formula
No ratings yet
Numerical Solutions of Stiff Initial Value Problems Using Modified Extended Backward Differentiation Formula
4 pages
Ugc Model Curriculum Statistics: Submitted To The University Grants Commission in April 2001
No ratings yet
Ugc Model Curriculum Statistics: Submitted To The University Grants Commission in April 2001
101 pages
Fuzzy Logic & Machine Learning - PPT
No ratings yet
Fuzzy Logic & Machine Learning - PPT
138 pages
Computational-tools-and-software-MATLAB-Python
No ratings yet
Computational-tools-and-software-MATLAB-Python
5 pages
Duda Solutions PDF
No ratings yet
Duda Solutions PDF
77 pages
04 - Inequalities and Linear Programming S1 2018-19
100% (1)
04 - Inequalities and Linear Programming S1 2018-19
77 pages
5.2. Secant and Regula-Falsi Methods
No ratings yet
5.2. Secant and Regula-Falsi Methods
7 pages
15hc11 Optimization Techniques in Engineering
No ratings yet
15hc11 Optimization Techniques in Engineering
1 page
tmpAF8A TMP
No ratings yet
tmpAF8A TMP
7 pages
Seminar Report Machine Learning
No ratings yet
Seminar Report Machine Learning
20 pages
Eigenvalues and Eigenvectors
No ratings yet
Eigenvalues and Eigenvectors
15 pages
4 Duality Theory
No ratings yet
4 Duality Theory
17 pages
Gurobi Optimization
No ratings yet
Gurobi Optimization
26 pages
Variable Selection
No ratings yet
Variable Selection
15 pages
Heart Disease PredictionUsing
No ratings yet
Heart Disease PredictionUsing
6 pages
Newton Gauss Method
No ratings yet
Newton Gauss Method
37 pages
Applications of Data Mining in The Banking Sector
No ratings yet
Applications of Data Mining in The Banking Sector
8 pages
GAM: The Predictive Modeling Silver Bullet: Author: Kim Larsen
No ratings yet
GAM: The Predictive Modeling Silver Bullet: Author: Kim Larsen
27 pages
MiniTab Introduction
100% (1)
MiniTab Introduction
124 pages
Divergence PDF
No ratings yet
Divergence PDF
7 pages
Biological Modeling of Populations
100% (1)
Biological Modeling of Populations
185 pages
PDF Mathematical Optimization Techniques Richard Bellman download
100% (3)
PDF Mathematical Optimization Techniques Richard Bellman download
22 pages
Environmental and Ecological Statistics With R, Second Edition (Song S. Qian)
No ratings yet
Environmental and Ecological Statistics With R, Second Edition (Song S. Qian)
560 pages
Operation Research Problems Solving in Python: Prepared by Saurav Barua
No ratings yet
Operation Research Problems Solving in Python: Prepared by Saurav Barua
15 pages
An Introduction To Numerical Methods.a Matlab Approach
100% (1)
An Introduction To Numerical Methods.a Matlab Approach
435 pages
A Step by Step Backpropagation Example - Matt Mazur
No ratings yet
A Step by Step Backpropagation Example - Matt Mazur
7 pages
Tutorial 2018 Optimization
No ratings yet
Tutorial 2018 Optimization
7 pages
Linear Programming Method For Engineering Management
100% (1)
Linear Programming Method For Engineering Management
57 pages
Recommended Reading For Time Series Analysis
No ratings yet
Recommended Reading For Time Series Analysis
2 pages
Equity of Cybersecurity in the Education System: High Schools, Undergraduate, Graduate and Post-Graduate Studies.
From Everand
Equity of Cybersecurity in the Education System: High Schools, Undergraduate, Graduate and Post-Graduate Studies.
Joseph O. Esin
No ratings yet
Introduction to Numerical Analysis
From Everand
Introduction to Numerical Analysis
Simone Malacrida
No ratings yet
Process Simulation Using WITNESS
From Everand
Process Simulation Using WITNESS
Raid Al-Aomar
No ratings yet
Lecature Real16
No ratings yet
Lecature Real16
3 pages
Lecture Real 2
No ratings yet
Lecture Real 2
7 pages
Lecature Real14
No ratings yet
Lecature Real14
6 pages
Lecature Real8
No ratings yet
Lecature Real8
4 pages
Database
No ratings yet
Database
4 pages
Css MCQS: by Elrayek
No ratings yet
Css MCQS: by Elrayek
16 pages
Database
No ratings yet
Database
5 pages
Analytical Chemistry MCQ
No ratings yet
Analytical Chemistry MCQ
4 pages
Data Integration - Ebook - Top 10 FDMEE and ODI Tutorials
No ratings yet
Data Integration - Ebook - Top 10 FDMEE and ODI Tutorials
120 pages
Traction - A Startup Guide To Ge - Justin Mares - Part6
No ratings yet
Traction - A Startup Guide To Ge - Justin Mares - Part6
10 pages
Philips+Tpn18 1e+la
No ratings yet
Philips+Tpn18 1e+la
61 pages
Data Visualization With D3.js. This Blog Deals With Visualization of - by Ankit Agarwal - Medium
No ratings yet
Data Visualization With D3.js. This Blog Deals With Visualization of - by Ankit Agarwal - Medium
1 page
Cisco Slm224G 24-Port 10/100 + 2-Port Gigabit Smart Switch: Sfps Cisco Small Business Smart Switches
No ratings yet
Cisco Slm224G 24-Port 10/100 + 2-Port Gigabit Smart Switch: Sfps Cisco Small Business Smart Switches
5 pages
AN123 Stud
No ratings yet
AN123 Stud
650 pages
DB Related: 'Db3' 'E:/Sql - Dbs/Db3.Mdf'
No ratings yet
DB Related: 'Db3' 'E:/Sql - Dbs/Db3.Mdf'
8 pages
Dolby_Version_Matrix_2023_November_01
No ratings yet
Dolby_Version_Matrix_2023_November_01
6 pages
Tips Goodpassword
No ratings yet
Tips Goodpassword
2 pages
FGJFKJGJNFDKGNJ LJDLJF LD Lfjdsjfisdjfi KJSKLDJFLKDSJFKDSJF DFJNDJKFNJKDSF JDJFJDSKF
100% (2)
FGJFKJGJNFDKGNJ LJDLJF LD Lfjdsjfisdjfi KJSKLDJFLKDSJFKDSJF DFJNDJKFNJKDSF JDJFJDSKF
40 pages
Module 6B PDF
No ratings yet
Module 6B PDF
30 pages
Topological diagram:: Báo cáo môn chuyên đề 1 Nguyễn Lê Trùng Dương
No ratings yet
Topological diagram:: Báo cáo môn chuyên đề 1 Nguyễn Lê Trùng Dương
11 pages
CCN Lab 09
No ratings yet
CCN Lab 09
6 pages
SGGS Institute of Engg & Technology,: Vishnupuri, Nanded
No ratings yet
SGGS Institute of Engg & Technology,: Vishnupuri, Nanded
9 pages
Log
No ratings yet
Log
8 pages
MP31 - en - 2005 - Kap09 Communication Sipart
No ratings yet
MP31 - en - 2005 - Kap09 Communication Sipart
6 pages
Group 4 Presentation
No ratings yet
Group 4 Presentation
6 pages
ZTE SDR BTS Introduction - V2.00 - 20130403
No ratings yet
ZTE SDR BTS Introduction - V2.00 - 20130403
63 pages
Release Note BT200 GXP 1.2.5.3
No ratings yet
Release Note BT200 GXP 1.2.5.3
10 pages
Computer Programming Exam Answer Keys
No ratings yet
Computer Programming Exam Answer Keys
6 pages
Swayam Prabha Educational DTH Channels India 13.8.2024 Onwards
No ratings yet
Swayam Prabha Educational DTH Channels India 13.8.2024 Onwards
18 pages
ReadMe AMS 6500 Configuration
No ratings yet
ReadMe AMS 6500 Configuration
7 pages
Top Artists DNA Graphic From Spotify Data N-Gen Art
No ratings yet
Top Artists DNA Graphic From Spotify Data N-Gen Art
1 page
LAB#4
No ratings yet
LAB#4
5 pages
SAP ASE BKP Automation
No ratings yet
SAP ASE BKP Automation
2 pages