6 Real-World Case Studies: Data Science For Business
6 Real-World Case Studies: Data Science For Business
6 Real-World Case Studies: Data Science For Business
A. CONCEPT
AGE
K-MEANS
SAVINGS SAVINGS
B. K-MEAN ALGORITHM STEPS
2. Select random K points that are going to be the centroids for each
cluster
3. Assign each data point to the nearest centroid, doing so will enable
us to create “K” number of clusters
• Plot the WCSS vs. K and choose the elbow of the curve as the optimal
number of clusters to use.
= distance(Pi, C1 )2
Pi in Cluster 1
+ distance(Pi, C2 )2 + distance(Pi, C3 )2
Pi in Cluster 2 Pi in Cluster 3
WITHIN CLUSTERS SUM OF
SQUARES (WCSS)
Pi
C3 OPTIMAL “K”
AGE
Pi C1
Pi
C2
A. CONCEPT
p
Y = β0 + ƒj Xj + ϵ
j=1
The functions ƒj (xj) are unknown smoothing functions fit from the data
FUTURE
PAST
SALES
TIME
B. FACEBOOK PROPHET
3. REGRESSION TASKS
A. CONCEPT
y = b0 + b1 * x1 + b2 * x2 + .. + bnxn
• Least squares fitting is a way to find the best fit curve or line for a set
of points.
• The sum of the squares of the offsets (residuals) are used to estimate
the best fit curve or line.
yi(actual)
d d = ŷi − yi
REVENUE($)
ŷi(estimated/fitted)
min (ŷi − yi)2
TEMPERATURE (DegC)
• This value is then converted into a probability that could range from
0 to 1
Linear equation:
y = b0 + b1 * x
PASS/FAIL
LOGISTIC
REGRESSION Apply Sigmoid function:
MODEL
P(x) = sigmoid (y)
1
P(x) =
1+e−y
1
P(x) =
HOURS OF STUDYING 1+e−(b0+b1*x)
P1(t) a1k(t)
1 1 1
P2(t) a2k(t)
2 2 2
W11 W12 W1 , N 1
…
W21 W22 W 2, N 1
…
…
…
• CNNs is a type of deep neural networks that are commonly used for
image classification.
• CNNs are formed of (1) Convolutional Layers (Kernels and feature
detectors), (2) Activation Functions (RELU), (3) Pooling Layers (Max
Pooling or Average Pooling), and (4) Fully Connected Layers
(Multi-layer Perceptron Network).
TARGET CLASSES
Airplanes
Hidden
Cars
Input
Output Birds
CONVOLUTION POOLING FLATTENING
Cats
Deer
Dogs
KERNELS/ POOLING Frogs
FEATURE FILTERS
Horses
DETECTORS
Ships
Trucks
POOLINGL
LAYER
CONVOLUTIONAL (DOWNSAMPLING)
LAYER f(y)
f(y)=y
f(y)=0 y
A. CONCEPT
• True negatives (TN): cases when model predicted FALSE (no disease),
and correct class was FALSE (patient does not have disease).
TRUE CLASS
+ −
Type I error
+ TRUE + FALSE +
PREDICTIONS
− FALSE − TRUE −
Type II error
• Recall = TP/ Actual TRUE = TP/ (TP+FN) (when the class was actually
TRUE, how often did the classifier get it right?)
B. CONFUSION MATRIX IN SKLEARN
C. CLASSIFICATION REPORT
1
MAE = |yi − ŷi |
n
i= 1
• If MAE is zero, this indicates that the model predictions are perfect
• Mean Square Error (MSE) is very similar to the Mean Absolute Error
(MAE) but instead of using absolute values, squares of the difference
between the model predictions and the training dataset (true values)
is being calculated.
• MSE values are generally large compared to the MAE since the
residuals are being squared.
n
1 2
MSE = yi − ŷi
n
i= 1
n
1
MSE= yi − ŷi 2
n
i= 1
• MAPE might exhibit some limitations if the data point value is zero
(since there is division operation involved)
n
100%
MAPE= |(yi − ŷi )/ yi |
n
i= 1
n
100%
MPE= (yi − ŷi )/ yi
n
i= 1