Machine Learning
Machine Learning
A comprehensive overview
Deployment
Ordinal
Cyclical continuous
Data imputation
❑ Use the attribute mean for all subjects of the same class
Binning
Regression
Moving averages
Filters
Signal processing
Smoothness Responsive Score
KAMA 5 6 11
VIDYA 6 5 11
MAMA 7 1 8
Ehlers 1 3 4
Median 8 7 15
Median-MA 4 8 12 Filters are designed to selectively modify or
FRAMA 2 4 6 extract specific frequency components from a
Laguerre 3 2 5 signal while attenuating others
Low-pass filter: Allows low-frequency components to pass through while attenuating higher
frequencies. It is useful for removing high-frequency noise or extracting the slow-changing trends from
a signal.
High-pass filter: Allows high-frequency components to pass through while attenuating lower
frequencies. It is used to remove low-frequency noise or isolate fast-changing features in a signal.
Band-pass filter: Allows a specific range of frequencies to pass through while attenuating others. It is
employed when you want to isolate a specific band of frequencies from a signal.
Notch filter: Attenuates a narrow band of frequencies, often used to remove specific interference or
noise components.
Noise reduction: By averaging out nearby data points, moving averages can suppress high-frequency
noise or fluctuations, resulting in a smoother representation of the underlying signal.
Trend and momentum analysis: Moving averages can help identify trends, momentum and patterns
in data by smoothing out short- and longer-term fluctuations. Different types of moving averages, such
as simple moving averages (SMA), weighted moving averages (WMA), and exponential moving
averages (EMA), provide varying emphasis on recent versus older data points.
Forecasting: Moving averages can be used to generate predictions or forecasts by extrapolating the
smoothed trend. They are often employed in time series analysis and financial markets to make short-
term predictions.
Random sampling
Stratified sampling
Normalization and Standardization
ML Model Engineering
Supervised Learning
Regression – assessing the relationship between traits
in the data and calculates how much a variable changes
when any other variables change
Model-free Model-based
ML
Algorithm
Training
dataset
Learn
Model
Apply
Test
dataset Induction (predictive + descriptive models) – The process of reasoning
in which the premises of an argument are believed to support the
conclusion but do not ensure it.
Deduction (only predictive models) – The kind of reasoning in which the
conclusion is necessitated by, or reached from, previously known facts
(the premises)
ML Algorithms
Dimensionality reduction
❑ PCA is unsupervised and focuses on maximizing variance, while LDA is supervised and
focuses on maximizing class separation
❑ PCA doesn't require labeled data, but LDA does
❑ PCA reduces dimensionality by projecting data onto a lower-dimensional space, while LDA
creates linear combinations of features
❑ PCA outputs principal components that capture variation, while LDA outputs discriminant
functions that separate classes
❑ PCA is commonly used for exploratory data analysis, while LDA is often used for
classification tasks
❑ PCA is generally faster and more computationally efficient, but LDA may be more effective
with labeled data
Linear Regression
Linear regression is a statistical technique that models the relationship between a dependent
variable and one or more independent variables by fitting a straight line to the data
Benefits Drawbacks
works well when there are linear relationships performs poorly when there are non-linear
between the variables in your dataset relationships
straightforward to understand and explain often outclassed by its regularized
counterparts
can be updated easily with new data
Logistic Regression
Benefits Drawbacks
easy to implement, interpret, and efficient in may lead to overfitting if the number of
training observations is fewer than the number of
features
flexible and does not assume specific class constructs linear boundaries and assumes
distributions linearity between the dependent and
independent variables
can handle multiple classes and provides a limited to predicting discrete functions and is
probabilistic view of predictions not suitable for non-linear problems
measures predictor importance and direction requires low or no multicollinearity among
of association independent variables
quickly classifies unknown records and may struggle to capture complex relationships
performs well with linearly separable datasets
Decision Tree
Benefits Drawbacks
require less effort for data preparation during prone to instability
pre-processing compared to other algorithms
normalization of data is not necessary calculation more complex
Random Forest combines the output of multiple decision trees to reach a single result to make
predictions or classify data
Benefits Drawbacks
Versatile and easy to use Computationally demanding
Handles high-dimensional spaces Model interpretability
Feature importance Overcomplexity
Robust to overfitting Bias in multiclass problems
Out-of-box predictor Lack of precision
Apriori algorithm
The Apriori algorithm finds frequent patterns and associations in a transactional dataset
Benefits Drawbacks
Simplicity and ease of implementation Computational complexity
The rules are human-readable Difficulty handling sparse data
Benefits Drawbacks
have several advantages over traditional complex and require a significant amount of
algorithms data to train
can learn from data and tackle complex overfitting is a concern
problems
can generalize and identify patterns that lack interpretability
traditional algorithms may miss
particularly useful for tasks like image less suited for reasoning or decision-making
recognition and natural language processing
are efficient at processing large amounts of lack explanatory capabilities
data with speed and accuracy
K-Nearest Neighbor
Benefits Drawbacks
simple to implement needs to determine the value of k
robust to the noisy training data computation cost is high
Benefits Drawbacks
relatively easy to implement and apply determining the optimal value of k
can handle large datasets effectively dependence on initial values can impact the
results of k-means clustering
guarantees convergence to a final solution clustering data with varying sizes and density
can be challenging
allows for warm-starting, initializing centroids outliers can affect the clustering results
with predefined positions
can easily adapt to new examples and scalability of k-means is influenced by the
generalize to clusters of different shapes and number of dimensions in the data
sizes
DBSCAN
Benefits Drawbacks
Handles irregularly shaped and sized clusters Not suitable for datasets with categorical
features
Robust to outliers Requires a drop in density to detect cluster
borders
Does not require the number of clusters to be Struggles with clusters of varying density
specified Sensitive to scale of variables
Less sensitive to initialization conditions Sensitive to scale of variables
Relatively fast compared to other clustering Performance tends to degrade in high-
algorithms dimensional data
Difference DBSCAN and K-Means
DBSCAN K-Means
In DBSCAN we need not specify the number K-Means is very sensitive to the number of
of clusters clusters so it need to be specified
Clusters formed in DBSCAN can be of any Clusters formed in K-Means are spherical or
arbitrary shape convex in shape
DBSCAN can work well with datasets having K-Means does not work well with outliers data,
noise and outliers ouliers can skew the clusters in K-Means to a
very large extent
In DBSCAN two parameters are required for In K-Means only one parameter is required for
training the model training the model
Support Vector Machine
Benefits Drawbacks
works better when the data is linear choosing a good kernel is not easy
more effective in high dimensions doesn’t show good results on a big dataset
can solve any complex problem with kernel not that easy to fine-tune the hyper-
trick parameters
not sensitive to outliers
can do image classifications
Naive Bayes
Benefits Drawbacks
works quickly and can save a lot of time assumes that all predictors (or features) are
independent
suitable for solving multi-class prediction faces the ‘zero-frequency problem’
problems
can perform better than other models and estimations can be wrong in some cases
requires much less training data
better suited for categorical input variables
than numerical variables
ML Model Evaluation
Bias-Variance Tradeoff