ML - Machine Learning PDF
ML - Machine Learning PDF
ML - Machine Learning PDF
LinkedIn: https://www.linkedin.com/in/cristobal-veas
MACHINE LEARNING
“A subfield of computer science that gives computers the ability to learn without being explicitly
programmed”
SUPERVISED LEARNING
To train and direct the machine learning to predict model of future instances. For Supervised Learning the
data must be labelled.
Advantages of Regression:
- Very fast to analyze.
- It is not requiring tuning parameters.
- It is Easy to understand.
- It is Highly interpretable.
By Cristóbal Veas
LinkedIn: https://www.linkedin.com/in/cristobal-veas
- Test on a portion of train set: When the Test-Set is a portion of the Train-Set. The benefits are
high training accuracy and low out-of-sample accuracy.
- Train/Test Split: It is mutually exclusive with more accurate evaluation on out sample accuracy
and highly dependent on which datasets the data is trained and tested.
- K-Fold Cross Validation: Using multiple train/test split resulting the average to produce a more
consistent accuracy.
- Multiple Linear Regression: A model used for many independent variables to predict a dependent
variable.
By Cristóbal Veas
LinkedIn: https://www.linkedin.com/in/cristobal-veas
- Non-Linear Regression: Models to recall when the distribution between data is not linear.
Examples of Non- Linear Regression are polynomial, log, logistic, cubic, square regressions, etc.
- Ordinary least Squares: Using Linear algebra operations and for dataset with less 10k values.
- Optimization Algorithms: Using Gradient Descent for dataset of 10k values or more.
CLASSIFICATION
The process of categorizing some unknown items into a discrete set of categories or “classes”. It
corresponds to a supervised learning approach. The target attribute is a categorical variable.
- Decision Trees: It is used to go from observations about an item (represented in the branches)
to conclusions about the item's target value (represented in the leaves). The model is all about
finding the highest information and weighted entropy.
By Cristóbal Veas
LinkedIn: https://www.linkedin.com/in/cristobal-veas
- Logistic Regression: A classification algorithm for categorical variables. It is analogous to the linear
regression but predicting a categorical variable. This model Is suitable when a data is binary, it
required probabilistic results and if is important to understand the impact of a feature. Logistic
Regression uses as a step the sigmoid function. The training process is:
1. Initialize θ.
2. Calculate the predict value for a costumer.
3. Compare the output of the predict value and the real value and record it as error.
4. Calculate the error for all costumers.
5. Change the θ to reduce the cost.
6. Go back to step 2.
- Support Vector Machine (SVM): It is a supervised algorithm that classifies data finding a
separator. It is also mapping data to a high-dimensional feature space using different predictions
models (Kernelling). Using to image recognition, text category assignment, detecting spam,
sentiment analysis, gene expression classification.
Advantages and Disadvantages of using this algorithm:
1. A. Accurate in high-dimensional spaces.
2. A. Memory efficient.
3. D. Prone to over-fitting.
4. D. No probability estimation.
5. D. Small datasets.
By Cristóbal Veas
LinkedIn: https://www.linkedin.com/in/cristobal-veas
- Confusion Matrix: It’s used to calculate the value of F-score, each value of the matrix represents
the number of correct and wrong predictions. A value of F-score nearest to 1 have more
accuracy.
- Log Loss: Using for probabilities between 0 and 1 of a class labels instead of the label. A value
nearest 0 have better accuracy.
By Cristóbal Veas
LinkedIn: https://www.linkedin.com/in/cristobal-veas
CLUSTERING
Dividing the population or data points into a number of groups such that data points in the same groups
are more similar to other data points in the same group and dissimilar to the data points in other groups.
Clustering is a process for unsupervised learning and is used for exploratory data analysis, summary
generation, outlier detection, finding duplicates, pre-processing step,etc.
- K-means Algorithms: It is used for portioning clustering dividing the data into non-overlapping
subsets without any cluster-internal structure. The examples within a cluster are very similar and
very different across different clusters. K-means are used for med and large sized databases,
produces sphere like clusters and needs numbers of cluster. The features of this algorithms are:
Intra-Cluster: Distances within examples inside a cluster (minimized).
Inter-Cluster: Distances across examples inside a cluster (maximized).
- Hierarchical Clustering: Build a hierarchy of clusters where each node is a cluster consists of the
clusters of its daughter nodes. To the top are agglomerative approach and to the bottom are
divisive approach. The hierarchical clustering is mapping into a dendrogram. The steps are:
- Density Based Clustering: Algorithm useful to locate regions of high density and separate outliers.
One of the most important is DBSCAN (Density-Based Spatial Clustering of Applications with
Noise) used to works based on density of objects. Each point is either (Core, Border, Outlier) It is
based in 2 parameters.
1. Radius of neighborhood: Radius that if includes enough number of points within
we call it a dense area.
2. Min number of neighbors: The minimum number od fata points we want in a
neighborhood to define a cluster.
Advantages of DBSCAN:
RECOMMENDER SYSTEMS
It is a process that capture the pattern of people’s behavior and use it to predict what else they might
want or like. The applications are what to buy, where to eat, which job to apply, who you should be friends
with, personalize your experience on the web. The advantages are broader exposure, possibility of
continual usage or purchase of products and provides better experience.
- Memory Based: uses the entire user-item dataset to generate a recommendation. Uses statistical
techniques to approximate users of items (Pearson correlation, cosine similarity, Euclidean
distance, etc).
- Model-Based: develops a model of users to learn their preferences and models can be created
using machine learning techniques like regression, clustering, classification.
- Collaborative Filtering: Based on the fact that relationships exist between products. Those
algorithms have 2 different approaches:
1. User-Based collaborative filtering: Based on user’s neighbors.
The steps are:
By Cristóbal Veas
LinkedIn: https://www.linkedin.com/in/cristobal-veas
REINFORCEMENT LEARNING
“A subfield of computer science that study of how given a set of rewards or punishments, it is necessary to
learn what actions to take in the feature”
Agent: Entity that perceives its environment and acts upon that environment. (Ex: a person trying to solve a
puzzle).
State: A configuration of the agent and its environment. (Ex: a configuration of the puzzle’s pieces to solve).
Action: Choices that can be made in a state (Ex: The actions taken by the agent to solve the puzzle in a
specific state).
Environment or Transition Model: The place where the agent is going to make their action.
Reward: Numerical value to represent if the action taken was positive or negative.
Exploitation: Using knowledge of the actions that the A.I already has.
Exploration: Using knowledge exploring other actions that it may not have explored before.
Markov Decision Process: Model for decision-making, representing states, actions, and their rewards.
- Set of states S
- Set of ACTIONS(S) = a
- Transition Model P(s’|s,a)
By Cristóbal Veas
LinkedIn: https://www.linkedin.com/in/cristobal-veas
Q- Learning: Method for learning a function Q(s, a) estimate of the value of performing action a in state s.
Pseudocode of Q-Learning:
Start with Q(S, A) = 0 for all s, a
When it takes an action and receive a reward
Estimate the value of Q(s, a) based on current reward and expected future rewards
Update Q(s, a) to take into account old estimates as well as our new estimate.
r: reward
s: state
a: action
α: Learning rate -> how much is valuable new information compared old information
Greedy Decision-Making Policy: Using with Q- learning formula; when in states, choose action a with highest
Q(s,a).
ε Greedy:
Pseudocode:
Set ε equal to how often we want to move randomly.
With Probability 1 – ε , choose estimated best move.
With probability ε, choose a random move.
Function Approximation : Approximating Q(s,a), often by a function combining various features, rather than
storing one value for every state-action.