The document provides an overview of machine learning algorithms, focusing on mathematical principles, linear regression, logistic regression, cost functions, and gradient descent. It introduces the perceptron model as a fundamental building block of neural networks, explaining its components, learning process, and types of perceptron models. Additionally, it discusses the advantages and disadvantages of single-layer and multi-layer perceptrons in solving complex problems.
The document provides an overview of machine learning algorithms, focusing on mathematical principles, linear regression, logistic regression, cost functions, and gradient descent. It introduces the perceptron model as a fundamental building block of neural networks, explaining its components, learning process, and types of perceptron models. Additionally, it discusses the advantages and disadvantages of single-layer and multi-layer perceptrons in solving complex problems.
Fundamental Mathematical Formulae Learning outcomes By the end of this Session, you should be able to:
Describe some of the mathematical equations implemented
in those algorithms that help in the learning process.
Demonstrate knowledge and understanding on how
machine learning algorithms function Introduction As a subfield of artificial intelligence, Machine learning relies heavily on mathematical principles and formulas to create models that can learn and make predictions from data.
Understanding the underlying mathematical concepts is
crucial for practitioners to effectively apply machine learning algorithms. In this article.
Lets explore some common mathematical formulas used in
machine learning. Linear Regression Linear Regression is used to predict the outcome of a continuous variable by fitting the best line on the data points. The best-fitted line defines a relationship between the dependent and the independent variable(s). The algorithm tries to find the best-fitted line for predicting the value of the target variable. The best-fit line is attained by minimizing the sum of the squared difference between the data points and the regression line. Linear Regression Linear Regression The formula for simple linear regression can be expressed as:
y = β₀ + β₁x₁ + β₂x₂ + … + βₙxₙ + ɛ
Where: y is the target variable or Dependent Variable x₁, x₂, …, xₙ are the input features (Independent Variables) β₀, β₁, β₂, …, βₙ are the coefficients to be learned (slope) ɛ represents the error term Logistic Regression Logistic Regression is a Classification algorithm used to estimate the outcome of a categorical variable based on the independent variables. It predicts the probability of an event occurring by fitting the data to a logistic function. The coefficients of the independent variables in the logistic function are optimized by maximizing the likelihood function. A decision boundary is optimized such that the cost function is minimal. The cost function can be minimized by using Gradient Descent. Logistic Regression Logistic regression is widely used for classification tasks. It is useful when the dependent variable is “binary” in nature. Logistic regression is usually associated with examining the association of independent variables with one dichotomous dependent variable. Linear regression is used when the dependent variable is continuous, and the nature of the line of regression is linear. Logistic Regression Logistic regression is widely used for classification tasks. The logistic function, or sigmoid function, plays a pivotal role in this algorithm. It is defined as: σ(z) = 1 / (1 + e^(-z)) Where: z represents a linear combination of input features and their corresponding coefficients. The logistic function maps the linear output to a value between 0 and 1, allowing us to interpret it as a probability. Cost Functions Cost functions quantify the error or discrepancy between the predicted values and the actual values. 1. Mean Squared Error (MSE) is a commonly used cost function for regression problems, defined as: MSE = (1 / N) * ∑(yᵢ — ȳ)² Where: N is the number of samples yᵢ is the actual value ȳ is the predicted value Cost Functions For classification problems: 2. Cross-Entropy Loss is often employed, given by: CE = -∑(yᵢ * log(pᵢ) + (1 — yᵢ) * log(1 — pᵢ)) Where: yᵢ is the actual label (0 or 1) pᵢ is the predicted probability Gradient Descent Gradient descent is an optimization algorithm used to minimize the cost function and find the optimal values of the coefficients. The update rule for gradient descent can be expressed as: θ = θ — α * ∇J(θ) where θ represents the coefficients α is the learning rate J(θ) is the cost function ∇J(θ) is the gradient of the cost function with respect to θ. Gradient Descent Example Train me to find the line of best fit: y = x * 1.28 + 1.02 For 500 times, Cost: 123.53 after 100 times y = x * 1.39 + 1.03 Cost: 96.20 Gradient Descent Gradient Descent is a popular algorithm for solving AI problems. A simple Linear Regression Model can be used to demonstrate a gradient descent. The goal of a linear regression is to fit a linear graph to a set of (x,y) points. Though this can be solved with a math formula, Machine Learning Algorithm can also solve this. This is what the previous example does. It starts with a scatter plot and a linear model (y = wx + b). Then it trains the model to find a line that fits the plot. This is done by altering the weight (slope) and the bias (intercept) of the line. Machine Learning Algorithms Perceptron Perceptron A Perceptron is an Artificial Neuron and the building block of an Artificial Neural Network It is inspired by the function of a Biological Neuron.
It is the simplest possible Neural Network.
Perceptron is a linear Machine Learning algorithm used for
supervised learning for various binary classifiers.
In Machine Learning and Artificial Intelligence, Perceptron is
the most commonly used term for all folks. Perceptron - History 1928 – 1971 - Frank Rosenblatt invented a Perceptron program, on an IBM 704 computer at Cornell Aeronautical Laboratory. Scientists had discovered that brain cells (Neurons) receive input from our senses by electrical signals. The Neurons, then again, use electrical signals to store information, and to make decisions based on previous input. Perceptrons are designed to simulate brain principles, with the ability to learn and make decisions. Perceptron - History
It is the primary step to learn Machine Learning and Deep
Learning technologies, which consists of a set of weights, input values or scores, and a threshold. Perceptron Model
Perceptron is also understood as the Artificial Neuron or
neural network unit helps to detect certain input data computations in business intelligence.
In Machine Learning, binary classifiers are defined as the
function that helps in deciding whether input data can be represented as vectors of numbers and belongs to some specific class. Binary classifiers can be considered as linear classifiers. Perceptron – examples Just imagine of the perceptron in your brain and how it helps make decision? Will I go to the concert?
Question: Will the perceptron fire given the threshold as 1.5?
Basic Components of Perceptron
The perceptron model contains three main components:
Perceptron Model 1. Input Nodes or Input Layer This is the primary component of Perceptron which accepts the initial data into the system for further processing. Each input node contains a real numerical value. Perceptron Model 2. Wight and Bias Weight parameter represents the strength of the connection between units. Weight is directly proportional to the strength of the associated input neuron in deciding the output. A higher value means that the input has a stronger influence on the output. Bias can be considered as the line of intercept in a linear equation Perceptron Model 2. Wight and Bias Example on Bias: Sometimes, if both inputs are zero, the perceptron might produce an incorrect output. To avoid this, we give the perceptron an extra input with the value of 1. This is called a bias. Perceptron Model 3. Activation Function: This components are the ones that help to determine whether the neuron will fire or not. Activation Function can be considered primarily as a step function. The activation function is typically accompanied by a Threshold Value. If the result of the activation function exceeds the threshold, the perceptron fires (outputs 1), otherwise it remains inactive (outputs 0). Perceptron Model 3. Activation Function:
Note: The training function guesses the outcome based on the
activate function. Every time the guess is wrong, the perceptron should adjust the weights. After many guesses and adjustments, the weights will be correct. Perceptron Model 3. Activation Function: Backpropagation: • After each guess, the perceptron calculates how wrong the guess was. • If the guess is wrong, the perceptron adjusts the bias and the weights so that the guess will be a little bit more correct the next time. • This type of learning is called backpropagation. • After trying (a few thousand times) the perceptron will become quite good at guessing. Perceptron Model 3. Activation Function: There are three types of Activation functions: • Sign function • Step function • Sigmoid function Perceptron Model Perceptron Model How Perceptron works: In Machine Learning, Perceptron is considered as a single- layer neural network that consists of four main parameters named: 1. input values (Input nodes) 2. weights and Bias, net sum 3. an activation function As shown in the previous diagram, Perceptron model works in two important steps as discussed next: Perceptron Model Step-1 In the first step, multiply all input values with corresponding weight values and then add them to determine the weighted sum. Mathematically, we can calculate the weighted sum as follows: ∑wi*xi = x1*w1 + x2*w2 +…wn*xn Next, add a special term called bias 'b' to this weighted sum to improve the model's performance. ∑wi*xi + b Perceptron Model Step-2 The weighted sum is applied to the activation function 'f' to obtain the desired output as follows:
Y = f(∑wi*xi + b)
The output is either in binary form or a continuous value
Perceptron Algorithm - Frank Rosenblatt Frank Rosenblatt suggested the following algorithm: Steps: 1. Set a threshold value 2. Multiply all inputs with its weights 3. Sum all the results 4. Activate the output Perceptron Learning The perceptron can learn from examples through a process called training. The learning process presents the perceptron with labeled examples, where the desired output is known. The perceptron compares its output with the desired output and adjusts its weights accordingly, aiming to minimize the error between the predicted and desired outputs. This is typically done using a learning algorithm such as the perceptron learning rule or a backpropagation algorithm. The learning process allows the perceptron to learn the weights that enable it to make accurate predictions for new, unknown inputs. Types of Perceptron Models A decisions can not be made by One Neuron alone, thus other neurons provide more input Multi-Layer Perceptrons can be used for more sophisticated decision making. Although perceptrons are limited to learning linearly separable patterns, this is overcome by stacking multiple perceptrons together in layers and incorporating non-linear activation functions Neural networks can overcome this limitation and learn more complex patterns Types of Perceptron Models Perceptron models are divided into two types based on the layers as follows:
1. Single-layer Perceptron Model
2. Multi-layer Perceptron model Types of Perceptron Models 1. Single Layer Perceptron Model: The easiest Artificial neural networks (ANN) types. Consists of a feed-forward network and includes a threshold transfer function inside the model. The main objective of the single-layer perceptron model is to analyze the linearly separable objects with binary outcomes. In a single layer perceptron model, its algorithms do not contain recorded data, so it begins with inconstantly allocated input for weight parameters. Types of Perceptron Models 1. Single Layer Perceptron Model: Take note: That this model consists of a few discrepancies triggered when multiple weight inputs values are fed into the model. Hence, to find desired output and minimize errors, some changes should be necessary for the weights input. Single-layer perceptron can learn only linearly separable patterns. Types of Perceptron Models 2. Multi-Layered Perceptron Model: a multi-layered perceptron model is considered as multiple artificial neural networks having various layers in which activation function does not remain linear, similar to a single layer perceptron model. A multi-layer perceptron model has greater processing power and can process linear and non-linear patterns. Further, it can also implement logic gates such as AND, OR, XOR, NAND, NOT, XNOR, NOR. Types of Perceptron Models 2. Multi-Layered Perceptron Model: Consists of at least three layers: an input layer, more than one hidden layer, and an output layer Types of Perceptron Models 2. Multi-Layered Perceptron Model: Like a single-layer perceptron model, a multi-layer perceptron model also has the same model structure but has a greater number of hidden layers. The multi-layer perceptron model is also known as the Backpropagation algorithm, which executes in two stages as follows: a. Forward Stage: Activation functions start from the input layer in the forward stage and terminate on the output layer. Types of Perceptron Models 2. Multi-Layered Perceptron Model: b. Backward Stage: In the backward stage, weight and bias values are modified as per the model's requirement. In this stage, the error between actual output and demanded originated backward on the output layer and ended on the input layer. Types of Perceptron Models Advantages of Multi-Layer Perceptron: 1. A multi-layered perceptron model can be used to solve complex non-linear problems. 2. It works well with both small and large input data. 3. It helps us to obtain quick predictions after the training. 4. It helps to obtain the same accuracy ratio with large as well as small data. Types of Perceptron Models Disadvantages of Multi-Layer Perceptron: 1. In Multi-layer perceptron, computations are difficult and time-consuming. 2. In multi-layer Perceptron, it is difficult to predict how much the dependent variable affects each independent variable. Neural Networks Perceptrons are often used as the building blocks for more complex neural networks, such as multi-layer perceptrons (MLPs) or deep neural networks (DNNs). By combining multiple perceptrons in layers and connecting them in a network structure, these models can learn and represent complex patterns and relationships in data, enabling tasks such as image recognition, natural language processing, and decision making. Neural Networks