Unit 1 Machine Learning
Unit 1 Machine Learning
Unit 1 Machine Learning
Unit-1
Unsupervised Learning:
The algorithm is given data without explicit instructions on what to do with it. The
system tries to learn the patterns and the structure from the data without labeled outputs.
Reinforcement Learning:
The algorithm learns by interacting with an environment. It receives feedback in the form
of rewards or penalties as it navigates through a problem space, allowing it to learn the
optimal behavior.
Machine learning has found applications in various domains. Here are some notable
examples:
Healthcare:
Finance:
ML is utilized for fraud detection, credit scoring, and stock market predictions.
Algorithms analyze financial data to make predictions and optimize decision-making
processes.
Autonomous Vehicles:
Recommendation Systems:
Machine learning algorithms can be classified based on various criteria. Here are two
primary classifications:
By Learning Style:
By Task:
Machine Learning
Classification: Predicting the category or class to which a new data point belongs.
Regression: Predicting a continuous value.
Clustering: Grouping similar data points based on their characteristics.
Dimensionality Reduction: Reducing the number of features in a dataset while
preserving its essential information.
Developing a machine learning model involves several key steps. Here's a step-by-step
procedure that we follow:
Clearly define the problem you want to solve. Understand the goal and the expected
output of the machine learning model.
Gather Data:
Collect relevant data for your problem. Ensure that your dataset is representative,
comprehensive, and free from biases. Consider the quality and quantity of data available.
Explore the dataset to understand its characteristics, identify missing values, outliers, and
potential features. Handle missing data and outliers appropriately.
Convert categorical variables into a suitable format if needed (e.g., one-hot encoding).
Split the dataset into training and testing sets.
Feature Engineering:
Create new features or transform existing ones to enhance the model's performance.
Feature engineering involves selecting, modifying, or creating features that can improve
the model's ability to make accurate predictions.
Select a Model:
Choose a machine learning algorithm that is suitable for your problem. The choice of
algorithm depends on the nature of the problem (classification, regression, clustering) and
the characteristics of your data.
Machine Learning
Use the training set to train your machine learning model. The model learns the patterns
in the data and adjusts its parameters to make accurate predictions.
Assess the model's performance using the testing set. Common evaluation metrics vary
based on the type of problem (e.g., accuracy, precision, recall, F1-score for classification;
mean squared error for regression). Choose metrics relevant to your specific problem.
Hyperparameter Tuning:
Fine-tune the hyperparameters of your model to improve its performance. This may
involve using techniques like grid search or random search.
Depending on the type of model used, try to interpret the results. Some models, like
decision trees or linear regression, offer insights into feature importance.
Regularly monitor the model's performance in a real-world setting. Retrain the model
periodically with new data to maintain its accuracy over time.
Machine learning is an iterative process. Use feedback from the model's performance in a
real-world setting to make improvements. Revisit any of the previous steps if necessary.
By following these steps, you can systematically develop and deploy a machine learning
model for various applications. Keep in mind that the specific details of each step may
vary depending on the nature of your problem and the characteristics of your data.
Linear regression
Linear regression is a statistical method used in machine learning to model the
relationship between a dependent variable (target) and one or more independent variables
(features). The goal is to find the best-fit line that minimizes the difference between the
predicted and actual values of the dependent variable. This line is called the "regression
line" or "best-fit line."
The equation for a simple linear regression with one independent variable can be
represented as:
y=mx+b
Here:
y is the dependent variable (target),
b is the y-intercept.
In a machine learning context, the values of m and b are learned from the training data to
make accurate predictions on new, unseen data.
Let's go through a simple example using Python and the popular library, scikit-learn:
In this example:
We generate random data points using a linear equation with some added noise.
Split the data into training and testing sets.
Create a Linear Regression model using scikit-learn.
Train the model on the training data.
Make predictions on the test data and plot the regression line.
The green line in the plot represents the learned regression line. The model aims to
minimize the difference between the predicted and actual values, optimizing the
parameters m and b to best fit the training data. This learned line can then be used to
make predictions on new data.
Cost function
In linear regression, the cost function, also known as the loss function or error function, is
a measure of how well the model's predictions match the actual target values. The goal of
Machine Learning
linear regression is to find the best-fitting line (or hyperplane in higher dimensions) that
minimizes this cost function. The most commonly used cost function in linear regression
is the Mean Squared Error (MSE) function.
Mean Squared Error (MSE): The MSE is calculated by taking the average of the
squared differences between the predicted values and the actual target values.
Mathematically, for a dataset with
MSE=m1∑i=1m(yi−y^i)2
The goal is to minimize this value, which means finding the parameters (slope and
intercept in simple linear regression) that result in the smallest MSE.
Optimization: To find the parameters that minimize the cost function (MSE),
optimization algorithms such as Gradient Descent are commonly used. Gradient Descent
iteratively adjusts the parameters in the direction that reduces the cost function until
convergence is reached, i.e., until further adjustments do not significantly decrease the
cost.
Other Cost Functions: While MSE is the most commonly used cost function in linear
regression, other cost functions such as Mean Absolute Error (MAE) or Huber loss can
also be used depending on the specific requirements of the problem.
Overall, the cost function in linear regression serves as a quantitative measure of how
well the model is performing, and the aim is to minimize this cost to obtain the best-
fitting line or hyperplane.
Gradient Descent
Gradient Descent is an optimization algorithm commonly used in machine learning to
minimize a cost function. In the context of linear regression, Gradient Descent is used to
find the optimal parameters (coefficients) for the linear model that minimize the cost
function, such as the Mean Squared Error (MSE).
Initialization: First, you start by initializing the parameters (coefficients) of the linear
regression model with some random values or zeros.
Machine Learning
Compute Gradient: At each iteration, you compute the gradient of the cost function with
respect to each parameter. The gradient indicates the direction of steepest increase of the
cost function. In the case of linear regression with MSE, the gradient with respect to each
parameter (slope and intercept) can be computed analytically using calculus.
Update Parameters: Once you have the gradient, you update the parameters by taking a
small step (determined by a parameter called the learning rate) in the opposite direction
of the gradient. This step is performed to minimize the cost function. The update rule for
each parameter
θ:=θ−α⋅∇J(θ)
α is the learning rate, a hyperparameter that determines the size of the step taken during
each iteration.
Where:
α is the learning rate, a hyperparameter that determines the size of the
step taken during each iteration.
∇()∇J(θ) is the gradient of the cost function J with respect to the parameter
θ.
Repeat: Steps 2 and 3 are repeated iteratively until convergence is reached. Convergence
is typically determined when the change in the cost function between iterations is very
small, or when a maximum number of iterations is reached.
Gradient Descent allows the linear regression model to iteratively adjust its parameters in
the direction that minimizes the cost function, eventually leading to optimal parameter
values that result in the best-fitting line or hyperplane for the given dataset.
z=β0+β1x1+β2x2+…+βnxn
where 0,1,…, β0,β1,…,βn are the coefficients (parameters) to be learned, and 1,
2,…, x1,x2,…,xn are the input features.
p^=σ(z)=1+e−z1
The predicted probability ^p^ can then be thresholded at 0.5 (or any other threshold) to
make binary predictions.
Overall, logistic regression is a simple yet powerful algorithm for binary classification
tasks, particularly when the relationship between the input features and the target variable
is linear or can be reasonably approximated as such.
Machine Learning
Gaussian function
In machine learning, the Gaussian function often refers to the Gaussian distribution, also
known as the normal distribution. It's a type of probability distribution that is commonly
used in various statistical models and machine learning algorithms due to its
mathematical properties and prevalence in natural phenomena.
The Gaussian function is defined by its probability density function (PDF), which takes
the form:
f(x∣μ,σ2)=2πσ21exp(−2σ2(x−μ)2)
Where:
The Gaussian function describes a symmetric "bell-shaped" curve centered around the
mean μ. The spread of the curve is determined by the variance σ2, where larger values of
σ2 result in wider curves, and smaller values result in narrower curves.
In machine learning, the Gaussian function is often used in various contexts, including:
1. Probability Density Estimation: Gaussian distributions are frequently used to
model the underlying probability distributions of data, especially when the data is
continuous and assumes a symmetric, bell-shaped form.
2. Gaussian Mixture Models (GMMs): GMMs are probabilistic models that represent
the distribution of data as a mixture of several Gaussian distributions. They are
commonly used for clustering and density estimation tasks.
3. Kernel Density Estimation (KDE): KDE is a non-parametric method used for
estimating the probability density function of a random variable. Gaussian kernels
are often employed in KDE to smooth the estimated density function.
4. Bayesian Inference: Gaussian distributions are often used as prior distributions in
Bayesian inference due to their mathematical tractability and conjugate properties
with certain likelihood functions.
Overall, the Gaussian function plays a fundamental role in machine learning, providing a
mathematical framework for understanding and modeling uncertainty in data, as well as
forming the basis for many algorithms and statistical techniques.