Machine learning algorithms laiki
Machine learning algorithms laiki
Regression
SVM
Naïve bayes
KNN
ANN
Decision trees
Linear regression
• Linear regression is one of the easiest and most
popular Machine Learning algorithms. It is a
statistical method that is used for predictive
analysis. Linear regression makes predictions for
continuous/real or numeric variables such
as sales, salary, age, product price, etc.
• Linear regression algorithm shows a linear
relationship between a dependent (y) and one or
more independent (y) variables, hence called as
linear regression.
Some popular applications of linear regression are:
o Analyzing trends and sales estimates
o Salary forecasting
o Real estate prediction
o Arriving at ETAs in traffic.
Types
• Simple linear regression (one predictor)
• Multiple linear regression (multiple predictor)
Logistic Regression
o Binary(0/1, pass/fail)
o Multi(cats, dogs, lions)
o Ordinal(low, medium, high)
Support vector machine
Support Vector Machine or SVM is one of the
most popular Supervised Learning algorithms,
which is used for Classification as well as
Regression problems. However, primarily, it is
used for Classification problems in Machine
Learning.
• The goal of the SVM algorithm is to create the
best line or decision boundary that can
segregate n-dimensional space into classes so
that we can easily put the new data point in
the correct category in the future. This best
decision boundary is called a hyperplane.
o Kernel: It is a function used to map a lower-dimensional data into higher dimensional data.
o Hyperplane: In general SVM, it is a separation line between two classes, but in SVR, it is a
line which helps to predict the continuous variables and cover most of the datapoints.
o Boundary line: Boundary lines are the two lines apart from hyperplane, which creates a
o Support vectors: Support vectors are the datapoints which are nearest to the hyperplane
Example: Suppose, we have an image of a creature that looks similar to cat and dog, but
we want to know either it is a cat or dog. So for this identification, we can use the KNN
algorithm, as it works on a similarity measure. The KNN model will find the similar
features of the new data set to the cats and dogs images and based on the most similar
features it will put it in either cat or dog category.
Suppose there are two categories, i.e., Category A and Category B,
and we have a new data point x1
Working of KNN
• Step-1: Select the number K of the neighbors
• Step-2: Calculate the Euclidean distance of K
number of neighbors
• Step-3: Take the K nearest neighbors as per the
calculated Euclidean distance.
• Step-4: Among these k neighbors, count the
number of the data points in each category.
• Step-5: Assign the new data points to that
category for which the number of the neighbor is
maximum.
Decision Tree Classification Algorithm
url = "/iris.data.txt"
names = ['sepal-length', 'sepal-width', 'petal-
length', 'petal-width', 'class']
dataset = pd.read_csv(url, names=names)
Summarize the data and perform
analysis
• Dimensions of data set: Find out how many
rows and columns our dataset has using the
shape property
# shape
print(dataset.shape)
Result: (150,5), Which means our dataset has
150 rows and 5 columns
To check the first 20 rows of our dataset
print(dataset.head(20))
statistical summary
Class distribution
Visual data analysis
• Random forest
• Logistic Regression
KNN Model
model = KNeighborsClassifier()
model.fit(x_train,y_train)
predictions = model.predict(x_test)
print(accuracy_score(y_test, predictions))
SVM
model = SVC()
model.fit(x_train,y_train)
predictions = model.predict(x_test)
print(accuracy_score(y_test, predictions))
Random forest
model = RandomForestClassifier(n_estimators=10)
model.fit(x_train,y_train)
predictions = model.predict(x_test)
print(accuracy_score(y_test, predictions))
Logistic regression
model = LogisticRegression()
model.fit(x_train,y_train)
predictions = model.predict(x_test)
print(accuracy_score(y_test, predictions))
Model accuracy
K – Nearest Neighbour (KNN) = 0.9
Randomforest=0.8666666666666667
The main reason why we use sigmoid function is because it exists between (0
to 1). Therefore, it is especially used for models where we have to predict the
probability as an output. Since probability of anything exists only between the
range of 0 and 1, sigmoid is the right choice.
Hyperbolic tangent
The range of the tanh function is from (-1 to 1).
tanh is also sigmoidal (s - shaped)
The advantage is that the
negative inputs will be
mapped strongly negative
and the zero inputs will be
mapped near zero in the
tanh graph.
ReLU
• The ReLU is the most used activation function in the world
right now.Since, it is used in almost all the convolutional
neural networks or deep learning. the ReLU is half rectified
(from bottom). F(z) is zero when z is less than zero and f(z) is
equal to z when z is above or equal to zero.
Types of Artificial Neural Network