Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
104 views

Python For Machine Learning Basics

The document summarizes a workshop on machine learning concepts and applications held at P.S.R. Engineering College in Sivakasi. It discusses the installation of Anaconda for Python packages, popular machine learning libraries like NumPy, SciPy, Pandas and scikit-learn. Specific machine learning algorithms covered include k-nearest neighbors classification, linear regression, support vector machines, k-means clustering and principal component analysis. Code examples are provided for k-NN classification of iris data and linear regression on a salary dataset.

Uploaded by

eshwari2000
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
104 views

Python For Machine Learning Basics

The document summarizes a workshop on machine learning concepts and applications held at P.S.R. Engineering College in Sivakasi. It discusses the installation of Anaconda for Python packages, popular machine learning libraries like NumPy, SciPy, Pandas and scikit-learn. Specific machine learning algorithms covered include k-nearest neighbors classification, linear regression, support vector machines, k-means clustering and principal component analysis. Code examples are provided for k-NN classification of iris data and linear regression on a salary dataset.

Uploaded by

eshwari2000
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 36

Workshop on “Machine Learning Concepts

and Applications”

P.S.R.Engineering College,
Sevalpatti, Sivakasi

Dr.R.Meena Prakash, Associate


9/16/2021
Professor/PSREC
Session 2
Python for Machine Learning algorithms and
applications

Dr.R.Meena Prakash
Associate Professor/ECE
P.S.R.Engineering College,
Sevalpatti, Sivakasi

Dr.R.Meena Prakash, Associate


9/16/2021
Professor/PSREC
Contents
• Anaconda Installation
• Numpy, SciPy, matplotlib, Pandas, openCV
• Sci-kit learn
• K-Nearest Neighbors Classification
• Linear Regression
• SVM Classifier
• K-Means Image Segmentation
• PCA and Logistic Regression Classifier

Dr.R.Meena Prakash, Associate


9/16/2021
Professor/PSREC
Anaconda Installation

https://www.anaconda.com/products/individual

• Anaconda Individual Edition is the world’s most popular


Python distribution platform with over 20 million users
worldwide. 
• Over 7,500 data science and machine learning packages
are available. With the conda-install command
thousands of open-source packages can be installed.
Dr.R.Meena Prakash, Associate
9/16/2021
Professor/PSREC
Numpy, SciPy and matplotlib, OpenCV
• NumPy provides support of highly optimized
multidimensional arrays. These are the basic data
structure of most state-of-the art algorithms.
• SciPy use these arrays to provide a set of fast
numerical recipes. SciPy contains modules for
optimization, linear algebra, integration, interpolation,
special functions, FFT, signal and image processing
• matplotlib is feature-rich library to plot high-quality
graphs using Python.
• OpenCV is popular library for Computer Vision.
• Pandas – Python data analysis and manipulation tool;
Great tool for using Excel with Python
Dr.R.Meena Prakash, Associate
9/16/2021
Professor/PSREC
• Scalars (0D tensors)
A tensor that contains only one number is called
a scalar.
• Vectors (1D tensors)
• An array of numbers is called a vector, or 1D
tensor. A 1D tensor is said to have exactly
one axis.
• Matrices (2D tensors)
An array of vectors is a matrix, or 2D tensor. A
matrix has two axes (often referred to
rows and columns).
Dr.R.Meena Prakash, Associate
9/16/2021
Professor/PSREC
• >>> import numpy as np
• >>> a = np.array([0,1,2,3,4,5])
• >>> a
• array([0, 1, 2, 3, 4, 5])
• >>> a.ndim
• 1
• >>> a.shape
• (6,)

Dr.R.Meena Prakash, Associate


9/16/2021
Professor/PSREC
• >>> b = a.reshape((3,2))
• >>> b
• array([[0, 1],
• [2, 3],
• [4, 5]])
• >>> b.ndim
• 2
• >>> b.shape
• (3, 2)

Dr.R.Meena Prakash, Associate


9/16/2021
Professor/PSREC
• c=a.reshape(3,1,2) • [[[0 1]]

• print(c) • [[2 3]]


• print(c.ndim)
• print(c.shape) • [[4 5]]]
• 3
• npdata=np.arange(3)
• (3, 1, 2)
• print(npdata) • [0 1 2]
• npdata=np.arange(40) • [[ 0 1 2 3 4 5 6 7]
• npdata.shape=(5,8) • [ 8 9 10 11 12 13 14 15]
• [16 17 18 19 20 21 22 23]
• print(npdata) • [24 25 26 27 28 29 30 31]
• [32 33 34 35 36 37 38 39]]
Dr.R.Meena Prakash, Associate
9/16/2021
Professor/PSREC
SciPy Packages

Dr.R.Meena Prakash, Associate


9/16/2021
Professor/PSREC
• import scipy as sp
• data = sp.genfromtxt("web_traffic.tsv", delimiter="\t")
• print(data[:10])
• print(data.shape)
• x = data[:,0] 743 hrs – web traffic
• y = data[:,1] In word pad file
Missing Values - 8
• n=sp.sum(sp.isnan(y))
• print(n)
• x = x[~sp.isnan(y)]
• y = y[~sp.isnan(y)]
• print(x)
• print(y)
• print(x.shape)
Dr.R.Meena Prakash, Associate
9/16/2021
Professor/PSREC
• import matplotlib.pyplot as plt
• # plot the (x,y) points with dots of size 10
• plt.scatter(x, y, s=10)
• plt.title("Web traffic over the last month")
• plt.xlabel("Time")
• plt.ylabel("Hits/hour")
• plt.xticks([w*7*24 for w in range(10)],['week %i' % w
for w in range(10)])
• plt.autoscale(tight=True)
• # draw a slightly opaque, dashed grid
• plt.grid(True, linestyle='-', color='0.75')
• plt.show()
Dr.R.Meena Prakash, Associate
9/16/2021
Professor/PSREC
Dr.R.Meena Prakash, Associate
9/16/2021
Professor/PSREC
Scikit-learn
• Scikit library is the standard library for many
machine learning tasks including classification
• fit(features, labels): This is the learning step
and fits the parameters of the model
• predict(features): This method can only be
called after fit and returns a prediction for one
or more inputs
• Conda install scikit-learn

Dr.R.Meena Prakash, Associate


9/16/2021
Professor/PSREC
Cross Validation
• Cross-validation is a re-sampling procedure used to
evaluate machine learning models on a limited data
sample.
• Shuffle the dataset randomly.
• Split the dataset into k groups
• For each unique group:
– Take the group as a hold out or test data set
– Take the remaining groups as a training data set
– Fit a model on the training set and evaluate it on the test set
– Retain the evaluation score and discard the model
• Summarize the skill of the model using the sample of
model evaluation scores. In general k=5 or 10
Dr.R.Meena Prakash, Associate
9/16/2021
Professor/PSREC
Cross Validation

Dr.R.Meena Prakash, Associate


9/16/2021
Professor/PSREC
K-Nearest Neighbor Algorithm
• KNN is a non-parametric
algorithm in which there
is no assumption for
underlying data
distribution like GMM
• K is the number of
nearest neighbors
The steps include
• Calculate distance
• Find closest neighbors
• Vote for labels
Dr.R.Meena Prakash, Associate
9/16/2021
Professor/PSREC
K Neighbors Classifier for classification of iris data
• from matplotlib import pyplot as plt
• import numpy as np
• from sklearn.datasets import load_iris
• from sklearn.model_selection import
train_test_split
• from sklearn.model_selection import KFold
• data = load_iris()

Dr.R.Meena Prakash, Associate


9/16/2021
Professor/PSREC
• features = data.data
• feature_names = data.feature_names
• target = data.target
• target_names = data.target_names
• labels = target_names[target]
• from sklearn.neighbors import
KNeighborsClassifier
• (Features are the length and width of sepals
and petals)

Dr.R.Meena Prakash, Associate


9/16/2021
Professor/PSREC
• classifier = KNeighborsClassifier(n_neighbors=1)
• X=features
• y=target
• X_train, X_test, y_train, y_test = train_test_split(
• X, y, test_size=0.33,
random_state=42)
• classifier.fit(X_train, y_train)
• print(classifier.score(X_test, y_test))
• (random state – seed to the random generator)
• Output : Accuracy=0.98
Dr.R.Meena Prakash, Associate
9/16/2021
Professor/PSREC
Linear Regression

• import numpy as np
• import matplotlib.pyplot as plt
• import pandas as pd
• dataset = pd.read_csv('Position_Salaries.csv')
• X = dataset.iloc[:, 1:2].values
• y = dataset.iloc[:, 2].values
• from sklearn.tree import DecisionTreeRegressor
• regressor =
DecisionTreeRegressor(random_state=0)
• (iloc in pandas is used to select rows and columns in
Pandas dataframe by row numbers)
Dr.R.Meena Prakash, Associate
9/16/2021
Professor/PSREC
• regressor.fit(X,y)
• n=np.array([6.5]).reshape(1, 1)
• y_pred = regressor.predict(n)
• plt.scatter(X, y, color = 'red')
• plt.plot(X, regressor.predict(X), color = 'blue')
• plt.title('Regression Model')
• plt.xlabel('Position level')
• plt.ylabel('Salary')
• plt.show()

Dr.R.Meena Prakash, Associate


9/16/2021
Professor/PSREC
• X_grid = np.arange(min(X), max(X), 0.01)
• X_grid = X_grid.reshape((len(X_grid), 1))
• plt.scatter(X, y, color = 'red')
• plt.plot(X_grid, regressor.predict(X_grid), color =
'blue')
• plt.title('Example of Decision Regression Model')
• plt.xlabel('Position level')
• plt.ylabel('Salary')
• plt.show()
• (arange- evenly spaced values within the given
range)
Dr.R.Meena Prakash, Associate
9/16/2021
Professor/PSREC
Output

Dr.R.Meena Prakash, Associate


9/16/2021
Professor/PSREC
SVM Classifier

• from sklearn import datasets


• from sklearn.model_selection import train_test_split
• iris = datasets.load_iris()
• X = iris.data # we only take the first two features.
• y = iris.target
• from sklearn.svm import SVC
• model = SVC(kernel='linear', C=1E10)( # C is the penalty
parameter of error term)
• X_train, X_test, y_train, y_test = train_test_split(
• X, y, test_size=0.33, random_state=42)
• model.fit(X_train, y_train)
• print(model.score(X_test, y_test))
(For 2 features, accuracy=0.84, For 4 features, accuracy=0.98)
Dr.R.Meena Prakash, Associate
9/16/2021
Professor/PSREC
K-Means Segmentation of image
• import cv2
• import numpy as np
• import matplotlib.pyplot as plt
• image = cv2.imread("image.jpg")
• image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
• pixel_values = image.reshape((-1, 3))
• pixel_values = np.float32(pixel_values)
• print(pixel_values.shape)
• criteria = (cv2.TERM_CRITERIA_EPS +
cv2.TERM_CRITERIA_MAX_ITER, 100, 0.2)

Dr.R.Meena Prakash, Associate


9/16/2021
Professor/PSREC
• k=3
• _, labels, (centers) = cv2.kmeans(pixel_values,
k, None, criteria, 10,
cv2.KMEANS_RANDOM_CENTERS)
• # convert back to 8 bit values
• centers = np.uint8(centers)
• # flatten the labels array
• labels = labels.flatten()

Dr.R.Meena Prakash, Associate


9/16/2021
Professor/PSREC
• # convert all pixels to the color of the centroids
• segmented_image = centers[labels.flatten()]
• # reshape back to the original image dimension
• segmented_image =
segmented_image.reshape(image.shape)
• # show the image
• plt.imshow(image)
• plt.show()
• plt.imshow(segmented_image)
• plt.show()

Dr.R.Meena Prakash, Associate


9/16/2021
Professor/PSREC
Output

Dr.R.Meena Prakash, Associate


9/16/2021
Professor/PSREC
Principal Component Analysis
• import numpy as np
• import matplotlib.pyplot as plt
• import pandas as pd
• # importing or loading the dataset
• dataset = pd.read_csv('wine.csv')
• # distributing the dataset into two
components X and Y
• X = dataset.iloc[:, 1:13].values
• y = dataset.iloc[:, 0].values
Dr.R.Meena Prakash, Associate
9/16/2021
Professor/PSREC
• from sklearn.model_selection import
train_test_split
• X_train, X_test, y_train, y_test = train_test_split(X,
y, test_size = 0.2, random_state = 0)
• # performing preprocessing part
• from sklearn.preprocessing import StandardScaler
• sc = StandardScaler()
• X_train = sc.fit_transform(X_train)
• X_test = sc.transform(X_test)
• # Applying PCA function on training
• # and testing set of X component
Dr.R.Meena Prakash, Associate
9/16/2021
Professor/PSREC
• from sklearn.decomposition import PCA
• from sklearn.linear_model import LogisticRegression
• pca = PCA(n_components = 2)
• X1_train = pca.fit_transform(X_train)
• X1_test = pca.transform(X_test)
• print(X.shape)
• print(X1_train.shape)
• variance = pca.explained_variance_ratio_
• classifier = LogisticRegression(random_state = 0)
• classifier.fit(X1_train, y_train)
• y_pred = classifier.predict(X1_test)
Dr.R.Meena Prakash, Associate
9/16/2021
Professor/PSREC
• print(classifier.score(X1_test, y_test))
• classifier.fit(X_train, y_train)
• y1_pred = classifier.predict(X_test)
• # making confusion matrix between
• # test set of Y and predicted value.
• print(classifier.score(X_test, y_test))
• print(np.shape(X_train))
• print(np.shape(X1_train))
• plt.figure(figsize=(8,6))
• plt.scatter(X1_train[:,0],X1_train[:,1],s=10,c=y_train,cmap='r
ainbow')
• plt.xlabel('First principal component')
• plt.ylabel('Second Principal Component')
Dr.R.Meena Prakash, Associate
9/16/2021
Professor/PSREC
Linear Vs logistic regression

Dr.R.Meena Prakash, Associate


9/16/2021
Professor/PSREC
SVM

Dr.R.Meena Prakash, Associate


9/16/2021
Professor/PSREC
References
• Python Data science Hand book –
Jake VanderPlas
• Building Machine Learning Systems with
Python – Luis Pedro Coelho, Willi Richert
• Deep Learning – Ian Goodfellow, Yoshua
Bengio, Aaron Courville
• Statistics and Machine Learning and Python –
Edouard Duchesnay, Tommy Lofstedt
• Other Web Resources

Dr.R.Meena Prakash, Associate


9/16/2021
Professor/PSREC

You might also like