0% found this document useful (0 votes)

104 views

Python For Machine Learning Basics

The document summarizes a workshop on machine learning concepts and applications held at P.S.R. Engineering College in Sivakasi. It discusses the installation of Anaconda for Python packages, popular machine learning libraries like NumPy, SciPy, Pandas and scikit-learn. Specific machine learning algorithms covered include k-nearest neighbors classification, linear regression, support vector machines, k-means clustering and principal component analysis. Code examples are provided for k-NN classification of iris data and linear regression on a salary dataset.

Uploaded by

eshwari2000

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

104 views

Python For Machine Learning Basics

Uploaded by

eshwari2000

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 36

Workshop on “Machine Learning Concepts

and Applications”

P.S.R.Engineering College,
Sevalpatti, Sivakasi

Dr.R.Meena Prakash, Associate

9/16/2021
Professor/PSREC
Session 2
Python for Machine Learning algorithms and
applications

Dr.R.Meena Prakash
Associate Professor/ECE
P.S.R.Engineering College,
Sevalpatti, Sivakasi

Dr.R.Meena Prakash, Associate

9/16/2021
Professor/PSREC
Contents
• Anaconda Installation
• Numpy, SciPy, matplotlib, Pandas, openCV
• Sci-kit learn
• K-Nearest Neighbors Classification
• Linear Regression
• SVM Classifier
• K-Means Image Segmentation
• PCA and Logistic Regression Classifier

Dr.R.Meena Prakash, Associate

9/16/2021
Professor/PSREC
Anaconda Installation

https://www.anaconda.com/products/individual

• Anaconda Individual Edition is the world’s most popular

Python distribution platform with over 20 million users
worldwide.
• Over 7,500 data science and machine learning packages
are available. With the conda-install command
thousands of open-source packages can be installed.
Dr.R.Meena Prakash, Associate
9/16/2021
Professor/PSREC
Numpy, SciPy and matplotlib, OpenCV
• NumPy provides support of highly optimized
multidimensional arrays. These are the basic data
structure of most state-of-the art algorithms.
• SciPy use these arrays to provide a set of fast
numerical recipes. SciPy contains modules for
optimization, linear algebra, integration, interpolation,
special functions, FFT, signal and image processing
• matplotlib is feature-rich library to plot high-quality
graphs using Python.
• OpenCV is popular library for Computer Vision.
• Pandas – Python data analysis and manipulation tool;
Great tool for using Excel with Python
Dr.R.Meena Prakash, Associate
9/16/2021
Professor/PSREC
• Scalars (0D tensors)
A tensor that contains only one number is called
a scalar.
• Vectors (1D tensors)
• An array of numbers is called a vector, or 1D
tensor. A 1D tensor is said to have exactly
one axis.
• Matrices (2D tensors)
An array of vectors is a matrix, or 2D tensor. A
matrix has two axes (often referred to
rows and columns).
Dr.R.Meena Prakash, Associate
9/16/2021
Professor/PSREC
• >>> import numpy as np
• >>> a = np.array([0,1,2,3,4,5])
• >>> a
• array([0, 1, 2, 3, 4, 5])
• >>> a.ndim
• 1
• >>> a.shape
• (6,)

Dr.R.Meena Prakash, Associate

9/16/2021
Professor/PSREC
• >>> b = a.reshape((3,2))
• >>> b
• array([[0, 1],
• [2, 3],
• [4, 5]])
• >>> b.ndim
• 2
• >>> b.shape
• (3, 2)

Dr.R.Meena Prakash, Associate

9/16/2021
Professor/PSREC
• c=a.reshape(3,1,2) • [[[0 1]]

• print(c) • [[2 3]]

• print(c.ndim)
• print(c.shape) • [[4 5]]]
• 3
• npdata=np.arange(3)
• (3, 1, 2)
• print(npdata) • [0 1 2]
• npdata=np.arange(40) • [[ 0 1 2 3 4 5 6 7]
• npdata.shape=(5,8) • [ 8 9 10 11 12 13 14 15]
• [16 17 18 19 20 21 22 23]
• print(npdata) • [24 25 26 27 28 29 30 31]
• [32 33 34 35 36 37 38 39]]
Dr.R.Meena Prakash, Associate
9/16/2021
Professor/PSREC
SciPy Packages

Dr.R.Meena Prakash, Associate

9/16/2021
Professor/PSREC
• import scipy as sp
• data = sp.genfromtxt("web_traffic.tsv", delimiter="\t")
• print(data[:10])
• print(data.shape)
• x = data[:,0] 743 hrs – web traffic
• y = data[:,1] In word pad file
Missing Values - 8
• n=sp.sum(sp.isnan(y))
• print(n)
• x = x[~sp.isnan(y)]
• y = y[~sp.isnan(y)]
• print(x)
• print(y)
• print(x.shape)
Dr.R.Meena Prakash, Associate
9/16/2021
Professor/PSREC
• import matplotlib.pyplot as plt
• # plot the (x,y) points with dots of size 10
• plt.scatter(x, y, s=10)
• plt.title("Web traffic over the last month")
• plt.xlabel("Time")
• plt.ylabel("Hits/hour")
• plt.xticks([w*7*24 for w in range(10)],['week %i' % w
for w in range(10)])
• plt.autoscale(tight=True)
• # draw a slightly opaque, dashed grid
• plt.grid(True, linestyle='-', color='0.75')
• plt.show()
Dr.R.Meena Prakash, Associate
9/16/2021
Professor/PSREC
Dr.R.Meena Prakash, Associate
9/16/2021
Professor/PSREC
Scikit-learn
• Scikit library is the standard library for many
machine learning tasks including classification
• fit(features, labels): This is the learning step
and fits the parameters of the model
• predict(features): This method can only be
called after fit and returns a prediction for one
or more inputs
• Conda install scikit-learn

Dr.R.Meena Prakash, Associate

9/16/2021
Professor/PSREC
Cross Validation
• Cross-validation is a re-sampling procedure used to
evaluate machine learning models on a limited data
sample.
• Shuffle the dataset randomly.
• Split the dataset into k groups
• For each unique group:
– Take the group as a hold out or test data set
– Take the remaining groups as a training data set
– Fit a model on the training set and evaluate it on the test set
– Retain the evaluation score and discard the model
• Summarize the skill of the model using the sample of
model evaluation scores. In general k=5 or 10
Dr.R.Meena Prakash, Associate
9/16/2021
Professor/PSREC
Cross Validation

Dr.R.Meena Prakash, Associate

9/16/2021
Professor/PSREC
K-Nearest Neighbor Algorithm
• KNN is a non-parametric
algorithm in which there
is no assumption for
underlying data
distribution like GMM
• K is the number of
nearest neighbors
The steps include
• Calculate distance
• Find closest neighbors
• Vote for labels
Dr.R.Meena Prakash, Associate
9/16/2021
Professor/PSREC
K Neighbors Classifier for classification of iris data
• from matplotlib import pyplot as plt
• import numpy as np
• from sklearn.datasets import load_iris
• from sklearn.model_selection import
train_test_split
• from sklearn.model_selection import KFold
• data = load_iris()

Dr.R.Meena Prakash, Associate

9/16/2021
Professor/PSREC
• features = data.data
• feature_names = data.feature_names
• target = data.target
• target_names = data.target_names
• labels = target_names[target]
• from sklearn.neighbors import
KNeighborsClassifier
• (Features are the length and width of sepals
and petals)

Dr.R.Meena Prakash, Associate

9/16/2021
Professor/PSREC
• classifier = KNeighborsClassifier(n_neighbors=1)
• X=features
• y=target
• X_train, X_test, y_train, y_test = train_test_split(
• X, y, test_size=0.33,
random_state=42)
• classifier.fit(X_train, y_train)
• print(classifier.score(X_test, y_test))
• (random state – seed to the random generator)
• Output : Accuracy=0.98
Dr.R.Meena Prakash, Associate
9/16/2021
Professor/PSREC
Linear Regression

• import numpy as np
• import matplotlib.pyplot as plt
• import pandas as pd
• dataset = pd.read_csv('Position_Salaries.csv')
• X = dataset.iloc[:, 1:2].values
• y = dataset.iloc[:, 2].values
• from sklearn.tree import DecisionTreeRegressor
• regressor =
DecisionTreeRegressor(random_state=0)
• (iloc in pandas is used to select rows and columns in
Pandas dataframe by row numbers)
Dr.R.Meena Prakash, Associate
9/16/2021
Professor/PSREC
• regressor.fit(X,y)
• n=np.array([6.5]).reshape(1, 1)
• y_pred = regressor.predict(n)
• plt.scatter(X, y, color = 'red')
• plt.plot(X, regressor.predict(X), color = 'blue')
• plt.title('Regression Model')
• plt.xlabel('Position level')
• plt.ylabel('Salary')
• plt.show()

Dr.R.Meena Prakash, Associate

9/16/2021
Professor/PSREC
• X_grid = np.arange(min(X), max(X), 0.01)
• X_grid = X_grid.reshape((len(X_grid), 1))
• plt.scatter(X, y, color = 'red')
• plt.plot(X_grid, regressor.predict(X_grid), color =
'blue')
• plt.title('Example of Decision Regression Model')
• plt.xlabel('Position level')
• plt.ylabel('Salary')
• plt.show()
• (arange- evenly spaced values within the given
range)
Dr.R.Meena Prakash, Associate
9/16/2021
Professor/PSREC
Output

Dr.R.Meena Prakash, Associate

9/16/2021
Professor/PSREC
SVM Classifier

• from sklearn import datasets

• from sklearn.model_selection import train_test_split
• iris = datasets.load_iris()
• X = iris.data # we only take the first two features.
• y = iris.target
• from sklearn.svm import SVC
• model = SVC(kernel='linear', C=1E10)( # C is the penalty
parameter of error term)
• X_train, X_test, y_train, y_test = train_test_split(
• X, y, test_size=0.33, random_state=42)
• model.fit(X_train, y_train)
• print(model.score(X_test, y_test))
(For 2 features, accuracy=0.84, For 4 features, accuracy=0.98)
Dr.R.Meena Prakash, Associate
9/16/2021
Professor/PSREC
K-Means Segmentation of image
• import cv2
• import numpy as np
• import matplotlib.pyplot as plt
• image = cv2.imread("image.jpg")
• image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
• pixel_values = image.reshape((-1, 3))
• pixel_values = np.float32(pixel_values)
• print(pixel_values.shape)
• criteria = (cv2.TERM_CRITERIA_EPS +
cv2.TERM_CRITERIA_MAX_ITER, 100, 0.2)

Dr.R.Meena Prakash, Associate

9/16/2021
Professor/PSREC
• k=3
• _, labels, (centers) = cv2.kmeans(pixel_values,
k, None, criteria, 10,
cv2.KMEANS_RANDOM_CENTERS)
• # convert back to 8 bit values
• centers = np.uint8(centers)
• # flatten the labels array
• labels = labels.flatten()

Dr.R.Meena Prakash, Associate

9/16/2021
Professor/PSREC
• # convert all pixels to the color of the centroids
• segmented_image = centers[labels.flatten()]
• # reshape back to the original image dimension
• segmented_image =
segmented_image.reshape(image.shape)
• # show the image
• plt.imshow(image)
• plt.show()
• plt.imshow(segmented_image)
• plt.show()

Dr.R.Meena Prakash, Associate

9/16/2021
Professor/PSREC
Output

Dr.R.Meena Prakash, Associate

9/16/2021
Professor/PSREC
Principal Component Analysis
• import numpy as np
• import matplotlib.pyplot as plt
• import pandas as pd
• # importing or loading the dataset
• dataset = pd.read_csv('wine.csv')
• # distributing the dataset into two
components X and Y
• X = dataset.iloc[:, 1:13].values
• y = dataset.iloc[:, 0].values
Dr.R.Meena Prakash, Associate
9/16/2021
Professor/PSREC
• from sklearn.model_selection import
train_test_split
• X_train, X_test, y_train, y_test = train_test_split(X,
y, test_size = 0.2, random_state = 0)
• # performing preprocessing part
• from sklearn.preprocessing import StandardScaler
• sc = StandardScaler()
• X_train = sc.fit_transform(X_train)
• X_test = sc.transform(X_test)
• # Applying PCA function on training
• # and testing set of X component
Dr.R.Meena Prakash, Associate
9/16/2021
Professor/PSREC
• from sklearn.decomposition import PCA
• from sklearn.linear_model import LogisticRegression
• pca = PCA(n_components = 2)
• X1_train = pca.fit_transform(X_train)
• X1_test = pca.transform(X_test)
• print(X.shape)
• print(X1_train.shape)
• variance = pca.explained_variance_ratio_
• classifier = LogisticRegression(random_state = 0)
• classifier.fit(X1_train, y_train)
• y_pred = classifier.predict(X1_test)
Dr.R.Meena Prakash, Associate
9/16/2021
Professor/PSREC
• print(classifier.score(X1_test, y_test))
• classifier.fit(X_train, y_train)
• y1_pred = classifier.predict(X_test)
• # making confusion matrix between
• # test set of Y and predicted value.
• print(classifier.score(X_test, y_test))
• print(np.shape(X_train))
• print(np.shape(X1_train))
• plt.figure(figsize=(8,6))
• plt.scatter(X1_train[:,0],X1_train[:,1],s=10,c=y_train,cmap='r
ainbow')
• plt.xlabel('First principal component')
• plt.ylabel('Second Principal Component')
Dr.R.Meena Prakash, Associate
9/16/2021
Professor/PSREC
Linear Vs logistic regression

Dr.R.Meena Prakash, Associate

9/16/2021
Professor/PSREC
SVM

Dr.R.Meena Prakash, Associate

9/16/2021
Professor/PSREC
References
• Python Data science Hand book –
Jake VanderPlas
• Building Machine Learning Systems with
Python – Luis Pedro Coelho, Willi Richert
• Deep Learning – Ian Goodfellow, Yoshua
Bengio, Aaron Courville
• Statistics and Machine Learning and Python –
Edouard Duchesnay, Tommy Lofstedt
• Other Web Resources

Dr.R.Meena Prakash, Associate

9/16/2021
Professor/PSREC

Machine Learning Cheat Sheet ??? - ?
No ratings yet
Machine Learning Cheat Sheet ??? - ?
231 pages
100 English Sentences Used in Daily Life - English Grammar Here
No ratings yet
100 English Sentences Used in Daily Life - English Grammar Here
8 pages
Lecture Notes - William James & Richard Taylor
100% (1)
Lecture Notes - William James & Richard Taylor
3 pages
Advanced Unix Programming
From Everand
Advanced Unix Programming
Prof. N. B Venkateswarlu
No ratings yet
Installing A Python Based Machine Learning Environment in Windows 10
No ratings yet
Installing A Python Based Machine Learning Environment in Windows 10
9 pages
Introduction To Python Programming: Dr. R. Rajeswara Rao Professor & Head Dept. of CSE Jntuk-Ucev Vizianagaram
No ratings yet
Introduction To Python Programming: Dr. R. Rajeswara Rao Professor & Head Dept. of CSE Jntuk-Ucev Vizianagaram
27 pages
Python For Machine Learning-2
No ratings yet
Python For Machine Learning-2
35 pages
Data Preprocessing in Machine Learning
No ratings yet
Data Preprocessing in Machine Learning
5 pages
Data Types in Python
No ratings yet
Data Types in Python
5 pages
Deep Learning - Wikipedia
No ratings yet
Deep Learning - Wikipedia
36 pages
Chapter 2 Introduction To R and Python
No ratings yet
Chapter 2 Introduction To R and Python
35 pages
C Programming Notes - Part 2
No ratings yet
C Programming Notes - Part 2
16 pages
Numpy Complete Notes
No ratings yet
Numpy Complete Notes
64 pages
Machine Learning Solution
No ratings yet
Machine Learning Solution
6 pages
Iot Systems - Logical Design Using Python: Bahga & Madisetti, © 2015
No ratings yet
Iot Systems - Logical Design Using Python: Bahga & Madisetti, © 2015
31 pages
Python Notes 3rd Mca
No ratings yet
Python Notes 3rd Mca
99 pages
Pandas DataFrame Notes
No ratings yet
Pandas DataFrame Notes
13 pages
Data Abstraction and Data Independence
No ratings yet
Data Abstraction and Data Independence
1 page
Accuracy and Error Measures
No ratings yet
Accuracy and Error Measures
46 pages
Tools Machine Learning
No ratings yet
Tools Machine Learning
9 pages
Ufl User Manual
No ratings yet
Ufl User Manual
116 pages
Cs-101: Computer Systems: Introduction To Programming Using Matlab
No ratings yet
Cs-101: Computer Systems: Introduction To Programming Using Matlab
60 pages
Practical 02
No ratings yet
Practical 02
36 pages
Unit I Dbms
0% (1)
Unit I Dbms
45 pages
Artificial Intelligence Lab Manual: Python
No ratings yet
Artificial Intelligence Lab Manual: Python
15 pages
CSE 1001 (Python) Faculty Name: Dr. AMIT Kumar Tyagi
No ratings yet
CSE 1001 (Python) Faculty Name: Dr. AMIT Kumar Tyagi
16 pages
(R18A0513) Python Programming
No ratings yet
(R18A0513) Python Programming
183 pages
Machine Learning Unit 4
100% (1)
Machine Learning Unit 4
78 pages
Python PDF
100% (1)
Python PDF
103 pages
The Python Bible For Beginners
No ratings yet
The Python Bible For Beginners
185 pages
Beginners Python Cheat Sheets Sample
100% (1)
Beginners Python Cheat Sheets Sample
8 pages
Python
No ratings yet
Python
157 pages
MATLAB by Examples - Starting With Neural Network in Matlab
No ratings yet
MATLAB by Examples - Starting With Neural Network in Matlab
6 pages
Yeungnam University School of Mechanical Engineering Syllabus For 0993 Tribology
No ratings yet
Yeungnam University School of Mechanical Engineering Syllabus For 0993 Tribology
42 pages
Principal of Programming Language
No ratings yet
Principal of Programming Language
67 pages
Python Notes
No ratings yet
Python Notes
232 pages
AI Course Outline
0% (1)
AI Course Outline
2 pages
Cs2258 Database Management Systems Lab Manual: Prepared by
No ratings yet
Cs2258 Database Management Systems Lab Manual: Prepared by
65 pages
4.7.1 - Data Warehousing Mining & Business Intelligence
No ratings yet
4.7.1 - Data Warehousing Mining & Business Intelligence
3 pages
Comprehension in Python
No ratings yet
Comprehension in Python
2 pages
How To Do Deep Learning With SAS: Title
No ratings yet
How To Do Deep Learning With SAS: Title
16 pages
Numpy
No ratings yet
Numpy
19 pages
Data Modelling and Visualization
No ratings yet
Data Modelling and Visualization
31 pages
Introduction To Data Engineering
No ratings yet
Introduction To Data Engineering
8 pages
Speech and Language Processing: Third Edition Draft
No ratings yet
Speech and Language Processing: Third Edition Draft
287 pages
Lecture Notes
100% (1)
Lecture Notes
82 pages
Unit - 2
No ratings yet
Unit - 2
26 pages
Python Lecture 2018
No ratings yet
Python Lecture 2018
174 pages
Ghantasala Golla Python
No ratings yet
Ghantasala Golla Python
3 pages
CS 224n Assignment #2: Word2vec (43 Points)
No ratings yet
CS 224n Assignment #2: Word2vec (43 Points)
4 pages
SQL Queries
No ratings yet
SQL Queries
25 pages
Chapter 1 Introduction To Databases and Transactions
No ratings yet
Chapter 1 Introduction To Databases and Transactions
47 pages
SSIS Transformations
No ratings yet
SSIS Transformations
46 pages
Python - User Defined Functions
No ratings yet
Python - User Defined Functions
150 pages
Machine Learning Guide Line
No ratings yet
Machine Learning Guide Line
10 pages
Machine Learning GenAI Roadma
No ratings yet
Machine Learning GenAI Roadma
36 pages
Python Lab Manual 2022-23-2
No ratings yet
Python Lab Manual 2022-23-2
36 pages
01 Introduction-Modern BCI Design
No ratings yet
01 Introduction-Modern BCI Design
49 pages
SAS Presentation
No ratings yet
SAS Presentation
49 pages
Java Reflection Complete Self-Assessment Guide
From Everand
Java Reflection Complete Self-Assessment Guide
Gerardus Blokdyk
No ratings yet
DBMS Lab Manual
From Everand
DBMS Lab Manual
Jitendra Patel
1.5/5 (3)
New Learning of Python by Practical Innovation and Technology
From Everand
New Learning of Python by Practical Innovation and Technology
Sudhir Pathania
No ratings yet
Unit 1 - Cloud Computing
No ratings yet
Unit 1 - Cloud Computing
5 pages
Machine Learning Basics
No ratings yet
Machine Learning Basics
57 pages
EC8361 - ADCLab Manual PDF
100% (1)
EC8361 - ADCLab Manual PDF
105 pages
Prakash2019 - Face Recognition
No ratings yet
Prakash2019 - Face Recognition
4 pages
Ij 909 1 Mrbrains13-Meenaprakash
No ratings yet
Ij 909 1 Mrbrains13-Meenaprakash
9 pages
Australian Journal: Australian Journal of Basic and Applied Sciences January 2018
No ratings yet
Australian Journal: Australian Journal of Basic and Applied Sciences January 2018
8 pages
EC8361 - ADCLab Manual PDF
100% (1)
EC8361 - ADCLab Manual PDF
105 pages
Gesture Recognition
No ratings yet
Gesture Recognition
4 pages
Achieve Programmable Automatic Biopsy System: Spring-Loaded Action For Fast, Accurate Penetration
No ratings yet
Achieve Programmable Automatic Biopsy System: Spring-Loaded Action For Fast, Accurate Penetration
4 pages
Cataratas
No ratings yet
Cataratas
13 pages
Legal Aspects of Banking (Cir 11.2.2018)
No ratings yet
Legal Aspects of Banking (Cir 11.2.2018)
32 pages
Cobra ODE Supported Boot Discs
No ratings yet
Cobra ODE Supported Boot Discs
16 pages
MLS318 (2023)
No ratings yet
MLS318 (2023)
11 pages
Lessons Learned About Lesson Learned
100% (1)
Lessons Learned About Lesson Learned
18 pages
Essay Writing Skills
No ratings yet
Essay Writing Skills
20 pages
Read The Following Text To Answer Questions Number 1 To 3
No ratings yet
Read The Following Text To Answer Questions Number 1 To 3
6 pages
Christian Approach
No ratings yet
Christian Approach
14 pages
Miranda Yeoh KREBSCYCLE
No ratings yet
Miranda Yeoh KREBSCYCLE
11 pages
Goals Magic System
100% (2)
Goals Magic System
45 pages
65 Nov 2012
No ratings yet
65 Nov 2012
23 pages
Beowulf. Translations
No ratings yet
Beowulf. Translations
3 pages
Abbott vs. NLRC, 145 Scra 206
No ratings yet
Abbott vs. NLRC, 145 Scra 206
3 pages
Toefl Prep 1 Structure
No ratings yet
Toefl Prep 1 Structure
133 pages
5 Times Younghyun Thought
No ratings yet
5 Times Younghyun Thought
8 pages
Purcell Quartet Songs, Albums, Reviews, Bio & ... | AllMusic
No ratings yet
Purcell Quartet Songs, Albums, Reviews, Bio & ... | AllMusic
1 page
Japanese Grammar Compilation
No ratings yet
Japanese Grammar Compilation
4 pages
The Path of Knowledge - in The Light of Bhagavad Gita
100% (1)
The Path of Knowledge - in The Light of Bhagavad Gita
5 pages
Chess
No ratings yet
Chess
2 pages
3.B.2.1. The Role of Strategic Alliance
No ratings yet
3.B.2.1. The Role of Strategic Alliance
7 pages
C24 BTTS-9 Physics
No ratings yet
C24 BTTS-9 Physics
19 pages
Jason Z. Gao Zgao@math - Carleton.ca
No ratings yet
Jason Z. Gao Zgao@math - Carleton.ca
3 pages
Fundamentals of Ent Week 2
No ratings yet
Fundamentals of Ent Week 2
45 pages
Group 3 Audio Lingual Method Beed 2 2
No ratings yet
Group 3 Audio Lingual Method Beed 2 2
25 pages
Bio Factsheet: The Paired T-Test & When To Use It
No ratings yet
Bio Factsheet: The Paired T-Test & When To Use It
3 pages
Pedro Men Arriving Soon 9 Feb
No ratings yet
Pedro Men Arriving Soon 9 Feb
10 pages
Test 2
No ratings yet
Test 2
3 pages