Recognizing Handwritten Digits With Scikit-Learn: Punam Seal
Recognizing Handwritten Digits With Scikit-Learn: Punam Seal
Included among the other applications that may come to mind is OCR
(Optical Character Recognition) software. OCR software must read
handwritten text, or pages of printed books, for general electronic
documents in which each character is well defined. But the problem of
handwriting recognition goes farther back in time, more precisely to the
early 20th Century (1920s), when Emanuel Goldberg (1881–1970)
began his studies regarding this issue and suggested that a statistical
approach would be an optimal choice.
OVERVIEW:
The problem involves predicting a numeric value, and then reading and
interpreting an image that uses a handwritten font. So even in this case
you will have an estimator with the task of learning through a fit()
function, and once it has reached a degree of predictive capability (a
model sufficiently valid), it will produce a prediction with the predict()
function. Then we will discuss the training set and validation set, created
this time from a series of images.
GOAL:
Our goal is to involve predicting a numeric value, and then reading and
interpreting an image that uses a handwritten font. You can choose a
smaller training set and different range for validation and get 100%
accurate predictions, but this may not be the case at all times. Perform
data analysis to accept the Hypothesis, if it predicts the digit accurately
95% of the times or else reject it. Run for at-least 3 cases , each case for
different range of training and validation sets.
TOOLS USED:
DATA ANALYSIS:
Firstly, I want to import the required libraries and then describe our data
by using function like- DESCR, dir, size, shape, etc.
Import the required libraries
1. I will be using several Python several libraries such as Numpy,
Matplotlib.
import numpy as np
import matplotlib.pyplot as plt
print(digits.DESCR)
Out:
.. _digits_dataset:
.. topic:: References
dir(digits)
digits.data
6. The function shape returns the shape of an array which means in our
dataset, we have 1797 lines and 64 columns.
digits.data.shape
Out:(1797, 64)
digits.target
Out: array([0, 1, 2, ..., 8, 9, 8])
digits.target.size
Out: 1797
Visualizing data
9. The images of the handwritten digits are contained in a digits.images
array. Each element of this array is an image that is represented by an
8x8 matrix of numerical values that correspond to a grayscale from
white, with a value of 0, to black, with the value 15.
digits.images[0]
This dataset contains 1,797 elements, and so you can consider the first
1,791 as a training set and will use the last six as a validation set.
10. You can see in detail these six handwritten digits by using the
imshow() function is used to display data as an image; i.e. on a 2D
regular raster, cmap = gray_r displays a grayscale image,
interpolation= ‘nearest’ displays an image without trying to
interpolate between pixels if the display resolution is not the same as the
image resolution and the title() function is used to display the title on
the graph.
plt.figure(figsize=(10,7))
plt.subplot(321)
plt.imshow(digits.images[1791], cmap=plt.cm.gray_r,
interpolation='nearest')
plt.subplot(322)
plt.imshow(digits.images[1792], cmap=plt.cm.gray_r,
interpolation='nearest')
plt.subplot(323)
plt.imshow(digits.images[1793], cmap=plt.cm.gray_r,
interpolation='nearest')
plt.subplot(324)
plt.imshow(digits.images[1794], cmap=plt.cm.gray_r,
interpolation='nearest')
plt.subplot(325)
plt.imshow(digits.images[1795], cmap=plt.cm.gray_r,
interpolation='nearest')
plt.subplot(326)
plt.imshow(digits.images[1796], cmap=plt.cm.gray_r,
interpolation='nearest')
Preparing data
11. Next, for preparing the data for training by declaring a NumPy array
data and reshaping it so that it has the first dimension equal to the
length of the images, which is the number of samples i.e. n_samples and
for length, I used the len() function, but with reduced dimensionality. So,
the dimension of data will be 1797 x 64.
n_samples = len(digits.images)
n_samples
Out: 1797
12. As you can see, this function has been reshaped by the
numpy.reshape() function that shapes an array without changing the
data of the array.
Ideally, you can split your original dataset into input (x) and output (y)
columns, then call the function passing both arrays and have them split
appropriately into train and test subsets.
13. Here, we have split the data from data and digits.target by
assigning 0.01 as test size and setting the random_state to an integer
value which is equal to zero.
svc_classifier.fit(x_train, y_train)
Out: SVC(C=100.0, gamma=0.001)
16. By using predict() function, you can test the estimator by making it
interpret the digits of the test set and named as svc_y_pred.
svc_y_pred = svc_classifier.predict(x_test)
svc_y_pred
Out: array([2, 8, 2, 6, 6, 7, 1, 9, 8, 5, 2, 8, 6, 6, 6, 6, 1, 0])
17. Now, displaying plots of each digit from 0 to 9 which are in the form
of an array as images using functions such as
figure() function which is used to create a new figure with a
specified size of (12,7),
combining two lists using the zip() function for easier handling inside
the plotting loop,
plt.figure(figsize=(12,7))
images_and_labels = list(zip(digits.images, digits.target))
18. Plotting the images of the predicted digits from the array using the
following code.
images_and_predictions = list(zip(x_test,svc_y_pred))
plt.figure(figsize=(18,5))
for index, (image, prediction) in
enumerate(images_and_predictions[:19]):
plt.subplot(2, 9, index + 1)
image = image.reshape(8, 8)
plt.imshow(image, cmap=plt.cm.gray_r, interpolation='nearest')
plt.title('Prediction: %i' % prediction)
It will display the confusion matrix and classification report using the
classification_report() and confusion_matrix() functions.
It will display the accuracy score of the model can be obtained using
the score() function.
21. predict() function, you can test the estimator by making it interpret
the digits of the test set and named as GNB_y_pred.
GNB_y_pred = GNB_classifier.predict(x_test)
GNB_y_pred
Out: array([2, 8, 2, 6, 6, 7, 1, 9, 8, 5, 2, 8, 6, 6, 6, 6, 1, 0])
plt.figure(figsize=(12,7))
images_and_labels = list(zip(digits.images, digits.target))
images_and_predictions = list(zip(x_test,GNB_y_pred))
plt.figure(figsize=(18,5))
for index, (image, prediction) in
enumerate(images_and_predictions[:19]):
plt.subplot(2, 9, index + 1)
image = image.reshape(8, 8)
plt.imshow(image, cmap=plt.cm.gray_r, interpolation='nearest')
plt.title('Prediction: %i' % prediction)
24. This code will display the classification report, confusion matrix and
accuracy of the Gaussian Naive Bayes Classifier as follows:
26. predict() function, you can test the estimator by making it interpret
the digits of the test set and named as KNN_y_pred.
KNN_y_pred = KNN_classifier.predict(x_test)
KNN_y_pred
Out: array([2, 8, 2, 6, 6, 7, 1, 9, 8, 5, 2, 8, 6, 6, 6, 6, 1, 0])
plt.figure(figsize=(12,7))
images_and_labels = list(zip(digits.images, digits.target))
images_and_predictions = list(zip(x_test,KNN_y_pred))
plt.figure(figsize=(18,5))
for index, (image, prediction) in
enumerate(images_and_predictions[:19]):
plt.subplot(2, 9, index + 1)
image = image.reshape(8, 8)
plt.imshow(image, cmap=plt.cm.gray_r, interpolation='nearest')
plt.title('Prediction: %i' % prediction)
29. This code will display the classification report, confusion matrix and
accuracy of the K Nearest Neighbours Classifier as follows:
print("\nClassification report for K Nearest Neighbours Classifier
%s:\n%s\n" % (KNN_classifier, metrics.classification_report(y_test,
KNN_y_pred)))
disp = metrics.plot_confusion_matrix(KNN_classifier, x_test, y_test)
disp.figure_.suptitle("Confusion Matrix of K Nearest Neighbours
Classifier")
print("\nConfusion matrix of K Nearest Neighbours Classifier:\n%s" %
disp.confusion_matrix)
print("\nAccuracy of the K Nearest Neighbours Classifier Algorithm:
", KNN_classifier.score(x_test, y_test))
plt.show()
OBSERVATION:
GITHUB LINK:
Suven-Consultants-and-Technology-Tasks/main.ipynb at master
·…
This repository contains Online Coding Internship related to Data
Analytics using Python Domain. …
github.com
CONCLUSION:
Movie Recommendation Methods you need know 【SIGGRAPH 2020】 Stylistic differences
System to Estimate Feature Unpaired Motion Style between R and Python in
Varunsinghal
Importance for ML Transfer from Video to modelling data through
models Animation neural networks
Summer Hu in Artificial Center on Frontiers of Nicola Giordano in Towards
Intelligence in Plain English Computing Studies, PKU Data Science
PDFmyURL.com - convert URLs, web pages or even full websites to PDF online. Easy API for developers!