Convolutional-Neural-Network-Based Handwritten Character Recognition: An Approach with Massive Multisource Data

foysal haque

Article Convolutional-Neural-Network-Based Handwritten Character Recognition: An Approach with Massive Multisource Data Nazmus Saqib 1,*, Khandaker Foysal Haque 2, Venkata Prasanth Yanambaka 1 and Ahmed Abdelgawad 1 College of Science and Engineering, Central Michigan University, Mount Pleasant, MI 48859, USA; yanam1v@cmich.edu (V.P.Y.); abdel1a@cmich.edu (A.A.) 2 Institute for the Wireless Internet of Things, Northeastern University, Boston, MA 02115, USA; haque.k@northeastern.edu * Correspondence: saqib1n@cmich.edu 1 Citation: Saqib, N.; Haque, K.F.; Yanambaka, V.P.; Abdelgawad, A. Convolutional-Neural-NetworkBased Handwritten Character Recognition: An Approach with Massive Multisource Data. Algorithms 2022, 15, 129. https:// doi.org/10.3390/a15040129 Academic Editor: Marcos Zampieri Received: 5 March 2022 Accepted: 12 April 2022 Abstract: Neural networks have made big strides in image classification. Convolutional neural networks (CNN) work successfully to run neural networks on direct images. Handwritten character recognition (HCR) is now a very powerful tool to detect traffic signals, translate language, and extract information from documents, etc. Although handwritten character recognition technology is in use in the industry, present accuracy is not outstanding, which compromises both performance and usability. Thus, the character recognition technologies in use are still not very reliable and need further improvement to be extensively deployed for serious and reliable tasks. On this account, characters of the English alphabet and digit recognition are performed by proposing a customtailored CNN model with two different datasets of handwritten images, i.e., Kaggle and MNIST, respectively, which are lightweight but achieve higher accuracies than state-of-the-art models. The best two models from the total of twelve designed are proposed by altering hyper-parameters to observe which models provide the best accuracy for which dataset. In addition, the classification reports (CRs) of these two proposed models are extensively investigated considering the performance matrices, such as precision, recall, specificity, and F1 score, which are obtained from the developed confusion matrix (CM). To simulate a practical scenario, the dataset is kept unbalanced and three more averages for the F measurement (micro, macro, and weighted) are calculated, which facilitates better understanding of the performances of the models. The highest accuracy of 99.642% is achieved for digit recognition, with the model using ‘RMSprop’, at a learning rate of 0.001, whereas the highest detection accuracy for alphabet recognition is 99.563%, which is obtained with the proposed model using ‘ADAM’ optimizer at a learning rate of 0.00001. The macro F1 and weighted F1 scores for the best two models are 0.998, 0.997:0.992, and 0.996, respectively, for digit and alphabet recognition. Published: 14 April 2022 Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and Keywords: handwritten character recognition; English character recognition; convolutional neural networks (CNNs); deep learning in character recognition; digit recognition; English alphabet recognition institutional affiliations. 1. Introduction Copyright: © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/license s/by/4.0/). Handwriting is the most typical and systematic way of recording facts and information. The handwriting of an individual is idiosyncratic and unique to individual people. The capability of software or a device to recognize and analyze human handwriting in any language is called a handwritten character recognition (HCR) system. Recognition can be performed from both online and offline handwriting. In recent years, applications of handwriting recognition are thriving, widely used in reading postal addresses, language translation, bank forms and check amounts, digital libraries, keyword spotting, and traffic sign detection. Algorithms 2022, 15, 129. https://doi.org/10.3390/a15040129 www.mdpi.com/journal/algorithms Algorithms 2022, 15, 129 22 of 26 Image acquisition, preprocessing, segmentation, feature extraction, and classification are the typical processes of an HCR system, as shown in Figure 1. The initial step is to receive an image form of handwritten characters, which is recognized as image acquisition that will proceed as an input to preprocessing. In preprocessing, distortions of the scanned images are removed and converted into binary images. Afterward, in the segmentation step, each character is divided into sub images. Then, it will extract every characteristic of the features from each image of the character. This stage is especially important for the last step of the HCR system, which is called classification [1]. Based on classification accuracy and different approaches to recognize the images, there are many classification methods, i.e., convolutional neural networks (CNNs), support vector machines (SVMs), recurrent neural networks (RNNs), deep belief networks, deep Boltzmann machines, and K-nearest neighbor (KNN) [2]. Figure 1. Representation of a common handwritten character recognition (HCR) system. A subclass of machine learning comprises neural networks (NNs), which are information-processing methods inspired by the biological process of the human brain. Figure 2 represents the basic neural network. The number of layers is indicated by deep learning in a neural network. Neurons, being the information-processing element, build the foundation of neural networks that draws parallels from the biological neural network. Weights associated with the connection links, bias, inputs, and outputs are the primary components of an NN. Every node is called a perceptron in a neural network (NN) [3]. Research is being conducted to obtain the best accuracy, but the accuracy using a CNN is not outstanding, which compromises the performance and usability for handwritten character recognition. Hence, the aim of this paper is to obtain the highest accuracy by introducing a handwritten character recognition (HCR) system using a CNN, which can automatically extract the important features from the images better than multilayer perceptron (MLP) [4–9]. Figure 2. Representation of a basic neural network (NN). Algorithms 2022, 15, 129 23 of 26 CNNs were first employed in 1980 [10]. The conception of convolutional neural networks (CNNs) was motivated by the human brain. People can identify objects from their childhood because they have seen hundreds of pictures of those objects, which is why a child can guess an object that they have never seen before. CNNs work in a similar way. CNNs used for analyzing visual images are a variation of an MLP deep neural network that is fully connected. Fully connected means that each neuron in the layer is fully connected to all the neurons in the subsequent layer. Some of the renowned CNN architectures are AlexNet (8 layers), VGG (16, 19 layers), GoogLeNet (22 layers), and ResNet (152 layers) [11]. CNN models can provide an excellent recognition result because they do not need to collect prior knowledge of designer features. As for CNNs, they do not depend on the rotation of input images. A CNN model has been broadly set for the HCR system, using the MNIST dataset. Such research has been carried out for several years. A few researchers have found the accuracy to be up to 99% for the recognition of handwritten digits [12]. An experiment was carried out using a combination of multiple CNN models for MNIST digits and had 99.73% accuracy [13]. Afterward, for the same MNIST dataset, the recognition accuracy was improved to 99.77%, when this experiment of the 7-net committee was extended to a 35-net committee [14]. Niu and Suen minimized the structural risk by integrating the SVM for the MNIST digit recognition and obtain the astonishing accuracy of 99.81% [15]. Chinese handwritten character recognition was investigated using a CNN [16]. Recently, Alvear-Sandoval et al. worked on deep neural networks (DNN) for MNIST and obtained a 0.19% error rate [17]. Nevertheless, after a vigilant investigation, it has been observed that the maximal recognition accuracy of the MNIST dataset can be attained by using only ensemble methods, as these aid in improving the classification accuracy. However, there are tradeoffs, i.e., high computational cost and increased testing complexity [18]. In this paper, a tailored CNN model is proposed which attains higher accuracy with light computational complexity. Research on HCR technology has been going on for long time now and it is in use by the industry, but the accuracy is low, which compromises the usability and overall performance of the technology. Until now, the character recognition technologies in use are still not very dependable and need more development to be deployed broadly for unfailing applications. On this account, characters of the English alphabet and digit recognition are performed in this paper by proposing a custom-tailored CNN model with two different datasets of handwritten images, i.e., Kaggle and MNIST, respectively, which achieve higher accuracies. The important features of these proposed projects are as follows: 1. 2. 3. 4. In the proposed CNN model, four 2D convolutional layers are kept the same and unchanged to obtain the maximum comparable recognition accuracy into two different datasets, Kaggle and MNIST, for handwritten letters and digits, respectively. This proves the versatility of our proposed model. A custom-tailored, lightweight, high-accuracy CNN model (with four convolutional layers, three max-pooling layers, and two dense layers) is proposed by keeping in mind that it should not overfit. Thus, the computational complexity of our model is reduced. Two different optimizers are used for each of the datasets, and three different learning rates (LRs) are used for each of the optimizers to evaluate the best models of the twelve models designed. This suitable selection will assist the research community in obtaining a deeper understanding of HCR. To the best of the authors’ knowledge, the novelty of this work is that no researchers to date have worked with the classification report in such detail with a tailored CNN model generalized for both handwritten English alphabet and digit recognition. Moreover, the proposed CNN model gives above 99% recognition accuracy both in compact MNIST digit datasets and in extensive Kaggle datasets for alphabets. Algorithms 2022, 15, 129 24 of 26 5. The distribution of the dataset is imbalanced. Hence, only the accuracy would be ineffectual in evaluating model performance, so advanced performances are analyzed to a great extent with a classification report for the best two proposed models for the Kaggle and MNIST datasets, respectively. Classification reports indicate the F1 score for each of the 10 classes for digits (0–9) and each of the 26 classes for alphabet (A–Z). In our case of multiclass classification, we examined averaging methods for the F1 score, resulting in different average scores, i.e., micro, macro, and weighted average, which is another novelty of this proposed project. The rest of the paper is organized as follows: Section 2 describes the review of the literature and related works in the handwritten character recognition research arena; Sections 3 and 4 present datasets and proposed CNN model architecture, respectively; Section 5 discusses the result analysis and provides a comparative analysis; and Section 6 describes the conclusion and suggestions for future directions. 2. Review of Literature and Related Works Many new techniques have been introduced in research papers to classify handwritten characters and numerals or digits. Shallow networks have already shown promising results for handwriting recognition [19–26]. Hinton et al. investigated deep belief networks (DBN), which have three layers along with a grasping algorithm, and recorded an accuracy of 98.75% for the MNIST dataset [27]. Pham et al. improved the performance of recurrent neural networks (RNNs), reducing the word error rate (WER) and character error rate (CER) by employing a regularization method of dropout to recognize unconstrained handwriting [28]. The convolutional neural network (CNN) delivered a vast change as it delivers a state-of-the-art performance in HCR accuracy [29–33]. In 2003, for visual document analysis, a common CNN architecture was introduced by Simard et al., which loosened the training of complex methods of neural networks [34]. Wang et al. used multilayer CNNs for end-to-end text recognition on benchmark datasets, e.g., street view text and ICDAR 2003, and accomplished brilliant results [35]. Recently, for scene text recognition, Shi et al. introduced a new approach, the conventional recurrent neural network (CRNN), integrating both the deep CNN (DCNN) and recurrent neural network (RNN), and announced its superiority to traditional methods of character recognition [36]. For semantic segmentation, Badrinarayanan et al. proposed a deep convolutional network architecture where the max-pooling layer was used to obtain good performance; the authors also compared their model with current techniques. The segmentation architecture known as SegNet consists of a pixel-wise classification layer, an encoder network, and a decoder network [37,38]. In offline handwritten character recognition, CNN has shown outstanding performance for different regional and international languages. Researchers have conducted studies on Chinese handwritten text recognition [39–41]; Arabic language [42]; handwritten Urdu text recognition [43,44]; handwritten Tamil character recognition [45]; Telugu character recognition [46]; and handwritten character recognition on Indic scripts [47]. Gupta et al. used features extracted from a CNN in their model and recognized the informative local regions in [48] from recent character images, accomplishing a recognition accuracy of 95.96% by applying a novel multi-objective optimization framework for HCR which comprises handwritten Bangla numerals, handwritten Devanagari characters, and handwritten English numerals. High performance of the CROHME dataset was observed in the work of Nguyen et al. [49]. The author employed a multiscale CNN for clustering handwritten mathematical expression (HME) and concluded by identifying that their model can be improved by training the CNN with a combination of global, attentive, and max-pooling layers. Recognition of word location in historical books, for example on Gutenberg’s Bible pages, is wisely addressed in the work of Ziran et al. [50] by developing an R-CNN-based Algorithms 2022, 15, 129 25 of 26 deep learning framework. Ptucha et al. introduced an intelligent character recognition (ICR) system, logically using a conventional neural network [51]. IAM datasets and French-language-based RIMES lexicon datasets were used to evaluate the model, which reported a commendable result. The variance between model parameters and hyperparameters was highlighted in [52]. The hyper-parameters include the number of epochs, hidden units, hidden layers, learning rate (LR), kernel size, activation function, etc., which must be determined before the training begins to determine the performance of the CNN [53]. It is mentioned that, if the hyper-parameters are chosen poorly, it can lead to a bad CNN performance. The total number of hyper-parameters of some CNN models are 27, 57, 78, and 150, respectively, for AlexNet [54], VGG-16 [55], GoogleNet [56], and ResNet52 [57]. To improve the recognition performance, practicing researchers play an important role in the handwriting recognition field for designing CNN parameters effectively. Tapotosh Ghosh et al. converted the images into black-and-white 28 × 28 forms with white as the foreground color in [58] by approaching InceptionResNetV2, DenseNet121, and InceptionNetV3 using the CMATERdb dataset. The accuracy obtained by different researchers, their dataset preprocessing, and the different approaches taken to obtain the best recognition accuracy in recent years have been arranged in a tabular form at the end of the paper in Section 5—Results and Analysis. 3. Datasets The MNIST digit benchmark dataset is a subgroup of a bigger special dataset available from the National Institute of Standards and Technology (NIST). This benchmark dataset, having two categories (digit and alphabet), is accessible through Keras functionality, which is shaped through training on 60,000 sets of examples and a test set, which is made up of testing 10,000 examples [59]. Nevertheless, for digit recognition in this project, only 1 type of dataset is used from the list, which comprises 10 classes of MNIST digits. Each digit is of uniform size and, by computing the center of mass of the pixels, each binary image of a handwritten digit is centered into a 28 × 28 image. The test set consists of 5000 patterns and each image consists of 30,000 patterns from 2 datasets, from about 250 different writers, 1 from high school students and the other from Census Bureau employees [1]. To make verification easier, datasets are labeled accordingly. The MNIST images sample distribution is shown in Figure 3. Figure 3. Total distribution of MNIST digits (0–9). Algorithms 2022, 15, 129 26 of 26 The Kaggle alphabet dataset was sourced from the National Institute of Standards and Technology (NIST), NMIST, and other google images [60]. Kaggle English handwritten alphabets of 26 classes are shaped by training with over 297,000 sets of examples and a test set, which is made up of over 74,490 examples. The total distribution of Kaggle letters is illustrated in Figure 4. Each letter is of uniform size and by computing the center of mass of the pixels, each binary image of a handwritten letter is centered into a 28 × 28 image. Figure 4. Total distribution of Kaggle letters (A–Z). 4. Proposed Convolutional Neural Network Of all the deep learning models in image classifications, CNN has become very popular due to its high performance in recognizing image patterns. This has opened up various application opportunities in our daily life and industries which include medical image classification, traffic monitoring, autonomous object recognition, facial recognition, and much more. CNNs are sparse, feed-forward neural networks [61]. The idea of an artificial neuron was first conceptualized in 1943. Hubel and Wisel first found that, for detecting lights in the receptive fields, visual cortex cells have a major role, which greatly inspired building models such as neocognitron. This model is considered to be the base and predecessor of CNN. CNN is formed of artificial neurons which have a self-optimization property, learning like brain neurons. Due to this self-optimizing property, it can extract and classify the features extracted from images more precisely than any other algorithm. Moreover, it needs very limited preprocessing of the input data, while yielding highly accurate and precise results. CNNs are vastly used in object detection and image classification, including medical imaging. In image classification, each pixel is considered a feature for Algorithms 2022, 15, 129 27 of 26 the neural network. CNN tries to understand and differentiate among the images depending on these features. Conventionally, first few convolutional layers capture very low-level features, such as the edges, gradient orientation, or color. However, with the increased number of convolutional layers, it starts extracting high-level features. Due to the higher dimensionality and convolution, the parameters of the network increase exponentially. This makes the CNN computationally heavy. However, with the development of computational technology and GPU, these jobs have become much more efficient. Moreover, the development of the CNN algorithms has also prompted the ability to reduce dimensionality by considering small patches at a time which reduces the computational burden without losing the important features. Handwritten character recognition (HCR) with deep learning and CNN was one of the earliest endeavors of researchers in the field. However, with increased modeling efficacy and the availability of a huge dataset, current models can perform significantly better than the models of ten years ago. However, one of the challenges of the current models is generalization. The model that performs excellently with one dataset may perform poorly with a different one. Thus, it is important to develop a robust model which can perform with the same level of accuracy across different datasets, which would give the model versatility. Thus, a CNN model is designed which is computationally proficient because of its optimized number of CNN layers, while performing with high accuracy across multisource massive datasets. Owing to the lower resolution of the handwritten character images, the images which were fed to the input layers were sized 28 × 28 pixels. The input layer feeds the images to the convolutional layers, where the features are convolved. The model has only four convolutional layers, which makes it lightweight and computationally efficient. The first layer is a 2D convolutional layer with a 3 × 3 kernel size and rectified linear unit (ReLU)activation function. ReLU is one of the most widely used activation functions in deep learning algorithms. ReLU is computationally effective because the neurons are not activated altogether like the other activation functions, e.g., tanh [62]. ReLU is a piecewise linear function which is also continuous and differentiable at all points except for 0. Besides providing simplicity and empirical simplicity, it also has reduced likelihood of vanishing gradient. Because of the abovementioned benefits, and as per the suggestion of the literature that ReLUs tend to converge early, it was chosen for our model. The idea behind ReLU is simple, it returns positive values input directly to the output, whereas the negative values are returned as 0, as depicted in Figure 5. Figure 5. Rectified linear unit (ReLU) function. The subsequent three layers are the 2D convolutional layers, which are accompanied by one max-pooling layer and a ReLU-activation function. Max pooling is a sample-based Algorithms 2022, 15, 129 28 of 26 discretization process which is used to downsize our input images. It pools the maximum value from each patch of each feature map, thus helping to reduce the dimensionality of the network. Moreover, it reduces the number of parameters by discarding insignificant ones, which decreases the computational burden as well as helping to avoid overfitting. Thus, a 2 × 2 max-pooling layer is integrated in each of the convolutional layers except for the first one. The output of the fourth convolutional layer is fed to the flattening layer to convert the input to a 1D string, which is then fed to the fully connected layer, i.e., the dense layer. In the fully connected layer, as the name suggests, all the neurons are linked to the activation units of the following layer. In the proposed model, there are two fully connected layers where all the neurons of the first layer are connected to the activation unit of the second fully connected layer. In the second fully connected layer, all the inputs are passed to the Softmax activation function, which categorizes the features into multiclass as needed. Finally, the determined class of any input image is declared in the output. The proposed model is illustrated in Figure 6 and the resultant parameters of each layer are tabulated in Table 1. Figure 6. Proposed CNN model for character recognition. Table 1. Details of the proposed model. Layer (Type) conv_1 (Conv 2D) conv_2 (Conv 2D) max_pooling2D_18 (MaxPooling2D) conv_3 (Conv 2D) max_pooling2D_19 (MaxPooling2D) conv_4 (Conv 2D) max_pooling2D_20 (MaxPooling2D) flatten (Flatten) FC_1 (Dense) FC_2 (Dense) Output Shape (None, 26, 26, 32) (None, 26, 26, 64) (None, 13, 13, 64) (None, 13, 13, 128) (None, 6, 6, 128) (None, 6, 6, 256) (None, 3, 3, 256) (None, 2304) (None, 64) (None, 10) Param # 320 18,496 0 73,856 0 295,168 0 0 147,520 650 Algorithms 2022, 15, 129 29 of 26 Total Params # Trainable Params # Non-Trainable Params # 536,010 536,010 0 For generalization, the same proposed model is used to classify both the English alphabets and digits. The only difference is the number of output classes defined in the last fully connected layer, which is the ‘fully connected + Softmax’ layer, as depicted by Figure 6, and the FC_2 layer, as presented by Table 1. The number of classes is 10 for digit recognition as depicted by the table, and the number of classes is 26 for alphabet recognition. Moreover, for extensive comparative analysis, we also analyzed how the proposed model performs with different optimizers, ‘ADAM’ and ‘RMSprop’, which also include the variation of the learning rates (LRs). This analysis helps in understanding how the model performance might vary with the change of optimizers and variation of learning rates which are discussed in detail in Section 5—Results and Analysis. In order to avoid the difficulties posed by the problem of latency in data processing, this project utilizes Colab-pro by Google, which has a 2.20 GHz Intel Xeon Processor, 128 GB RAM, and Tesla P100 16 GB GPU. The model was designed and tested in Colab-pro, keeping in mind the factor of easy reproducibility by the research community, as Colabpro has built-in support for GPU-enabled TensorFlow and the necessary support for CUDA acceleration. 5. Results and Analysis We used two datasets for the handwritten recognition process: the Kaggle dataset for our English letter (A–Z) and MNIST for our numeric characters (0–9). Two optimizers were used for each of the datasets, ‘ADAM’ and ‘RMSprop’, as well as three different learning rates (LRs) of 0.001, 0.0001, and 0.00001 for each of the optimizers. This gives us six CNN models for each of the datasets and twelve models overall. To avoid confusion and repetition, we named our models. The models were named as follows for the Kaggle dataset: with a learning rate of 0.001, the model under the ‘ADAM’ optimizer is K1 and the one under the ‘RMSprop’ is K2; with a learning rate of 0.0001, the model under the ‘ADAM’ optimizer is K3 and the one under the ‘RMSprop’ is K4; with a learning rate of 0.00001, the model under the ‘ADAM’ optimizer is K5 and the one under the ‘RMSprop’ is K6. The models were named similarly for the MNIST dataset from model M1 to model M6. Our results indicated that we obtained the best result under the ‘ADAM’ optimizer with a learning rate of 0.00001 under the Kaggle dataset (model K5), and under ‘RMSprop’ with a learning rate of 0.001 for the MNIST dataset (model M2). We then calculated the F1 score (micro, macro, and weighted average) and obtained confusion matrices and two classification reports for the two models that give us the best accuracy for the each datasets. Figure 7 simplifies the selection of the best models for each dataset. Algorithms 2022, 15, 129 30 of 26 Figure 7. Best model selection from the Kaggle and MNIST datasets. For the alphabet dataset, the overall accuracies using the ‘ADAM’ optimizer in the proposed CNN model for handwritten English alphabet recognition were 99.516%, 99.511%, and 99.563% for LR 0.001, LR 0.0001, and LR 0.00001, respectively. The same model using ‘RMSprop’ achieved the accuracy of 99.292%, 99.108%, and 99.191%, respectively, by LR 0.001, LR 0.0001, and LR 0.00001. These results clearly show that, in terms of accuracy, the model using the ‘ADAM’ optimizer with LR 0.00001, named as model K5, performs better than the other proposed models. It is clear that all the proposed six models for character recognition achieved above 99.00% overall accuracy. For the digit dataset, the overall accuracies using ‘RMSprop’ for handwritten digit recognition were 99.642%, 99.452%, and 98.142% for LR 0.001, LR 0.0001, and LR 0.00001, respectively. The same model using the ‘ADAM’ optimizer achieved accuracies of 99.571%, 99.309%, and 98.142% for LR 0.001, LR 0.0001, and LR 0.00001, respectively. Figures 8 and 9 depict validation accuracies and Figures 10 and 11 show the validation losses of all the twelve models with the Kaggle and MNIST dataset, respectively. It is clear that overall accuracy decreases with the decrease in learning rate (LR). This confirms that the model using ‘RMSprop’ with LR 0.001, named as model M2, outperformed the other proposed models in terms of accuracy. From Figures 9 and 11, it can be clearly observed that no overfitting happens for the digit recognition or for alphabet recognition; overfitting occurs when ‘RMSprop’ is used, which is depicted in Figures 8d–f and 10d–f. Overfitting occurs when the model performs fine on the training data but does not perform exactly in the testing set. Here, the model learns the unnecessary information within the dataset as it trains for a long time on the training data. Algorithms 2022, 15, 129 31 of 26 (a) (b) (c) (d) (e) (f) Figure 8. Validation accuracy of the six models for English alphabet recognition. (a) Optimizer— ‘ADAM’; learning rate—0.001. (b) Optimizer—‘ADAM’; learning rate—0.0001. (c) Optimizer— ‘ADAM’; learning rate—0.00001. (d) Optimizer—‘RMSprop’; learning rate—0.001. (e) Optimizer— ‘RMSprop’; learning rate—0.0001. (f) Optimizer—‘RMSprop’; learning rate—0.00001. Algorithms 2022, 15, 129 32 of 26 (a) (b) (c) (d) (e) (f) Figure 9. Validation accuracy of the six models for digit (0–9) recognition. (a) Optimizer—‘ADAM’; learning rate—0.001. (b) Optimizer—‘ADAM’; learning rate—0.0001. (c) Optimizer—‘ADAM’; learning rate—0.00001. (d) Optimizer—‘RMSprop’; learning rate—0.001. (e) Optimizer— ‘RMSprop’; learning rate—0.0001. (f) Optimizer—‘RMSprop’; learning rate—0.00001. Algorithms 2022, 15, 129 33 of 26 (a) (b) (c) (d) (e) (f) Figure 10. Validation loss of the six models for English alphabet recognition. (a) Optimizer— ‘ADAM’; learning rate—0.001. (b) Optimizer—‘ADAM’; learning rate—0.0001. (c) Optimizer— ‘ADAM’; learning rate—0.00001. (d) Optimizer—‘RMSprop’; learning rate—0.001. (e) Optimizer— ‘RMSprop’; learning rate—0.0001. (f) Optimizer—‘RMSprop’; learning rate—0.00001. Algorithms 2022, 15, 129 34 of 26 (a) (b) (c) (d) (e) (f) Figure 11. Validation loss of the proposed six models for digit (0–9) recognition. (a) Optimizer— ‘ADAM’; learning rate—0.001. (b) Optimizer—‘ADAM’; learning rate—0.0001. (c) Optimizer— ‘ADAM’; learning rate—0.00001. (d) Optimizer—‘RMSprop’; learning rate—0.001. (e) Optimizer— ‘RMSprop’; learning rate—0.0001. (f) Optimizer—‘RMSprop’; learning rate—0.00001. The performance evaluation of the models is more obvious and explicit from the matrices of specificity, recall, precision, F1 score, and support. The possible outcomes obtained by the confusion matrix (CM) calculate the performance of these matrices. This CM has four different outcomes: total false positive (TFP), total false negative (TFN), total true positive (TTP), and total true negative (TTN). The CM sets up nicely to compute the per-class values of recall, precision, specificity, and F1 score for each of the datasets. Let us consider the scenario where we want the model to detect the letter ‘A’. For simplification, let us also assume that each of the 26 letters in the alphabet (A–Z) has 100 Algorithms 2022, 15, 129 35 of 26 images for each of the letters, totaling 2600 images altogether. If we assume that the model accurately identifies the images of the letter ‘A’ in 97 out of 100 images, then we say that the accuracy of the model is 97%. Thus, we can also conclude that the total number of true positives (TTPs) is 97. Under the same assumptions as above, if the letter ‘O’ is incorrectly identified as ‘A’, then this would tell us that the number of total false positives (TFPs) in this case would be 1. If the letter ‘A’ has been misidentified as ‘O’ three times in the model, then the total number of false negatives (TFNs) for this model is 3. The rest of the 2499 images of the 2600 images are then considered as the total true negative (TTN). Figures 12 and 13 show the confusion matrices for the best two models (model K5 for letter recognition and model M2 for digit recognition) established in terms of overall performance that were trained and validated with the Kaggle and MNIST datasets, respectively. Figure 12. Confusion matrix of model K5 for A–Z recognition. Precision deals with the percentage of the relevant results, whereas accuracy states how close the real values are to the generated values. Sensitivity, identified as recall and true negative rate, known as specificity, are other important factors for investigating a CNN model. The F1 score is the weighted average of the combination of both precision and recall. Equations (1)–(5) represent accuracy, specificity, recall, precision, and F1 score, respectively. 𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = 𝑇𝑃+𝑇𝑁 𝑇𝑃+𝑇𝑁+𝐹𝑃+𝐹𝑁 𝑆𝑝𝑒𝑐𝑖𝑓𝑖𝑐𝑖𝑡𝑦 = 𝑅𝑒𝑐𝑎𝑙𝑙 = 𝑇𝑁 𝑇𝑁+𝐹𝑃 𝑇𝑃 𝑇𝑃+𝐹𝑁 (1) (2) (3) Algorithms 2022, 15, 129 36 of 26 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 = 𝐹1 𝑆𝑐𝑜𝑟𝑒 = 2 ∗ ( 𝑇𝑃 𝑇𝑃+𝐹𝑃 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛∗𝑅𝑒𝑐𝑎𝑙𝑙 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛+𝑅𝑒𝑐𝑎𝑙𝑙 (4) ) (5) Figure 13. Confusion matrix of model M2 for 0–9 recognition. With the Kaggle dataset of 74,490 testing images for letter recognition, using ‘ADAM’ optimizer model K1 detects 74,130 TTP and 360 TFP images, model K3 finds 74,126 TTP and 364 TFP images, whereas model K5 detects 74,165 TTP and 325 TFP images. Then, again, the models using ‘RMSprop’ underperform in identifying the TFP cases, which is above 500 for each model, while the TTP cases detected using ‘RMSprop’ are 73,963, 73,826, and 73,888 by models K2, K4, and K6, respectively. The recall of model K5 is 99.56%, performing better than the other investigated models. In contrast, model K4 attained the lowest recall percentage of 99.1% (which is also above 99%) in comparison with others. From the confusion matrices, it was noted that all the proposed models achieved the specificity of 99% and model K5 performed supremely for recognizing handwritten letters. It is important to recall that, in multiclass classification, we compute the F1 score for each class in a one-versus-the-rest (OvR) tactic, instead of a single overall F1 score, as perceived in binary classification. Therefore, the total true negative (TTN) number is vital to evaluate the proposed model. With the Kaggle dataset, model K5 detected 1,861,925 TTN images, which was the highest number detected; contrarily, model K4 detected the lowest number, at 1,861,586 TTN images. The overall performance and the total number of TTN images detected of the proposed model K5 showed better results than the other models, as it is always expected that higher TTN cases will be obtained for each class while recognizing specific letters. The accuracy of the proposed model K5 was 99.563%, with precision and recall values of 99.5% for both the parameters, which is the highest handwritten alphabet recognition accuracy known to the authors for the Kaggle dataset. Now, for the MNIST dataset for digit recognition, model M1, model M2, and model M3 performed with accuracies of 99.571%, 99.309%, and 98.238%, respectively, whereas Algorithms 2022, 15, 129 37 of 26 model M2, model M4, and model M6 performed with accuracies of 99.642%, 99.452%, and 98.142%, respectively. Model M2 (‘RMSprop’ with LR 0.001) achieved the highest precision of 99.6, whereas model M6 obtained the lowest precision of 98.1 (which is also above 98 %). Similarly, the highest value of recall was 0.9964 by model M2, and the lowest recall value of 0.9814 was by model M6. It was also seen that, from confusion matrices, with MNIST 5000 test patterns for digit recognition, model M2 had the highest, preeminent performance in comparison with other models, because it found 4185 TTP and 15 TFP images, and had the highest TTN case of 37,785. Model M6 performed poorest when compared with other models; the TTP and TFP cases are 4122 and 78, respectively, and the TTN cases decreased by 65 images compared with model M2; however, while recognizing specific digits, it was expected to always obtain higher TTN cases for each class. Table 2 shows the comparative results with the Kaggle and MNIST datasets of all the models. 360 360 74,130 1,861,890 99.516 RMS_prop K2 99.2 99.97 99.29 527 527 73,963 1,861,723 99.292 ‘ADAM’ K3 99.5 99.98 99.51 364 364 74,126 1,861,886 99.511 RMS_prop K4 99.0 99.96 99.10 664 664 73,826 1,861,586 99.108 ‘ADAM’ K5 99.5 99.98 99.56 325 325 74,165 1,861,925 99.563 RMS_prop K6 99.1 99.96 99.19 602 602 73,888 1,861,648 99.191 ‘ADAM’ M1 99.5 99.95 99.57 22 22 4178 37,778 99.571 RMS_prop M2 99.6 99.96 99.64 15 15 4185 37,785 99.642 ‘ADAM’ M3 99.2 99.92 99.30 29 29 4171 37,771 99.309 RMS_prop M4 99.4 99.93 99.45 23 23 4177 37,777 99.452 ‘ADAM’ M5 98.2 99.80 98.23 74 74 4126 37,726 98.238 RMS_prop M6 98.1 99.79 98.14 78 78 4122 37,722 98.142 0.0001 0.00001 To classify both the English alphabets and digits, using two different optimizers, four 2D convolutional layers are kept the same and unchanged in the proposed CNN model. The only difference is the number of output classes, defined in the last fully connected layer, which is the ‘fully connected + Softmax’ layer. The number of classes is 10 and 26 for digit and alphabet recognition, respectively. It is clearly seen from Table 2 that the ‘ADAM’ optimizer performs better than ‘RMSprop’ for letter recognition, whereas, for digit recognition, ‘RMSprop’ is more suitable. These optimizers are used here to obtain fast-tracked results by changing the attributes of the proposed neural networks. For alphabet recognition, with the Kaggle dataset, it shows that the models (i.e., K1–K6) perform better with the decrement of one of the hyper-parameters, the learning rate (LR); contrariwise, with the MNIST dataset, for digit recognition, it displays that the models (model M1–model M6) perform well with the increment of the learning rate, e.g., the (%) TTN 99.51 Accuracy TTP 99.98 Overall TFN 99.4 Recall (%) K1 (%) TFP 0.001 Specificity 0.00001 Precision (%) 0.0001 ‘ADAM’ Optimizer Rate Learning 0.001 Model Name Alphabet Recognition Digit Recognition MNIST Kaggle Data-Set Table 2. Confusion matrix parameters of all the investigated models. Algorithms 2022, 15, 129 38 of 26 overall accuracy increases to 1.53% while using ‘RMSprop’, and the learning rate increases from 0.00001 to 0.001 for the MNIST dataset. It can be seen from Figures 12 and 13 that the distribution of the dataset is imbalanced. Hence, only the accuracy would be ineffective in judging model performance and so the classification report (CR) is indispensable for an analytical understanding of the model predictions. The advanced performances can be analyzed to a great extent with a classification report (CR) for the best two proposed models (model M2 and model K5), which are presented in Tables 3 and 4, respectively. Table 3. Classification report of model M2 for 0–9 recognition. Digit (0–9) class 0 class 1 class 2 class 3 class 4 class 5 class 6 class 7 class 8 class 9 Precision /Class 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 0.99 0.99 Total Recall /Class 1.00 1.00 1.00 1.00 0.99 0.99 1.00 1.00 1.00 1.00 F1 Score /Class 1.00 1.00 1.00 1.00 0.99 1.00 1.00 1.00 0.99 0.99 Support /Class 411 485 403 418 461 372 413 446 382 409 4200 Support Proportion/Class 0.098 0.115 0.096 0.1 0.11 0.089 0.098 0.106 0.091 0.097 1.00 Table 4. Classification report of model K5 for A–Z recognition. Letters (A–Z) Precision /Class Recall /Class F1 score /Class Support /Class class A class B class C class D class E class F class G class H class I class J class K class L class M class N class O class P class Q class R class S class T class U class V class W 0.99 1.00 0.99 1.00 0.99 0.99 1.00 0.99 1.00 0.99 0.99 0.98 1.00 1.00 0.99 0.99 1.00 0.99 1.00 1.00 1.00 1.00 0.99 0.99 0.99 1.00 1.00 0.99 0.99 0.99 1.00 1.00 1.00 0.99 0.99 0.99 1.00 0.99 0.99 1.00 0.99 0.98 0.99 0.99 1.00 0.98 0.99 1.00 0.99 1.00 0.99 0.99 0.99 0.99 1.00 0.99 0.99 0.99 1.00 1.00 0.99 0.99 1.00 0.99 0.99 0.99 1.00 1.00 0.99 1459 4747 2310 5963 1986 1161 1712 2291 3894 2724 2315 1109 3841 11,524 2488 1235 4518 1226 229 870 2045 9529 1145 Support Proportion /Class 0.02 0.064 0.031 0.08 0.027 0.016 0.023 0.031 0.052 0.037 0.031 0.015 0.052 0.155 0.033 0.017 0.061 0.016 0.003 0.012 0.027 0.128 0.015 Algorithms 2022, 15, 129 39 of 26 class X class Y class Z 0.99 0.97 0.99 0.99 0.96 0.99 0.99 0.97 0.99 2165 249 1755 74,490 Total 0.029 0.003 0.024 1.00 The columns marked in yellow in Tables 3 and 4 indicate the score for each of the 10 classes for digits (0–9) and each of the 26 classes for alphabet, A–Z. In our case, using multiclass classification calculation, we pursued averaging methods for the F1 score, resulting in different average scores, i.e., micro, macro, and weighted averaging. Macro averaging is possibly the most straightforward among the averaging methods. Regardless of their support values, this method treats all classes without differentiation. Support denotes the number of actual occurrences of the class in the dataset. The macro F1 score is totaled by taking the unweighted arithmetic mean of all the per-class F1 scores. We calculated macro F1 scores of 0.998 and 0.992 for model M2 and model K5, respectively. 𝑀𝑎𝑐𝑟𝑜 𝐹1 𝑠𝑐𝑜𝑟𝑒 (𝑚𝑜𝑑𝑒𝑙 𝑀2) = 𝑀𝑎𝑐𝑟𝑜 𝐹1 𝑠𝑐𝑜𝑟𝑒 (𝑚𝑜𝑑𝑒𝑙 𝐾5) = 8 ∗ 1 + 2 ∗ 99 = 0.998 10 8 ∗ 1 + 17 ∗ 99 + .97 = 0.992 26 The weighted average F1 score was computed by counting the mean of all per-class F1 scores while considering each classes’ support. This average refers to the proportion of each classes’ support, relative to the sum of all support values. In our case of multiclass classification, using Equation (6), the calculated values of weighted F1 score are 0.997 and 0.996 for model M2 and model K5, respectively. Micro averaging computes a universal average F1 score by taking the sums of the TTP, TFN, and TFP. Micro F1 score is computed using Equation (7), which is derived from Equation (5). Table 5 shows the micro, macro, and weighted average of F1 score for the best two proposed models (model M2 and model K5). The performance of a classification model is evaluated by the following well-accepted F-measurement matrix: 𝑊𝑒𝑖𝑔ℎ𝑡𝑒𝑑 𝐹1 𝑠𝑐𝑜𝑟𝑒 = ∑𝑏𝑎(𝑃𝑒𝑟 𝑐𝑙𝑎𝑠𝑠 𝐹1 𝑠𝑐𝑜𝑟𝑒 ∗ 𝑆𝑢𝑝𝑝𝑜𝑟𝑡 𝑃𝑟𝑜𝑝𝑜𝑟𝑡𝑖𝑜𝑛) 𝑀𝑖𝑐𝑟𝑜 𝐹1 𝑠𝑐𝑜𝑟𝑒 = 1 2 𝑇𝑃 (6) (7) 𝑇𝑃+ (𝐹𝑃+𝐹𝑁) Table 5. Micro, macro, and weighted average of F1 score for the best two proposed models. F-Measure Micro F1 score Macro F1 score Weighted F1 score Model M2 (Digit Recognition) 0.996 0.998 0.997 Model K5 (Letter Recognition) 0.995 0.992 0.996 Micro F1 score works well with the balanced dataset. In this project, as the datasets are imbalanced, the macro average would be an ideal choice where all classes are equally significant, because it treats all classes equally. The weighted average is preferred if we want to assign greater contribution to classes with more examples, because each classes’ contribution to the F1 average is weighted by its size. We obtained good results with multiclass classification. The proposed CNN model (with four convolutional layers, three max-pooling layers, and two dense layers) for handwritten recognition was approached by keeping in mind that it should not start overfitting. However, this work shows how different learning rates and optimizers play a part in the models’ performances. Additionally, classification reports are presented for Algorithms 2022, 15, 129 40 of 26 micro, macro, and weighted averages. A comparative analysis of how the best two proposed models perform with other distinguished models in recent years by different researchers is shown in Table 6. Algorithms 2022, 15, 129 41 of 26 Table 6. Comparison of different learning models’ approaches based on dataset, preprocessing, and accuracy. Sl. No. Author(s) Publication Year Approach Dataset Preprocessing Results 1. Mor et al. [63] 2019 Two convolutional layers and one dense layer. EMNIST X 87.1% 2. Alom et al. [64] 2017 CNN with dropout and Gabor Filters. CMATERdb 3.1.1 Raw images passed to Normalization 98.78% 3. Sabour et al. [65] 2019 EMNIST X 95.36% (Letters) 99.79% (Digits) 4. Dos Santos et al. [66] 2019 EMNIST Digits X 99.775%. 5. Adnan et al. [67] 2018 Deep Belief Network (DBN), Stacked Auto encoder (AE), DenseNet CMATERdb 3.11 600 images are rescaled to 32 × 32 pixels. 6. W. Xue et al. [68] 2020 Three CNN were combined into a single feature map for classification. UC Merced, AID, and NWPU-RESISC45 X 7. D.S.Prashanth et al. [69] 2020 1. CNN, 2. Modified Lenet CNN Own dataset of 38,750 images (MLCNN) and 3. Alexnet CNN (ACNN). 8. D.S.Joshi and Risodkar [70] 2018 9. Ptucha et al. [51] 2020 10. Shibaprasad et al. [71] 2018 11. Yu Weng and Cnulei 2019 A CNN with 3 convolutional layers and two capsule layers. Deep convolutional extreme learning machine. X K-NN classifier and Neural Network Own dataset with 30 samples RGB to gray conversion, skew correction, filtering, morphological operation Introduced an intelligent character recognition (ICR) system Convolutional Neural Network (CNN) architecture IAM RIMES lexicon X Deep Neural Network (DNNs) 1000-character samples Own dataset of 400 types of pictures Resized all images to 28 × 28 pixels. Normalized to 52 × 52 pixels. 99.13% (Digits) 98.31% (Alphabets) 98.18% (Special Character) AID: 93.47% UC Merced: 98.85%, NWPU-RESISC45: 95% CNN: 94% MLCNN: 99% ACNN: 98% 78.6% 99% 99.40% 93.3% Algorithms 2022, 15, 129 42 of 26 Xia [72] 12. Gan et al. [73] 2019 1-D CNN ICDAR-2013 IAHCC-UC AS2016 Chinese character images rescaled into 60 × 60-pixel size. 98.11% (ICDAR-2013) 97.14% (IAHCCUCA2016) 13. Kavitha et al. [45] 2019 CNN (5 convolution layers, 2 max pooling layers, and fully connected layers) HPL-Tamil-is o-char RGB to gray conversion 97.7%. 14. Saha et al. [74] 2019 Divide and Merge Mapping (DAMM) Own dataset with 1,66,105 images Resize all images to 128 × 128. 99.13% 15. Y. B. Hamdan et al. [75] 2021 Support vector machine (SVM) classifiers network graphical methods. MNIST, CENPARMI X 94% 16. Ukil et al. [76] 2019 CNNs PHD Indic_11 RGB to grayscale conversion and resized image to 28 × 28 pixels. 95.45% 17. Cavalin et al. al. [77] 2019 A hierarchical classifier by the confusion matrix of flat classifier EMNIST X 99.46% (Digits) 93.63% (Letters) 18. Tapotosh Ghosh et al. [58] 2021 InceptionResNetV2, DenseNet121, and InceptionNetV3 CMATERdb 2022 CNN using ‘RMSprop’ and ‘ADAM’ optimizer with four convolutional layers, three max pooling and two dense layers are used for three different Learning rates (LR 0.001, LR 0.0001 and LR 0.00001) for multiclass classification. MNIST: 60,000 training, 10,000 testing images. Kaggle: 297,000 training, 74,490 testing images. 19. Proposed Model The images were first converted to B&W 28 × 97.69% 28 form with white as the foreground color. Each digit/letter is of a 99.64% (Digits) uniform size and by computing the center of Macro F1 score average: 0.998 mass of the pixels, each 99.56% (Letters) binary image of a handwritten digit is Macro F1 score average: 0.992 centered into a 28 × 28 image. Algorithms 2022, 15, 129 23 of 26 6. Conclusions In modern days, applications of handwritten character recognition (HRC) systems are flourishing. In this paper, to address HCR systems with multiclass classification, a CNN-based model is proposed that achieved exceptionally good results with this multiclass classification. The CNN models were trained with the MNIST digit dataset, which is shaped with 60,000 training and 10,000 testing images. They were also trained with the substantially larger Kaggle alphabet dataset, which comprises over 297,000 training images and a test set which is shaped on testing over 74,490 images. For the Kaggle dataset, the overall accuracies using the ‘ADAM’ optimizer were 99.516%, 99.511%, and 99.563% for learning rate (LR) 0.001, LR 0.0001, and LR 0.00001, respectively. Meanwhile, the same model using ‘RMSprop’ achieved accuracies of 99.292%, 99.108%, and 99.191%, respectively, by LR 0.001, LR 0.0001, and LR 0.00001. For the MNIST dataset, the overall accuracies using ‘RMSprop’ were 99.642%, 99.452%, and 98.142% for LR 0.001, LR 0.0001, and LR 0.00001, respectively. Meanwhile, the same model using the ‘ADAM’ optimizer achieved accuracies of 99.571%, 99.309%, and 98.142% with LR 0.001, LR 0.0001, and LR 0.00001, respectively. It can be easily understood that, for alphabet recognition, accuracy decreases with the increase in learning rate (LR); contrarily, overall accuracy is proportionately related to LR for digit recognition. In addition, precision, recall, specificity, and F1 score were measured from confusion matrices. Of all the discussed twelve models, the model using the ‘ADAM’ optimizer with LR 0.00001 obtained a recall value of 99.56%, and the model with LR 0.001 with the ‘RMSprop’ optimizer obtained the recall value of 99.64%; therefore, these two models excel other models for the Kaggle and MNIST datasets, respectively. As the distribution of the datasets is imbalanced, only the accuracy would be ineffective in evaluating the models; therefore, classification reports (CR) indicating the F1 score for every 10 classes for digits (0–9) and every 26 classes for alphabet (A–Z) were included for the predictions of the best two proposed models. From the CR, we achieved micro, macro, and weighted F1 scores of 0.996 and 0.995, 0.998 and 0.992, and 0.997 and 0.996 for the MNIST and Kaggle datasets, respectively. Furthermore, the obtained results of best two models presented here were compared with the results of other noticeable works in this arena. Considering future work, we intend to include several feature extraction methods by applying a similar framework to that proposed here to more complex languages, such as Korean, Chinese, Finnish, and Japanese. Author Contributions: Conceptualization, N.S., K.F.H. and A.A.; methodology, N.S. and K.F.H.; software, N.S. and K.F.H.; validation, N.S. and A.A.; formal analysis, N.S.; investigation, N.S. and K.F.H.; writing—original draft preparation, N.S. and K.F.H.; writing—review and editing, N.S., K.F.H., V.P.Y. and A.A.; visualization, N.S. and V.P.Y.; supervision, V.P.Y. and A.A.; project administration, A.A. All authors have read and agreed to the published version of the manuscript. Funding: This research received no external funding. Institutional Review Board Statement: Not applicable. Informed Consent Statement: Not applicable. Data Availability Statement: Not applicable. Conflicts of Interest: The authors declare no conflict of interest. Reference 1. 2. 3. Priya, A.; Mishra, S.; Raj, S.; Mandal, S.; Datta, S. Online and offline character recognition: A survey. In Proceedings of the International Conference on Communication and Signal Processing, (ICCSP), Institute of Electrical and Electronics Engineers Inc.: Melmaruvathur, Tamilnadu, India, 6–8 April 2016; pp. 967–970. Gunawan, T.S.; Noor, A.F.R.M.; Kartiwi, M. Development of english handwritten recognition using deep neural network. Indones. J. Electr. Eng. Comput. Sci. 2018, 10, 562–568. https://doi.org/10.11591/ijeecs.v10.i2. Vinh, T.Q.; Duy, L.H.; Nhan, N.T. Vietnamese handwritten character recognition using convolutional neural network. IAES Int. J. Artif. Intell. 2020, 9, 276–283. https://doi.org/10.11591/ijai.v9.i2. Algorithms 2022, 15, 129 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 24 of 26 Dalal, N.; Triggs, B. Histograms of oriented gradients for human detection. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), San Diego, CA, USA, 20–25 June 2005; Volume 1, pp. 886– 893. Lowe, D.G. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 2004, 60, 91–110. https://doi.org/10.1023/B:VISI.0000029664.99615.94. Xiao, J.; Zhu, X.; Huang, C.; Yang, X.; Wen, F.; Zhong, M. A New Approach for Stock Price Analysis and Prediction Based on SSA and SVM. Int. J. Inf. Technol. Decis. Mak. 2019, 18, 35–63. https://doi.org/10.1142/S021962201841002X. Wang, D.; Huang, L.; Tang, L. Dissipativity and synchronization of generalized BAM neural networks with multivariate discontinuous activations. IEEE Trans. Neural Netw. Learn. Syst. 2018, 29, 3815–3827. https://doi.org/10.1109/TNNLS.2017.2741349. Kuang, F.; Zhang, S.; Jin, Z.; Xu, W. A novel SVM by combining kernel principal component analysis and improved chaotic particle swarm optimization for intrusion detection. Soft Comput. 2015, 19, 1187–1199. https://doi.org/10.1007/s00500-014-13327. Choudhary, A.; Ahlawat, S.; Rishi, R. A binarization feature extraction approach to OCR: MLP vs. RBF. In Proceedings of the International Conference on Distributed Computing and Technology (ICDCIT), Bhubaneswar, India, 6–9 February 2014; Springer: Cham, Switzerland, 2014; pp. 341–346. Fukushima, K. Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biol. Cybern. 1980, 36, 193–202. https://doi.org/10.1007/BF00344251. Ahlawat, S.; Choudhary, A.; Nayyar, A.; Singh, S.; Yoon, B. Improved handwritten digit recognition using convolutional neural networks (Cnn). Sensors 2020, 20, 3344. https://doi.org/10.3390/s20123344. Jarrett, K.; Kavukcuoglu, K.; Ranzato, M.; LeCun, Y. What is the best multi-stage architecture for object recognition? In Proceedings of the IEEE 12th International Conference on Computer Vision (ICCV), Kyoto, Japan, 29 September–2 October 2009; pp. 2146–2153. Cireşan, D.C.; Meier, U.; Masci, J.; Gambardella, L.M.; Schmidhuber, J. High-Performance Neural Networks for Visual Object Classification. arXiv 2011, arXiv:1102.0183v1. Ciresan, D.; Meier, U.; Schmidhuber, J. Multi-column deep neural networks for image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, 16–21 June 2012; IEEE: Piscataway, NJ, USA, 2012; pp. 3642–3649. Niu, X.X.; Suen, C.Y. A novel hybrid CNN-SVM classifier for recognizing handwritten digits. Pattern Recognit. 2012, 45, 1318– 1325. https://doi.org/10.1016/j.patcog.2011.09.021. Qu, X.; Wang, W.; Lu, K.; Zhou, J. Data augmentation and directional feature maps extraction for in-air handwritten Chinese character recognition based on convolutional neural network. Pattern Recognit. Lett. 2018, 111, 9–15. https://doi.org/10.1016/j.patrec.2018.04.001. Alvear-Sandoval, R.F.; Figueiras-Vidal, A.R. On building ensembles of stacked denoising auto-encoding classifiers and their further improvement. Inf. Fusion 2018, 39, 41–52. https://doi.org/10.1016/j.inffus.2017.03.008. Demir, C.; Alpaydin, E. Cost-conscious classifier ensembles. Pattern Recognit. Lett. 2005, 26, 2206–2214. https://doi.org/10.1016/j.patrec.2005.03.028. Choudhary, A.; Ahlawat, S.; Rishi, R. A Neural Approach to Cursive Handwritten Character Recognition Using Features Extracted from Binarization Technique. Stud. Fuzziness Soft Comput. 2015, 319, 745–771. https://doi.org/10.1007/978-3-319-128832_26. Cai, Z.W.; Huang, L.H. Finite-time synchronization by switching state-feedback control for discontinuous Cohen–Grossberg neural networks with mixed delays. Int. J. Mach. Learn. Cybern. 2018, 9, 1683–1695. https://doi.org/10.1007/s13042-017-0673-9. Zeng, D.; Dai, Y.; Li, F.; Sherratt, R.S.; Wang, J. Adversarial learning for distant supervised relation extraction. Comput. Mater. Contin. 2018, 55, 121–136. https://doi.org/10.3970/cmc.2018.055.121. Long, M.; Zeng, Y. Detecting iris liveness with batch normalized convolutional neural network. Comput. Mater. Contin. 2019, 58, 493–504. https://doi.org/10.32604/cmc.2019.04378. Huang, C.; Liu, B. New studies on dynamic analysis of inertial neural networks involving non-reduced order method. Neurocomputing 2019, 325, 283–287. https://doi.org/10.1016/j.neucom.2018.09.065. Xiang, L.; Li, Y.; Hao, W.; Yang, P.; Shen, X. Reversible natural language watermarking using synonym substitution and arithmetic coding. Comput. Mater. Contin. 2018, 55, 541–559. https://doi.org/10.3970/cmc.2018.03510. Huang, Y.S.; Wang, Z.Y. Decentralized adaptive fuzzy control for a class of large-scale MIMO nonlinear systems with strong interconnection and its application to automated highway systems. Inf. Sci. (Ny). 2014, 274, 210–224. https://doi.org/10.1016/j.ins.2014.02.132. Ahlawat, S.; Rishi, R. A Genetic Algorithm Based Feature Selection for Handwritten Digit Recognition. Recent Pat. Comput. Sci. 2018, 12, 304–316. https://doi.org/10.2174/2213275911666181120111342. Hinton, G.E.; Osindero, S.; Teh, Y.W. A fast learning algorithm for deep belief nets. Neural Comput. 2006, 18, 1527–1554. https://doi.org/10.1162/neco.2006.18.7.1527. Pham, V.; Bluche, T.; Kermorvant, C.; Louradour, J. Dropout Improves Recurrent Neural Networks for Handwriting Recognition. In Proceedings of the 14th International Conference on Frontiers in Handwriting Recognition (ICFHR), Heraklion, Greece, 1–4 September 2014; pp. 285–290. Algorithms 2022, 15, 129 29. 30. 31. 32. 33. 34. 35. 36. 37. 38. 39. 40. 41. 42. 43. 44. 45. 46. 47. 48. 49. 50. 51. 52. 53. 54. 55. 56. 25 of 26 Lang, G.; Li, Q.; Cai, M.; Yang, T.; Xiao, Q. Incremental approaches to knowledge reduction based on characteristic matrices. Int. J. Mach. Learn. Cybern. 2017, 8, 203–222. https://doi.org/10.1007/s13042-014-0315-4. Tabik, S.; Alvear-Sandoval, R.F.; Ruiz, M.M.; Sancho-Gómez, J.L.; Figueiras-Vidal, A.R.; Herrera, F. MNIST-NET10: A heterogeneous deep networks fusion based on the degree of certainty to reach 0.1% error rate. ensembles overview and proposal. Inf. Fusion 2020, 62, 73–80. https://doi.org/10.1016/j.inffus.2020.04.002. Badrinarayanan, V.; Kendall, A.; Cipolla, R. SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 2481–2495. https://doi.org/10.1109/TPAMI.2016.2644615. Liang, T.; Xu, X.; Xiao, P. A new image classification method based on modified condensed nearest neighbor and convolutional neural networks. Pattern Recognit. Lett. 2017, 94, 105–111. https://doi.org/10.1016/j.patrec.2017.05.019. Sueiras, J.; Ruiz, V.; Sanchez, A.; Velez, J.F. Offline continuous handwriting recognition using sequence to sequence neural networks. Neurocomputing 2018, 289, 119–128. https://doi.org/10.1016/j.neucom.2018.02.008. Simard, P.Y.; Steinkraus, D.; Platt, J.C. Best practices for convolutional neural networks applied to visual document analysis. In Proceedings of the International Conference on Document Analysis and Recognition(ICDAR), Edinburgh, UK, 3–6 August 2003; Volume 3, pp. 958–963. Wang, T.; Wu, D.J.; Coates, A.; Ng, A.Y. End-to-end text recognition with convolutional neural networks. In Proceedings of the 21st-International Conference on Pattern Recognition, Tsukuba, Japan, 11–15 November 2012; pp. 3304–3308. Shi, B.; Bai, X.; Yao, C. An End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text Recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 2298–2304. https://doi.org/10.1109/TPAMI.2016.2646371. Chen, L.C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 40, 834–848. https://doi.org/10.1109/TPAMI.2017.2699184. Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; Boston, MA, USA, 7–12 June 2015. Wu, Y.C.; Yin, F.; Liu, C.L. Improving handwritten Chinese text recognition using neural network language models and convolutional neural network shape models. Pattern Recognit. 2017, 65, 251–264. https://doi.org/10.1016/j.patcog.2016.12.026. Xie, Z.; Sun, Z.; Jin, L.; Feng, Z.; Zhang, S. Fully convolutional recurrent network for handwritten Chinese text recognition. In Proceedings of the International Conference on Pattern Recognition, Cancun, Mexico, 4–8 December 2016; pp. 4011–4016. Liu, C.L.; Yin, F.; Wang, D.H.; Wang, Q.F. Online and offline handwritten Chinese character recognition: Benchmarking on new datasets. Pattern Recognit. 2013, 46, 155–162. https://doi.org/10.1016/j.patcog.2012.06.021. Boufenar, C.; Kerboua, A.; Batouche, M. Investigation on deep learning for off-line handwritten Arabic character recognition. Cogn. Syst. Res. 2018, 50, 180–195. https://doi.org/10.1016/j.cogsys.2017.11.002. Husnain, M.; Missen, M.M.S.; Mumtaz, S.; Jhanidr, M.Z.; Coustaty, M.; Luqman, M.M.; Ogier, J.M.; Choi, G.S. Recognition of urdu handwritten characters using convolutional neural network. Appl. Sci. 2019, 9, 2758. https://doi.org/10.3390/APP9132758. Ahmed, S.B.; Naz, S.; Swati, S.; Razzak, M.I. Handwritten Urdu character recognition using one-dimensional BLSTM classifier. Neural Comput. Appl. 2019, 31, 1143–1151. https://doi.org/10.1007/s00521-017-3146-x. Kavitha, B.R.; Srimathi, C. Benchmarking on offline Handwritten Tamil Character Recognition using convolutional neural networks. J. King Saud Univ.-Comput. Inf. Sci. 2019, 34, 1183–1190. https://doi.org/10.1016/j.jksuci.2019.06.004. Dewan, S.; Chakravarthy, S. A system for offline character recognition using auto-encoder networks. In Proceedings of the the International Conference on Neural Information Processing, Doha, Qatar, 12–15 November 2012. Sarkhel, R.; Das, N.; Das, A.; Kundu, M.; Nasipuri, M. A multi-scale deep quad tree based feature extraction method for the recognition of isolated handwritten characters of popular indic scripts. Pattern Recognit. 2017, 71, 78–93. https://doi.org/10.1016/j.patcog.2017.05.022. Gupta, A.; Sarkhel, R.; Das, N.; Kundu, M. Multiobjective optimization for recognition of isolated handwritten Indic scripts. Pattern Recognit. Lett. 2019, 128, 318–325. https://doi.org/10.1016/j.patrec.2019.09.019. Nguyen, C.T.; Khuong, V.T.M.; Nguyen, H.T.; Nakagawa, M. CNN based spatial classification features for clustering offline handwritten mathematical expressions. Pattern Recognit. Lett. 2020, 131, 113–120. https://doi.org/10.1016/j.patrec.2019.12.015. Ziran, Z.; Pic, X.; Undri Innocenti, S.; Mugnai, D.; Marinai, S. Text alignment in early printed books combining deep learning and dynamic programming. Pattern Recognit. Lett. 2020, 133, 109–115. https://doi.org/10.1016/j.patrec.2020.02.016. Ptucha, R.; Petroski Such, F.; Pillai, S.; Brockler, F.; Singh, V.; Hutkowski, P. Intelligent character recognition using fully convolutional neural networks. Pattern Recognit. 2019, 88, 604–613. https://doi.org/10.1016/j.patcog.2018.12.017. Tso, W.W.; Burnak, B.; Pistikopoulos, E.N. HY-POP: Hyperparameter optimization of machine learning models through parametric programming. Comput. Chem. Eng. 2020, 139, 106902. https://doi.org/10.1016/j.compchemeng.2020.106902. Cui, H.; Bai, J. A new hyperparameters optimization method for convolutional neural networks. Pattern Recognit. Lett. 2019, 125, 828–834. https://doi.org/10.1016/j.patrec.2019.02.009. Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet classification with deep convolutional neural networks. Commun. ACM 2017, 60, 84–90. https://doi.org/10.1145/3065386. Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. In Proceedings of the 3rd International Conference on Learning Representations, San Diego, CA, USA, 7–9 May 2015. Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going deeper with convolutions. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition; Boston, Algorithms 2022, 15, 129 57. 58. 59. 60. 61. 62. 63. 64. 65. 66. 67. 68. 69. 70. 71. 72. 73. 74. 75. 76. 77. 26 of 26 MA, USA, 7–12 June 2015; pp. 1–9. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 770–778. Ghosh, T.; Abedin, M.H.Z.; Al Banna, H.; Mumenin, N.; Abu Yousuf, M. Performance Analysis of State of the Art Convolutional Neural Network Architectures in Bangla Handwritten Character Recognition. Pattern Recognit. Image Anal. 2021, 31, 60–71. https://doi.org/10.1134/S1054661821010089. LeCun, Y. The Mnist Dataset of Handwritten Digits. Available online: http://yann.lecun.com/exdb/mnist/ (accessed on 26 February 2022). Kaggle:A-Z Handwritten Alphabets in .csv Format. Available online: https://www.kaggle.com/sachinpatel21/az-handwrittenalphabets-in-csv-format/metadata (accessed on 26 February 2022). Kavitha, M.; Gayathri, R.; Polat, K.; Alhudhaif, A.; Alenezi, F. Performance evaluation of deep e-CNN with integrated spatialspectral features in hyperspectral image classification. Measurement 2022, 191, 110760. https://doi.org/10.1016/J.MEASUREMENT.2022.110760. Foysal Haque, K.; Farhan Haque, F.; Gandy, L.; Abdelgawad, A. Automatic Detection of COVID-19 from Chest X-ray Images with Convolutional Neural Networks. In Proceedings of the 2020 International Conference on Computing, Electronics and Communications Engineering (ICCECE), 17–18 August 2020; pp. 125–130. Mor, S.S.; Solanki, S.; Gupta, S.; Dhingra, S.; Jain, M.; Saxena, R. Handwritten text recognition: With deep learning and android. Int. J. Eng. Adv. Technol. 2019, 8, 172–178. Alom, M.Z.; Sidike, P.; Taha, T.M.; Asari, V.K. Handwritten Bangla Digit Recognition Using Deep Learning. arXiv 2017, arXiv:1705.02680. Sabour, S.; Frosst, N.; Hinton, G.E. Dynamic routing between capsules. In Proceedings of the 2007 Neural Information Processing Systems (NIPS), Long Beach, CA, USA, 4–9 December 2017; pp. 548–556. dos Santos, M.M.; da Silva Filho, A.G.; dos Santos, W.P. Deep convolutional extreme learning machines: Filters combination and error model validation. Neurocomputing 2019, 329, 359–369. https://doi.org/10.1016/j.neucom.2018.10.063. Adnan, M.; Rahman, F.; Imrul, M.; AL, N.; Shabnam, S. Handwritten Bangla Character Recognition using Inception Convolutional Neural Network. Int. J. Comput. Appl. 2018, 181, 48–59. https://doi.org/10.5120/ijca2018917850. Xue, W.; Dai, X.; Liu, L. Remote Sensing Scene Classification Based on Multi-Structure Deep Features Fusion. IEEE Access 2020, 8, 28746–28755. https://doi.org/10.1109/ACCESS.2020.2968771. Prashanth, D.S.; Mehta, R.V.K.; Sharma, N. Classification of Handwritten Devanagari Number-An analysis of Pattern Recognition Tool using Neural Network and CNN. In Procedia Computer Science; Elsevier: Amsterdam, The Netherlands, 2020; Volume 167, pp. 2445–2457. Joshi, D.S.; Risodkar, Y.R. Deep Learning Based Gujarati Handwritten Character Recognition. In Proceedings of the 2018 International Conference On Advances in Communication and Computing Technology, Sangamner, India, 8–9 February 2018 ; pp. 563–566. Sen, S.; Shaoo, D.; Paul, S.; Sarkar, R.; Roy, K. Online handwritten bangla character recognition using CNN: A deep learning approach. In Advances in Intelligent Systems and Computing; Springer: Singapore, 2018; Volume 695, pp. 413–420, ISBN 9789811075650. Weng, Y.; Xia, C. A New Deep Learning-Based Handwritten Character Recognition System on Mobile Computing Devices. Mob. Netw. Appl. 2020, 25, 402–411. https://doi.org/10.1007/s11036-019-01243-5. Gan, J.; Wang, W.; Lu, K. A new perspective: Recognizing online handwritten Chinese characters via 1-dimensional CNN. Inf. Sci. (Ny). 2019, 478, 375–390. https://doi.org/10.1016/j.ins.2018.11.035. Saha, S.; Saha, N. A Lightning fast approach to classify Bangla Handwritten Characters and Numerals using newly structured Deep Neural Network. Procedia Comput. Sci. 2018, 132, 1760–1770. https://doi.org/10.1016/J.PROCS.2018.05.151. Hamdan, Y.B.; Sathish Construction of Statistical SVM based Recognition Model for Handwritten Character Recognition. J. Inf. Technol. Digit. World 2021, 3, 92–107. https://doi.org/10.36548/jitdw.2021.2.003. Ukil, S.; Ghosh, S.; Obaidullah, S.M.; Santosh, K.C.; Roy, K.; Das, N. Improved word-level handwritten Indic script identification by integrating small convolutional neural networks. Neural Comput. Appl. 2020, 32, 2829–2844. https://doi.org/10.1007/s00521019-04111-1. Cavalin, P.; Oliveira, L. Confusion matrix-based building of hierarchical classification. In Proceedings of the Pattern Recognition, Image Analysis, Computer Vision, and Applications, Havana, Cuba, 28-31 October 2019; Springer: Berlin/Heidelberg, Germany, 2019; Volume 11401, pp. 271–278.

RELATED PAPERS

RELATED TOPICS

Log In

Convolutional-Neural-Network-Based Handwritten Character Recognition: An Approach with Massive Multisource Data

Convolutional-Neural-Network-Based Handwritten Character Recognition: An Approach with Massive Multisource Data

Related Papers

RELATED PAPERS

RELATED TOPICS