I have received the B.E. degree in Information Technology from Dharmsinh Desai University, Nadiad (Gujarat), India in 2004, followed by the M.E. degree in Computer Engineering from the same university in 2009. Currently, I am an associate professor with Department of Information Technology, Faculty of Technology, Dharmsinh Desai University, Nadiad, India. My research interest includes pattern recognition, document image analysis and machine learning.
2014 International Conference on Green Computing Communication and Electrical Engineering (ICGCCEE), 2014
ABSTRACT Post processing is an important part of any document processing system. There are two wa... more ABSTRACT Post processing is an important part of any document processing system. There are two ways of post processing. First word level correction and second sentence level correction in document. The word level is performed in two ways first, finding error and finding dictionary by most similar word. That is called dictionary based approach. Another method to find most probable word is known as probabilistic approach. In order to generate the probabilistic model which includes unigram, bigram, trigram, online resources from various Gujarati newspaper websites are used. The proposed system will use models like Naïve Bayes and Hidden Markov Model to correct word level error. The system will be tested on synthetic dataset which is generated by adding random word level error in the actual document.
2014 International Conference on Green Computing Communication and Electrical Engineering (ICGCCEE), 2014
ABSTRACT Strokes are the most natural way of describing the character formation; although most of... more ABSTRACT Strokes are the most natural way of describing the character formation; although most of the researcher uses transform based features for recognition of offline text. The stroke base features are very popular in online handwritten text recognition because it is easy to identify stroke and its sequence by tracing pen tip, whereas it is difficult to obtain the same information in the offline text. The aim of this research is to propose the method which will separate the strokes from the thinned binary image of text and extract the directional features from the separated stroke. The strokes are further categorized using k-Nearest Neighbor (k-NN). Recognition of printed Gujarati numeral is selected as a case study to validate proposed features. Accuracy obtains at stroke level is 88%, however the accuracy at symbolic level is likely to improve since every symbol is a collection of multiple stroke. Since there is no standard data set available exclusively for Gujarati text, so this research also aims to generate a dataset of isolated Gujarati numeral.
ABSTRACT Multi-Relational Data Mining is an active area of research from last decade. Relational ... more ABSTRACT Multi-Relational Data Mining is an active area of research from last decade. Relational database is an important source of structured data, hence richest source of knowledge. Most of the commercial and application oriented data uses a relational database scheme in which multiple relations are linked through primary key, foreign key relationship. Multi-Relational Data Mining (MRDM) deals with extraction of information from a relational database containing multiple tables related to each other. In order to extract important information or knowledge, it is required to apply Data Mining algorithms on this relational database but most of these algorithms work only on a single table. Generating a single table from multiple tables may result in loss of important information, like the relation between tuples, also it is a not efficient in terms of time and space. In this paper, we proposed a Probabilistic Graphical Model, Bayesian Belief Network (BBN), based approach that considers not only attributes of the table but also the relation between tables. The conditional dependencies between tables are derived from Semantic Relationship Graph (SRG) of the relational database, whereas Tuple Id propagation helps to derive the conditional probability of tables. Our model not only predicts class label of unknown samples, but also gives the value of sample if class label is known.
2014 International Conference on Green Computing Communication and Electrical Engineering (ICGCCEE), 2014
ABSTRACT Post processing is an important part of any document processing system. There are two wa... more ABSTRACT Post processing is an important part of any document processing system. There are two ways of post processing. First word level correction and second sentence level correction in document. The word level is performed in two ways first, finding error and finding dictionary by most similar word. That is called dictionary based approach. Another method to find most probable word is known as probabilistic approach. In order to generate the probabilistic model which includes unigram, bigram, trigram, online resources from various Gujarati newspaper websites are used. The proposed system will use models like Naïve Bayes and Hidden Markov Model to correct word level error. The system will be tested on synthetic dataset which is generated by adding random word level error in the actual document.
This paper addresses the problem of recognizing handwritten numerals for Gujarati Language. Three... more This paper addresses the problem of recognizing handwritten numerals for Gujarati Language. Three methods are presented for feature extraction. One belongs to the spatial domain and other two belongs to the transform domain. In first technique, a new method has been proposed for spatial domain which is based on Freeman chain code. This method obtains the global direction by considering n × n neighbourhood and thus eliminates the noise which occurs due to local direction. In second and third method, 85 dimensional Fourier descriptors and Discrete Cosine Transform coefficients were computed and treated as feature vectors. Comparative analysis has been done for these three methods. These methods are tested with three different classifiers namely K-Nearest Neighbour, Support Vector Machine and Back Propagation Neural Network. Experimental results were evaluated using 10 fold cross validation. The highest recognition rates obtained for full data set of 3000 digits are 85.67%, 93.60% and 93.00% using modified chain code, DFT and DCT respectively.
This paper presents a method for combining Self Organizing Map (SOM) with k-Nearest Neighbor Clas... more This paper presents a method for combining Self Organizing Map (SOM) with k-Nearest Neighbor Classifier (k-NN) to device an elegant classification technique and applying it for classification of subset of printed Gujarati characters. Many researchers have employed many different models for the classification of printed/handwritten characters for number of different languages all over the globe; few of the widely used classifiers are Template Matching, Artificial Neural Network (ANN), Hidden Markov Model (HMM), and Support Vector Machine (SVM) etc. Our attempt is to use SOM based k-NN classifier for classification of subset of printed Gujarati characters. This approach does not require prior feature identification stage hence it is faster and more generalize compare to other approaches. A prototype system is implemented for the same and tested on sufficient dataset. Average accuracy of 82.36% is reported on test dataset
2014 International Conference on Green Computing Communication and Electrical Engineering (ICGCCEE), 2014
ABSTRACT Post processing is an important part of any document processing system. There are two wa... more ABSTRACT Post processing is an important part of any document processing system. There are two ways of post processing. First word level correction and second sentence level correction in document. The word level is performed in two ways first, finding error and finding dictionary by most similar word. That is called dictionary based approach. Another method to find most probable word is known as probabilistic approach. In order to generate the probabilistic model which includes unigram, bigram, trigram, online resources from various Gujarati newspaper websites are used. The proposed system will use models like Naïve Bayes and Hidden Markov Model to correct word level error. The system will be tested on synthetic dataset which is generated by adding random word level error in the actual document.
2014 International Conference on Green Computing Communication and Electrical Engineering (ICGCCEE), 2014
ABSTRACT Strokes are the most natural way of describing the character formation; although most of... more ABSTRACT Strokes are the most natural way of describing the character formation; although most of the researcher uses transform based features for recognition of offline text. The stroke base features are very popular in online handwritten text recognition because it is easy to identify stroke and its sequence by tracing pen tip, whereas it is difficult to obtain the same information in the offline text. The aim of this research is to propose the method which will separate the strokes from the thinned binary image of text and extract the directional features from the separated stroke. The strokes are further categorized using k-Nearest Neighbor (k-NN). Recognition of printed Gujarati numeral is selected as a case study to validate proposed features. Accuracy obtains at stroke level is 88%, however the accuracy at symbolic level is likely to improve since every symbol is a collection of multiple stroke. Since there is no standard data set available exclusively for Gujarati text, so this research also aims to generate a dataset of isolated Gujarati numeral.
ABSTRACT Multi-Relational Data Mining is an active area of research from last decade. Relational ... more ABSTRACT Multi-Relational Data Mining is an active area of research from last decade. Relational database is an important source of structured data, hence richest source of knowledge. Most of the commercial and application oriented data uses a relational database scheme in which multiple relations are linked through primary key, foreign key relationship. Multi-Relational Data Mining (MRDM) deals with extraction of information from a relational database containing multiple tables related to each other. In order to extract important information or knowledge, it is required to apply Data Mining algorithms on this relational database but most of these algorithms work only on a single table. Generating a single table from multiple tables may result in loss of important information, like the relation between tuples, also it is a not efficient in terms of time and space. In this paper, we proposed a Probabilistic Graphical Model, Bayesian Belief Network (BBN), based approach that considers not only attributes of the table but also the relation between tables. The conditional dependencies between tables are derived from Semantic Relationship Graph (SRG) of the relational database, whereas Tuple Id propagation helps to derive the conditional probability of tables. Our model not only predicts class label of unknown samples, but also gives the value of sample if class label is known.
2014 International Conference on Green Computing Communication and Electrical Engineering (ICGCCEE), 2014
ABSTRACT Post processing is an important part of any document processing system. There are two wa... more ABSTRACT Post processing is an important part of any document processing system. There are two ways of post processing. First word level correction and second sentence level correction in document. The word level is performed in two ways first, finding error and finding dictionary by most similar word. That is called dictionary based approach. Another method to find most probable word is known as probabilistic approach. In order to generate the probabilistic model which includes unigram, bigram, trigram, online resources from various Gujarati newspaper websites are used. The proposed system will use models like Naïve Bayes and Hidden Markov Model to correct word level error. The system will be tested on synthetic dataset which is generated by adding random word level error in the actual document.
This paper addresses the problem of recognizing handwritten numerals for Gujarati Language. Three... more This paper addresses the problem of recognizing handwritten numerals for Gujarati Language. Three methods are presented for feature extraction. One belongs to the spatial domain and other two belongs to the transform domain. In first technique, a new method has been proposed for spatial domain which is based on Freeman chain code. This method obtains the global direction by considering n × n neighbourhood and thus eliminates the noise which occurs due to local direction. In second and third method, 85 dimensional Fourier descriptors and Discrete Cosine Transform coefficients were computed and treated as feature vectors. Comparative analysis has been done for these three methods. These methods are tested with three different classifiers namely K-Nearest Neighbour, Support Vector Machine and Back Propagation Neural Network. Experimental results were evaluated using 10 fold cross validation. The highest recognition rates obtained for full data set of 3000 digits are 85.67%, 93.60% and 93.00% using modified chain code, DFT and DCT respectively.
This paper presents a method for combining Self Organizing Map (SOM) with k-Nearest Neighbor Clas... more This paper presents a method for combining Self Organizing Map (SOM) with k-Nearest Neighbor Classifier (k-NN) to device an elegant classification technique and applying it for classification of subset of printed Gujarati characters. Many researchers have employed many different models for the classification of printed/handwritten characters for number of different languages all over the globe; few of the widely used classifiers are Template Matching, Artificial Neural Network (ANN), Hidden Markov Model (HMM), and Support Vector Machine (SVM) etc. Our attempt is to use SOM based k-NN classifier for classification of subset of printed Gujarati characters. This approach does not require prior feature identification stage hence it is faster and more generalize compare to other approaches. A prototype system is implemented for the same and tested on sufficient dataset. Average accuracy of 82.36% is reported on test dataset
Uploads
Papers by Mukesh Goswami